|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.jpedal.grouping.PdfGroupingAlgorithms
public class PdfGroupingAlgorithms
Applies heuristics to unstructured PDF text to create content
| Field Summary | |
|---|---|
static char |
MARKER2
|
static boolean |
oldTextExtraction
Flag used to debug new text routines |
static boolean |
useUnrotatedCoords
|
| Constructor Summary | |
|---|---|
PdfGroupingAlgorithms()
|
|
PdfGroupingAlgorithms(org.jpedal.objects.PdfData pdf_data)
create a new instance, passing in raw data |
|
| Method Summary | |
|---|---|
void |
cleanupText(org.jpedal.objects.PdfData pdf_data)
generic decode merely clean up data and remove our embedded information |
java.util.Map |
extractTextAsTable(int x1,
int y1,
int x2,
int y2,
int pageNumber,
boolean isCSV,
boolean keepFontInfo,
boolean keepWidthInfo,
boolean keepAlignmentInfo,
int borderWidth,
boolean AddCustomTags)
calls various low level merging routines on merge - isCSV sets if output is XHTML or CSV format - XHTML also has options to include font tags (keepFontInfo), preserve widths (keepWidthInfo), try to preserve alignment (keepAlignmentInfo), and set a table border width (borderWidth) - AddCustomTags should always be set to false |
java.util.Vector |
extractTextAsWordlist(int x1,
int y1,
int x2,
int y2,
int page_number,
boolean estimateParagraphs,
boolean breakFragments,
java.lang.String punctuation)
algorithm to place data into an object for each page - hardcoded into program - Co-ordinates are x1,y1 (top left hand corner), x2,y2(bottom right) - If the co-ordinates are not valid a PdfException is thrown - Returns a Vector with the words and co-ordinates (all values are Strings) |
java.lang.String |
extractTextInRectangle(int x1,
int y1,
int x2,
int y2,
int page_number,
boolean estimateParagraphs,
boolean breakFragments)
algorithm to place data into an object for each page - hardcoded into program Co-ordinates are x1,y1 (top left hand corner), x2,y2(bottom right) If the co-ordinates are not valid a PdfException is thrown |
java.util.List |
findMultipleTermsInRectangle(int x1,
int y1,
int x2,
int y2,
int rotation,
int page_number,
java.lang.String[] terms,
boolean orderResults,
int searchType,
org.jpedal.grouping.SearchListener listener)
Algorithm to find multiple text terms in x1,y1,x2,y2 rectangle on page_number. |
java.util.SortedMap |
findMultipleTermsInRectangleWithMatchingTeasers(int x1,
int y1,
int x2,
int y2,
int rotation,
int page_number,
java.lang.String[] terms,
int searchType,
org.jpedal.grouping.SearchListener listener)
Algorithm to find multiple text terms in x1,y1,x2,y2 rectangle on page_number, with matching teaser |
float[] |
findTextInRectangle(int x1,
int y1,
int x2,
int y2,
int page_number,
java.lang.String textValue)
Deprecated. |
float[] |
findTextInRectangle(int x1,
int y1,
int x2,
int y2,
int page_number,
java.lang.String textValue,
int searchType)
Algorithm to find textValue in x1,y1,x2,y2 rectangle on page_number |
void |
generateTeasers()
tell find text to generate teasers as well |
float[] |
getEndPoints()
return endpoints from last findtext |
java.lang.String[] |
getTeasers()
return text teasers from findtext if generateTeasers() called before find |
static java.lang.String |
removeHiddenMarkers(java.lang.String contents)
method to show data without encoding |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static char MARKER2
public static boolean oldTextExtraction
public static boolean useUnrotatedCoords
| Constructor Detail |
|---|
public PdfGroupingAlgorithms()
public PdfGroupingAlgorithms(org.jpedal.objects.PdfData pdf_data)
| Method Detail |
|---|
public final void cleanupText(org.jpedal.objects.PdfData pdf_data)
public static java.lang.String removeHiddenMarkers(java.lang.String contents)
public final java.util.Map extractTextAsTable(int x1,
int y1,
int x2,
int y2,
int pageNumber,
boolean isCSV,
boolean keepFontInfo,
boolean keepWidthInfo,
boolean keepAlignmentInfo,
int borderWidth,
boolean AddCustomTags)
throws PdfException
PdfException
public final java.util.Vector extractTextAsWordlist(int x1,
int y1,
int x2,
int y2,
int page_number,
boolean estimateParagraphs,
boolean breakFragments,
java.lang.String punctuation)
throws PdfException
PdfException
public final java.lang.String extractTextInRectangle(int x1,
int y1,
int x2,
int y2,
int page_number,
boolean estimateParagraphs,
boolean breakFragments)
throws PdfException
PdfException
public java.util.SortedMap findMultipleTermsInRectangleWithMatchingTeasers(int x1,
int y1,
int x2,
int y2,
int rotation,
int page_number,
java.lang.String[] terms,
int searchType,
org.jpedal.grouping.SearchListener listener)
throws PdfException
x1 - the left x cordy1 - the upper y cordx2 - the right x cordy2 - the lower y cordrotation - the rotation of the page to be searchedpage_number - the page number to search onterms - the terms to search forsearchType - searchType the search type made up from one or more constants obtained from the SearchType classlistener - an implementation of SearchListener is required, this is to enable searching to be cancelled
PdfException - If the co-ordinates are not valid
public java.util.List findMultipleTermsInRectangle(int x1,
int y1,
int x2,
int y2,
int rotation,
int page_number,
java.lang.String[] terms,
boolean orderResults,
int searchType,
org.jpedal.grouping.SearchListener listener)
throws PdfException
x1 - the left x cordy1 - the upper y cordx2 - the right x cordy2 - the lower y cordrotation - the rotation of the page to be searchedpage_number - the page number to search onterms - the terms to search fororderResults - if true the list that is returned is ordered to return the resulting rectangles in a
logical order descending down the page, if false, rectangles for multiple terms are grouped together.searchType - searchType the search type made up from one or more constants obtained from the SearchType classlistener - an implementation of SearchListener is required, this is to enable searching to be cancelled
PdfException - If the co-ordinates are not valid
public final float[] findTextInRectangle(int x1,
int y1,
int x2,
int y2,
int page_number,
java.lang.String textValue)
throws PdfException
PdfException
public final float[] findTextInRectangle(int x1,
int y1,
int x2,
int y2,
int page_number,
java.lang.String textValue,
int searchType)
throws PdfException
x1 - the left x cordy1 - the upper y cordx2 - the right x cordy2 - the lower y cordpage_number - the page number to search ontextValue - the text string to search forsearchType - the search type made up from one or more constants obtained from the SearchType class
PdfException - If the co-ordinates are not validpublic float[] getEndPoints()
public java.lang.String[] getTeasers()
public void generateTeasers()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||