Extract Text In Rectangle from PDF files - output in table
| Description | Extract PDF text on a PDF page using PDF co-ordinates - output in table |
| Jar Path | org.jpedal.examples.text.ExtractTextInRectangleAsTable |
How to run the code
| Windows: | java -cp %jpedalDir%/jpedal.jar org/jpedal/examples/text/ExtractTextInRectangleAsTable inputValues |
| MacOS X Linux: | java -cp $jpedalDir/jpedal.jar org/jpedal/examples/text/ExtractTextInRectangleAsTable inputValues |
Click here for help on running Java programs from the command line
An explanation of the input values
This example expects between one and five space delimited input values.
| First value: | The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/"). | Remaining values: | If you wish to extract PDF text from a specified area of the screen, enter the co-ordinates of the bounding rectangle in the order x1 y1 x2 y2 |
Other details/screenshot
This example will extract PDF text from either the whole PDF page, or any rectangular area of the PDF page, and calulate the table structure containing the PDF text.
For best results, try to avoid including PDF text which is not contained within the table.
An output directory called text contains sub-directories with multiple files. Each page in the PDF document is extracted as a separate file and contains the PDF text from that page in a tabular format. The file is named using the page number. The output can be either XML or CSV (the default).
NB: PDF co-ordinates start at the bottom left corner and run up the page. x1,y1 is the top left corner, x2,y2 the bottom right
Links to related articles
Getting started tutorial
ExtractTextInRectangle - extract PDF text in any rectangular area.
Additional optional libraries lists items which may be needed for some PDF files or functionality.
Return to main PDF support section
Click here for the main PDF support area with lots of java examples, PDF tutorials and useful information to get the most out of the JPedal PDF library




