Please choose a menu option
Visit the IDRsolutions WebSite

Extract Text In Rectangle from PDF files - output in table

Description Extract PDF text on a PDF page using PDF co-ordinates - output in table
Jar Path org.jpedal.examples.text.ExtractTextInRectangleAsTable

Click for source code link

How to run the code

Windows: java -cp %jpedalDir%/jpedal.jar org/jpedal/examples/text/ExtractTextInRectangleAsTable inputValues
MacOS X Linux: java -cp $jpedalDir/jpedal.jar org/jpedal/examples/text/ExtractTextInRectangleAsTable inputValues

Click here for help on running Java programs from the command line

 

An explanation of the input values

This example expects between one and five space delimited input values.

First value: The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").
Remaining values: If you wish to extract PDF text from a specified area of the screen, enter the co-ordinates of the bounding rectangle in the order x1 y1 x2 y2

 

Other details/screenshot

This example will extract PDF text from either the whole PDF page, or any rectangular area of the PDF page, and calulate the table structure containing the PDF text.

For best results, try to avoid including PDF text which is not contained within the table.

An output directory called text contains sub-directories with multiple files. Each page in the PDF document is extracted as a separate file and contains the PDF text from that page in a tabular format. The file is named using the page number. The output can be either XML or CSV (the default).

NB: PDF co-ordinates start at the bottom left corner and run up the page. x1,y1 is the top left corner, x2,y2 the bottom right

 

Links to related articles

Getting started tutorial
ExtractTextInRectangle - extract PDF text in any rectangular area.
Additional optional libraries lists items which may be needed for some PDF files or functionality.

 

Return to main PDF support section

Click here for the main PDF support area with lots of java examples, PDF tutorials and useful information to get the most out of the JPedal PDF library

 


PDF viewer