Please choose a menu option
Visit the IDRsolutions WebSite

PDF to text conversion

Description PDF to text conversion from any rectangular area of the page
Jar Path org.jpedal.examples.text.ExtractTextInRectangle

Click for source code link

How to run the code

Windows: java -cp %jpedalDir%/jpedal.jar org/jpedal/examples/text/ExtractTextInRectangle inputValues
MacOS X Linux: java -cp $jpedalDir/jpedal.jar org/jpedal/examples/text/ExtractTextInRectangle inputValues

Click here for help on running Java programs from the command line

 

An explanation of the input values

This example expects between one and five space delimited input values.

First value: The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").
Remaining values: If you wish to extract PDF text from a specified area of the screen, enter the PDF co-ordinates of the bounding rectangle in the order x1 y1 x2 y2

 

XML extraction

By default, the PDF text will be extracted as plain text. You can include XML tagging by running the example with the command line paramter -Dxml=true and the are additional options in the source code which can be altered.

 

Demo Limitations

In the demo version, every SIXTH letter is converted into a 1

 

Other details/screenshot

A directory called text contains sub-directories with multiple files. Each page in the PDF document is extracted as a separate file, and is named using that page number. The output can be either XML or plain text (the default).

NB: PDF co-ordinates start at the bottom left corner and run up the page. x1,y1 is the top left corner, x2,y2 the bottom right

 

Links to related articles

Getting started tutorial
ExtractTextInRectangleAsTable - extract PDF text in any rectangular area.
Additional optional libraries lists items which may be needed for some PDF files or functionality.

 

Return to main PDF support section

Click here for the main PDF support area with lots of java examples, PDF tutorials and useful information to get the most out of the JPedal PDF library

 


PDF viewer