Please choose a menu option
Visit the IDRsolutions WebSite

Extract PDF text from Structured Content in PDF files

Description Extract structured PDF text from PDF files (requires PDF files to contain Structured data)
Jar Path org.jpedal.examples.text.ExtractStructuredText

Click for source code link

How to run the java code

Windows: java -cp %jpedalDir%/jpedal.jar org/jpedal/examples/text/ExtractStructuredText inputValues
MacOS X Linux: java -cp $jpedalDir/jpedal.jar org/jpedal/examples/text/ExtractStructuredText inputValues

Click here for help on running Java programs from the command line

 

An explanation of the input values

This example expects two space delimited input values.

First value: The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").
Second value (optional): Target directory for ouput data

 

Other details/screenshot

Needs PDF files created with Structured content. If this is not present, have a look at ExtractTextInRectangle, ExtractTextInRectangleAsTable or ExtractTextAsWordlist for PDF to text conversion.

 

Links to related articles

Getting started tutorial
Additional optional libraries lists items which may be needed for some PDF files or functionality.

 

Return to main PDF support section

Click here for the main PDF support area with lots of java examples, PDF tutorials and useful information to get the most out of the JPedal PDF library

 


PDF viewer