Extract PDF text from Structured Content in PDF files
| Description | Extract structured PDF text from PDF files (requires PDF files to contain Structured data) |
| Jar Path | org.jpedal.examples.text.ExtractStructuredText |
How to run the java code
| Windows: | java -cp %jpedalDir%/jpedal.jar org/jpedal/examples/text/ExtractStructuredText inputValues |
| MacOS X Linux: | java -cp $jpedalDir/jpedal.jar org/jpedal/examples/text/ExtractStructuredText inputValues |
Click here for help on running Java programs from the command line
An explanation of the input values
This example expects two space delimited input values.
| First value: | The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/"). | Second value (optional): | Target directory for ouput data |
Other details/screenshot
Needs PDF files created with Structured content. If this is not present, have a look at ExtractTextInRectangle, ExtractTextInRectangleAsTable or ExtractTextAsWordlist for PDF to text conversion.
Links to related articles
Getting started tutorial
Additional optional libraries lists items which may be needed for some PDF files or functionality.
Return to main PDF support section
Click here for the main PDF support area with lots of java examples, PDF tutorials and useful information to get the most out of the JPedal PDF library




