Category Archives: Extraction

There are several versions of each image inside your PDF file

When you look at a PDF file you see images displayed. In fact there are ‘several’ versions of each image… Firstly there is the raw, unclipped version of the image. This may be in an ‘odd colorspace’ – see this … Continue reading

Posted in Extraction, Image | Tagged , , , , | Comments Off

PDF hacks and HTML5 – ‘hidden’ PDF text

While debugging our PDF to HTML5 we have come across alsorts of interesting ‘PDF’ features which need conversion to an HTML5 equivalent. Today, I have been looking at a PDF page which had extra text on the HTML5 version. It … Continue reading

Posted in Extraction, html | Tagged , , , | Comments Off

Starting With GlassFish – Part 2, The Web Page

I have recently begun experiements with glassfish with the idea of creating a few web pages that use glassfish to call some of jPedals example code (such as our PDF to HTML5 converter) and provide the user with the output. … Continue reading

Posted in Extraction, glassfish, html, Java, PDF | Tagged , , , | Comments Off

Starting with GlassFish – Part 1, The Code

I have recently begun experiements with glassfish with the idea of creating a few web pages that use glassfish to call some of jPedals example code (such as our PDF to HTML5 converter) and provide the user with the output. … Continue reading

Posted in Extraction, glassfish, html, Java, PDF | Tagged , , , | Comments Off

PDF to HTML5 conversion – duplicate text in PDF files for bold effects

A popular trick in PDF files is to print some text twice (with the second character moved slightly) to create a bold effect. You cannot do this in HTML5 so all you get is double text overlapping. How ugly! So … Continue reading

Posted in Extraction, html | Tagged , , , | Comments Off