PDF to HTML5 conversion – duplicate text in PDF files for bold effects

A popular trick in PDF files is to print some text twice (with the second character moved slightly) to create a bold effect.

pdf text

You cannot do this in HTML5 so all you get is double text overlapping. How ugly!

html text

So we add some ‘intelligence’ into the conversion to ignore these characters (it needs to be smart enough to work correctly when we get genuine double characters like following or moon so we look at the position and gap between the letters).

This gives a much better representation of the text :-)

html text

The PDF file format uses lots of tricks which work very well for PDF but need care in being translated in HTML5.

 

Related Posts:

Download our free PDF Guide for Java Developers

This entry was posted in Extraction, html and tagged , , , . Bookmark the permalink.

Comments are closed.