PDF to image conversion in Java

PDF to image conversion in Java is a topic which I get asked a lot of questions about so I thought it would make a good topic for a blog. I will be doing some follow-up articles to cover specific image issues with tiff, JPEG and PNG.

A PDF is essentially a vector graphics format. Developers often want to create images of the pages to show on the web or use as thumbnails. So the first decision is how big to make the image. The visible size of a PDF file is the CropBox so it often makes sense to use this (so a common page size is 595 by 842 so we create an image 595×842 pixels in size).  This is what our standard PDF to image conversion example does.

That is fine unless the user wants to zoom into the image. In this case we would need to create a bigger image. The issue here is that we can rapidly use up a great deal or memory (creating an image at 1190x 1684 pixels will use four times the memory). So you need to think very carefully about the optimum values to use.

There is also a question of how big to make the image. Sometimes the PDF will contain highly detailed images. If you squeeze them down you will lose image quality to you might want to use a higher scaling to take advantage of them. But if there are loads of poor quality images, then there is no advantage to creating a big image.

So we created a second example which could produce bigger images if the embedded images are high quality and allow the user to create different sizes of image for a page. Interestingly the main performance hit is on the actual creation of the blank image in Java.

So converting PDF files into images is very much a balancing trick – think very carefully about what you want to achieve to get the best balance on image size – a bigger image has more detail and you can zoom in but will take longer to create and the file will be larger.

And if you want to look at PDF to image conversion in Java, do have a look at our examples.

Related Posts:

Posted in Image, Java | Tagged , , , | Leave a comment

PDF to HTML conversion – 3 ways to make a HTML version of a PDF

As I have been asked this frequently, I thought it would make a good blog topic.

When you create an HTML version of a PDF file, you can essentially do this in the following ways:-

1. Draw everything onto an image of the page and show the image. This has the advantage that it looks exact but results in very large files. When you scale in you get pixellation.

2. Place the text inside a div element and put the rest on the image. This gives you better text quality on scaling but you will need the correct fonts and the text positioning may not be exact. You still get the problems with a big image.

3. Place the text inside a div element and use the Canvas of SVG for all Vector content. This removes the need for a huge image. You could still see some pixellation on the canvas but the file is much smaller. The downside is that you need a modern browser.

If you want to see the sort of results, you can get with option 3, why not give our PDF2HTML5 convertor a spin?

Related Posts:

Posted in html | Tagged , , | Leave a comment

PDF to HTML5 conversion – non-standard glyfs

I have been looking at a PDF to HTML5 conversion issue where there was some odd text appearing on the HTML5 page but not in the PDF file. It turned out to be rather interesting…

Every glyf inside a PDF file has a name (A, B, Space, ellipsis, etc). There are a whole set of standard values defined but you can also use any arbitary value. They are listed in the charset and inside the fonts. So long as the values match where they are used up you can call them what you want.

However, if you create your own glyfs, the software may not be able to resolve the actual character you want to associate with this to display or extract as text. So what should we do when generating HTML5 from these files? The only value we have is the glyf name so this is the odd text we were seeing on the screen (in this case angbracketleft and angbracketright).

So we have added some mapping code into the static helper class HTMLHelper so you can replace these with an appropriate value

/**
* replace any non-standard glyfs
*/
public String mapNonstandardGlyfName(String glyf,PdfFont currentFontData) {

glyf = glyf.replaceAll("angbracketright", ")");
glyf = glyf.replaceAll("angbracketleft", "(");

return glyf;
}

That looks rather better!

fixed text

Related Posts:

Posted in html | Tagged , , , , | Leave a comment

JavaFx and HTML5 differences – Shapes work differently

Over the past few months I have been working on a couple of projects, involving HTML5 and JavaFX. There are many similarities and as you can imagine some differences as well. In these series I will concentrate mainly on the differences.

The first one I noticed, and the subject of today’s article, was the moveTo command. This is available in both the HTML5 and JavaFX as a method used to draw shapes. This a class in JavaFX which means a new instance of it has to be created whereas on HTML5 its a method which can be accessed from the canvas object.

Path path_4 = new Path();
ObservableList shape_4 = path_4.getElements();
shape_4.add(new MoveTo(50,50));
shape_4.add(new LineTo(150,50));
shape_4.add(new LineTo(150,150));
shape_4.add(new LineTo(50,150));
shape_4.add(new LineTo(50,50));
path_4.setStrokeWidth(2);
path_4.setStroke(Color.rgb(255,0,0));

JavaFX implementation

var pdf_canvas=document.getElementById("pdf6");
var pdf_context=pdf_canvas.getContext("2d");

pdf_context.beginPath();
pdf_context.moveTo(50,50);
pdf_context.lineTo(150,50);
pdf_context.lineTo(150,150);
pdf_context.lineTo(50,150);;
pdf_context.lineTo(50,50);
pdf_context.lineWidth = '2';
pdf_context.strokeStyle = 'rgb(255,0,0)';
pdf_context.stroke();
pdf_context.closePath();

HTML5 implementation

This is the code for the red box below, however the others are similar.

Colored shapes

When the MoveTo command is omitted for any reason, be it a mistake or an attempt for a short-cut or lazy programming, the result is rather interesting…

On the HTML5 output the first line after where the moveTo command is meant to be does not get drawn, however the other lineTo commands are executed producing an awkward shape. For example the red square below is drawn in a clockwise manner from the top left to top right, top right to bottom right, bottom right to bottom left and finally from the bottom left to the top left. By comparison the green is drawn in an anti-clockwise manner.

The JavaFx does the complete opposite, where the HTML5 displays the proceeding lines JavaFx compiles and runs with an error stating the initial MoveTo is missing. Regardless of this fact the program still runs BUT with the whole shape missing.

Not sure which you prefer?? I would rather the shape does not show up so that I know I have omitted something i.e. MoveTo as opposed to being deceived that is still there. What do you think?

Stay tuned as no doubt I will encounter some more interesting differences. Which I am going to document for you.

Related Posts:

Posted in html, javafx | Tagged , , , , | 2 Comments

PDF to HTML conversion – relative positioning of content

One of the most interesting things about developing the PDF to HTML convertor is the number of ideas and enhancements which arise from actual usage. I am constantly surprised at the number of different uses people have found for it and the creative ways it is being used (if you are doing something interesting and would like to write a short blog post to publicise it let us know).

One of the requests we have had from several customers in the ability to use the HTML more flexibly. The content is positioned on the page using CSS. This works fine until users want to manipulate the CSS because we use absolute positions.

So for today’s release we have had an option to place all our content inside a div tag with relative positioning.  We give this div the css tag name ‘jpedal’ as we think this should be reasonably unique. This allows us to use absolute positions within our div and users to alter the CSS and content around it.

The new option is enabled by default in our example code or you can add to your code with this line

/**
* include our content in a Div so you can position relative
*/
HTMLoutput.setBooleanValue(HTMLDisplay.EncloseContentInDiv,true);

This should make it much easier if you wish to use the content with your own set of RSS tags or wrap it in other content. Any other suggestions?

Related Posts:

Posted in html | Tagged , , , , , | Leave a comment