PDF to HTML5 conversion – duplicate font names

When you convert PDF to HTML5, you can have a potential problem of duplicate font names. In a PDF file, you can embed lots of fonts and subset them to ignore just the glyfs you are using (keeping the font size down). So a page could contain several fonts, all called Arial. This is not an issue in a PDF file because the font name is a piece of information not the key used to identify the font.

In a PDF file, it is the unique key which identifies the fonts used in the CSS tag (FONT-FAMILY) and the @font-face tag to embed the font. So we need to ensure that the font name is unique in the HTML5. How will you handle this?

This is how we deal with this. The first time you use Arial, we will call it Arial. If a different version of Arial appears we will append the FontID  (which is how the PDF identifies it) and the size of the font data to give a unique version (Arial_C2_0_5400). Luckily, because the PDF does not use it, we can easily alter it for our own use without breaking anything else and handle all these fonts. Does this seem sensible?

Related Posts:

Posted in Fonts, html | Tagged , , | Leave a comment

PDF to HTML5 conversion – the standard PDF base font families

With a PDF, you can generally assume that 8 font families (Courier, Arial, etc) are all setup for you and you do not need to embed them. This keeps down the size of the PDF file. However, this does not stop you embedding your own version of a font and using that instead.

So how do we handle this when we convert PDF to HTML5?

HTML5 generally has passable versions of all these fonts, but the user may want to use the version in the PDF file, so it probably makes sense to let the person doing the conversion decide. So we have a JVM flag (org.jpedal.pdf2html.fontMode) with possible values which include HTMLFontMapper.EMBED_ALL_EXCEPT_BASE_FAMILIES and HTMLFontMapper.EMBED_ALL so that you can choose. If you use EMBED_ALL, you will see all the fonts appearing in the HTML5

<!-- Any embedded fonts defined here -->
<style type="text/css" >

@font-face {
font-family: KHBMAA-Quorum-Bold;
src: url(“01/fonts/KHBMAA-Quorum-Bold.otf”);
}

@font-face {
font-family: KHBLNK-Helvetica;
src: url(“01/fonts/KHBLNK-Helvetica.otf”);
}

@font-face {
font-family: KHBLON-Helvetica-Bold;
src: url(“01/fonts/KHBLON-Helvetica-Bold.otf”);
}

@font-face {
font-family: KHBLOO-HelveticaNeue-Black;
src: url(“01/fonts/KHBLOO-HelveticaNeue-Black.otf”);
}

Related Posts:

Posted in Fonts | Tagged , , , | Leave a comment

IDRsolutions moving to bigger offices

In the last 12 years, we have grown IDRsolutions slowly and steadily. This has allowed us to try and keep the personal touch and focus on our product development with a steadily growing team of developers working in several locations.

We have some new developers arriving in the main office next month to help with the PDF to HTML5 developments so we have finally outgrown our current office  and it is time to move to a bigger office. So from monday, we will be in a new,bigger office…

My favourite feature in our current office is the view of green fields so one critical requirement for the new office was a similar view for my desk to provide the occasional distraction from code. If you look at the view, I think you will agree, I managed to keep that. What is most important to you in your office?

Related Posts:

  • No Related Posts
Posted in opinion | Leave a comment

Replacing the deprecated Java JPEG classes for Java 7

In the early days of Java, Sun produced a really handy set of classes to handle JPEG images. These included some really nifty little features like the ability to easily set the amount of compression and the resolution. When ImageIO came along, the class was deprecated. This means that it is still in Java but not guaranteed to be in any later releases. ImageIO was more complicated to use for JPEG images and we felt the earlier code produced better results so we continued to use it.

I have been checking our code against the new Java 7 (release 4) build for the Mac and it now appears that the old JPEG classes have finally been removed. So I have updated my code to use ImageIO. Here is my updated version with both old and new versions so you can see the changes if you are still using these classes.

public static void saveAsJPEG(String jpgFlag,BufferedImage image_to_save, float JPEGcompression, FileOutputStream fos) throws IOException {

//useful documentation at http://docs.oracle.com/javase/7/docs/api/javax/imageio/metadata/doc-files/jpeg_metadata.html
//useful example program at http://johnbokma.com/java/obtaining-image-metadata.html to output JPEG data

//old jpeg class
//com.sun.image.codec.jpeg.JPEGImageEncoder jpegEncoder = com.sun.image.codec.jpeg.JPEGCodec.createJPEGEncoder(fos);
//com.sun.image.codec.jpeg.JPEGEncodeParam jpegEncodeParam = jpegEncoder.getDefaultJPEGEncodeParam(image_to_save);

// Image writer
JPEGImageWriter imageWriter = (JPEGImageWriter) ImageIO.getImageWritersBySuffix(“jpeg”).next();
ImageOutputStream ios = ImageIO.createImageOutputStream(fos);
imageWriter.setOutput(ios);

//and metadata
IIOMetadata imageMetaData = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(image_to_save), null);

if (jpgFlag != null){

int dpi = 96;

try {
dpi = Integer.parseInt(jpgFlag);
} catch (Exception e) {
e.printStackTrace();
}

//old metadata
//jpegEncodeParam.setDensityUnit(com.sun.image.codec.jpeg.JPEGEncodeParam.DENSITY_UNIT_DOTS_INCH);
//jpegEncodeParam.setXDensity(dpi);
//jpegEncodeParam.setYDensity(dpi);

//new metadata
Element tree = (Element) imageMetaData.getAsTree(“javax_imageio_jpeg_image_1.0″);
Element jfif = (Element)tree.getElementsByTagName(“app0JFIF”).item(0);
jfif.setAttribute(“Xdensity”, Integer.toString(dpi));
jfif.setAttribute(“Ydensity”, Integer.toString(dpi));

}

if(JPEGcompression>=0 && JPEGcompression<=1f){

//old compression
//jpegEncodeParam.setQuality(JPEGcompression,false);

// new Compression
JPEGImageWriteParam jpegParams = (JPEGImageWriteParam) imageWriter.getDefaultWriteParam();
jpegParams.setCompressionMode(JPEGImageWriteParam.MODE_EXPLICIT);
jpegParams.setCompressionQuality(JPEGcompression);

}

//old write and clean
//jpegEncoder.encode(image_to_save, jpegEncodeParam);

//new Write and clean up
imageWriter.write(imageMetaData, new IIOImage(image_to_save, null, null), null);
ios.close();
imageWriter.dispose();

}

Otherwise, Java 7 looks to be a definite progression over Java 6. What do you think of it?

Related Posts:

Posted in Image | Tagged , , , | Leave a comment

What you missed at the IText summit (and how you can catch up)

In March, IText organised a free conference in Belgium and invited an incredible mix of speakers to talk about PDF. There was lots of technical content about PDF and also some amazing examples about how companies were using PDF file on computers and mobile devices.  I posted a brief review telling you some of the things you missed in a blog article.

Well, the good news is that if you missed it, you have a second chance! All the videos have been posted for free viewing on Parleys. Which is your favorite talk?

Related Posts:

Posted in Open Source, opinion | Tagged , , | Leave a comment