Unicode characters dropped in PDF files generated with iText and Flying Saucer

Flying Saucer is a very useful Java library that uses iText to convert HTML pages to PDF documents. Here is a nice tutorial on how to use Flying Saucer.

The last few days I was trying unsuccessfully  to generate some report that contained non-standard Unicode characters (in my case it was Greek, but I guess the same problem exists for other character sets as well, like Cyrillic, Armenian, etc). The problem was that the Greek characters seemed to be omitted; they didn’t show up in the document.

The code I was using was more or less something like that:

public class Html2Pdf {

    public static void main(String[] args) throws DocumentException, IOException {
        File file = new File("output.pdf");
        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
        XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream("input.html"));

And the input HTML file was something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
        <h1>Αρνάκι άσπρο και παχύ</h1>

When I tried to convert this simple HTML to PDF, I got a blank page.

After lots of hours of troubleshooting, I finally discovered that, for some reason, if no specific font is used, the generated PDF uses some kind of  default (probably Helvetica) font, that contains a very limited character set, that obviously does not contain the Greek code page.

So I came up with this simple trick, that seems to solve the problem. I only had to make sure that all elements in my HTML file will use a font that contains Greek, like Arial:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

        * { font-family: Arial; }
        <h1>Αρνάκι άσπρο και παχύ</h1>

Arial is a pretty standard font, installed by default in most operating system, and implements a wide variety of alphabets (including Greek).

I hope this helped…


This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Unicode characters dropped in PDF files generated with iText and Flying Saucer

  1. Ider Lkhagvasuren says:

    Good Point. Save my day

  2. JUG says:

    Not working….

  3. Goutham says:

    not working

Leave a Reply

Your email address will not be published. Required fields are marked *