Thursday, July 28, 2005

PDFtoHTML

pdftohtml is a utility which converts PDF files into HTML and XML formats.

The latest release is 0.36
It's based on the xpdf 2.02 by Derek Noonburg
Demo: PDF Document HTML Document

You can get a win32 GUI for pdftohtml here

This is a neat one. I use pdf every day and would like to have a copy of them in html. I found this one fast and very simple to use. It was definitely good. But since my pdfs have lot of complex graphics it was not able to get it very right. But I am happy as it still did present it correctly and neatly. I was wondering why the pages are generated separately and couldn't have been as a single html page.
To install this open source application and to get the windows GUI up you will have to install the following:
GhostScript (the "Windows Binary" package)
PDFtoHTML (Get the "Windows Binary" package)
PDF2HTMLgui v1.3 (the Windows Binary again! ;) )

9 comments:

Anonymous said...

I've downloaded pdftohtml and a ghostscript package but the GUI is no more available for dowload (broken link on their site). Also, in console mode, pdftohtml does not output correctly the images and gives Error - .dll not found.

Can you send me the GUI package and tell me what distribution of GhostScript should I use (for Win32)? I am m_chris05@yahoo.com
m_chris05 at yahoo com

Techknight said...

Hi Chris,
I too saw that the GUI is not available and the link is broken.
But I tried a google search for "pdf2htmlgui.zip " and found this link below.

Down load it from here and give it a shot again.
http://ftp.pub.cri74.org/pub/win9x/conversion_fichiers/PDF-HTML/PDFtoHTML/

As for your other question use gs854w32.exe the .exe for 32-bit Windows. That is the latest available I found at http://sourceforge.net/project/showfiles.php?group_id=1897

Let me know again if it still does not work. I have tested it and it works well enough.

Sorry I have not tried it in the console mode so am not aware of the .dll problem. Could be a missing dll from its path. May be someone can try the console version and help us out.

Cheers

emotions.pirad said...

The link is also broken by now

Techknight said...


http://cypherswipe.googlepages.com/pdf2htmlgui.zip

Try the link above. It seems to give a file for download. Till then I will try and put this file up somewhere I know so that we dont lose it again :)

Please let us know if this works!

Anonymous said...

Thanks for providing all this information above, it helped me out a great deal. Some additional thoughts to throw into the mix:

1. Commandline usage doesn't seem to reference the .ini file created by the gui install which tells it where to find the ghostscript binaries. After installing GS I added the resulting /bin folder to the windows PATH variable and it was able to find all needed .dlls

2. Running pdftohtml on the commandline with the -h switch lists out more options than are documented elsewhere.

3. Using the -noframes switch on the commandline gives you one consolidated html file instead of one html per pdf page. There is an equivalent checkbox option on the gui labelled -generate no frames-

Thanks,

Peter

Anonymous said...

(1) New (working!) PDF2HTMLgui download link.

Click on 'Serveur 1'. Don't worry, the webpage is in French but the program is in English.

(2) How to convert a document to a single html page:
More Options->tick generate no frames.

(3) Which version of Ghostscript should be installed? The last open source version, i.e. gs871w32.exe from the GhostScript site linked to in the main article above.

(4) PDF2HTMLgui bugs: I found a couple of bugs. The program crashed when navigating to a file. And it would not produce pic files at all when generate complex document was ticked.
Workaround: I manually edited pdftohtmlgui.ini (a file created by the program). The 2nd line must be edited so it uses shortnames where needed such that there are no spaces in the Ghostscript executable pathname. Mine now reads:

GSpath=c:\progra~1\ghostscript\gs8.71\bin\gswin32c.exe

Do this after everything has been installed and PDF2HTMLgui has been run the first time so that you have answered both its questions about the locations of pdftohtml.exe and gswin32c.exe . Under Windows XP, the DOS pathname can be found by navigating in MS-DOS to the Ghostscript executable drawer-by-drawer from the root, and on the way using
DIR /x
to ascertain the shortname for each subdrawer comprising the pathname.

(5) Be aware that the generate complex document option must be ticked to preserve all text formatting, but doing so produces a large page-sized background pic for every document page. With this option unticked, each picture is converted to a single picture file.

(6) Here is a
review & instruction page describing PDFtoHTML/PDF2HTMLgui.

Steven Stevenson

Альтернативная энергетика said...
This comment has been removed by the author.
A-r said...

Sorry for bad Inglish. Bad link for GUI download. Pliese help me with this software.

A-r said...

All this the links brokken. I have WindowsXP.