PDFtoHTML
pdftohtml is a utility which converts PDF files into HTML and XML formats.
The latest release is 0.36
It's based on the xpdf 2.02 by Derek Noonburg
Demo: PDF Document HTML Document
You can get a win32 GUI for pdftohtml here
This is a neat one. I use pdf every day and would like to have a copy of them in html. I found this one fast and very simple to use. It was definitely good. But since my pdfs have lot of complex graphics it was not able to get it very right. But I am happy as it still did present it correctly and neatly. I was wondering why the pages are generated separately and couldn't have been as a single html page.
To install this open source application and to get the windows GUI up you will have to install the following:
GhostScript (the "Windows Binary" package)
PDFtoHTML (Get the "Windows Binary" package)
PDF2HTMLgui v1.3 (the Windows Binary again! ;) )
The latest release is 0.36
It's based on the xpdf 2.02 by Derek Noonburg
Demo: PDF Document HTML Document
You can get a win32 GUI for pdftohtml here
This is a neat one. I use pdf every day and would like to have a copy of them in html. I found this one fast and very simple to use. It was definitely good. But since my pdfs have lot of complex graphics it was not able to get it very right. But I am happy as it still did present it correctly and neatly. I was wondering why the pages are generated separately and couldn't have been as a single html page.
To install this open source application and to get the windows GUI up you will have to install the following:
GhostScript (the "Windows Binary" package)
PDFtoHTML (Get the "Windows Binary" package)
PDF2HTMLgui v1.3 (the Windows Binary again! ;) )
Comments
Can you send me the GUI package and tell me what distribution of GhostScript should I use (for Win32)? I am m_chris05@yahoo.com
m_chris05 at yahoo com
I too saw that the GUI is not available and the link is broken.
But I tried a google search for "pdf2htmlgui.zip " and found this link below.
Down load it from here and give it a shot again.
http://ftp.pub.cri74.org/pub/win9x/conversion_fichiers/PDF-HTML/PDFtoHTML/
As for your other question use gs854w32.exe the .exe for 32-bit Windows. That is the latest available I found at http://sourceforge.net/project/showfiles.php?group_id=1897
Let me know again if it still does not work. I have tested it and it works well enough.
Sorry I have not tried it in the console mode so am not aware of the .dll problem. Could be a missing dll from its path. May be someone can try the console version and help us out.
Cheers
http://cypherswipe.googlepages.com/pdf2htmlgui.zip
Try the link above. It seems to give a file for download. Till then I will try and put this file up somewhere I know so that we dont lose it again :)
Please let us know if this works!
1. Commandline usage doesn't seem to reference the .ini file created by the gui install which tells it where to find the ghostscript binaries. After installing GS I added the resulting /bin folder to the windows PATH variable and it was able to find all needed .dlls
2. Running pdftohtml on the commandline with the -h switch lists out more options than are documented elsewhere.
3. Using the -noframes switch on the commandline gives you one consolidated html file instead of one html per pdf page. There is an equivalent checkbox option on the gui labelled -generate no frames-
Thanks,
Peter
Click on 'Serveur 1'. Don't worry, the webpage is in French but the program is in English.
(2) How to convert a document to a single html page:
More Options->tick generate no frames.
(3) Which version of Ghostscript should be installed? The last open source version, i.e. gs871w32.exe from the GhostScript site linked to in the main article above.
(4) PDF2HTMLgui bugs: I found a couple of bugs. The program crashed when navigating to a file. And it would not produce pic files at all when generate complex document was ticked.
Workaround: I manually edited pdftohtmlgui.ini (a file created by the program). The 2nd line must be edited so it uses shortnames where needed such that there are no spaces in the Ghostscript executable pathname. Mine now reads:
GSpath=c:\progra~1\ghostscript\gs8.71\bin\gswin32c.exe
Do this after everything has been installed and PDF2HTMLgui has been run the first time so that you have answered both its questions about the locations of pdftohtml.exe and gswin32c.exe . Under Windows XP, the DOS pathname can be found by navigating in MS-DOS to the Ghostscript executable drawer-by-drawer from the root, and on the way using
DIR /x
to ascertain the shortname for each subdrawer comprising the pathname.
(5) Be aware that the generate complex document option must be ticked to preserve all text formatting, but doing so produces a large page-sized background pic for every document page. With this option unticked, each picture is converted to a single picture file.
(6) Here is a
review & instruction page describing PDFtoHTML/PDF2HTMLgui.
Steven Stevenson