Using Open Office to convert MS Word documents

Rickard Öberg recently posted a request for suggestions about using Java to convert MS word docs into HTML. I have been doing some work on this lately using the freely available, open-source OpenOffice.org to do the hard parts, making calls to a running OpenOffice server from within my Java code. It seems that there is some more interest in doing this from the Java community at large, so I am posting some source code here for anyone who is interested.

You need a running instance of OpenOffice in 'server' mode. My classes communicate with this to effect the conversions. I have successfully tested conversions of Word 97 and Word 2000 to HTML, plain ASCII text, 'flat' XML and OpenOffice's native format. The documentation at OpenOffice.org suggests that there are many other possibilities.

Much of the code was freely adapted from examples in the OpenOffice documentation. This is not something I will be actively developing further, but feel free to use this in your own projects if it's useful.

This was previously published at http://blog.sockdrawer.org and was retrieved from the Internet Archive.