A bit about PDF files
Portable Document Format (PDF) files, a file format originally created by Adobe, come in two types.
Some PDF's are created as scanned images of a document. When opened in a PDF viewer, such as Adobe Acrobat Reader, the text is not selectable. As you move the cursor over the text, the cursor will be displayed as a crosshair cursor.
This indicates that the content of the PDF is only selectable as an image. In effect, each page of the PDF is a picture of a page and the text can only be extracted by Optical Character Recognition (OCR). These are image-based PDFs.
The other type of PDF is one that has been created as the output of an application such as Microsoft Word or other page layout or word processing software. These PDFs were typically "printed" to PDF format and contain data about the text in the PDF. When the PDF is viewed in a PDF viewer the text of the document may be selected and copied. As you move the cursor over the text, the cursor will be displayed as an i-beam cursor.
This indicates that the text is selectable. Unless a PDF document's security settings prevents it, PDF files with editable text can be converted back to an editable format such as Microsoft Word. These are text-based PDFs
The PDF format also supports a variety of security features that the creator of a PDF can apply to their document. These security features, such as disabling the copy and pasting of text from the document or disabling printing of the document or other security options, can inhibit or prevent making use of the PDF as electronic source material for qualitative research.
Converting a text-based PDF to a Microsoft Word document
Many PDF Viewers, such as Acrobat Reader, offer an option to save the text of the PDF document to a Plain Text file if the permissions of the PDF file allow it. In Acrobat Reader, select "Save as text..." from the "File" menu. If you just need to convert the PDF to a text file, using "Save as text..." is the fastest way to do so.
A number of commercial and free tools exist that will convert a text-based PDF file to a Microsoft Word (or OpenOffice) document. Once converted to Microsoft Word, for example, the source materials can be saved as .DOCX (Microsort Word 2007 and above) or .ODT (Open Office Writer) and opened in HyperRESEARCH 3.5 and later OR Rich Text Format (RTF) and opened in HyperRESEARCH 3.0 and later versions, OR saved as Text and opened in any version of HyperRESEARCH.
- Nuance (http:/www.nuance.com/) makes a commercial PDF conversion tool called PDF Converter. Details can be found here. We recommend a commercial tool if you have a lot of PDF files you need to convert and want them converted in a timely fashion.
- NitroPDF Software (http://www.nitropdf.com/) provides a free web-based service for converting PDFs to either Microsoft Word "doc" format or to "rtf" format (We recommend converting to .doc and then using Word to save the .doc as a .rtf file). This service can be found here. This service is very popular and the turn-around time (the time from when you submit your file to when it shows up in your email) can be hours. This service is useful when you have a few PDF files that are not time sensitive to convert. NitroPDF also makes commercial conversion software for higher volume with more conversion options. Details can be found here.
There are other PDF to Microsoft Word conversion tools available, both free and commercial. Enter "PDF to Word Converter" into your favorite search engine to see many of them.
Converting an image-based PDF to a Microsoft Word document
The process to convert an image-based PDF is similar to that of a text-based PDF. In these PDF documents, the conversion tool must look at each character of text as an image and, by matching the shape of the character, convert it to a typable character. This technology is called Optical Character Recognition (OCR). OCR capability is generally only found in commercial PDF conversion tools.
Once a PDF file has been converted to Microsoft Word or OpenOffice, you can then save it as .DOCX (Microsort Word 2007 and above) or .ODT (Open Office Writer) and opened in HyperRESEARCH 3.5 and later OR Rich Text Format (RTF) and opened in HyperRESEARCH 3.0 and later versions, OR saved as Text and opened in any version of HyperRESEARCH.
Other conversion options
Other options exist for PDF conversion. We've found the web site, http://www.online-convert.com/, to provide a very wide array fo free conversion options. PDF's can also be converted to a series of image files (each page an image) and coded as images. This is potentially useful when the PDF is a scanned image or exact preservation of the original colors, layout, and styles in a complex PDF is required.