Connect: Facebook Twitter gprofile_button-64 linkedin_in_icon_55px

Researchware, Inc.

Simply Powerful Tools for Qualitative Research

File Conversions: PDF to Microsoft Word

A bit about PDF files

Portable Document Format (PDF) files, a file format originally created by Adobe, come in two types.

Some PDF's are created as scanned images of a document. When opened in a PDF viewer, such as Adobe Acrobat Reader, the text is not selectable. As you move the cursor over the text, the cursor will be displayed as a crosshair cursor.

crosshairsThis indicates that the content of the PDF is only selectable as an image. In effect, each page of the PDF is a picture of a page and the text can only be extracted by Optical Character Recognition (OCR). These are image-based PDFs.

The other type of PDF is one that has been created as the output of an application such as Microsoft Word or other page layout or word processing software. These PDFs were typically "printed" to PDF format and contain data about the text in the PDF. When the PDF is viewed in a PDF viewer the text of the document may be selected and copied. As you move the cursor over the text, the cursor will be displayed as an i-beam cursor.

i-beam

This indicates that the text is selectable. Unless a PDF document's security settings prevents it, PDF files with editable text can be converted back to an editable format such as Microsoft Word. These are text-based PDFs

The PDF format also supports a variety of security features that the creator of a PDF can apply to their document. These security features, such as disabling the copy and pasting of text from the document or disabling printing of the document or other security options, can inhibit or prevent making use of the PDF as electronic source material for qualitative research.

Converting a text-based PDF to a Microsoft Word document

Many PDF Viewers, such as Acrobat Reader, offer an option to save the text of the PDF document to a Plain Text file if the permissions of the PDF file allow it. In Acrobat Reader, select "Save as text..." from the "File" menu. If you just need to convert the PDF to a text file, using "Save as text..." is the fastest way to do so.

A number of commercial and free tools exist that will convert a text-based PDF file to a Microsoft Word (or OpenOffice) document. Once converted to Microsoft Word, for example, the source materials can be saved as .DOCX (Microsort Word 2007 and above) or .ODT (Open Office Writer) and opened in HyperRESEARCH 3.5 and later OR Rich Text Format (RTF) and opened in HyperRESEARCH 3.0 and later versions, OR saved as Text and opened in any version of HyperRESEARCH.

  • Nuance (http:/www.nuance.com/) makes a commercial PDF conversion tool called PDF Converter. Details can be found here. We recommend a commercial tool if you have a lot of PDF files you need to convert and want them converted in a timely fashion.
  • NitroPDF Software (http://www.nitropdf.com/) provides a free web-based service for converting PDFs to either Microsoft Word "doc" format or to "rtf" format (We recommend converting to .doc and then using Word to save the .doc as a .rtf file). This service can be found here. This service is very popular and the turn-around time (the time from when you submit your file to when it shows up in your email) can be hours. This service is useful when you have a few PDF files that are not time sensitive to convert.  NitroPDF also makes commercial conversion software for higher volume with more conversion options. Details can be found here.

There are other PDF to Microsoft Word conversion tools available, both free and commercial. Enter "PDF to Word Converter" into your favorite search engine to see many of them.

Converting an image-based PDF to a Microsoft Word document

The process to convert an image-based PDF is similar to that of a text-based PDF. In these PDF documents, the conversion tool must look at each character of text as an image and, by matching the shape of the character, convert it to a typable character. This technology is called Optical Character Recognition (OCR). OCR capability is generally only found in commercial PDF conversion tools.

From Nuance (http://www.nuance.com/), only their PDF Converter Pro version supports OCR. Details can be found here.

From NitroPDF Software (http://www.nitropdf.com/) only their NitroPDF Professional version supports OCR. Details can be found here.

Once a PDF file has been converted to Microsoft Word or OpenOffice, you can then save it as .DOCX (Microsort Word 2007 and above) or .ODT (Open Office Writer) and opened in HyperRESEARCH 3.5 and later OR Rich Text Format (RTF) and opened in HyperRESEARCH 3.0 and later versions, OR saved as Text and opened in any version of HyperRESEARCH.

Other conversion options

Other options exist for PDF conversion. We've found the web site, http://www.online-convert.com/, to provide a very wide array fo free conversion options. PDF's can also be converted to a series of image files (each page an image) and coded as images. This is potentially useful when the PDF is a scanned image or exact preservation of the original colors, layout, and styles in a complex PDF is required.

Keep In Touch!



  1. Sign up for Researchware emails and be the first to learn about exclusive offers, product upgrades, and promotions.

  2. Please enter a valid email address.

Do You Know...

... that you can change the length of the segment that HyperTRANSCRIBE plays? Choose the Preferences menu item, and change the number of "Seconds to Play When Advancing". From now on, HyperTRANSCRIBE will play a segment of the new length when you press Shift-Space or when you press Tab to advance to the next segment.

Adjusting the segment length can help make your transcribing process more efficient. For example, if you often need to replay the segment before advancing to the next, try making the segment length shorter so that you can transcribe an entire segment in one pass. On the other hand, if you find it very easy to finish transcribing each segment without replaying it, making the segment length longer may improve your rate.
Learn More About HyperTRANSCRIBE