Does the type of PDF created matter? Yes, it does. When i.comes to converting PDFs, the nature of the PDF does matter. Here's a behind-the-scenes look at the types of PDFs.
Native PDFs: As noted, Native PDFs are ones that are generated from an electronic
source - such as a Word document, .computer generated report, or spreadsheet data. These have an internal structure that can be read and interpreted.
These "generated" PDF documents, thus, already contain characters that have an electronic character designation. In most cases, the PDF creation software will take information from the structure of the Word document - such as character information, word placement information, etc. - and retain these items in the created PDF, which is why you can word search a text-based document. Conversion from such a PDF can rely on these electronic character designations and provide reliable output.
There are a variety of PDF converters available on the market that will take the PDF data from native PDFs and move it into MS Word, Excel and other formats.
investintech.com/prod_a2e.htm Able2Extract by Investintech is one such example of a PDF converter that can handle native PDFs.
Scanned PDFs: Because not all documents needing to be transmitted are in electronic form yet, conversion of the physical paper document into the electronic form still needs to be done. This is where a scanned PDF typ.comes into play. = Background
It would be inefficient to re-type documents manually into electronic forms and then convert them into PDFs. The solution to this is to scan them, using an electronic scanning device. Like the PDF writer, a scanner "digitally captures" the image of the physical document into an electronic form. A scanner, doesn't reconstruct the character of every word when it creates this scanned image, the scanner takes a "snap-shot" of the document. This snap-shot is then turned into a PDF by using software integrated with the scanner. The result is a scanned PDF document.
However, even though the image may be a document that contains words, th.computer recognizes those words only as "images" that it displays without any information structure behind it. If you try to text search the document, the PDF search engine won't yield any results.
= Solution
Converting a scanned PDF into an editable format, OCR (Optical Character Recognition) software is required to analyze the "image" of each character and match it to an electronic character-based file. Because of this, it is much more difficult to determine that the character "recognized" by the OCR software is, indeed, the character on the scanned document.
One should note, that the quality of OCR output is affected by matters such as poor image quality of the scanned document, mixture of fonts used in the scanned documents, and italicized and underlined fonts, which may blur the quality and shape of individual characters.
Finding a PDF Converter that handles image PDFs is more difficult. The
investintech.com/prod_a2e_pro.htm professional version of Able2Extract can handle image PDFs, as well as native PDFs.
Bio:
investintech.com Investintech.com Inc., is the developer and publisher of powerful PDF creation and extraction software products, including
investintech.com/prod_a2e.htm Able2Extract and the
investintech.com/products/sonic/prod_sonic.htm Sonic PDF Creator.