3 reasons to consider scan-to-text now
October 26, 2009 by Sam NarisiPosted in: In this week's e-newsletter, Latest News & Views, Solutions
In the past decade, Optical Character Recognition has gone from being an expensive curiosity to a no-brainer, something included in the box with every ink jet multifunctional.
These programs, which convert scanned documents back into editable, searchable text, have gotten steadily better and faster over the years, so that now many pages can be converted into the original text with near total accuracy, even with entry-level software.
And though the basic product has been pretty stable over the last five years or so, there have been a number of advances in Optical Character Recognition (OCR) technology that make it even more useful than ever.
We talked with Scott Thompson, senior product marketing manager at ABBYY, the maker of FineReader, one of the leading OCR programs on the market. Here are some of the new features his company has come up with recently, things that may help you create smooth workflow available at the touch of a button.
1. HTML scanning
OCR programs have been able to convert scanned input into PDF, text, and Word formats for a long time. One critical advance is conversion into formatted HTML files, the kind that make attractive Web pages out of product sheets, price lists or annual reports.
Automating this process is a big plus for companies who regularly post documents onto either intranet or internet sites. The ability to capture a printed document and make it available online is a major workflow bonus for anyone who has had to program HMTL by hand. Sure, you can post a PDF file online, but PDFs are slow to open and hard to update.
2. Analyzing document structure
ABBYY is pushing not just to understand individual letters and pages, but whole documents as well. Using a technology called Adaptive Document Recognition Technology (ADRT), the professional version of its software can break down a page and recognize distinct elements, including headlines, headers and footers, images, tables, and page numbers. In this way, it preserves the digitized text without the operator having to spend a lot of time separating out elements and editing in great detail the resulting document.
This technology starts to overcome one of the biggest traditional problems with OCR. It works great with straightforward text documents but has problems handling the complications of a laid-out, compartmentalized page. Now the program can approximate the original layout in Word, giving a digital file that looks like the original and is editable and searchable.
3. Indexing by touch
ABBYY has developed a technology that simplifies one of the most labor-intensive tasks in managing documents, namely, indexing documents by specific data types. Think of the issue of scanning invoice for accounts payable. Ideally, you’d like to capture the data, the invoice number, the name/address of the invoicing party, the amount invoiced, and so on, so you will have multiple ways of looking up key data. For example, an accountant or sales manager can call up all invoices between March and June from Company X.
To do that normally, you’d have to have someone typing in the index data before after and/or during the scanning process. That’s because no two companies use the same invoice format, so it is difficult to pinpoint where, for example, the invoice number would be located on the page. But manual entry can be slow and lead to all kinds of clerical errors. The TouchTo program lets you scan in a document, bring it up on a touchscreen (available on a new model from Fujitsu), and identify specific fields by touch. So, the digitize invoice number and/or date can be tagged directly to the scanned document in a few touches.
Another potential plus is portability. Using a network-based scanner with a touchscreen, the operator can bring the scanning to the documents, rather than the documents to the scanner. And this means an employee on the spot in a remote office can do the indexing, rather than a back-office operation. In any case, ABBYY estimates that the touchscreen feature can make anyone three times more efficient than traditional keyboard-based indexing.
DocuCrunch.com delivers the latest IT and Imaging news once a week to the inboxes of over 200,000 IT and Imaging professionals.
Click here to sign up and start your FREE subscription to DocuCrunch!
Tags: ABBYY, FineReader, OCR, Optical Character Recognition
