DocuCrunch.com » 3 reasons to consider scan-to-text now

3 reasons to consider scan-to-text now

October 26, 2009 by Sam Narisi
Posted in: In this week's e-newsletter, Latest News & Views, Solutions

In the past decade, Optical Character Recognition has gone from being an expensive curiosity to a no-brainer, something included in the box with every ink jet multifunctional.

These programs, which convert scanned documents back into editable, searchable text, have gotten steadily better and faster over the years, so that now many pages can be converted into the original text with near total accuracy, even with entry-level software.

And though the basic product has been pretty stable over the last five years or so, there have been a number of advances in Optical Character Recognition (OCR) technology that make it even more useful than ever.

We talked with Scott Thompson, senior product marketing manager at ABBYY, the maker of FineReader, one of the leading OCR programs on the market. Here are some of the new features his company has come up with recently, things that may help you create smooth workflow available at the touch of a button.

1. HTML scanning

OCR programs have been able to convert scanned input into PDF, text, and Word formats for a long time. One critical advance is conversion into formatted HTML files, the kind that make attractive Web pages out of product sheets, price lists or annual reports.

Automating this process is a big plus for companies who regularly post documents onto either intranet or internet sites. The ability to capture a printed document and make it available online is a major workflow bonus for anyone who has had to program HMTL by hand. Sure, you can post a PDF file online, but PDFs are slow to open and hard to update.

2. Analyzing document structure

ABBYY is pushing not just to understand individual letters and pages, but whole documents as well. Using a technology called Adaptive Document Recognition Technology (ADRT), the professional version of its software can break down a page and recognize distinct elements, including headlines, headers and footers, images, tables, and page numbers. In this way, it preserves the digitized text without the operator having to spend a lot of time separating out elements and editing in great detail the resulting document.

This technology starts to overcome one of the biggest traditional problems with OCR. It works great with straightforward text documents but has problems handling the complications of a laid-out, compartmentalized page. Now the program can approximate the original layout in Word, giving a digital file that looks like the original and is editable and searchable.

3. Indexing by touch

ABBYY has developed a technology that simplifies one of the most labor-intensive tasks in managing documents, namely, indexing documents by specific data types. Think of the issue of scanning invoice for accounts payable. Ideally, you’d like to capture the data, the invoice number, the name/address of the invoicing party, the amount invoiced, and so on, so you will have multiple ways of looking up key data. For example, an accountant or sales manager can call up all invoices between March and June from Company X.

To do that normally, you’d have to have someone typing in the index data before after and/or during the scanning process. That’s because no two companies use the same invoice format, so it is difficult to pinpoint where, for example, the invoice number would be located on the page. But manual entry can be slow and lead to all kinds of clerical errors. The TouchTo program lets you scan in a document, bring it up on a touchscreen (available on a new model from Fujitsu), and identify specific fields by touch. So, the digitize invoice number and/or date can be tagged directly to the scanned document in a few touches.

Another potential plus is portability. Using a network-based scanner with a touchscreen, the operator can bring the scanning to the documents, rather than the documents to the scanner. And this means an employee on the spot in a remote office can do the indexing, rather than a back-office operation. In any case, ABBYY estimates that the touchscreen feature can make anyone three times more efficient than traditional keyboard-based indexing.

  • Share/Bookmark

DocuCrunch.com delivers the latest IT and Imaging news once a week to the inboxes of over 200,000 IT and Imaging professionals.

Click here to sign up and start your FREE subscription to DocuCrunch!

Tags: , , ,


Comments are closed.


advertisement


Whitepapers

  • How to Select a Web Host
    November 27, 2011 by marketing

    Creating a new website?  Not sure how to choose from among all the options?  Need shared hosting, small business hosting, or VPS hosting?  Lots of email accounts? 5-star reliability rating? Fortunately, there’s information available to help. The Best Web Hosts is great resource that will help you select the best web hosting company. It features reviews, rankings, and definitions that can help make your job of selecting a new web host more effective.

  • SMART Steps Towards Workload Automation
    January 19, 2010 by Luke Marchie

    Consolidating job scheduling into a single, comprehensive workload automation solution is a critical first step to effective Workload Automation (WLA).

    Download the free whitepaper here! More…

  • Identifying and Thwarting Malicious Intrusions
    January 12, 2010 by Luke Marchie

    Identifying and Thwarting Malicious Intrusions

    The phenomenal growth in social media has opened the door for all new malicious intrusions from gangs of cyber criminals. Utilizing the trusted relationships in social networking and benefiting from immature security and content controls, hackers are seeing increased performance in their attacks.

    Download the free whitepaper here More…

  • The Security Issues with Web 2.0
    January 12, 2010 by Luke Marchie

    The collaborative benefits of Web 2.0 technologies have fueled rapid growth in online consumer markets and now are being adopted by businesses worldwide. With these technologies come new types of attack vectors.

    Download the free whitepaper here

    More…

  • Network-Critical Physical Infrastructure: Optimizing Business Value
    December 29, 2009 by Luke Marchie

    To stay competitive in today’s rapidly changing business world, companies must update the way they view the value of their investment in Network-Critical Physical Infrastructure (NCPI). No longer are simple availability and upfront costs sufficient to make adequate business decisions. Agility, or business flexibility, and low total cost of ownership have become equally important to companies that will succeed in a global, ever-changing marketplace.

    Download the free whitepaper here! More…

  • The New World of eCrime: Targeted Brand Attacks and How to Combat Them
    December 26, 2009 by Luke Marchie

    Nothing is more valuable to a business than its reputation. That is why brand attacks, which leverage a company’s valuable brand for nefarious purposes, must be battled on every possible front. Brand attacks are the new form of eCrime, and they’re being launched with new and rapidly evolving exploits, including phishing and—most recently—malware.

    Download the free whitepaper here! More…

  • DDoS: The Mother of All Cyber Threats
    December 16, 2009 by Luke Marchie

    DDoS: The Mother of All Cyber Threats

    Don’t wait until your business is targeted. A Forrester Consulting study commissioned by VeriSign revealed that nearly 75 percent of the 400 study respondents have experienced one or more DDoS attacks in the past year. Yet, most e-commerce businesses are not prepared for a large-scale DDoS attack. Could your business afford three or more hours of downtime? Avoid that revenue loss by registering for this free white paper

    Click here to download the free white paper More…

  • View more offers


    Quick Vote

    • Does your office have a color printer or copier?

      • Yes (75%, 3 Votes)
      • We're looking into buying one (25%, 1 Votes)
      • No (0%, 0 Votes)

      Total Voters: 4

      Loading ... Loading ...

  • advertisement