Whether it’s auto-extracting information from a scanned receipt for an expense report or translating a foreign language using your phone’s camera, optical character recognition (OCR) technology can seem mesmerizing. And while it seems miraculous that we have computers that can digitize analog text with a degree of accuracy, the reality is that the accuracy we have come to expect falls short of what’s possible. And that’s because, despite the perception of OCR as an extraordinary leap forward, it’s actually pretty old-fashioned and limited, largely because it’s run by an oligopoly that’s holding back further innovation.
What’s New Is Old
OCR’s precursor was invented over 100 years ago in Birmingham, England by the scientist Edmund Edward Fournier d’Albe. Wanting to help blind people “read” text, d’Albe built a device, the Optophone, that used photo sensors to detect black print and convert it into sounds. The sounds could then be translated into words by the visually impaired reader. The devices proved so expensive — and the process of reading so slow — that the potentially-revolutionary Optophone was never commercially viable.
While additional development of text-to-sound continued in the early 20th century, OCR, as we know it today, didn’t get off the ground until the 1970s when inventor and futurist Ray Kurzweil developed an OCR computer program. By 1980, Kurzweil sold to Xerox, who continued to commercialize paper-to-computer text conversion. Since then, very little has changed. You convert a document to an image, then the software tries to match letters against character sets that have been uploaded by a human operator.
And therein lies the problem with OCR as we know it. There are countless variations in document and text types, yet most OCR is built based on a limited set of existing rules that ultimately limit the technology’s true utility. As Morpheus once proclaimed: “Yet their strength and their speed are still based in a world that is built on rules. Because of that, they will never be as strong or as fast as you can be.”
Furthermore, additional innovation in OCR has been stymied by the technology’s gatekeepers, as well as by its few-cents-per-page business model, which has made investing billions in its development about as viable as the Optophone.
But that’s starting to change.
Recently, a new generation of engineers is rebooting OCR in a way that would astonish Edmund Edward Fournier d’Albe. Built using artificial intelligence-based machine learning technologies, these new technologies aren’t limited by the rules-based character matching of existing OCR software. With machine learning, algorithms trained on a significant volume of data learn to think for themselves. Instead of being restricted to a fixed number of character sets, these new OCR programs will accumulate knowledge and learn to recognize any number of characters.
One of the best examples of modern-day OCR is s, the 34-year-old OCR software that was adopted by Google and turned open source in 2006. Since then, the OCR community’s brightest minds have been working to improve the software’s stability, and a dozen years later, Tesseract can process text in 100 languages, including right-to-left languages like Arabic and Hebrew.
Amazon has also released a powerful OCR engine, Textract. Made available through Amazon Web Services in May of this year, the technology already has a reputation as being among the most accurate to date.
These readily-available technologies have certainly, vastly reduced the cost of building an OCR with enhanced quality. Still, they don’t necessarily solve the problems that most OCR users are looking to fix.
The long-standing, intrinsic difficulty of character recognition itself has long blinded us to the reality that simple digitization was never the end goal for using OCR. We don’t use OCR just so we can put analog text into digital formats. What we want is to turn analog text into digital insights. For example, a company might scan hundreds of insurance contracts with the end goal of uncovering its climate-risk exposure. Turning all those paper contracts into digital ones alone is of little more use than the originals.
That is why many are now looking beyond machine learning and implementing another type of artificial intelligence, deep learning. In deep learning, a neural network mimics the functioning of the human brain to ensure algorithms don’t have to rely on historical patterns to determine accuracy — they can do it themselves. The benefit is that, with deep learning, the technology does more than just recognize text — it can derive meaning from it.
With deep-learning-driven OCR, the company scanning insurance contracts gets more than just digital versions of their paper documents. They get instant visibility into the meaning of the text in those documents. And that can unlock billions of dollars worth of insights and saved time.
Adding Insight To Recognition
OCR is finally moving away from just seeing and matching. Driven by deep learning, it’s entering a new phase where it first recognizes scanned text, then makes meaning of it. The competitive edge will be given to the software that provides the most powerful information extraction and highest-quality insights. And since each business category has its own particular document types, structures and considerations, there’s room for multiple companies to succeed based on vertical-specific competencies.
Users of traditional OCR services should reevaluate their current licenses and payment terms. They can also try out free services like Amazon’s Textract or Google’s Tesseract to see the latest advances in OCR and determine if those advances align with their business goals. It will also be important to scope independent providers in the RPA and artificial intelligence space that are making strides for the industry overall.
And in five years, I expect what’s been fairly static for the past 30 — if not 100 — years will be completely unrecognizable.