List of optical character recognition software
Encyclopedia
An OCR SDK is a software development kit
for adding optical character recognition
capabilities to forms processing applications, document imaging management systems, e-discovery systems and records management solutions.
In order to avoid the difficulties of incorporating OCR technology, some OCR SDKs contain a high number of APIs, support multiple operating system
s and programming language
s.
Here is a non-exhaustive comparison of optical character recognition software:
Software development kit
A software development kit is typically a set of software development tools that allows for the creation of applications for a certain software package, software framework, hardware platform, computer system, video game console, operating system, or similar platform.It may be something as simple...
for adding optical character recognition
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
capabilities to forms processing applications, document imaging management systems, e-discovery systems and records management solutions.
In order to avoid the difficulties of incorporating OCR technology, some OCR SDKs contain a high number of APIs, support multiple operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s and programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s.
Here is a non-exhaustive comparison of optical character recognition software:
Name | Founded year | Latest stable version | Release year | License | Online | Windows Microsoft Windows Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal... |
Mac OS X Mac OS X Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems... |
Linux Linux Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds... |
BSD Berkeley Software Distribution Berkeley Software Distribution is a Unix operating system derivative developed and distributed by the Computer Systems Research Group of the University of California, Berkeley, from 1977 to 1995... |
Programming language | SDK? | Languages | Fonts | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ABBYY FineReader | 1989 | 11 | 2011 | C/C++ | 186 | ABBYY also supplies SDKs for embedded or mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac. | ||||||||
AnyDoc Software AnyDoc Software AnyDoc Software, founded in 1989 as Microsystems Technology, Inc., is a company based in Tampa, Florida that develops, sells, installs, and supports enterprise content management software which captures data from scanned documents or images into machine-readable text for back-office applications... |
1989 | VBScript | Works with structured, semi-structured, and unstructured documents. | |||||||||||
CuneiForm CuneiForm (software) In computer software, CuneiForm is an OCR tool. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of OCR engine was released under the open source BSD license license at the beginning of April... /OpenOCR |
12 | 2007 | C/C++ | 28 | Any printed font | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure | ||||||||
ExperVision ExperVision ExperVision, Inc is a technology company in California founded in 1987 whose main product is optical character recognition systems. It is now owned by ExperExchange, Inc., but retains the trading name ExperVision.... TypeReader TypeReader Expervision TypeReader is an Optical Character Recognition software application developed by Expervision.TypeReader converts scanned documents into electronic files at speed of 8,000 pages per hour with maximum reliability... & RTK |
1987 | 7.1.170.1125 | 2010 | C/C++ | 17 | 2618 | Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994). The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine PC Magazine PC Magazine is a computer magazine published by Ziff Davis Publishing Holdings Inc. A print edition was published from 1982 to January 2009... but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats." PC Magazine PC Magazine PC Magazine is a computer magazine published by Ziff Davis Publishing Holdings Inc. A print edition was published from 1982 to January 2009... |
|||||||
GOCR GOCR GOCR is a free optical character recognition program, initially written by Jörg Schulenburg. It can be used to convert or scan image files into text files.- Features :... |
0.47 | 2009 | C | |||||||||||
LEADTOOLS | 1990 | 17 | 2010 | various | 56 | Any printed font | Supports Latin, Asian, Arabic, and MICR character sets. For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition. ICR (handwritten text recognition) is supported. | |||||||
Java OCR | Java OCR | 2010 | Uses Java | |||||||||||
Microsoft Office Document Imaging Microsoft Office Document Imaging Microsoft Office Document Imaging is a Microsoft Office application that supports editing documents scanned by Microsoft Office Document Scanning. It was first introduced in Microsoft Office XP and is included in later Office versions including Office 2007. It is no longer available in Office 2010... |
Office 2007 | 2007 | Uses OmniPage | |||||||||||
Microsoft Office OneNote 2007 | 2007 | 2007 | ||||||||||||
Ocrad Ocrad Ocrad is an optical character recognition program, developed as part of the GNU Project. Like all GNU software it is free software, and is licensed under the GNU GPL.... |
0.20 | 2010 | C++ | Latin alphabet | Command line | |||||||||
OCRopus OCRopus OCRopus is a free document analysis and optical character recognition system released under the Apache License, Version 2.0 with a very modular design through the use of plugins... |
0.3.1 | 2008 | C++ and Lua | Pluggable framework which can use Tesseract | ||||||||||
OCRFeeder OCRFeeder OCRFeeder is a free software desktop OCR suite for GNOME. It converts paper documents to digital document files or makes them accessible to visually impaired users.... |
0.7.6 | 2009 | Python | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract Tesseract (software) Tesseract is a free software optical character recognition engine for various operating systems.Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett... or Ocrad Ocrad Ocrad is an optical character recognition program, developed as part of the GNU Project. Like all GNU software it is free software, and is licensed under the GNU GPL.... |
||||||||||
OmniPage OmniPage OmniPage is an optical character recognition application available from Nuance Communications.OmniPage was one of the first OCR programs to run on personal computers.... |
2005 | 18 | 2011 | C/C++/C# | Product of Nuance Communications Nuance Communications Nuance Communications is a multinational computer software technology corporation, headquartered in Burlington, Massachusetts, USA, that provides speech and imaging applications... |
|||||||||
Puma.NET Puma.NET Puma.NET is an open source OCR SDK project for Microsoft Windows platform available under BSD license. The project is oriented on software developers working with Microsoft.NET Framework and is aimed to provided newly developed applications with OCR capabilities. Puma.NET is a wrapper for... |
C# | 28 | Any printed font | .NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... OCR SDK Software development kit A software development kit is typically a set of software development tools that allows for the creation of applications for a certain software package, software framework, hardware platform, computer system, video game console, operating system, or similar platform.It may be something as simple... based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API Application programming interface An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other... for .NET applications |
||||||||||
Readiris Readiris Readiris is optical character recognition software for Microsoft Windows and Mac OS. It is produced by Belgian company Image Recognition Integrated Systems Group S.A. I.R.I.S. Group... |
12 Pro | 2009 | C++ | Product of I.R.I.S. Group I.R.I.S. Group IRIS : Image recognition integrated systems is a computer software technology company that provides text recognition and document management solutions. IRIS is headquartered in Louvain-la-Neuve, in Belgium.-IRIS history:... of Belgium. Asian and Middle Eastern editions. |
||||||||||
ReadSoft ReadSoft ReadSoft is a company that develops, markets and supports software that automates the processing of documents, such as invoices, in different business processes and ERP environments within organizations. ReadSoft was founded by two university students in Lund, Sweden, in 1991, both of which are... |
Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | |||||||||||||
RelayFax RelayFax RelayFax is fax server software for Windows computer systems, produced by Alt-N Technologies.Available end-user interfaces include standard e-mail client, virtual printer, and dedicated software. All these interfaces result in an e-mail message being sent to a dedicated mailbox on a local or... |
Many | Converts fax Fax Fax , sometimes called telecopying, is the telephonic transmission of scanned printed material , normally to a telephone number connected to a printer or other output device... ed pages into editable document formats (doc, PDF, etc...). |
||||||||||||
Scantron Scantron Scantron is an American company based in Eagan, Minnesota, that manufactures and sells machine-readable papers on which students mark answers to academic multiple-choice test questions. To analyze those answers, the machines use image-based data collection software and scanners... |
Cognition | For working with localized interfaces, corresponding language support is required. | ||||||||||||
SimpleOCR SimpleOCR SimpleOCR is a proprietary optical character recognition application developed originally by Cyril Cambien of France under the title WOCAR . It converts black and white scans or TIFF images to editable text files or Microsoft Word documents.Version 3.1, reviewed in PC Magazine in 2004, is the... |
2002 | 3.5 | 2008 | |||||||||||
SmartScore SmartScore SmartScore is a music OCR and scorewriter program, developed, published and distributed by Musitek Corporation based in Ojai, California, . As of March 2010, there are over 35,000 registered users of Musitek software worldwide.... |
For musical scores | |||||||||||||
Tesseract Tesseract (software) Tesseract is a free software optical character recognition engine for various operating systems.Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett... |
3.00 | 2010 | C++, C | 35+ | Created by Hewlett-Packard Hewlett-Packard Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including... ; under further development by Google Google Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program... |
|||||||||
Transym OCR Transym Transym OCR is an optical character recognition engine that has been tested against the internationally recognised ISRI database. TOCR consists of the OCR engine together with a simple viewer program to connect to the engine which will handle both bitmaps and TIFF files. It has been designed... |
3.0 | 2008 | C#, C/C++, VB, VB.NET | 11 | ||||||||||
Zonal OCR Zonal OCR Zonal OCR is the process by which Optical Character Recognition applications "read" specifically zoned text from a scanned image. Many batch document imaging applications allow the end user to identify and draw a "zone" on a sample image to be recognized... |
||||||||||||||
Name | Founded year | Latest stable version | Release year | License | Online | Windows Microsoft Windows Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal... |
Mac OS X Mac OS X Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems... |
Linux Linux Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds... |
BSD Berkeley Software Distribution Berkeley Software Distribution is a Unix operating system derivative developed and distributed by the Computer Systems Research Group of the University of California, Berkeley, from 1977 to 1995... |
Programming language | SDK? | Languages | Fonts | Notes |