Sunteți pe pagina 1din 4

Introduction to Programming with LEADTOOLS .NET OCR - Multi-thr...

http://www.codeproject.com/Articles/30341/Introduction-to-Programm...

10,187,527 members (70,084 online)

Sign in

home

articles

quick answers

discussions

features

community

help

Search for articles, questions, tips

Articles Third Party Products Product Showcase Applications

Next

Article Browse Code Stats Alternatives Comments & Discussions (3)

About Article
Introduce you to the key features of the new .NET OCR classes, provide you with a step-by-step approach for creating an OCR application, and provide you with sample code. Type Licence First Posted 22 Oct 2008 27,884 34 times Article

By LEADTOOLS Support, 7 Jan 2013


Tweet 11 1 Like 6

This article is in the Product Showcase section for our sponsors at CodeProject. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.

Views Bookmarked

VS2005 C# VB Win32 W Dev Intermediate Win7 , +

LEADTOOLS is the #1 imaging toolkit in the world and has earned its place on top by consistently delivering imaging components of the highest quality, performance and stability in a format that is programmer friendly. Developers are able to significantly reduce time-to-market for their applications, thereby maximizing productivity and ensuring the greatest possible return on investment. LEADTOOLS has an all new design that greatly simplifies development without sacrificing control. One important enhancement is the set of high level .NET classes available for enabling Optical Character Recognition (OCR) of scanned images. This new architecture is intuitive, flexible and incredibly easy to follow. A programmer can enable image OCR functionality in as little as three lines of code, while maintaining the necessary level of control required by the specific application or workflow. In this article, we will introduce you to the key features of the new .NET OCR classes, provide you with a step-by-step approach for creating an OCR application, and provide you with sample code. Feel free to try it out for yourself by downloading a fully functional evaluation SDK from the links provided below.

LEADTOOLS provides methods to: Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats. Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations. Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code. Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish. Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers. Set accuracy thresholds prior to recognition to control the accuracy of recognition. Learn, save, and load character recognition data for similar documents. The software learns as a result of normal recognition, and acquires additional information by using the OCRs text verification system. Recognize text from 5 to 72 points in virtually any typeface. Increase recognition accuracy with built-in and user dictionaries. Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly. Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.

1 di 5

05/11/2013 12.37

Introduction to Programming with LEADTOOLS .NET OCR - Multi-thr...

http://www.codeproject.com/Articles/30341/Introduction-to-Programm...

Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.

Top News
6 dirty secrets of the IT industry

The LEADTOOLS OCR .NET class library comes in Win32 and x64 editions that can support development of software applications for any of the following environments: Windows 8 (32 and 64-bit editions) Windows 7 (32 and 64-bit editions) Windows 2008 (32 and 64-bit editions) Windows Vista (32 and 64-bit editions) Windows XP (32 and 64-bit editions) Windows 2000 Samples provided will work in Visual Studio 2005 or Visual Studio 2008.

Get the Insider News free each morning.

Related Videos

LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification. The following is an outline of the general steps involved in recognizing one or more pages. For a more detailed explanation, download the LEADTOOLS evaluation and refer to the Programming with LEADTOOLS .NET OCR topic in the .NET help: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Select the engine type you wish to use and create an instance of the IOcrEngine interface. Startup the OCR Engine with the IOcrEngine.Startup method. Establish an OCR document with one or more pages. Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) Optional. Set the active languages to be used by the OCR engine. (The default is English). Optional. Set the spell checking language. (The default is English). Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. Recognize. Save recognition results, if desired. The results can be saved to either a file or to memory. Shut down the OCR engine when finished.

Related Articles
Matrix Multiplication in C# Creating animations with Dundas Chart for ASP.NET Smarter Data Labels with Dundas Chart SmartLabels Understanding Chart Areas with Dundas Chart for .NET A Formatted Text Box Using screensavers inside the Windows Media Player Making Sense of Geographic Data with Dundas Map and AJAX Handling connection notification between a desktop machine and Windows CE based devices Create data-driven applications with the Hera Application Framework Towards the self-documenting database: extended properties Accessibility audit vs. accessibility testing Digital Signatures and PDF Documents Color Scale Filter WMP Power Hour APP Merge Landscape and Portrait PDFs using ASP.NET

Where steps 4, 5, 6, and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page. You can start using LEADTOOLS for .NET OCR in your application by adding a reference to the Leadtools.Forms.Ocr.dll assembly into your .NET application. This assembly contains the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR. Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.

The following example shows how to perform the above steps in code:

How to conduct an SMS survey using a cell phone connected SMS gateway and MS Access Using Barcodes in Documents Best Practices
Collapse | Copy Code

Visual Basic
'***Step1:Selecttheenginetypeand 'createaninstanceoftheIOcrEngineinterface. 'WewillusetheLEADTOOLSOCRPlusengineanduseitinthesameprocess DimocrEngineAsIOcrEngine=_ OcrEngineManager.CreateEngine(OcrEngineType.Plus,False) '***Step2:Startuptheengine. 'Usethedefaultparameters ocrEngine.Startup(Nothing,Nothing,Nothing)

How to Retrieve EMC Centera Cluster/Pool Capabilities Embedding IronPython in WPF Using C# "Hey! Is That My Car? How to Sharpen a QuickBird Satellite Image Using DotImage" Integrate your SharePoint environment into the open standards-based WebSphere Portal platform using the Visual Studio IDE

'***Step3:CreateanOCRdocumentwithoneormorepages. DimocrDocumentAsIOcrDocument=_

2 di 5

05/11/2013 12.37

Introduction to Programming with LEADTOOLS .NET OCR - Multi-thr...

http://www.codeproject.com/Articles/30341/Introduction-to-Programm...

ocrEngine.DocumentManager.CreateDocument() 'AddallthepagesofamultipageTIFimagetothedocument ocrDocument.Pages.AddPages("C:\Images\Ocr.tif",1,1,Nothing) '***Step4:Establishzonesonthepage(s),eithermanuallyorautomatically 'Automaticzoning ocrDocument.Pages.AutoZone(Nothing) '***Step5:(Optional)SettheactivelanguagestobeusedbytheOCRengine 'EnableEnglishandGermanlanguages ocrEngine.LanguageManager.EnableLanguages(NewString(){"en","de"}) '***Step6:(Optional)Setthespellcheckinglanguage 'EnablethespellcheckingsystemandsetEnglishasthespelllanguage ocrEngine.SpellCheckManager.Enabled=True ocrEngine.SpellCheckManager.SpellLanguage="en" '***Step7:(Optional)Setanyspecialrecognitionmoduleoptions 'ChangethefillmethodforthefirstzoneinthefirstpagetobeOmr DimocrZoneAsOcrZone=ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod=OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0)=ocrZone '***Step8:Recognize ocrDocument.Pages.Recognize(Nothing) '***Step9:Saverecognitionresults 'SavetheresultstoaPDFfile ocrDocument.Save("C:\\Images\Document.pdf",OcrDocumentFormat.PdfA,Nothing) ocrDocument.Dispose() '***Step10:ShutdowntheOCRenginewhenfinished ocrEngine.Shutdown() ocrEngine.Dispose() The Essential Guide to Mobile App Testing

Related Research

C#
Collapse | Copy Code

//***Step1:Selecttheenginetypeand //createaninstanceoftheIOcrEngineinterface. //WewillusetheLEADTOOLSOCRPlusengineanduseitinthesameprocess IOcrEngineocrEngine=OcrEngineManager.CreateEngine(OcrEngineType.Plus,false); //***Step2:Startuptheengine. //Usethedefaultparameters ocrEngine.Startup(null,null,null); //***Step3:CreateanOCRdocumentwithoneormorepages. IOcrDocumentocrDocument=ocrEngine.DocumentManager.CreateDocument(); //AddallthepagesofamultipageTIFimagetothedocument ocrDocument.Pages.AddPages(@"C:\Images\Ocr.tif",1,1,null); //***Step4:Establishzonesonthepage(s),eithermanuallyorautomatically //Automaticzoning ocrDocument.Pages.AutoZone(null); //***Step5:(Optional)SettheactivelanguagestobeusedbytheOCRengine //EnableEnglishandGermanlanguages ocrEngine.LanguageManager.EnableLanguages(newstring[]{"en","de"}); //***Step6:(Optional)Setthespellcheckinglanguage //EnablethespellcheckingsystemandsetEnglishasthespelllanguage ocrEngine.SpellCheckManager.Enabled=true; ocrEngine.SpellCheckManager.SpellLanguage="en"; //***Step7:(Optional)Setanyspecialrecognitionmoduleoptions //Changethefillmethodforthefirstzoneinthefirstpagetobedefault OcrZoneocrZone=ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod=OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0]=ocrZone; //***Step8:Recognize ocrDocument.Pages.Recognize(null); //***Step9:Saverecognitionresults //SavetheresultstoaPDFfile ocrDocument.Save(@"C:\Images\Document.pdf",OcrDocumentFormat.PdfA,null); ocrDocument.Dispose(); //***Step10:ShutdowntheOCRenginewhenfinished ocrEngine.Shutdown(); ocrEngine.Dispose();

Finally, the following sample shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:

Visual Basic
Collapse | Copy Code

'Createtheengineinstance UsingocrEngineAsIOcrEngine=_

3 di 5

05/11/2013 12.37

Introduction to Programming with LEADTOOLS .NET OCR - Multi-thr...

http://www.codeproject.com/Articles/30341/Introduction-to-Programm...

OcrEngineManager.CreateEngine(OcrEngineType.Plus,False) 'Startuptheengine ocrEngine.Startup(Nothing,Nothing,Nothing) 'ConvertthemultipageTIFimagetoaPDFdocument ocrEngine.AutoRecognizeManager.Run(_ "C:\Images\Ocr.tif",_ "C:\Images\Document.pdf",_ Nothing,_ OcrDocumentFormat.PdfA,_ Nothing) EndUsing

C#
Collapse | Copy Code

//Createtheengineinstance using(IOcrEngineocrEngine= OcrEngineManager.CreateEngine(OcrEngineType.Plus,false)) { //Startuptheengine ocrEngine.Startup(null,null,null); //ConvertthemultipageTIFimagetoaPDFdocument ocrEngine.AutoRecognizeManager.Run( @"C:\Images\Ocr.tif", @"C:\Images\Document.pdf", null, OcrDocumentFormat.PdfA, null); }

LEADTOOLS provides developers with access to the worlds best performing and most stable imaging libraries in an easy-to-use, high-level programming interface enabling rapid development of businesscritical applications. The new design will simplify the development effort, without sacrificing the level of control dictated by the specific application. As demonstrated by the samples above, LEADs new high level OCR interface and design provide a logical and flexible approach to converting scanned images to editable and searchable documents. Classes are provided to allow you to control the entire process, or you can simply start the engine and convert any of the 150+ supported image formats to all common document formats with a single method call. OCR is one of the many things LEADTOOLS has to offer. For more information be sure to visit our home page and download a free fully functioning evaluation SDK.

LEADTOOLS provides several toolkits, add-ons and cost-saving product bundles that provide its awardwinning OCR technology. We recommend either Recognition Imaging or Document Imaging Suite, which include the Document Imaging SDK and all the required add-ons for OCR and searchable PDF output. For more options, please contact our sales department. Or if you want to try it before you make a purchasing decision, you can download the free 60 day fully functional evaluation for LEADTOOLS.

Need help getting this sample up and going? Contact our support team for free evaluation support!

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Article Top

4 di 5

05/11/2013 12.37

S-ar putea să vă placă și