Documente Academic
Documente Profesional
Documente Cultură
Lab Exercises
Copyright IBM Corporation, 2011 US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM Software
Contents
LAB 1 ACCOUNTS PAYABLE INVOICE PROCESSING ............................................................................................. 5 1.1 OVERVIEW .............................................................................................................................................. 5 1.2 PROCESSING INVOICES USING THE TASKMASTER APT THICK CLIENT ........................................................... 5 1.3 EXPLORING THE TASKMASTER WEB INTERFACE ........................................................................................ 29 QUICK AND EASY DOCUMENT SETUP WITH TASKMASTER FLEX .......................................................... 40 2.1 OVERVIEW ............................................................................................................................................ 40 2.2 USING TASKMASTER FLEX MANAGER....................................................................................................... 40 2.3 EXPLORING THE TASKMASTER FLEX APPLICATION..................................................................................... 47 REPORTING WITH IBM DATACAP TASKMASTER RV2 ............................................................................... 58 3.1 OVERVIEW ............................................................................................................................................ 58 3.2 VIEWING REPORTS ................................................................................................................................ 58 3.3 CREATING A FILTER FOR A REPORT ......................................................................................................... 63 3.4 CREATING A DASHBOARD OF REPORTS .................................................................................................... 65 DATACAP STUDIO DEEP DIVE ...................................................................................................................... 72 4.1 OVERVIEW OF DATACAP STUDIO ............................................................................................................. 72 4.2 RULESETS, RULES, FUNCTIONS, AND ACTIONS ......................................................................................... 74 4.3 LAB SCENARIO ...................................................................................................................................... 76 4.4 STARTING THE DATACAP STUDIO APPLICATION WIZARD ............................................................................ 76 4.5 SETTING UP THE DOCUMENT HIERARCHY ................................................................................................ 88 4.6 CONFIGURING SCANNING, PAGE IDENTIFICATION, AND FIELD EXTRACTION ................................................ 100 4.7 CREATING THE DOCUMENTS AND FIELDS................................................................................................ 126 4.8 TESTING THE CONFIGURATION .............................................................................................................. 135 4.9 CONFIGURING AND TESTING VISUAL VERIFICATION ................................................................................. 141 4.10 CREATING A SIMPLE VALIDATION RULE .................................................................................................. 159 IBM DATACAP NENU MONITORING............................................................................................................ 168 5.1 OVERVIEW .......................................................................................................................................... 168 5.2 CREATING A NENU CONFIGURATION USING DATACAP STUDIO ................................................................ 168 5.3 TESTING NENU .................................................................................................................................. 183 INTEGRATING IBM DATACAP TASKMASTER WITH THE FILENET P8 ECM REPOSITORY................... 187 6.1 OVERVIEW .......................................................................................................................................... 187 6.2 UPDATING THE APPLICATION CONFIGURATION IN DATACAP STUDIO .......................................................... 187 6.3 TESTING THE UPDATED APPLICATION...................................................................................................... 198 IBM DATACAP TASKMASTER AND EMAIL INTEGRATION....................................................................... 208 7.1 OVERVIEW OF DATACAP CONNECTOR FOR EMAIL AND ELECTRONIC DOCUMENTS ...................................... 208 7.2 LAB OVERVIEW.................................................................................................................................... 208 7.3 GETTING STARTED .............................................................................................................................. 208 BATCH SPLITTING........................................................................................................................................ 234 8.1 LAB OVERVIEW.................................................................................................................................... 234 8.2 THE SAMPLE CHECK PROCESSING APPLICATION .................................................................................... 234 8.3 UPDATING THE APPLICATION ................................................................................................................. 239 8.4 TESTING THE UPDATED APPLICATION...................................................................................................... 248 NOTICES ........................................................................................................................................................ 252 TRADEMARKS AND COPYRIGHTS ............................................................................................................. 254
LAB 2
LAB 3
LAB 4
LAB 5
LAB 6
LAB 7
LAB 8
APPENDIX A. APPENDIX B.
Contents
Page 3
IBM Software
Overview
Welcome to the IBM Datacap Taskmaster Proof of Technology! IBM Datacap Taskmaster is a powerful tool for capturing content (regardless of the content type), extracting important indexing and application data, and then storing both the content and data into various backend systems. It helps you eliminate labor-intensive document preparation and manual data entry, thus expediting the Capture process, as well as improving data accuracy. IBM Datacap Taskmaster can help by:
Capturing content from a variety of input points, including scanners, multifunction devices (MFDs), fax servers, email systems, and file systems Supporting remote users for both scanning and indexing/verification without having to deploy distributed servers Extracting machine print, handprint, checkbox and bar code data with multiple recognition engines (OCR/ICR, OMR) to reduce manual data entry Applying advanced validations, such as database lookups, math calculations, and check sums, to assure accurate data Enabling design and deployment of complex capture applications without expensive programming Integrating seamlessly with a variety of IBM and non-IBM ECM repositories, including IBM Content Manager Enterprise Edition, IBM Filenet Content Manager, IBM Filenet Image Services, Microsoft Sharepoint, OpenText LiveLink, and EMC Documentum Providing all this capability as a web service that can integrate with your line of business applications to provide Capture capability at any point in your enterprises process
Introduction
In this Proof of Technology (PoT), you will have the opportunity to explore many different aspects of the IBM Datacap Taskmaster capture solution. This is a hands on exploration and there are several labs for you to perform. Some are geared towards end users and focus on the standard Taskmaster end user interfaces. Other labs are designed with administrators and developers in mind. Those labs delve into the deep technical details of IBM Datacap Taskmaster. The labs do not assume any previous experience with IBM Datacap Taskmaster, nor with any other Capture solutions. Additionally, there isnt any specific sequence to the labs. You may choose to do the labs that are of the most interest to you, and in any sequence. Being able to do one lab is not predicated on the successful completion of any other lab. Your instructor will outline which labs are most applicable to you, based upon the role you play in your organization.
Page 4
IBM Software
Lab 1
1.1
Overview
Virtually every company needs to be able to process invoices. Most companies have some form of A/P software that is used to track, pay, report on, control payment of, and archive inbound invoices. Information needs to be read off the invoice and entered into those systems. Manual processing is costly and error prone. Automation to provide straight through processing is the goal. However, invoices can come from dozens, if not hundreds of different sources. And two invoices from the same company can have critical information located in different areas if the number of line items differs from invoice to invoice. Taskmasters APT solution is an out of the box, ready to use invoice capture application. Inbound invoices are scanned and matched against an existing fingerprint database of known invoices. Key data elements are recognized through OCR/ICR and validated against business rules. Manual verification by an end user allows them to correct low confidence recognition reads, and adjust fields that dont abide by established business rules. All the extracted information, along with the image of the invoice itself, are then available for export to third party LOB applications, databases, and ECM repositories.
1.2
In this first lab, we will process a batch of invoices. Some of the invoices are for vendors with whom we have already done business, so the fingerprints for their invoices are already in our APT fingerprint database. Other invoices are from a vendor whose invoices we have never seen. We will see how this is dealt with dynamically, instead of having to go back to the I/T department.
Because we are performing all the steps to process the batch on a single computer, it will be necessary to switch between the role of an end user and that of a Taskmaster server that is doing background processing. In a real, customer environment, the end user activity (such as scanning and verification) would be done at a user desktop, and background processes would run on a server in the computer room.
1.2.1
Under normal circumstances, you would place all the invoices in an actual scanner and click an icon to initiate scanning. For our lab, the images are already scanned and are on the filesystem. So we will use something called virtual scanning. This is similar to processing content that is already available as an image, such as faxes or email attachments.
Page 5
IBM Software
__1.
Start the APT Client application by clicking Start -> All Programs -> Datacap -> Applications > APT -> APT Client
__2.
You will be prompted to logon to the APT Client. Use the default user of admin and enter a password of admin.
__3.
This brings up the main APT client interface. Note that there are two panes or sections.
The top section (Operations window) has icons for all the user interfaces as well as background processes. A regular end user would only see icons for functions such as scanning and verification, whereas an administrator (such as yourself) sees all available functions.
The bottom section is the Job Monitor and shows the administrator the status of all the batches being processed by Taskmaster. We will use the Job Monitor to manually process our batches through the capture workflow.
Page 6
IBM Software
__4.
We will simulate the scanning of the invoices. Double click on the Scan icon in the Operations window.
__5.
A dialog box opens with a list of Job definitions to choose from. A job essentially states which Taskmaster configuration you want to use to process the batch.
Select the Demo job at the top of the list and click the OK button. __6. The demo job has been configured to use virtual scanning instead of an actual scanner. A window will appear showing you the status of the job. It should complete in a very short period of time.
Page 7
IBM Software
Note that if you were using a physical scanner, some type of scanner interface window would appear. You would be able to see the images as they were being scanned, as well as the capability to make modifications (i.e. rearrange pages, rotate the image, delete images, etc). The specific functions that would appear depend on the scanner you are using and the capabilities of the scanner software driver.
__7.
Click the Stop button. If you dont click the stop button quickly enough, Taskmaster assumes that you want to create another batch (this timeout period is completely configurable). If you dont click the stop button before the timeout period, then the Select Job window appears again.
Page 8
IBM Software
__8.
Click on the Job Monitor window so that it is the active window. Press F5 to refresh the job monitor.
Note that your batch has been created and that it is pending a task called Batch Profiler.
1.2.2
__9.
In the Job Monitor, double click on the batches ID number (as shown above by the red arrow). __10. You will be prompted to confirm that you want to execute the selected batch.
Page 9
IBM Software
__11.
A window will appear indicating that Batch Profiler is running. This is the name Taskmaster gives the background processing task. It will take a few seconds as Taskmasters Rule Runner service does all the background processing necessary to process the invoices.
Rule Runner is the service that does all the heavy lifting in Taskmaster. It does all the image cleanup, recognition, and other background tasks mentioned previously. One key competitive advantage that Taskmaster has over other capture solutions is that its Rule Runner engine can be called as a web service. This allows external applications to take advantage of Rule Runner. For example, you could initiate a job to capture and process images from within a BPM workflow. Rule Runners availability as a web service allows for true in process capture. Capture is no longer limited to a front end application!
Page 10
IBM Software
__12.
A window pops up when Batch Profiler has completed processing the batch.
Click the OK button to continue. __13. Click on the Job Monitor window and refresh it by pressing the F5 key.
Note that the status has been updated to indicate that the batch is pending the Verification step. Double click on the batch ID (as show above by the red arrow) to initiate Verification.
You will be asked to confirm if you want to execute the selected batch. Click OK to continue.
Page 11
IBM Software
Note: Another way you could have started the Verification task would have been to double click the Verify icon in the Operations window. This is the way most end users would initiate Verification.
1.2.3
__14.
The Verification window we are looking at has the recognition results on the left hand side of the screen and the image of the invoice on the left. You can size the windows to your preference, as well as zoom the image.
Page 12
IBM Software
This first invoice has been processed with high confidence. The system has seen invoices from Stinger Wellhead Protection before, and therefore knows where all the data elements are located. Elements such as the PO Number, invoice total, tax, invoice date, etc have all been located and recognized correctly.
You can tab from one field to the next and the related part of the image will be highlighted. __15. Lets take a look at the individual line items that appear in the invoice.
Page 13
IBM Software
Note that Taskmaster APT knows that there are 5 line items on the invoice. This is one of the key strengths of Taskmaster APT it can handle variability in a document. The number of line items will vary from invoice to invoice, and Taskmaster APT can accommodate that. Contrast that to other applications which use a fixed template approach to handling similar documents. You can click the Next button to go from one line item to the next. As you do so, note how the highlighted portion of the image changes also.
A teal or light blue background in a field means that the field was recognized with very high confidence. If a field has a yellow background, then it means that there is one or more characters that were read with a less than optimal confidence. The character(s) in question are shown in red. The operator can visually verify the OCR results and make changes if necessary. __16. Once the operator has looked through all the fields and line items, they can move on to the next document or the next problem. There are several ways to move from one problem or document to the next. You can use icons on the user interface or hot keys. Many experienced users eventually prefer hot keys because it is much faster. But icons are easier for new users so well click the next problem icon.
Page 14
IBM Software
Look to the top of the screen, just under the tool bar, for the arrow pointing to the right with the question mark (as shown above by the red arrow). Click this icon to move to the next problem document. __17. The next document in the batch appears.
Review the results for this invoice. __18. Note that there is a button just under the shipping field marked TIO. You can click this button to see the original TIFF image that was created by the scanner but before any image cleanup was done.
IBM Software
__19.
Notice the difference in the images. The original image (shown below) has all sorts of horizontal and vertical lines on it.
The cleaned up image doesnt have any of those lines. The ability to remove lines (and other image clean up functions) can greatly improve the accuracy of the OCR/ICR engines, however your enterprise may choose to store the original version of the image for legal purposes. The choice is up to you. __20. __21. Click the next problem icon to advance to the next invoice in the batch. The third invoice in the batch appears.
Page 16
IBM Software
Here we have something a little different. One of the fields (invoice date) has a light red background. This indicates that a validation error occurred. In this case, the invoice date though recognized correctly fails validation because it is an invalid date. Feb 29 can only occur in a leap year and 2009 is not a leap year. So the operator must take some step to correct. Change the date to 2/28/09 to get around this validation error. __22. Taskmaster is capable of much more than simple validations like ensuring dates are valid or that something matches a datatype check (e.g. is numeric only). More complex, business specific validations can be enforced. For example, Taskmaster APT is configured to ensure that the total amount of the invoice is equal to the sum total of the line items (including shipping and tax).
Change the Invoice Total to something other than its true value. Now try to click the next problem icon to move to the next invoice. Youll get a validation error that says:
Click No. Change the Invoice Total back to its correct value of $3166.44 and click the next problem icon. You will now be able to move to the next invoice.
Page 17
IBM Software
1.2.4
__23.
Note that virtually all the fields are blank! This is because Taskmaster APT has never seen an invoice from this vendor before and therefore doesnt know where to find most of the fields. Some of the fields, like Invoice Date and Invoice Total were found through a location technique that searches for specific text on the page that leads APT to know where the data is. But APT will require that a human tell it where the majority of the field data is located. With most capture products, this means going back to the capture administrator and asking them to create a new template for the invoice. But Taskmaster APT doesnt require that instead we will use a Taskmaster feature called Click and Key which lets an end user dynamically add a new fingerprint to the fingerprint database. __24. The first thing we want to do is ensure that the vendor we are paying is someone we are authorized to pay. Most companies have a vendor database that tracks approved suppliers. Taskmaster APT has database lookups directly integrated into the Verification process.
Page 18
IBM Software
Enter the first few characters of the vendor name (in this case JWS) and click the button marked Lookup Vendor. __25. A window pops up showing the vendors who start with JWS.
Double click on the entry for JWS of Colorado Inc. __26. Now we will start using the Click and Key feature to tell Taskmaster APT where to get the rest of its fields. Tab to the Remittance_Zip field and click on the zip code for the vendor.
When you move the cursor to a string of text, it will get highlighted in yellow. Click on the string to select it. The selected ZIP code will show up in the Remittance Zip field.
If you want to make things a little easier for you, you can right click on the image view window and select one of the zoom options. Zoom to width is probably a good idea.
__27.
Tab to the Invoice_Number field and click on the invoice number, located in the top right corner of the invoice.
Page 19
IBM Software
__28.
Tab to the Tax field and click on the sales tax amount in the lower left part of the invoice.
__29.
Now tab to the PO_Number field and click the purchase order number on the image,
__30.
Now we can start telling Taskmaster APT where the line item detail is. The great thing about APT is that you only have to tell it where the first line item is, and it will figure out all the remaining line items after that.
Click the Add button in the Details section of the screen (as shown above by the red arrow). The line item number should change to 1 of 1.
Page 20
IBM Software
__31.
Tab to the ItemID field and click on the first item number (the string 201 on the image).
__32.
Tab to the Qty field and click on the first quantity amount (the string 6 on the image).
__33.
Tab to the ItemDesc field. In this case, we want to select multiple words, not just a single string. There are two ways we can do this. You can hold down the Shift key while you click on each word in the string. An easier way is to simply hold down the left mouse button while you draw a box around the desired area.
Page 21
IBM Software
__34.
Tab to the Price field and click on the first unit price amount (the string 65.00 on the image).
__35.
Tab to the LineTotal field and click on the first line total amount (the string 390.00 on the image).
Page 22
IBM Software
__36.
__37.
We have only found the first line item. We need to find ALL the line items. Click on the Find Details button at the bottom of the Details section.
Page 23
IBM Software
__38.
Note how the details section is updated to show you that you are looking at the first of three line items.
You can click the Next button to see each of the line item details.
Here we see the details for the second line item on the invoice.
Page 24
IBM Software
__39.
We have completed defining the new invoice fingerprint. We can immediately add it to the fingerprint database by clicking on the New button.
A little fingerprint is displayed to indicate that the fingerprint database has been updated for this new invoice.
__40.
Page 25
IBM Software
__41.
Now we have another invoice from JWS of Colorado. Taskmaster APT knows that a fingerprint for this invoice exists in the system. This is indicated by the appearance of the Sticky Available button appearing.
Page 26
IBM Software
__42.
Taskmaster APT will use the fingerprint information from the previous invoice to locate and extract all the necessary data elements from this new invoice.
__43.
Click the next problem icon to advance to the last invoice in our batch.
Note that this is a two page invoice. You can move from page 1 to page 2 by clicking on the buttons labeled with < and > .
Page 27
IBM Software
__44.
Note that on page 2 of the invoice, Taskmaster APT was able to continue to locate the additional line item details, even though the header information was repeated.
__45.
We have reached the end of the batch and there no more problems to verify. Click the Yes button to finish the batch.
Click the OK button on the notification message saying that the batch has completed verification. __46. Refresh the Job Monitor screen by pressing the F5 key.
Page 28
IBM Software
Note that the job is pending export. The export step is where we might integrate with an ECM repository such as IBM CM8 or Filenet P8. Its also the step where our changes to the fingerprint database are confirmed and updates are published for other users to be able to use. So double click on the batch ID and let the Export job complete. IMPORTANT!! Dont miss this step!! If you forget to export the batch, then the fingerprint database wont get updated with the new fingerprint you created. Congratulations!! Youve completed the first Datacap Taskmaster lab. Lets move on to the next lab exercise.
1.3
In this lab, we will take a quick look at one of the web based user interfaces for scanning and verifying batches. One of the key strengths of Datacap Taskmaster is the fact that ALL function that can be executed from thick clients can also be done from thin, web based clients without installing any kind of additional software (including web plugins) and without the need for remote servers. Well go through the same APT application, but using a browser interface.
1.3.1
__47.
After IE starts up, click the link on the top toolbar to go directly to the Login screen for Datacap Taskmaster Web.
Page 29
IBM Software
__48.
Ensure that the application name is APT, the userid/password is admin, and the station is 1. __49. The main end user interface for Taskmaster APT Web is displayed.
There are three basic end user functions: Scan, Upload, and Verify. Click the Scan link. __50. The list of available jobs is displayed.
IBM Software
__51.
As with the last lab exercise, instead of actual scanning, we will virtually scan existing documents that are on our hard drive.
Click the Browse button on the top, right side of the screen. __52. Browse to C:\Datacap\APT\images\Input and select the first document in that directory.
Select the first of the images and click the Open button.
Page 31
IBM Software
Ensure that the check box is selected to tell Taskmaster that you want to virtually scan more than a single image. Setting the Expected pages value to zero means that you want to process all the images in that directory location. Click the Scan button. __53. Thumbnails of the images are displayed, along with a confirmation message.
Click the OK button. __54. At this point, you could examine the scanned pages in more detail, rearrange the pages, etc.
For now, lets just click the Done button to move to the next step.
Page 32
IBM Software
__55.
You now have the option to click the Continue button to scan another batch. We are done with scanning so just click the Stop button. __56. You are returned to the main Operations screen.
At this point, all the images that weve scanned are kept on your local workstation. They are not stored on the Taskmaster server until you upload them. The advantage of this approach is that, for remote users, they can run their scanners at rated speeds without worrying about the bandwidth of their network connection. By delaying the upload to a later point in time, uploads can be scheduled to run when network bandwidth is at its best. For now, click the Upload button to store the scanned documents on the Taskmaster Server.
Page 33
IBM Software
Click the Stop button since we have no more batches to upload. __57. Return back to your thick Taskmaster APT client.
__58.
Page 34
IBM Software
Note that there is now a web job that is awaiting the Batch Profiler background process. Double click on the batch ID for that job to start the background processor.
Click the Yes button to indicate you want to execute the selected batch.
Please keep in mind that in a real production environment, there would be a separate server that would continually monitor Taskmaster for batches to execute. We are manually processing the batches just for the purposes of this lab exercise.
__59.
Page 35
IBM Software
Click the OK button. __60. Refresh the Job Monitor window by pressing the F5 button.
Now our job is ready for verification. So we will switch from our background processing role to the role of a remotely based end user who will verify the batch of invoices. __61. Return back the Internet Explorer browser. You should be at the main Operations screen. If not, click the Operations link at the top of the screen.
Page 36
IBM Software
__62.
We are processing the same images that we processed in the first lab exercise.
Since we have seen all these images before, lets just quickly advance through the batch.
The icon for the next problem document is a little different on the web interface. Its the yellow arrow (pointing to the right) with the exclamation mark, as indicated above by the red arrow. Move to the next problem document. __63. Lets keep moving so click on the next problem link again.
Page 37
IBM Software
Note that the same validation error comes up with the invalid date on the third invoice. Correct the date to 2/28/09 and move on to the next document. __64. The next document is from JWS of Colorado. Note that the invoice is processed without having to click a Sticky Available button. This is because the export process that we ran at the end of Lab 1 has updated the fingerprint database for all users. Now anyone that processes an invoice from this vendor will have access to the updated fingerprint. At some point in going through these documents, select the Disp Snippet check box.
__65.
This function is available in the thick client also by using the Ctrl+S hot key combination. Again anything that can be done in the thick client can also be done in the thin client! __66. Continue using the next problem icon to move through the rest of the documents in the batch. When youve reached the end, youll be prompted to finish the batch.
Click OK.
Click OK.
Page 38
IBM Software
Click the Stop button to indicate you dont want to verify any more batches.
Page 39
IBM Software
Lab 2
2.1
Overview
The Taskmaster APT application is a highly specialized Accounts Payable application that is in use by hundreds of enterprises for the handling of large amounts of invoices. Lets step back and take a look at a more general type of application. A standard application that comes out of the box with the Taskmaster product is called Taskmaster Flex. Taskmaster Flex is a simple way to setup new document classes and then use Taskmasters Click and Key technology to define the form layouts. Whereas with APT we were only dealing with a single document class invoices we will deal with multiple document classes in Flex. We will start by setting up a basic document type, then move on to examining the client interface.
2.2
In this part of the lab, well examine the part of Taskmaster Flex that allows us to setup new document types. __67. Lets say that were going to start capturing emails with Datacap. Our users have asked that we add a new document type to Datacap called Email. Each email will be indexed by the Subject, To, From, and CC fields. Start the Taskmaster Flex Manager by clicking Start -> All Programs -> Datacap -> Applications -> Flex -> Taskmaster Flex Manager
Page 40
IBM Software
__68.
There are two tabs for this configuration screen: Index Fields and Document Classes. Index fields are fields that you might want to OCR on a document, then use that information later to search a content repository. We are currently on the Index Fields tab. The list of existing fields is shown on the left. The characteristics of the selected field are shown on the right side. __69. Click on the Document Class tab.
Page 41
IBM Software
All defined document classes are shown in the list on the left hand side. Selecting one of the existing document classes shows all the index fields that are used for that document class. __70. Go back to the Index Fields tab. We will define the four indexes that we want to use for the emails.
Enter From for the Index Field Name. Select Alphanumeric from the Data Type dropdown. Enter a Min Length of 5 and a Max Length of 30. Enter Z for the Picture String. The picture string determines the list of allowable characters. A picture string of Z means any printable ASCII character is allowed. (The complete list of Picture Strings is in the Flex Quick Start manual). Select No from the Required drop down list. Select Yes from the Find by Zone drop down. This means that we will be defining a zone using the Click and Key method. Once the zone is defined, we will perform OCR on that field. Click the Save button when you have finished entering all the field characteristics. __71. Now well define the Subject index field.
Page 42
IBM Software
Change the Index Field Name to Subject. Leave everything else the same as before, except change the Max Length to 128. Click the Save button. __72. Define the To index field.
Change the Index Field Name to To. Change the Max Length to 30. Click Save. __73. Lastly, we will define the CC field.
Page 43
IBM Software
You can use all the same parameters that you used for the previous field except change the Index Field Name to CC. Click Save. __74. We are done defining the index fields.
Page 44
IBM Software
__75.
__77.
Select From from the list of Available Indexes, then click the Add button. Repeat the process with the To, Subject, and CC indexes.
Lab 2 - Quick and Easy Document Setup with Taskmaster Flex Page 45
IBM Software
Note that you can change the order of the index fields by using the Up/Down buttons. __78. Click Save to save your changes.
Click OK. __79. Were done with adding our new document class. Close the Flex Configuration client.
Page 46
IBM Software
2.3
__80.
Enter admin for the password and click OK. __81. Like Taskmaster APT, the Flex Client has two sections: Operations and Job Monitor. We are seeing both because we are logged on as an administrator. End users would typically only see the Operations screen.
Double click on the Scan icon to create a new batch. __82. A list of possible job configurations is displayed.
Select the first job (Demo Job) and click the OK button.
Page 47
IBM Software
__83.
We are doing virtual scanning, which is essentially importing existing images that are on our hard drive. Youll see the progress window.
Click the Stop button to indicate you dont want to create another batch. (You have 10 seconds to click the Stop button before another batch is created. If this happens, just wait for the batch to be completed and youll get this message again. Click Stop. You can delete the unneeded batches in the next step by selecting the unneeded batch in the Job Monitor and pressing the Delete key on your keyboard). __84. Go to the Job Monitor and refresh it by pressing F5.
Note that the job is awaiting the Rule Runner background process. Double click on the job ID (as indicated above by the red arrow) to start the background process.
Page 48
IBM Software
Click the Yes button to confirm that you do want to execute the selected batch. The background Rule Runner process will be launched.
Click OK. __85. Return to the Job Monitor and refresh it by pressing F5.
Note that the job is now ready for verification. You can start the Flex verification client by double clicking on the job ID.
Page 49
IBM Software
Alternatively, recall that you can also do it by clicking on the Verify icon in the Operations window. __86. The first document in the batch is displayed in the Verification client.
Recall that in Flex, we are handling batches which could contain many different classes of documents. Since we have never seen this document before, Flex does not know what document class to use.
Click on the drop down list to see the available list of document classes. Select the AP Invoice document class.
Page 50
IBM Software
__87.
The left hand side of the screen changes and displays the different attribute values associated with an AP Invoice.
__88.
Note that there is a button marked Locate just to the right of the attribute fields.
Click the Locate button. The Locate capability is a very powerful capability in Taskmaster. It can use either a specific text pattern or a keyword file to locate information on a form without knowing its physical location. An example of a specific text pattern would be something like a Social Security Number, which is always of the form nnn-nn-nnnn (where n is a numeric digit). Another way to locate information is to look for a specific piece of text (e.g. Invoice Number) and then use the data near that text as the value for our field. We can put all the possible specific text strings in something called a keyword file. .
Page 51
IBM Software
__89.
See how two of the fields were automatically found for us.
The date is found by looking through the OCR data for patterns of the type mm/dd/yyyy. The PO number is found through the use of a keyword file. It is looking for specific text strings such as Purchase Order #, PO#, Order Number, etc. __90. Now we will handle the remaining fields.
The Vendor Name is displayed on the invoice as a graphic. As such, it cannot be OCRd and needs to be manually entered. __91. Tab to the Invoice Number field and click on the invoice number located on the top, right hand corner.
Page 52
IBM Software
__92.
Tab to the Terms field and then draw a box around the payment terms, which are located just above the line items.
__93.
Tab to the Total field and click on the part of the image where the invoice total is located.
__94.
__95.
The next document in the batch is displayed. Its an income tax form.
Select Tax Form from the document class drop down list and then click on the Locate button. __96. Note that there is a different set of attributes associated with Tax Forms than there are for AP Invoices. Each document class can have its own metadata structure.
Page 53
IBM Software
The Locate function was able to automatically find the Social Security Number since it has a well known structure. Now we just have to use the Click and Key feature to show Flex where the Client name is found.
Tab to the Client field and draw a box around the name at the top of the tax form. You may need to zoom in on the image to make drawing the box easier for you. Click the next problem icon to move to the next document in the batch.
Page 54
IBM Software
__97.
Another invoice from a different customer is displayed. Select AP Invoice from the document class dropdown list.
Click the Locate button to find as many of the fields as possible. Using the same process as you did with the first AP invoice, use the Click and Key feature to populate all the remaining fields. Then click the next problem icon to advance to the last document in our batch. __98. The last document is an email so select Email from the document class dropdown list.
Tab to the From field and draw a box around the name of the person who sent the email (Tom Stuart).
Page 55
IBM Software
Tab to the To field and draw a box around the name of the person who sent the email (Thomas Simalchik).
Tab to the CC field and draw a box around the name of the person who was carbon copied on the email (Scott Blau).
Lastly, tab to the Subject field and draw around the email subject line (Taskmaster Flex). __99. Click the next problem icon. There arent any more documents in the batch.
Page 56
IBM Software
Click the OK button to confirm the completion of the verification task. __100. Weve seen how we can just click on a part of document to easily enter data for indexing purposes. Weve also seen the ability to automatically locate information without even knowing its specific location. Additionally, information about the physical location of zones is maintained in the system. As a result, the next time we see the same forms (e.g. another invoice from Sloss Industries, or a 1040 tax form), Taskmaster will be able to automatically find the data without us having to click on a zone. Congratulations!! Youve completed this last lab exercise for the Taskmaster Flex application.
Page 57
IBM Software
Lab 3
3.1
Overview
IBM Datacap Taskmaster comes with a robust reporting tool that allows you to view a wide variety of predefined reports, create new reports, dynamically filter reports, and then export results to PDF or Excel format. You can even create a dashboard of multiple reports which refreshes itself automatically, thus giving you a real time view of your enterprises Datacap environment. In this lab, we will examine some of the basic reporting capabilities in IBM Datacap Taskmaster RV2. Note that, at this time, we are unable to lead you through custom reporting creation as this requires a license of Microsoft Visual Studio 2008 which were unable to distribute on VMWare images.
3.2
Viewing Reports
__101. RV2 reports are viewed via web browser so you can see reports at any time from anywhere. Start a Firefox browser by clicking on the Firefox icon on the toolbar.
Start RV2 by clicking on the RV2 Login link on the Firefox Bookmarks toolbar. __102. Login to RV2 using userid/password of admin/admin and station 1
Page 58
IBM Software
__103. A list of all predefined reports is listed on the left hand side. When you select a report, you can run it for a single application, or multiple applications. Theres also the capability to filter the data in these reports. Well examine filtering shortly. Lets take a look at some of these reports. The first report in the list is Problem Batches. Leave that report selected, select All from the list of applications, and click the Run Report button.
__104. The definition of a problem report is any report that has either aborted, or has been in a running state for over two hours.
__105. Return to the main list of reports by clicking on the Reports link in the upper left corner.
Page 59
IBM Software
__106. Lets look at a few more reports. Select Current Batches from the list of reports. Select All from the Application list, and click on Run Reports.
This shows you a list of all reports in the system, what stage of their workflow they are in, what their status is, what the priority is, and when they were created. You can sort the report output by any of these criteria by clicking on the column headings.
Page 60
IBM Software
__107. Go back to the main RV2 page and select the Current Stations report (select All applications and click Run Report).
This shows you a listing of all active stations, and who is logged on. Because were running on a single server, the only station ID weve been using is number 1. Your report may look different from the screen capture it will depend on the number of activities you have running at this time.
Page 61
IBM Software
__108. Go back to the main RV2 screen by clicking on the Reports link. Lets take a look at the Station Activity report for all applications.
This report shows you how many batches were processed during different points in the day by each station. Again, our output is quite simple because we have only been processing a small amount of data on a single station. But you can use this kind of report to see if any of the background stations are processing batches at an unusual rate. Unusual processing rates can be an indicator of a potential problem. __109. Go back to the main RV2 page and select the Scan Summary from the list of reports. Select All applications before you run the report.
Page 62
IBM Software
Expand the summary detail for the admin user by clicking on the plus sign next to the user name.
This scan summary is specific to thin client scanning. You could always modify this report definition if you wanted to include thick client scanning. __110. Take some time to review some of the predefined reports. Because of the limited amount of data on these Proof of Technology images, its best to view the data for All the applications. This will give you a better idea of the kind of knowledge that you can get from these reports.
3.3
__111. Lets say that you want to create a filter for the Current Batches report. You dont want to actually modify the report but just selectively filter out certain reports from your batch. For example, lets say that we only want to see current batches that are on hold for some reason.
Page 63
IBM Software
Select Current Batches from the list of reports on the main RV2 screen and then click on the Manage Filters link at the center-top of the screen. __112. There shouldnt be any filters currently on your image for this report type.
Enter the name Held Batches as the filter name and click the Add button as shown above to add a new filter. __113. You can basically use any column as a filter criteria.
Select Status from the field drop down list. Select Equal To as the condition. Enter Hold to complete the Search Criteria
Page 64
IBM Software
Click the Save button to save your filtering criteria. This will allow us to reuse the filter later. __114. Now lets view the report.
Now we only see the current reports that are in Hold status. Weve saved this filter so, in the future, you can select this filter when you run the report from the main RV2 screen and dynamically filter the report results as shown above.
3.4
__115. Click on the Dashboard link in the upper left corner of the screen.
Page 65
IBM Software
Select a report type, such as Problem Batches from the Report drop down list. Select All from the Application list. __117. The select report will automatically be displayed.
Note the shaded corner of the report window. You can click and drag this corner to size the window to any size you like. Resize the report window so that it take up the top left half quarter of the screen.
Page 66
IBM Software
__118. Lets add another report to the dashboard. Click the Add link at the top of the screen.
Page 67
IBM Software
__119. Another report window appears. It may actually appear on top of your existing report window. You can always move it by clicking on the blue title bar and dragging it to the desired location.
Move the new report window so that its beside the Problem Batches report window. Select the Current Batches report for All applications. Repeat this process for the Scan Summary report (or any report that you want, for that matter) and position it wherever you like. Heres an example of what the dashboard could look like:
Page 68
IBM Software
__120. Note that there is a drop down list at the top of the screen to set a Refresh rate.
You can set the Refresh rate to update your dashboard as often as every minute, thereby giving you a real time view of your enterprises Taskmaster system.
Page 69
IBM Software
__122. Select Current Batches from the list of reports and run the report.
__123. When the report completes running, note that there are links to create PDF and Excel versions of the report.
Click on the PDF link. __124. A PDF version of the report is displayed. You can use standard PDF functions to save/print the PDF file.
Page 70
IBM Software
Congratulations! Youve completed the RV2 lab. You should now have an idea of the kind of reporting capabilities that come with the IBM Datacap Taskmaster system.
Page 71
IBM Software
Lab 4
4.1
Datacap Studio is a rich development environment which allows a user to easily develop, modify, and test new Datacap applications without having to have programming or development skills. A Datacap application can be thought of as the processing rules for a batch of documents. The batch can contain one or more document types, each with differing processing requirements. When creating a Datacap application, you typically start by defining the document hierarchy and then create the rulesets that are applied to different elements within the document hierarchy. Rulesets are composed of rules which are, in turn, composed of predefined actions. The document hierarchy describes the structure of the batch. It describes;
different types of documents that can occur in a batch structure of each document type, which includes the different types of pages that can appear in a document. various fields that can occur on a given page
The four elements of a document hierarchy are the batch, the documents in a batch, the pages in a document, and the fields on a page. Rules can be bound to the different elements within a document hierarchy. You may have rules that are only executed once (e.g. connecting to a lookup database when the batch is opened). There may be rules that are executed once per document (for example, uploading a document to an ECM repository). You could have rules that are only executed once per page (for example, examining the page to see if it is a blank page). And finally, there could be rules that are executed once per field (for example, verifying that a field like SSN is in your customer database). You use the actions (which are reusable and located in the Actions library) to create functions. A function can be thought of as a group of actions that work together to perform a specific task. We will look at the details behind rulesets, rules, functions, and actions in more detail shortly. Lets take a quick look at Datacap Studio.
Page 72
IBM Software
Note that there are three tabs at the top of the Datacap Studio interface. 1. Rulemanager this is where main configuration for your application is done 2. Zones this is where you identify any fingerprints that you might use for classification purposes, as well as any zones you might want to set up for OCR/ICR. 3. Test this is a test environment for your application Well focus for the moment on the Rulemanager page. Notice how the Rulemanager page itself is divided into three main sections. Looking from left to right are the following: 1. On the far left is the Document hierarchy. This is where the structure of the batch is defined. The illustration above is for the APT application, which does invoice processing. The batch (called APT, meaning Accounts Payable Transactions) is made up of three possible document types: Invoices, Separator Pages, and Other (a catch all for anything that cannot be classified). The Invoice document type can have a Main Page, a Trailing Page, an Attachment Separator page, and the Attachment itself. On the Main Page are many fields, such as the Vendor Number and Invoice Total. 2. The middle section is where all of our rulesets are. A ruleset is made up of one or more rules. Rules are bound to different elements within the document hierarchy. For example, we might have a VScan rule which controls the scanning of the batch. You would use PageID and ImageFix rules on individual pages. You might use an Export rule to export invoice data to a line of business database.
Page 73
IBM Software
3. The far right is where the Action library and Task profiles are managed. The Action library (shown in the above illustration) is where all the reusable actions that come with Taskmaster are organized. You simply click on actions to make use of them when creating your rules. The Task profile (not shown in the above illustration) describes the order in which rules are applied to the document hierarchy. For example, you would want to run Scanning rules first, before running PageID, which would have to run before you could run Recognition rules. We will examine all parts of the Datacap Studio in more detail as we go through this lab exercise.
4.2
The key to Datacap Taskmaster is the Rules paradigm. It is a unique method for configuring Capture applications. It stresses efficient reusability and reduces, if not outright eliminates, the need to do any custom scripting. We will take a moment to examine this important aspect of Taskmaster configuration. Weve introduced the concept of rulesets, rules, functions, and actions. Lets look at these more closely.
Actions
Actions are our most basic elements and they perform very specific tasks. An action may perform OCR, connect to a database, or return information about a field. Actions can return information (e.g. the results of a SQL call) but all actions will return a Boolean value (true or false) indicating the success of the action. An example of an action is PDFDocumentToImage. This action takes a PDF document and converts it to a multipage TIFF image. The action returns true if the conversion completes successfully.
Functions
Functions are groups of actions. The actions within a function are executed in sequence until one action returns false. If all actions return true, then the function returns true. Lets look at an example. Lets say weve created a function that will be used to determine if a field is a proper zip code. The function could look like this: Function: Is_5_Digit_Zipcode IsFieldPercentNumeric(100) MinimumLength(5) MaximumLength(5) The first action returns true if 100% of the characters in the field are numeric. The second action returns true if the field is at least 5 characters long. The third action returns true if the field has a maximum length of 5 characters. The actions are executed one after the other as long as all actions return true. So if the field value is 28010, then all three actions will return true and the function will return true. But lets say the field value was 28O1O (with capital ohs instead of zeros). Then the first action would return false. The remainder of the actions would not be executed, and the entire function would return false.
Page 74
IBM Software
Rules
Rules are a collection of functions. One big difference between rules and functions is that functions execute actions until an action returns false whereas rules execute functions until a function returns true. Those familiar with programming can think of the logic associated with functions as being equivalent to a series of logical AND conditions. The logic associated with rules is equivalent to a series of logical OR conditions. Lets use an example to see why and how this works. Lets build on our zip code example. We could have a rule that looks like the following:
Rule: Is_A_Valid_ZIP_Code Function1: Is_5_Digit_ZIP_Code IsFieldPercentNumeric(100) MinimumLength(5) MaximumLength(5) Function2: Is_9_digit_ZIP_Code IsFieldPercentNumeric(90) MinimumLength(10) MaximumLength(10) The second function would return true if the field had 9 out of 10 characters be numeric, and be exactly 10 characters long. This means a field value of something like 28010-8990 would return true. We would continue testing the field against possible ZIP code conditions until one of them returns true. Theres no reason to continue testing the ZIP code if the first function returns true so we stop.
Rulesets
Rulesets are groups of related rules. For example, you might associate all the rules that validate your OCR results into a single ruleset. An example of this might look like the following: Ruleset: Validations Rule1: Is_A_Valid_ZIP_Code Rule2: Is_Date_Valid Rule3: Customer_Number_In_Database As weve noted before, rules are linked to different elements within the document hierarchy. Rulesets are linked to tasks in the task profile. Well examine this in more detail when we look into workflow.
Page 75
IBM Software
4.3
Lab Scenario
The scenario we will follow in our lab is a simple one. As part of a larger financing application, we want to verify an applicants income. We will do this by requesting the applicant submit a Verification of Employment form and a copy of their most recent W2 statement. This will then be used by our line of business application to determine if the applicants income meets our requirements. We will need to be able to automatically classify these two document types and the extract some data from them. Because we are doing a short lab, there are some elements of a production configuration that we will not concern ourselves with. The main idea is to understand the capabilities of Datacap Studio in setting up a new application, and the ease with which these capabilities can be used.
4.4
The first step in creating a new application is using the Datacap Studio Application Wizard to create a skeleton application. The wizard will create a simple document hierarchy, define some commonly used rulesets, and bind those rulesets to the typical parts of the document hierarchy. In this part of the lab, well create our skeleton application and take a look at what the wizard has defined for us __125. Start Datacap Studio by clicking Start -> All Programs -> Datacap -> Datacap Studio -> Datacap Studio
Page 76
IBM Software
__126. Datacap Studio will ask if you want to connect to an existing application. We are creating a new application so just click on the Close button.
__127. Datacap Studio opens with an empty application area. Look in the top right corner of the window. There are some icons there, including one for the Application Wizard.
Click on the icon immediately to the right of Settings as shown above by the red arrow.
Page 77
IBM Software
Click Next __129. The wizard gives you the option of creating a new application, copying an existing one or converting one from a previous version of Taskmaster.
Select the option to Create a new RRS application and click Next
Page 78
IBM Software
__130. Enter Income_Verification as the name of the new application. Leave the directory locations for the next two fields as C:\Datacap
Click Next. __131. At this point, the Application Wizard allows you to enter some preliminary information about your application.
Page 79
IBM Software
__132. The Application Wizard asks if you want to define any fingerprints.
Again, click Next to skip this step. Well do it later. __133. The Application Wizard asks if you want to enter any sample images.
Page 80
IBM Software
__134. The Application Wizard is done collecting information about the new application.
Click the Finish button to have the skeleton application created. __135. A status window appears when the Application Wizard has created the skeleton application.
It creates a folder for the application with the Datacap directory structure. It also creates a folder for all batches in process, as well as a folder where all process information is stored. Databases containing application and batch information are created (by default, they are created as Microsoft Access databases, but you can replace these with Microsoft SQL Server or Oracle databases for production purposes). Click the Close button to shutdown the wizard.
Page 81
IBM Software
Click on the Connection icon as shown above by the red arrow. __137. The list of applications is presented.
Page 82
IBM Software
Page 83
IBM Software
__139. The application is presented. Let us take a look at what is included in the skeleton application. Expand all elements of the Document hierarchy tab on the left side of the screen.
A batch called Income_Verification has been created. Under it is a default document type called Document. This document type is a generic document that includes a single field called Field. There is a default page type called Other, which is assigned to all pages until they have gone through some classification process. Youll notice that there are Open and Close sections for each batch, document, page and field. Rules can be executed when a batch/document/field is opened or closed for a particular task (more on tasks later). In the skeleton application, the VScan, Create Docs, Set Export Params, Set Fingerprint Params, Batch Document Integrity Check, and ImageFix Load Settings rules are all associated at the batch level. Many of these rules have to do with setting global parameters that are used for all content within the batch.
Page 84
IBM Software
__140. Now lets focus on the other side of the Datacap Studio window. Look on the far right side. The Actions library tab is displayed. Click on the Task profiles tab.
These are the tasks that constitute the workflow for our skeleton application. New tasks can be created and added to the profile if necessary. The Application Wizard automatically creates basic tasks that virtually all applications use. __141. Expand all the tasks in the Task profiles and lets take a look at the specific rulesets that make up the tasks.
Each task is linked to one or more rulesets. Note that the same ruleset can appear in multiple tasks. For example, the Validate ruleset appears in both the Rulerunner task (which is a background task) as well as the Verify task (which is an interactive task that is driven by a user interface). This is because you may have to perform some of the validation rules in the background while others may have to wait until the user actually types in information on the indexing screen.
Page 85
IBM Software
__142. Now lets examine the rules in the rulesets. These are in the middle section of the Studio. Locate the VScan ruleset and expand all the elements within it.
The VScan ruleset has only one rule in it, which is also called VScan (VScan stands for virtual scan). This rule scans all the documents in a directory to create a batch of documents. You can see the actions that are used to create the function. The directory location to be monitored is set, as are the maximum number of files to be scanned. Then the scan actually occurs and the batch is created. __143. Expand the elements of the ImageFix ruleset.
There are two rules here. The first one (ImageFixLoadSettings) is bound at the batch level and is executed once per batch. It tells Taskmaster which image enhancements settings to use. The second rule is bound at the page level and executed on every page with a page type of Other. This rule applies the image enhancement settings and saves the original page with an extension of tio, which means tiff original. Therefore, the original image can be maintained, if necessary. Note: As an aside associating the image enhancement rules with the Other page type ensures that every page gets enhanced. However, lets say that only certain pages need to be enhanced (perhaps for OCR purposes). You could bind the enhancement rule only to those certain page types. This would reduce the amount of processing that Taskmaster would have to do.
Page 86
IBM Software
The SetFingerprintParams rule is bound to the batch and executed when the batch is first created. It sets the parameters for use of fingerprint matching as the page identification method. The location of the fingerprint database is specified, the area of the image to limit the fingerprint search is defined, and the minimum classification confidence level of 70% is set. The PageID rule is bound to the page level. Every page is initially classified as page type Other. All Other pages execute this rule to see if a valid fingerprint can be found to help identify the correct page type. The first action analyzes the image, and the second action uses the analysis information to search the fingerprint database. __145. The skeleton application has a very basic Recognize rule set associated with it. Expand the Recognize ruleset.
This is just a general purpose recognition rule set. The ReadZones action looks at the fingerprint for the image and determines the physical locations of all the zones that are associated with the page. Then the RecognizePageFieldsOCR_S uses the ScanSoft OCR engine to attempt to read the data in those zones.
Page 87
IBM Software
The Set Export Params rule is executed at the batch level. It sets the path for the file that will contain all the metadata for each document, as well as writing some basic header information. The Export Page Fields is executed at the page level and will write out all the information for the page. Now that weve examined the skeleton application that is created by the Application Wizard, we can start creating our own application.
4.5
In this part of the lab, we will examine our sample documents and set up the document hierarchy for the documents. Recall that in our lab scenario, we are receiving documents used to verify a customers income. These documents consist of a W-2 tax form, and a standard Verification of Employment letter that is filled out by the customers employer. __147. Lets examine our sample documents. Open Windows Explorer and navigate to C:\Sample Images\Technical Deep Dive
Page 88
IBM Software
__148. Select any one of the W2 images and take a quick look at it by double clicking on the file.
We will use OCR technology to extract the following fields from the W2 (shown highlighted)
Page 89
IBM Software
__149. Choose one of the Verification letters and take a quick look at it by double clicking on the file.
We will be extracting
Page 90
IBM Software
__150. Now we will start defining the document structure within Datacap Studio. We will need to lock the Document hierarchy in order to modify it. This is a protection mechanism designed to ensure no one makes accidental modifications.
Click on the icon resembling a lock at the top of the document hierarchy. __151. As we previously noted, the batch is called Income_Verification and there are two objects that the Application Wizard has created for us. There is the default page type of Other and a single default document called Document. We will rename the default document type.
Expand the hierarchy if necessary and single click on the document until it changes appearance as shown above. This means that you can update the name. Change the name of the document from Document to W2. __152. The Application Wizard has created a single default page for this document. We will rename it also.
Again, single click on the page name and change it from Page to W2_Main_Page.
Page 91
IBM Software
__153. Expand the W2_Main_Page. We will now rename the default field that the Application Wizard created.
Single click on the default field and change its name to Borrower. __154. Three additional fields will be needed for this page.
Right click on W2_Main_Page and select Add multiple -> Fields. Enter 3 for the number of fields you want to add, then click Enter.
Page 92
IBM Software
__155. Note that three new fields have been added, named Field1, Field2, and Field3.
__156. Use the same process that you used to change the default field name to Borrower to change the name of the new fields to EIN, Income, and SSN. The hierarchy should look as follows:
Page 93
IBM Software
__157. The Application Wizard creates one default document but our application will need two document types. So now we will add a second document.
Right click on the batch name (Income_Verification) and select Add-> Document. A new document called Document1 will be added to your hierarchy.
__158. Single click on the new document name and change it to Verification_Letter.
Page 94
IBM Software
__159. There arent any pages yet in our new document. The Verification Letter is a single page document so we need to add one page to this document. Right click on the Verification_Letter document type and click Add-> Page.
__160. A new page will be displayed called Page1 (you may need to expand the hierarchy).
Page 95
IBM Software
__161. The main page of the Verification Letter has 5 fields associated with it. Right click on the page name and click Add multiple -> Fields -> 5.
__162. Change the name of the first new field to Borrower. You will see a message box appear telling you that the field name already exists somewhere else.
You can choose to use the properties of the existing field called Borrower. This means that any rules and properties associated with the existing field will be used for this new field. Click Yes.
Page 96
IBM Software
__163. Change the name of the remaining fields to Employer, Lender, Income_Last, and Income_YTD. Your document hierarchy should now look like this.
Take the time to confirm that everything matches the above picture.
Page 97
IBM Software
__164. Now we are going to define some properties that determine the document integrity of our documents. Document integrity refers to whether a particular object is mandatory, the maximum and minimum number of those objects, and the order that they appear in. Document integrity applies to the document, page, and field level. There are three properties called MAX, MIN, and ORDER that determine document integrity. Right click on the W2 document type and select Manage Variables
__165. A small dialog box appears. The section on Object general information should already be expanded for you.
Leave the variables as they are. Saying that the maximum and minimum number of documents is 0 means that the documents are completely optional within a batch. In other words, you could scan a batch of Income_Verification documents without having any W2s in the batch. Click Done in the upper right corner.
Page 98
IBM Software
__166. Right click on the W2_Main_Page page type and select Manage Variables. Note that the Max, Min, and Order values are 1.
This means that if a W2 document is scanned in, you must have 1 and only 1 main page (and obviously, its the first page in the document). In other words, you are saying that the W2 is a single page document. Click Done. __167. Right click on the Borrower field type and select Manage Variables. Change the Max and Min values to 1.
This means that there can only be one occurrence of the Borrower field on a single W2. There isnt an order to the fields. Click Done. __168. Repeat the above step for all the remaining fields in the W2 document type. They should all have Max and Min values of 1 and an Order of 0. __169. Change the Max, Min, and Order variables for the Verification_Letter_Main_Page so that they are equal to 1 (in other words, the Verification Letter is a single page document). __170. Change the settings for each of the fields in the Verification_Letter_Main_page. They should all have Max and Min values of 1 and an Order of 0.
Page 99
IBM Software
__i. __ii.
Click on the diskette icon to save your changes. Click on the lock icon to unlock the DCO.
4.6
Weve completed defining the structure of our batch, including the types of documents that will comprise the batch, and the page/field characteristics of each document. Now were ready to set up scanning, and then configure how well identify our documents __172. First well make a few minor changes to our VScan ruleset. In order to do so, we have to lock the ruleset.
Click on the VScan ruleset name (shown highlighted above) and click on the lock icon as indicated by the arrow. __173. Expand the VScan ruleset, as well as all its rules and functions. The first action, SetSourceDirectory, tells Taskmaster where to look for images during the virtual scan process. There is a registry variable called vscanimagedir whose value is C:\Datacap\Income_Verification\images. Copy all the sample images from C:\Sample Images\Technical Deep Dive to the C:\Datacap\Income_Verification\images.
Page 100
IBM Software
__174. Note that the SetMaxImageFiles action is set to 4. This means the maximum size of a batch is only 4 pages. This is pretty small lets change this to a larger value.
__i.
Look at the right side of the window. Near the bottom is the Properties tab. This is where we enter the parameters for the actions. __ii. Change the parameter from 4 to 20.
Press the Enter key to make your change effective. The action should now look as follows
Page 101
IBM Software
__i. __ii.
Click on the diskette icon to save your changes. Click on the lock icon. A drop down appears. Select Publish ruleset. This makes your changes available to everyone.
__176. Now lets test the scanning function. Recall that there is an integrated test environment built into Datacap Studio.
Click on the Test tab in the top left corner of the screen. __177. This will take you to the test environment. Look at the Workflow section, which is in the top left corner.
Page 102
IBM Software
By default, the Application Wizard creates three different workflows. We will focus on the Main Job. The rest of the workflows are best left for a more detailed workflow discussion. The Main Job consists of 5 tasks: VScan, PageID, Rulerunner, Verify, and Export. Note that these are the same tasks that were listed in the Task profiles back on the initial Datacap Studio page. __178. Right click on the VScan task and select New.
__179. A new batch is created for us, but we have yet to run any of the rules associated with the VScan task.
Click the green arrow as indicated above. This will start the VScan task. __180. The task should complete quite quickly and a message box will appear.
Page 103
IBM Software
__181. Look at the Runtime batch hierarchy which is on the middle of the left side of the screen.
This shows you the status of the current batch. Note that six pages have been scanned in. They are internally named TMnnnnnn where nnnnnn is a sequential six digit number. Next to the image name is the page type. Recall that the default page type for newly scanned images is Other. This will change this when we run the PageID task. You can click on any of the pages in the batch and the associated image will appear in the middle of the Studio. __182. We are done with this batch for now. Lets just cancel it.
Right click on the batch in the Workflow window and select Cancel.
Page 104
IBM Software
__183. Theres more to the batch than just the image files. Open Windows Explorer and navigate to C:\Datacap\Income_Verification\batches. There will be a subdirectory in there for your new batch. The name of the subdirectory will be based on the date and time that the batch was created. Open the batch directory.
You will see your six image files. There is also a log and an XML file. Logs and XML files are created for each task in the Task profiles. __184. Open the VScan.xml file (double clicking on it should automatically open it within Internet Explorer. Respond Yes to any messages asking if you want to allow scripts to run).
A lot of good diagnostic information is stored in these XML files. We have only scanned the batch so theres not a lot at this point. But it shows you the original source image filename as well as the current page type and status. Close the XML file.
Page 105
IBM Software
__186. Expand the PageID section of the Task profiles (on the right side of the screen).
There are two rulesets that make up the PageID task: ImageFix and PageID. __187. Go to the Rulesets section of the Rulemanager page (in the middle) and expand all of the elements associated with the ImageFix ruleset.
By default, the Application Wizard bound some of these rules to parts of the document hierarchy when it built the skeleton application. Lets see where they were bound.
Page 106
IBM Software
__188. Instead of looking through the Document hierarchy to see where the rules were bound, we can simply sync the rule to the hierarchy. Lets see how to do this.
__i. __ii.
Click on the ImageFix Load Settings rule. Look at the bar that separates the Rulesets tab from the Document hierarchy tab. There are arrows pointing to the left and to the right. Click on the arrow pointing toward the Document hiearchy.
The Document hierarchy will automatically change its view and show you where the ImageFix Load Settings rule is bound. Notice that the ImageFix Load Settings is bound at the Batch level. In other words, when the batch is created (and after the images are scanned), the settings for image enhancement are loaded.
Page 107
IBM Software
__189. Now click on the Enhance Image rule and select the sync views arrow that points towards the Document hierarchy.
All images initially come in with a page type of Other. This is why we bind the Enhance Image rule to the Other page type. This way, all images will get enhanced as soon as the ImageFix task runs. Well see this when we run our test of this task. __190. Now lets take a look at the PageID ruleset. Expand all elements within the PageID ruleset.
There are two rules in this ruleset. The Set Fingerprint Params tells Taskmaster where the fingerprint directory is stored, what area of the image to search for when looking for the fingerprint, and sets the minimum page identification confidence level. The PageID rule actually does the analysis of the image and looks it up in the fingerprint database.
Page 108
IBM Software
__191. Lets examine where these rules are bound to the document hierarchy. Select the Set Fingerprint Params rule and then click the sync views arrow pointing towards the Document hierarchy.
This rule is bound at the batch level, which makes sense since you are setting parameters that will apply to page identification for the entire batch. __192. Now click on the Page ID rule, and then click on the sync views arrow pointing towards the Document hierarchy to see where its bound.
This rule is bound at the page level for the Other page type. Recall that all pages are set as Other when they initially are scanned. So binding the page identification rule to the Other document type ensures all images attempt to use fingerprint identification.
Page 109
IBM Software
__193. One nice feature of Datacap Studio is that all the documentation for the Taskmaster actions is contained online. You can click on an action in the action library and retrieve the information for it. However, the library contains literally hundreds of actions and searching through the library could be time consuming. So theres another Sync Views button which can help us. Lets say that we wanted to know the details behind the AnalyzeImage action.
__i. __ii.
Click on the AnalyzeImage action from the Rulesets part of Studio. Click on the Actions library tab on the right side of Studio. On the bar separating the Rulesets tab from the Actions library is another set of sync view arrows. Click on the sync view arrow that points towards the Action Library.
Notice how the Recog_Shared action set is expanded and the Analyze Image action is highlighted. __194. Right click on the Analyze Image action in the Actions library and select Information.
The online documentation for the action appears. Keep this in mind as you go through the rest of this lab. Close the Information window.
Page 110
IBM Software
__195. Weve reviewed all the rules for the ImageFix and PageID rulesets. No changes were needed. Lets set up some fingerprints so we can test the configuration.
Click on the Zones tab. __196. The top left corner of the Zones tab is the Fingerprint section. Since this is a brand new application, there wont be any fingerprints defined yet. Well add some. Remember that there can be many different fingerprints for a single document type (e.g. a form may change from year to year but it still contains the same essential information). So we will first create a fingerprint class which represents all the fingerprints for a given document type.
Right click on <New > and select Add fingerprint class __197. Enter W2 as the name for the new fingerprint class.
Click OK.
Page 111
IBM Software
__198. Right click on <New> and select Add fingerprint class Enter Verification as the name for the second fingerprint class
Click OK. The Fingerprint section of your Zones tab should like this
__199. Now we will add the individual fingerprints. Right-click on the W2 fingerprint class and select Add fingerprint
Page 112
IBM Software
__200. When you add a fingerprint, you are essentially adding a sample image. Taskmaster will analyze the properties of the image and store those properties in the fingerprint database.
Navigate to C:\Sample Images\Technical Deep Dive and select one of the W2 images. __201. You will be asked if you want to enhance the image using the default image enhancement settings.
Select Yes. __202. The Image Enhancement window opens. There will be two images (the original on the left, the enhanced version on the right). You can also review all the image enhancement settings along the right side of the window. Maximizing the window will make things easier to see. Scroll through the image enhancement settings. Note that there is a section for line removal.
Page 113
IBM Software
__203. Enhancement has NOT been run yet, which is why the two images look the same.
Click the green arrow which will actually run the image enhancement. __204. Note how the image on the right changes. Notably, all the lines are removed. This will enhance recognition capabilities, especially when the characters are very close to the lines.
Page 114
IBM Software
__205. Recall that in the Task profiles, we run the ImageFix ruleset prior to running the PageID ruleset. This means that page identification will be run on the enhanced version of the image. Therefore we want to save the enhanced version of the image as the fingerprint. (Go back to the Rulemanager tab if you need to review the order of the rulesets).
Click on the diskette icon and select Save image. __206. Youll get a message saying that the image has been saved.
__208. The W2 section of the Fingerprints tab should look like the following:
Page 115
IBM Software
Make sure the fingerprint itself is selected (as shown above) __209. Now we have to associate the fingerprint with an actual page type.
From the Type pulldown, select W2_Main_Page. __210. Ensure that the fingerprint now looks like the following
__211. Now we will define the fingerprint for the Verification Letter.
Make sure the Verification fingerprint class is selected. Right click and select Add fingerprint
Page 116
IBM Software
__212. Navigate to C:\Sample Images\Technical Deep Dive and select any of the verification images.
Click Open __213. We will repeat the process that we did for the W2.
Click Yes.
Page 117
IBM Software
Page 118
IBM Software
Click OK
Close the image enhancement window. __216. Ensure that the new fingerprint is selected.
Select Verification_Letter_Main_Page from the Type dropdown. __217. Now the Verification fingerprint should like the following
Page 119
IBM Software
__218. The fingerprints have been defined. While were here, well also identify the zones and associate them with the different fields.
__i. __ii.
Click on W2_Main_Page in the Document hierarchy. Click on the W2_Main_Page fingerprint from the list of fingerprint above the Document hierarchy.
The sample image should be displayed on the right side of the screen.
Page 120
IBM Software
__219. Lock the W2_Main_Page hierarchy by clicking on the lock icon so that you can update it.
__220. Now were going to define the actual zones for each of the fields.
Click on the Borrower field in the hierarchy. __221. Draw a box around the section of the W2 that contains the borrowers name and address.
Page 121
IBM Software
__223. Draw a box around the part of the W2 that contains the employers identification number
__225. Draw a box around the part of the W2 that contains the income amount for the borrower.
Page 122
IBM Software
__227. Draw a box around the part of the W2 that contains the borrowers SSN.
Page 123
IBM Software
__228. Now we will define the zones for the Verification Letter.
__i. __ii.
Select the Verification_Letter in the hierarchy. Select the Verification_Letter_Main_Page from the list of fingerprints.
__229. Expand the Verification_Letter_Main_Page and click on the Borrower field in the hierarchy.
Draw a box around the part of the Verification Letter that contains the borrowers name.
Page 124
IBM Software
Draw a box around the part of the Verification Letter that contains the lenders name and address. __231. Click on the Income_Last field in the hierarchy. Draw a box around the part of the Verification Letter that contains the last years income.
__232. Click on the Income_YTD field in the hierarchy. Draw a box around the part of the Verification Letter that contains the year to date income.
Page 125
IBM Software
__233. Click on the Employer field in the hierarchy. Draw a box around the part of the Verification Letter that contains the employers name and address.
__i. __ii.
Click on the diskette icon to save your changes. Click on the lock icon to unlock the hierarchy.
4.7
Defined the document hierarchy Created fingerprints and associated them with our two page types Created zones for each of the fingerprints and associated them with the fields for each page type
Page 126
IBM Software
One thing that we havent done yet is examine how the documents are constructed from the individual pages that have been scanned and identified. That is our next step. __235. Return back to the Rulemanager tab in Datacap Studio. Select the Task profiles tab and ensure that the Rulerunner task is expanded.
The first ruleset that is executed is the CreateDocs ruleset. __236. Expand the CreateDocs ruleset in the middle section of the Rulemanager screen. The ruleset is made up of two rules: Create Docs and Create Fields. Click on the Create Docs rule (not the ruleset, but the rule, as shown highlighted below). Click on the sync views arrow pointing towards the Document hierarchy.
The Create Docs rule is bound at the batch level. This rule takes the individual pages and creates documents out of them using the page identification information, and the document integrity values of Max, Min, and Order that we saw before.
Page 127
IBM Software
__237. There is a second rule in the CreateDocs ruleset called Create Fields. We identified where zones are located for a fingerprint, and associated those with the fields in our page types. But what we havent done is examine the rule which tells Taskmaster to use that configuration information. The rule that tells Taskmaster to use the configuration information and create the fields is Create Fields. Click on the Create Fields rule. Click on the sync views arrow pointing towards the Document hierarchy.
Note that the rule is associated with the W2_Main_Page. This was done automatically by the Application Wizard when the skeleton application was originally built. But the skeleton application only had one default document and page type. We manually created the second document and page, which is the Verification Letter and Verification_Letter_Main_Page. Expand the Document hierarchy so that you can see the actions associated with opening a Verification_Letter_Main_Page (as shown above). Create Fields is not bound to this new page type. Our next task is to bind Create Fields with our second page type.
Page 128
IBM Software
__i. __ii.
Select the Verification_Letter_Main_Page Click on the lock icon to lock it for editing
Note: You may not see the Global section when you are doing the exercise. This is not a problem please continue.
Page 129
IBM Software
__239. Now well bind the Create Fields rule to the page.
Select the Create Fields rule. Select the Open for the Verification_Letter_Main_Page. Click the Add to DCO button on the bar separating the Rulesets from the Document hierarchy.
Page 130
IBM Software
There are four other rules that are associated with the W2_Main_Page that are not bound to the Verification_Letter_Main_Page. Use the process that you used just now to bind the following rules to Verification_Letter_Main_Page. Ruleset Recognize Export Validate Routing Recognize Page Export Page Fields Validate Page Routing Rule 1 Rule
Page 131
IBM Software
__241. Recall that a default field called Field was created by the Application Wizard. This was for the W2_Main_Page and that field was renamed to Borrower. Take a look at that field in the Document hierarchy.
The Application Wizard automatically associated a rule called Fields Clean with the default field. You can see what this rule is doing if you expand the Clean ruleset. There is one action being executed called DeleteAllMiscChars. This action gets rid of most special characters that are often introduced when noise interferes with recognition. This is added by default and may not be suitable for all applications. However we will use this action since all our fields are purely alphanumeric. __242. None of the fields on the W2_Main_Page, except for Borrower, have the Fields Clean rule bound to them. We will do it for the EIN field.
Select the Fields Clean rule from the Clean ruleset Select the Open associated with the EIN field Click the Add to DCO button on the bar separating the Rulesets from the Document hierarchy.
Page 132
IBM Software
__243. Repeat this process for all the remaining fields on the W2_Main_Page so that they look like this:
Page 133
IBM Software
__244. Repeat this process for all the fields on the Verification_Letter_Main_Page. The one exception is the Borrower field. Recall that we inherited the properties of the Borrower field on the W2_Main_Page. Thats why the Fields Clean rule is already associated with it. The fields for the Verification_Letter_Main_page should all look as follows:
__i. __ii.
Click the diskette icon to save your changes Click the lock icon to unlock the hierarchy
Page 134
IBM Software
4.8
Defined the document hierarchy Created fingerprints and associated them with our two page types Created zones for each of the fingerprints and associated them with the fields for each page type Recognized that the Application Wizard automatically associated some pages and fields with the rules the wizard created. We bound those rules to the pages and fields that we manually added to the hierarchy.
Now lets test the configuration. __246. We need to return to the test environment.
Click on the Test tab. __247. We will create a new test batch.
Right click on VScan and select New. Click the green arrow to run the batch. When the task completes, click the Advance button to move to the next step in the workflow.
Page 135
IBM Software
__248. The batch will be created with six unclassified pages (page type = Other).
Now lets run the page identification task. The task should already be selected (in the drop down list). Click the green arrow to run the Page ID task. Click Advance when the task is complete.
Page 136
IBM Software
__249. The page types should have been set to either W2_Main_Page or Verification_Letter_Main_Page. Now well run the Rulerunner task.
The task name in the drop down list should be Rulerunner. Click the green arrow to run the Rulerunner task and click Advance after it completes.
Page 137
IBM Software
__250. Note how the batch hierarchy changes after Rulerunner completes.
There are now additional elements for each page. These are the fields that we defined.
Page 138
IBM Software
__251. Select the first Verification Letter and expand all the elements under it.
All the fields are listed along with the recognition results. Examine all the rest of the pages to see what the OCR results are like. __252. We previously looked at the PageId.xml file to see what kind of diagnostic information is available to us during processing. Now lets look at the XML files that are created after the Rulerunner task completes. Open Windows Explorer and navigate to C:\Datacap\Income_Verification\Batches and look the for most recent batch folder.
Page 139
IBM Software
An XML file is created for each image, which is where we can see the OCR results for each field, as well the character by character confidence levels. Look at some of the XML files for other pages (TM00005.xml is a good one to examine because of the variety of character confidence levels).
Page 140
IBM Software
There is additional diagnostic information in the Rulerunner.xml file for the batch. Some additional information about the page classification is in here. Youll see a property called Confidence. The confidence of the page classification is rated on a scale of 0 to 1. Youll note that in the example shown here, the first image is classified with perfect confidence not unexpected since this is the image that was used to define the fingerprint. Subsequent images are not perfect matches so youll see confidence levels like 0.9827 or 0.9838 (still highly confident). The Template_ID variable tells you which fingerprint the page was matched to (the template IDs are shown in Datacap Studio). Weve now completed configuring and testing Scanning, Page Classification, and Recognition. Right click on the test batch in Datacap Studio and cancel it. The next step is to move on to visual Verification.
4.9
Of course, in a real environment, we would not be using the test facility of Datacap Studio to process batches. We would use the standard Taskmaster Clients for anything involving a user interface (such as scanning or verification). The verification interface is something that can be customized quite easily. Datacap Taskmaster has two options for building a verification interface: DotEdit and Batch Pilot. We will examine how to use Batch Pilot for building a simple, customized verification user interface.
Page 141
IBM Software
Click Start -> All Programs -> Datacap -> Batch Pilot -> Batch Pilot __255. Click File -> Open Project.
Page 142
IBM Software
__257. Look at the bottom of the project (you may have to move the toolbar) for the batch view.
Expand the batch structure until you see the W2_Main_Page. Right click on the W2_Main_Page and select AutoForm. The AutoForm function automatically creates a default user interface for you. __258. A simple interface is built for you.
Page 143
IBM Software
There are three parts to the UI: field labels, image snippets, and data entry areas. You can rearrange and resize these elements, as well as change fonts and text for labels. Elements can be rearranged simply by dragging them and dropping them from one location to another. __259. Lets make some simple changes. The Borrower field contains a lot of information so we might want to make the data entry area bigger. Also, lets say that your users have told you that they want the image snippet on top of the data entry area instead of next to it.
Feel free to change the UI so that it looks like the above example, or change it to a layout of your own choosing. Just be careful not to delete any elements. __260. Click File -> Save Form As
Page 144
IBM Software
Expand the batch view, right click on the Verification_Letter_Main_Page and select AutoForm.
Page 145
IBM Software
__263. Reorganize and resize the UI elements like you did for the W2. Heres an example of what it might look like:
Page 146
IBM Software
Page 147
IBM Software
__267. Datacap Taskmaster uses industry standard OCR and ICR engines to provide highly confident recognition results. When configured correctly, most pages wont need any manual verification because the confidence levels will be so high. However, as weve seen from the XML files, there may occasionally be exceptions and character recognition wont always produce highly confident results. There are some fields on some pages (like TM00005) where confidence levels for some characters were somewhat moderate. At the same time, we dont want to have to verify every page. Our next task will be to configure confidence levels so that were comfortable skipping most pages, and then configuring visual verification so that only unconfident pages are shown. First, lets take a look at the Routing ruleset. This was executed as part of the Rulerunner task. Return back to the Rulermanager tab in Datacap Studio.
__268. Locate the Routing ruleset and expand all of the elements within.
Routing Rule 1 has a function which calls the ChkConfidence action. This action checks the confidences of all the fields on a page. The first parameter is the minimum acceptable confidence level. If any of the fields on a page have a confidence level lower than that acceptable level, then the page status is set to whatever value is in the second parameter. You can interpret this action as saying if the confidence for any field is lower than 8 out of 10, then set the page status to 1 which indicates that a problem exists. __269. Now were going to update the Taskmaster Client configuration so that it filters out pages that dont have any problems, and only shows pages that have problems. Start the Taskmaster Client for the Income_Verification application by clicking Start -> All Programs -> Datacap -> Applications -> Income_Verification -> Income_Verification Client
Page 148
IBM Software
__271. Expand the Main Job, and select the Verify task.
Click the Setup button. __272. Click File -> Task Settings.
Page 149
IBM Software
Click the Filters tab. We will add filters so that only problem pages are shown. __274. Recall that a page status equal to 1 means that theres a problem.
Select the Verification_Letter_Main_Page from the Type list. Select the STATUS from the Property drop down. Enter 1 as the Problem value. Click the Add button.
Page 150
IBM Software
Select the W2_Main_Page, select STATUS, enter 1 as the problem value, and click the Add button. __276. Now your settings should look like this
Click OK.
Page 151
IBM Software
Page 152
IBM Software
Weve finished setting up the filter so now only problem pages will be displayed. __280. Now lets try out the application using the Client instead of the Datacap Studio Test facility.
Page 153
IBM Software
__281. There may be some entries in the Job Monitor that are left over from our testing. Lets clean things up a bit.
Select all the jobs in the Job Monitor (you can use the Shift key to select multiple jobs at once). Press the Delete button on your keyboard. __282. Youll get two messages asking you to confirm your deletion request.
Click Yes.
Click the Yes to All button. Press F5 to refresh your Job Monitor. All the jobs should eventually disappear. __283. Create a new batch by double clicking the VScan icon in the Operations window.
Page 154
IBM Software
Click Stop when the message box appears. If you dont click Stop within 10 seconds, another batch will be created. __285. Make the Job Monitor the active window and press F5 to refresh it. You should see the new job in the Job Monitor.
Note that the status is that the job is pending the PageID task __286. Double click the Page ID icon to execute the next task.
Page 155
IBM Software
Click Stop when the message appears. __288. Refresh your Job Monitor. Note that the job is awaiting the Rulerunner task.
Page 156
IBM Software
The job is now pending the Verify task. __291. Double click the Verify icon.
Page 157
IBM Software
__292. The verification user interface you create in Batch Pilot should appear.
Note how the low confidence characters are shown in red. You can simply press the Enter key to move from one field to another. The background color changes from yellow to blue, indicating that the field has been verified. NOTE: The documents that you actually requiring verification may differ from what is in the lab guide. This is because there may be slight differences from the way youve drawn the zones as compared to how we drew our zones when creating the lab guide. __293. You will move to the next problem document automatically
Again, press Enter to verify the low confidence fields and move through the fields. The batch should be complete after showing two problem documents. Remember, we configured Verify to filter out any documents that didnt have problems.
Page 158
IBM Software
Click Yes to complete the batch. __294. A completion message will be displayed.
Congratulations! Youve completed the configuration and testing of a Taskmaster application from scratch.
Page 159
IBM Software
__295. Go back to your Datacap Studio session and make sure that you are on the Rulemanager tab. We have a simple customer database on our lab VMware image. The first task will be to create a connection to the database. Well add a rule to the Validate ruleset.
Select the Validate ruleset from the Rulesets section of . __296. Expand the Validate ruleset.
There is one existing rule which is bound at the Page level (you can see this by using the sync views function).
Page 160
IBM Software
__297. We will add a rule at the page level to connect to our customer database. Expand the Lookup actions in the Actions library. Click the lock icon to lock the Rulesets in the middle pane.
Select the OpenConnection action. Select the Validate: Page Function 1 function. Click the Add to function button on the bar separating the Actions library from the Rulesets pane.
__298. Update the parameter for the OpenConnection action with the following parameter: provider=microsoft.jet.oledb.4.0;data source=C:\CustomerDB.mdb; persist security info=false This should all be on one line.
With this update, we will create a connection to our customer database whenever we process a W2 main page.
Page 161
IBM Software
__299. Right click on the Validate ruleset and select Add Rule A default rule and function will be created for you. Rename the rule to Validate SSN Rename the function to Lookup SSN
__300. Go to the Actions library and expand the Validations set of actions. Select the SetIsOverrideable action.
This action, when bound to a field, will generate an error if the subsequent validation action fails. Users will not be able to complete processing a batch if a non-overrideable field fails validation. __301. Add the SetIsOverrideable action to the function.
Click on the SetIsOverrideable action Click on the Lookup SSN function Click the Add to function button
Page 162
IBM Software
Select the SmartSQL action. Click on the Lookup SSN function. Click the Add to function button
__304. Note that there are two parameters for the SmartSQL action. Update the parameters with the following values: First parameter: Select SSN from CustTable Where SSN=+@F+; Second parameter: No
This action will perform the actual lookup on the CustTable table in the Customer database. The lookup will be based on the current SSN field value. If the SSN is found, then the action will return true It will return false if the SSN is not found. The action should look like this now:
Page 163
IBM Software
__305. Save and publish the ruleset. Click on the diskette icon to save your changes. Click on the lock icon and select Publish ruleset
__306. We created a new rule so the rule needs to be bound to the Document hierarchy.
First, click on the lock icon in the Document hierarchy. Expand the W2 document so that you can see all the fields. __307. We will bind the Validate SSN rule to the SSN field.
Click on the Validate SSN rule in the Ruleset section. Click on the SSN field in the Document hierarchy. Click on the Add to DCO button
Page 164
IBM Software
__308. Click on the diskette icon to save your changes to the Document hierarchy. Click on the lock icon to unlock the hierarchy. __309. We have made all the required changes to perform the lookup. We will test it now. Create and process a new batch from the Income_Verification client by clicking on the VScan, PageID, Rulerunner, and Verify icons. __310. Note what the fields look like in the Verify client when you get to the first W2.
The SSN has a light red background. This means that, while the OCR was OK, the field has failed a validation action. __311. Try to tab through all the fields so you can go to the next W2 document.
You will get an error message saying that the validation failed and it must be corrected. It cannot be overridden.
Page 165
IBM Software
__312. Change the SSN value to 111-22-3333. This is a value that is in our customer database.
Press Enter and youll be able to move to the next document. __313. The next document has a combination of confidence issues and another validation error due to the SSN not being in the customer database.
Change the SSN to 111-22-3333 so that you can complete processing the branch. Note: The error message that is currently displayed is very generic and not meaningful to end users. There are ways to add additional information to the error message box but that is outside the scope of this lab __314. Continue to process the rest of the batch (there is only one more document). You will get the message
Page 166
IBM Software
Click Yes. Weve finished processing the batch. Congratulations! Youve completed the configuration and testing of a Taskmaster application from scratch. NOTE: Weve omitted the configuration of the export function. This is explored in a separate lab, which we encourage you to run through if you are interested.
Page 167
IBM Software
Lab 5
5.1
Overview
NENU stands for New Enhanced Notification Utility. NENU is a monitoring and notification tool. You can use NENU to monitor Taskmaster applications, query batch information, change batch settings such as order or status, delete batches, send email notifications, and move batches from one application to another. NENU monitoring can be done manually through the NENU Manager, or set to run automatically at specific times using Windows Task Scheduler. Some sample uses of NENU include Notifying the system administrator when something has gone wrong with a Taskmaster component Correcting expecting problems Generating data that will be used later for RV2 reporting Deleting/archiving processed batches.
It is this latter use case that we will cover in our lab. This should give you an idea of how NENU monitoring is implemented and what its capabilities are. We will search for batches that have a status of Job Done. Those batches will be deleted from the application, the batches will be archived to an alternate location on the hard drive, and the database records associated with the batch will be moved to a different application for reference purposes.
5.2
NENU monitoring settings are configured through the use of Datacap Studio. NENU capabilities are implemented as Datacap actions. Well create a Datacap application but since were not processing batches in the typical sense, we will remove much of the scaffolding that is automatically created by Datacap. __315. Start Datacap Studio by clicking Start -> All Programs -> Datacap -> Datacap Studio -> Datacap Studio
Page 168
IBM Software
__316. We will create a new application instead of connecting to an existing one, so Close the connection screen.
__317. Start the Datacap Application Wizard by clicking on the icon in the upper right corner of the Datacap Studio.
Click Next.
Page 169
IBM Software
Click Next. __320. Enter a name for the application like NENU_Application.
Click Next.
Page 170
IBM Software
__321. We arent going to create a typical application, so we will ignore most of the setup screens.
Click Next. __322. Click Next again for the Fingerprints and Sample Images screen.
Page 171
IBM Software
__324. A summary screen will be displayed when the skeleton application is created.
Click Close. __325. Now we will connect to our newly created skeleton application.
Click on the icon for the Connection Wizard in the upper right corner of the screen. __326. Select the NENU_Application.
Page 172
IBM Software
__328. The skeleton application is created. It is setup with a simple document hierarchy. Typical rulesets and tasks that most applications use are automatically created.
Page 173
IBM Software
__329. We arent actually creating/processing batches of documents. Rather, were going to monitor the progress of existing batches. So none of the typical rulesets are needed. Indeed, were going to have to delete all of them.
Click on the lock icon on the Rulesets section of Datacap Studio. __330. Select the Vscan ruleset and click on the delete icon.
Click Yes to confirm that you want to delete the VScan ruleset. __331. The VScan ruleset is used by the sample document hierarchy. Deleting the ruleset means that the document hierarchy contains references to rules that no longer exist. A message appears informing us of that fact.
Page 174
IBM Software
__332. Repeat this process for ALL the rulesets. All the rulesets should be gone when you are done.
Page 175
IBM Software
__334. Select the Actions Library tab on the right hand side of the screen. Scroll through the actions. Locate and expand the NENU set of actions.
__335. Select Function1. Select the SetUser action from the Actions Library. Click the Add to function button that is on the bar between the Rulesets and Actions Library panes.
Page 176
IBM Software
__i. __ii.
Select the SetPassword action from the Actions Library. Click the Add to function button.
__337. Repeat this process to add the following actions to Function1. SetStation SetApplication SetupOpenApplication QuerySetStatus ProcessRunSqlQuery ProcessMoveBatches ProcessMoveDBRecords
Page 177
IBM Software
__338. Now we will start updating the parameters for these actions.
__i. __ii.
Select the SetUser action from the Rulesets pane. Enter admin in the parameter area under the Actions Library pane. This will update the action with the correct parameter.
SetPassword admin SetStation 1 SetApplication APT (we will be monitoring the APT application) QuerySetStatus Job Done (the job status changes to Job Done when the final Export step is complete). ProcessMoveBatches C:\Completed_Batches (this is the directory location where completed batches will be archived) ProcessMoveDBRecords NENU_Application,,,False,admin,admin,1,True (after the batch is archived, well move the database records regarding the batch to this NENU_Application application).
Page 178
IBM Software
The parameter list for the ProcessMoveDBRecords is a little difficult to read as a single line so heres the screen capture for that Parameter section of Datacap Studio
__i. __ii.
Click on the diskette icon to save your changes. Clock on the lock icon and select Publish ruleset from the dropdown.
Page 179
IBM Software
__341. We will bind our newly created ruleset to the Document hierarchy.
Click the lock icon on the Document hierarchy section of Datacap Studio. You will get another message reminding you that you deleted a whole bunch of rulesets that were linked to the Document hierarchy.
This time, youll get the option to remove all those non-existent rulesets. Click Yes to remove those unnecessary references. __342. Expand the Document hierarchy so that you can see the Open section under NENU_Application (see the picture below).
Select the Open section. Select Rule1 from the Rulesets pane. Click the Add to DCO button on the bar separating the Document hierarchy from the Rulesets.
Page 180
IBM Software
__343. Click on the diskette icon to save your changes. Click on the lock icon to unlock the Document hierarchy.
__344. Finally, well add a new Task that is specific to our NENU monitoring rules. Select the Task profiles tab on the right side of Datacap Studio
Click on the lock icon to lock the tasks for editing. __345. Click on the Add task icon.
Page 181
IBM Software
__346. A window appears allowing you to select typical tasks. We will not use any of these tasks.
Select the Custom task at the bottom of the window, enter a name of AutoDelete, and click OK.
Select the AutoDelete ruleset. Select the AutoDelete task. Click the Add ruleset to profile button.
__348. Click the icons to Save and Unlock the task profiles.
Page 182
IBM Software
5.3
Testing NENU
We will use the APT application to test our configuration. It was recommended that you run through the APT lab before doing anything else since APT is such an easy way to test Taskmaster capabilities. Youll need a completed APT batch. If a completed batch doesnt exist, go to the APT (Accounts Payable) lab and run through the first section to bring a batch to completion. __349. Start the APT Client
Click OK. __351. Make sure you have a job that has completed APT processing.
Page 183
IBM Software
__352. Now we will use NENU Manager to manually start the batch monitoring. Remember that this could also be scheduled to run periodically using the Windows Task Scheduler.
Click Start -> All Programs -> Datacap -> Taskmaster Client -> NENU Manager. __353. NENU Manager will start.
Click Create to create a new monitor setting. __354. Expand the RRS Application settings, if its not already done.
Change the lib parameter to NENU_Application Change the tprofile parameter to AutoDelete Click the Save button.
Page 184
IBM Software
Click the Run Profile button. A confirmation message appears when the task is complete.
Make sure the Job Monitor is the active pane and click F5 to refresh it. The completed job should be gone from the Job Monitor. __357. Open Windows Explorer and navigate to C:\Completed_Batches.
The batch directory has been moved to this archive location. __358. Recall that we also moved the database records for this batch to the NENU_Application. Open the client for that application to see the records.
Page 185
IBM Software
Click Start -> All Programs -> Datacap -> Applications -> NENU_Application -> NENU_Application Client. __359. Logon with a userid/password of admin.
Note that the database records have moved to this application (your screen may look a little different based on the number of batches you have archived). Congratulations! Youve completed the NENU Monitoring lab. Now you should have a better idea of how to use the NENU tool to monitor batches and take actions on them.
Page 186
IBM Software
Lab 6
6.1
Overview
In this lab, we will see how easy it is to take a Datacap Taskmaster application and update it so that content is stored in the Filenet P8 content repository. We will start with the existing 1040EZ application that comes automatically installed with Datacap.
6.2
__1.
__2.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 187
IBM Software
__3.
__4.
The main Datacap Studio page is opened. There are three sections: the Document Hierarchy on the left, the Actions Library/Task Profiles on the right, and the Rulesets in the middle.
__5.
We are going to start by adding a new Ruleset. This new ruleset will define all the actions needed to logon to a P8 repository and upload documents to it.
__i. __ii.
Select the 1040EZ overall rulesets Click the Add ruleset icon
Page 188
IBM Software
__6.
We will need two rules in our new ruleset. One will act at the batch level and login to the respository (since we only need to login once per batch). The other will act at the page level and upload each document to the repository. So we will need to add a rule to the new ruleset.
Right-click on the new ruleset and select Add Rule __7. Now that we have the necessary number of rulesets, rules, and functions, lets rename them to something more meaningful. Single-click on the name Ruleset1 so that you can rename it (similar to how you would do this in Windows Explorer). Change the name of the ruleset to Export to P8. Single-click on the name Rule1 and change the name to Batch Level Rule. Single-click on the name Rule2 and change the name to Page Level Rule. Under Rule1, Single-click on the name Function1 and change the name to Login to P8 Under Rule2, Single-click on the name Function1 and change the name to Upload Tax Form Your ruleset should now look like the following:
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 189
IBM Software
__8.
Click on the Actions Library tab on the right side of the screen
Locate and expand the FileNetP8 actions. These are all the actions that Taskmaster can perform on a P8 repository. __9. Now well add the action to our Login rule that sets the URL for the P8 repository. This is the URL for the P8 web services interface.
On the Rulesets pane, select the Login to P8 function On the Actions Library pane, select the FNP8_SetURL action. Click the Add to function button on the bar separating the Rulesets and Actions Library panes.
Page 190
IBM Software
__10.
Repeat the above process to add the following actions to the Login to P8 function: __i. __ii. __iii. FNP8_SetTargetObjectID FNP8_SetTargetClassID FNP8_Login
__11.
Select the FNP8_SetURL action in your rule. On the right side of the screen (under the Actions Library) is where you can set the parameter values. Enter http://hqdemo1:9080/wsi/FNCEWS35DIME as the URL value. Press Enter to confirm your change to the parameter. The action should now look like this:
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 191
IBM Software
__12.
Follow the above process to change the parameters for the next two actions. __i. __ii. __iii. Change the parameter for FNP8_SetTargetObjectID to ECM Change the parameter for FNP8_SetTargetClassID to Objectstore Change the parameter for FNP8_Login to administrator,filenet
So what weve done here is specify the web services interface URL for P8, say that we are going to store content in the ECM objectstore, and then login to P8 with userid administrator, and password filenet. __13. Now well add the actions for the Upload Tax Form function.
Select the Upload Tax Form function Select the FNP8_SetDocClassID action from the Actions Library. Click the Add to function button
Use the same process to add the following additional actions to the function. FNP8_SetDocTitle FNP8_SetProperty FNP8_Upload
Page 192
IBM Software
Then change the parameter values as follows: Change FNP8_SetDocClassID parameter to TaxForm Change FNP8_SetDocTitle parameter to 1040EZ Change FNP8_SetProperty parameter to SSN,@TaxpayerSSN
What the Upload Tax Form function will do is upload the scanned image to P8, using a document class of TaxForm and setting the SSN property to whatever was OCRd in the TaxpayerSSN field. __15. Well save our changes.
Click the diskette icon to save our changes Click the lock icon and select Publish ruleset to make our changes available
Now lets associate or bind the rules to appropriate levels of the document hierarchy.
First well lock the Document Hierarchy for editing. Click on the lock icon
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 193
IBM Software
__17.
Expand the 140EZ document hierarchy so that you can see the global rules that get executed when a new batch is opened.
Click on global in the document hierarchy Select the Batch Level Rule from your Rulesets pane Click on the Add to DCO button
Weve bound the batch level rule (which simply logs into the correct object store within P8) to the main 1040EZ batch. This will get executed once, whenever a batch of 1040EZ docs are scanned. __18. The document hierarchy should be updated to look like this now:
Page 194
IBM Software
__19.
Now lets take care of binding the rule that does the actual upload of docs. We could bind this to either the document or page level because the 1040EZ is a single page document. Binding at the page level is a little easier so, for the purposes of this lab, well do it that way.
Expand the document hierarchy for the Page_1040EZ page and select the global open. Select the Page Level Rule from the Rulesets pane. Click the Add to DCO button.
The document hierarchy for the 1040EZ page should now look like this:
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 195
IBM Software
__21.
Click the diskette icon to save our changes Click the lock icon to unlock the hierarchy.
Finally, we need to update one of the task profiles. Tasks are elements of the capture workflow and theyre executed in an orchestrated fashion (as opposed to a simply sequential one). Tasks can be repeated or executed conditionally.
Click on the Task Profiles tab and click the lockicon to lock the profiles for editing.
Page 196
IBM Software
__23.
Expand the Export task. This task is executed after all the documents have been scanned, processed, and manually verified. Right now, the only thing that is being executed are rules that export OCRd fields to a database.
Select the Export to P8 ruleset from the Rulesets pane. Select the Export task in the Task Profiles. Click the Add ruleset to profile button.
__i. __ii.
Click the diskette icon to save your changes. Click the lock icon to unlock the profiles.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 197
IBM Software
6.3
Weve completed making the necessary changes to the application. Now we can test our changes. __25. Start the 1040EZ client by clicking Start -> All Programs -> Datacap -> Applications -> 1040EZ -> 1040EZ Client
__26.
Page 198
IBM Software
__27.
__28.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 199
IBM Software
Click the Stop button to stop the scanning process. __29. Now we will run the background processes which do things like document integrity checking, image enhancement, page classification, and OCR. Click the Background icon.
Page 200
IBM Software
Click the Stop button. Another status message will appear indicating the recognition is running.
Click the Stop button __30. Now we need to manually verify our results.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 201
IBM Software
Click on the Verify/Fixup icon and the Verification client will start.
You can tab through the results to see how the OCR did. Fields with a yellow background indicate a low confidence read occurred. Fields with a blue or teal background with recognized with a high confidence. Click the Next Problem button on the taskbar to go to the next problem document.
There are no other documents in the batch which had recognition issues so youll be asked if you want to close the batch.
Select Yes.
Page 202
IBM Software
Click OK __31. Finally, we will run the Export task. This is where our new rulesets will get executed.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 203
IBM Software
If we have made our changes correctly, the completion message will indicate that the task finished successfully.
Click the Stop button. __32. The last thing we need to do is see if the documents were uploaded to P8 properly. Start the Filenet P8 Workplace XT client using the icon on the desktop.
__33.
Page 204
IBM Software
__34.
Click on the Search icon from the main Workplace XT screen, located in the top left corner.
__35.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 205
IBM Software
Select the Tax Form document class and click OK. __36. Change the search criteria to search for documents where the Document title starts with 1040EZ
Click the Search button. __37. If we did our work correctly, well see a 1040EZ was uploaded with the current date and time.
Page 206
IBM Software
You should see that the SSN field was updated with the OCRd value.
Congratulations!! Youve completed the process of integrating Taskmaster with Filenet P8.
Lab 6 - Integrating IBM Datacap Taskmaster with the Filenet P8 ECM Repository
Page 207
IBM Software
Lab 7
7.1
The IBM Datacap Taskmaster Capture Connector for Email and Electronic Documents is an optional component of the IBM DatacapTaskmaster product. This connector allows for easy capture of emails and/or electronic attachments, including Microsoft Word, Excel, PDF and zipped files. This optional connector allows customers to capture a wide variety of electronic content with a unified approach. The same classification, data extraction, quality control, and export capabilities can used across a wide spectrum of content.
7.2
Lab Overview
In this lab, we will explore how we integrate email systems with Datacap Taskmaster. We will modify the APT (Accounts Payable) application for invoice processing. In the out of the box application, invoices are scanned in or imported from a file system. In this lab, you will modify the application configuration to monitor a Microsoft Outlook mail box for inbound emails. Many customers utilize email boxes as an alternate method of receiving content. Customer correspondence can be directed to custserv@xyzcorp.com. Or invoices can be sent to AcctPayable@abcEnterprises.com. Emails will be retrieved from the mailbox and attached invoices will be processed as though they were scanned input. Once processed, emails are filed into other mailboxes for later archival. Problem emails are directed towards a different mailbox where an administrator can review them. For simplicity sake, we will assume that all attachments are received as TIFF images. In a product environment, the attachments could be a variety of formats Word, PDF, and JPG are possible alternatives. Datacap Taskmaster is fully capable of handling these document types with the Datacap Connector for Email and Electronic Documents.
7.3
__1.
Getting Started
There is a folder called Start ECM on the desktop of your VMware image.
Page 208
IBM Software
__2.
Open the folder called Email Exchange __3. Double-click on the program called step 1 Start Exchange. This will start the MS Exchange server.
A window will open showing the progress of the server startup. Wait for the window to disappear this means that Exchange has started. __4. Now well open up Datacap Studio so we can make a copy of the APT (accounts payable) application.
Click Start -> All Programs -> Datacap Studio -> Datacap Studio
Page 209
IBM Software
__5.
Click on the Close button __6. We will use Datacap Studios Application Wizard to copy an existing application.
Click on the Application Wizard (as shown by the red arrow above), located in the top right hand corner of Datacap Studio.
Page 210
IBM Software
__7.
Click Next. __8. Select the option to Copy an existing RRS application
Click Next
Page 211
IBM Software
__9.
Select the APT application from the drop down list. Select the box to rename the copy. Enter APT_Email as the name of the new application
Page 212
IBM Software
__11.
Click Close __12. Now we will connect to the copy of the APT application that we created.
Click on the Connection icon in the top right corner of Datacap Studio (as shown by the arrow above)
Page 213
IBM Software
__13.
Select the APT_Email application Click Next __14. Enter admin for both the userid and password
Page 214
IBM Software
__15.
There are three sections to the Datacap Studio. The leftmost section is called the Document Hierarchy and it describes the structure of the batch. The rightmost section is where the Actions Library and Task Profiles are located. The Actions Library is a reusable library of commonly used capture functions. The Task Profile is the list of tasks that make up the capture workflow. In the middle are the Rulesets, which are groupings of actions and rules. This is where the heart of an applications configuration is located. Expand the VScan ruleset so we can can examine what is happening here.
VScan stands for virtual scan and it works by scanning a directory for images. You can see what is happening here. It sets the directory its going to scan, sets the maximum number of images in a directory, says that it will accept multi-page TIFFs, and then starts scanning the directory for TIFF images to ingest. __16. Click on the Task Profiles on the right side of Datacap Studio and expand the VScan task.
The only ruleset that is being executed by this task is VScan. We will replace the VScan ruleset with a new ruleset that will monitor the Exchange mailbox.
Page 215
IBM Software
__17.
Select Add Ruleset __18. A new ruleset will be added to the bottom of the page.
__19.
You can rename any of these elements by single clicking on it (similar to how youd rename a file from Windows Explorer).
Rename the ruleset to Email Scan Rename the rule to Monitor Mailbox Leave the name of the function as Function1.
Page 216
IBM Software
__20.
Now we will start adding actions to our new rule. Select the Actions Library tab on the right side of Datacap Studio
Locate and expand the Ewsmail set of actions. These are all the reusable actions that relate to Exchange eMail.
Page 217
IBM Software
__21.
The ex_Types action defines the allowable attachment types that Taskmaster will process. The default is PDF but you can set it any type that you want to process. We will add this action to our function.
Select the ex_Types action from the Actions Library. Select Function1 from the Rulesets pane Click on the Add to function button, which is on the bar that separates the Rulesets pane from the Action Library pane.
Make sure the ex_Types action is selected. __23. Just under the Actions Library is where you can enter parameter values for the action. For our lab, we are going to limit the types of allowable attachments to TIFFs. You would monitor for a much wider scope of attachment types in a production environment.
Page 218
IBM Software
__24.
__25.
We will add another action to our function. The function is the ex_done_folder.
After Taskmaster is finished processing an email, it will file the email in the folder specified by the ex_done_folder action. __i. __ii. __iii. __26. Select ex_done_folder from the Actions Library Select Function1 from the Rulesets pane Click the Add to function button
Page 219
IBM Software
__27.
There is an Exchange folder called Processed. We want to file processed emails in there.
Select the ex_done_folder from the Ruleset pane and enter the parameter value of Processed __28. The function should now look like this
__29.
There are six more actions that need to be added to the function. Use the process youve used previously in this lab to add the following actions to Function1 and assign the appropriate parameter value
ex_problem_folder (parameter value is Problems). This is where emails that Taskmaster cannot process are filed. ex_wait_time (parameter value is 10). This is the time, in seconds, to wait for a full batch of emails. ex_ews_version (parameter value is 1). This specifies the exact version of Exchange we are using (1 means Exchange 2007 SP1) ex_max_docs (parameter value is 20). This is the maximum number of emails in a single batch. ex_login (parameter values are https://hqdemo1dom.filenet.com/EWS/Exchange.asmx, datacap_Service@hqdemo1dom.filenet.com, filenet). This is the URL for the Exchange server, and the userid/password to logon to the server with. ex_scan (no parameter required). This is the action to perform the actual mailbox scan
Page 220
IBM Software
__30.
Select the Email Scan ruleset you created. Click on the diskette icon to save your changes Click on the lock icon and select Publish ruleset from the dropdown
Page 221
IBM Software
__31.
Now we will update the VScan task to use our new ruleset.
Select the VScan task from the Task Profiles tab. Click on the lock icon to lock the task for editing.
__i. __ii.
Select the VScan ruleset (make sure its the ruleset and not the task). Click the delete icon.
Page 222
IBM Software
__33.
Select the Email Scan ruleset you created. Select the VScan profile from the Task Profiles tab Click the Add ruleset to profile
Make sure that the VScan task has been updated correctly. It should look like this
We need to save the changes you made to the Task Profile. Click on the diskette icon to save the changes. Click on the lock icon to unlock the task. __35. Lastly, we need to bind the new rule to the appropriate part of the Document Hierarchy
Select the top of the Document Hierarchy and click on the lock icon to lock it for editing.
Page 223
IBM Software
__36.
The rule to monitor the mailbox for emails is bound at the batch level. That is because this is the rule that actually creates the batch.
Expand the Document Hierarchy as shown above. Select the (global) part of the hierarchy. Select the Monitor Mailbox rule that you created. Click on the Add to DCO button that is on the bar separating the Document Hierarchy from the Rulesets.
Click on the diskette icon to save your changes. Click on the lock icon to unlock the hierarchy. Were done making changes in Datacap Studio! Now to test our configuration. __37. Start the Outlook client.
Page 224
IBM Software
__38.
We will logon as the Administrator and send an email with an invoice attached to it.
Select Administrator from the Profile Name pulldown and click OK. __39. The Outlook client opens with you logged on as the Administrator.
Click on the icon to compose a new email. __40. We will compose our email.
Send the email to datacap_Service@hqdemo1dom.filenet.com. This is the owner of the mailbox that we are monitoring.
Page 225
IBM Software
Click on the icon to attach files to the email (as indicated by the red arrow). __41. Navigate to My Documents
Select Invoice_0001.tif and Invoice_0002.tif (you can select multiple documents by holding down the Shift key). Click Insert __42. Make sure that the invoices are attached. Add a subject line if you like (its not mandatory).
Click the button to Send the email. Close the Outlook client.
Page 226
IBM Software
__43.
Now well log back on to the Outlook client and see whats in the datacap_Service mailbox. Start the Outlook client by clicking on its icon on the taskbar.
Select datacap_Service from the profile dropdown and click OK __44. You will be prompted to enter a password. Type in filenet and click OK. Note: The reason you werent asked to enter the Administrators password is that youre already logged on to Windows as Administrator.
__45.
Note that in the Inbox are two subfolders called Problems and Processed. Close the Outlook client.
Page 227
IBM Software
__46.
Double click on the icon to start the Taskmaster Client. __47. Logon with userid and password admin.
__48.
Start the mailbox scanning task by clicking on the Scan icon from the Operations window.
__49.
Click OK.
Page 228
IBM Software
__50.
Youll get a completion message when the task is complete (remember that the task will wait a full 10 seconds to see if any additional emails are received, based on the value you used for the ex_wait_time action).
Click Stop
Page 229
IBM Software
__51.
Refresh the Job Monitor window by clicking on it (to make it active) and pressing F5. You should see your job listed there, pending the Batch Profiler step.
Double click on the Background icon to start the background processor, which will take care of the Batch Profiler step. As with VScan, youll get a status window and a completion message. __52. Refresh the Job Monitor by clicking F5 after the background process has completed. The job should now be waiting for verification.
Page 230
IBM Software
__53.
So we see that Taskmaster successfully extracted the attachment from the email and processed it just as if it were a scanned document. __54. Click the Next Problem icon (as shown above by the red arrow) to move to the next problem document. View the results and click the Next Problem icon again.
The batch will be complete so click Yes to finish with the batch.
Page 231
IBM Software
__55.
Click Stop __56. Lets take one more look at the datacap_Service email box. Start the Outlook Client by clicking on the icon on the taskbar.
Select datacap_Service from the dropdown and click OK. __57. Enter filenet for the password.
Page 232
IBM Software
__58.
__59.
The email that was in the Inbox has been moved to the Processed folder, as we configured Taskmaster to do. Close the Outlook client. Congratulations! Youve completed the Email Integration lab. Now you have a better understanding of how easy it us to integrate email systems with Datacap Taskmaster.
Page 233
IBM Software
Lab 8
8.1
Batch Splitting
Lab Overview
One of the key advantages of Datacap Taskmaster is its fluid workflow. Many capture solutions force you into a fixed processing sequence. Problems with a single document in a given batch mean that the entire batch is held back until the issue is resolved. There isnt any way to do conditional processing for example, taking additional validation steps if recognition confidence levels arent optimal or have special problematic documents go to an administrator for additional research. In this lab, we will update a very simple sample application to illustrate the batch splitting and workflow capabilities of Datacap Taskmaster. As it stands, the application does some basic check processing. The amount of the check is read and low confidence reads on the check amount are sent to a verification client. Lets say that there is a requirement for checks over a certain amount ($1,000.00 in this example) to go to a supervisor for their review. Well add that logic to our application.
8.2
Lets take a look at the basic application that weve created for our lab. __1. Start the sample client. We have called the application SplitBatch. Open Windows Explorer and navigate to C:\Datacap\SplitBatch Locate the link for SplitBatch Client and double click on that to start the client application. __2. Logon with userid and password admin
Page 234
IBM Software
__3.
__4.
Page 235
IBM Software
__5.
Now run Page Identification by clicking on the Page ID icon in the Operations window.
Click Stop to end the page identification task. __6. Now we will run the Rulerunner task. This task performs the character recognition, among other activities.
Page 236
IBM Software
Click Stop __7. Finally, we will run the Verification client to see our processing results.
Double click on the Verify/FixUp icon in the Operations window and the Verification user interface is displayed.
This is a very simple sample application. We are only recognizing the check amount here. Also, the client is configured to only show problem images. Anything that was recognized with a high confidence is not displayed for verification. Click the Next Problem icon (its the blue arrow with the red question mark above it on the toolbar). Its indicated on the above screen capture by the red arrow in the top left corner.
Page 237
IBM Software
A message window is displayed saying that there are no more problem documents. Click No so that we can look at the other documents that were processed in the batch. Click Ctl+Shift+P to go to the previous document. Do this until you get to the beginning of the batch, when youll get the following message:
Click Cancel. Note the check image that is at the beginning of the batch.
We have one check in the batch which is for an amount greater than $1000.00. We are going to update the application so that checks greater than $1000.00 go to a special processing queue. Click the Ctl+N to go through the four checks again so that when you get to the end, youll get the message
Page 238
IBM Software
8.3
Now that weve seen how the sample application works, lets modify it to add in our exception handling logic. __8. Start Datacap Studio by clicking Start -> All Programs -> Datacap -> Datacap Studio -> Datacap Studio
Page 239
IBM Software
The main Datacap Studio page is shown. See how Datacap Studio is essentially split into three sections: theres the Document Hierarchy on the left, the Action Library and Task Profiles on the right, and the Rulesets in the middle. The Rulesets are where our processing logic is defined. __9. We could create a separate ruleset for our logic or we can just add some rules to an existing one. Either option is fine. For our lab, well just add some rules to an existing ruleset. Well modify the Routing ruleset.
__i. __ii.
Select the Routing ruleset Click on the lock icon to lock the ruleset for editing.
Page 240
IBM Software
__10.
Right click the Routing ruleset and select Add Rule __11. Change the name of the newly created rule and the function underneath it by single clicking on the name of the rule (or function), then typing in the new name. This is similar to the way you would rename objects using Windows Explorer.
Change the name of the new rule to Split Based on Amount and change the name of the function to Check Amount. This function will check the value of the field to which it is bound. If the field is greater than or equal to $1000.00, then we will set a flag for the system to place that document in a separate batch. __12. Click on the Actions Library tab on the left hand side. Locate the Validations actions and expand it. Locate the action called IsFieldGreaterOrEqual
Page 241
IBM Software
__13.
Click on the IsFieldGreaterOrEqual action in the Action Library pane Click on the Check Amount function in the Ruleset pane Click on the Add to function button which is on the bar separating the two panes
Now locate the DCO actions set in the Actions Library pane. Expand those and locate the SetPageStatus action.
Click on the SetPageStatus action in the Actions Library pane Click on the Check Amount function in the Rulesets pane. Click on the Add to function button
Page 242
IBM Software
__15.
Finally, well add one more action to our function. Locate and expand the rrunner set of actions. Find the rrSet action (dont confused this with the rr_Set action. That is an older action and has been replaced with rrset).
Click on the rrSet action in the Actions Library pane Click on the Check Amount function in the Rulesets pane Click on the Add to function button.
__16.
Now we will set the parameter values for these actions. Click on the IsFieldGreaterOrEqual action in the Ruleset pane.
Page 243
IBM Software
Click on the Parameter field on the right side of the screen, under the Actions Library pane. Enter a value of 1000.00. Press Enter. Your action should now appear as follows:
__17.
Repeat this process for the other two actions. Set the parameter value for SetPageStatus to 1. There are two parameter values for rrSet. The first parameters should be Yes. The second parameter should be @D.Split
What will happen is that the field value will be examined. If it is greater than or equal to $1000.00, the status for the page will be set to 1 (which means that there is a problem with the page) and a document level variable called Split will be set to Yes. __18. We will now add a second rule to the Routing ruleset.
Right click on the Routing ruleset and select Add Rule Change the name of the rule to Split Batch. Change the name of the function in the new rule to Perform Split. It should now look like the following:
Page 244
IBM Software
__19.
Select the SplitBatch action and add it to the Perform Split function, using the same process that you used to update the Check Amount function. Change the parameter value for the SplitBatch action to be @D.Split
This rule will be bound at the batch level. When the batch is finished, the SplitBatch action will cause all the documents in the batch to be examined. Any document whose Split variable is set to Yes will be sent to a separate queue. __20. Save the changes made to the rulesets.
__i. __ii.
Click the diskette icon to save your changes. Click the lock icon and select Publish ruleset.
Page 245
IBM Software
__21.
Now we will bind these newly created rules to the proper parts of the document hierarchy.
Click the lock icon on the Document Hierarchy pane so that it is locked for editing. Expand the Document Hierarchy as shown below so that you can see the actions executed when the Amount field is opened.
Select the Split Based on Amount rule from the Rulesets pane Select the Open part of the Document Hierarchy associated with the Amount field. Click on the Add to DCO button on the bar separating the Rulesets and Document Hierarchy panes.
Page 246
IBM Software
The hierarchy of the Amount field will now look like this:
__22.
Expand the very bottom of the hierarchy so that you see the actions executed when the batch is Closed
Select the Split Batch rule from the Rulesets pane Select the Close part of the hierarchy associated with the batch. Click on the Add to DCO button
Page 247
IBM Software
Your document hierarchy should like the above screen capture. __23. Save your changes to the document hierarchy.
__i. __ii.
Click the diskette icon to save your changes Click on the lock icon to unlock the document hierarchy.
8.4
__24.
Page 248
IBM Software
__25.
Do NOT click the Verify icon just yet. After Rulerunner is finished, make the Job Monitor the active window and click on F5 to refresh it.
Note that the one batch that you created has now been split into two. Some documents are in the Main verification queue (as usual) and some documents have gone to the Supervisor verification queue. __26. Click on the Supervisor Verify icon in the Operations pane.
It is the check whose amount is $1,210.20. Click the Next Problem icon (indicated above by the red arrow).
Page 249
IBM Software
There arent any more documents in the batch so click Yes. A completion message is displayed.
Click the Stop button __27. Select the normal Verify/Fixup icon from the Operations pane.
There are only three documents in the batch, and the only one that is displayed is the one with low confidence characters
Click the Next Problem icon to move to the next document in the batch. There shouldnt be any further documents to process.
Page 250
IBM Software
Click Stop. Congratulations! Youve completed the Batch Splitting lab. You should now understand some of the powerful workflow capabilities of Datacap Taskmaster. You are no longer tied to a fixed, sequential process. Instead, you can use logic to determine which tasks you want to perform, and which users will handle manual processing tasks.
Page 251
IBM Software
Appendix A. Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: IBM World Trade Asia Corporation Licensing 2-31 Roppongi 3-chome, Minato-ku Tokyo 106-0032, Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have
Page 252
IBM Software
been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. All references to fictitious companies or individuals are used for illustration purposes only. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Appendix
Page 253
IBM Software
Adobe, Acrobat, Portable Document Format (PDF), and PostScript are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, other countries, or both. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. See Java Guidelines Microsoft, Windows, Windows NT, and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Other company, product and service names may be trademarks or service marks of others.
Page 254
NOTES
Copyright IBM Corporation 2011. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. This information is based on current IBM product plans and strategy, which are subject to change by IBM without notice. Product release dates and/or capabilities referenced in these materials may change at any time at IBMs sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. IBM, the IBM logo and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Other company, product and service names may be trademarks or service marks of others.