Semi-Structured Data
Processing semi-structured data (i.e. forms) involves the application of one or more recognition technologies to extract embedded data. Forms fall into two categories:
Structured Forms
-
Structured forms present information in fixed locations that are immutable. Examples include surveys, questionnaires, claim forms, tests, etc.
Semi-Structured Forms
-
Semi-structured forms present information in a semi-structured format specifically designed for optional (e.g. appendices/schedules/annexes) and/or repeatable (e.g. table rows, repeating sections) components. Examples include mortgage applications, contracts, invoices, timesheets, purchase orders, etc.
To process forms, we use a range of recognition technologies, including optical character recognition (for printed text), optical mark recognition (for checkboxes and radio buttons), bar code recognition, intelligent character recognition (for handwriting), and magnetic ink character recognition (for checks) in each case supported by rigorous automated and "human-in-the-loop" quality control processes.
Once extracted the data can then be transformed and enriched into a wide variety of computer-readable formats such as Excel, JSON, XML, CSV, PDF, and database formats.