top of page
Medical form with stethoscope

Semi-Structured Data

semi-structured data.png

Processing semi-structured data (i.e. forms) involves the application of one or more recognition technologies to extract embedded data. Forms fall into two categories: 


Structured Forms 

  • Structured forms present information in fixed locations that are immutable. Examples include surveys, questionnaires, claim forms, tests etc.


Semi-structured Forms

  • Semi-structured forms present information in a semi-structured format specifically designed for optional (e.g. appendices/schedules/annexes) and/or repeatable (e.g. table rows, repeating sections) components. Examples include mortgage applications, contracts, invoices, timesheets, purchase orders etc.


To process forms, we use a range of recognition technologies, including optical character recognition (for printed text), optical mark recognition (for checkboxes and radio buttons), bar code recognition, intelligent character recognition (for handwriting), and magnetic ink character recognition (for checks) in each case supported by rigorous automated and "human-in-the-loop" quality control processes.


Once extracted the data can then be transformed and enriched into a wide variety of computer-readable formats such as Excel, JSON, XML, CSV, PDF and database formats.

bottom of page