Kodexa comprises a core framework and a surrounding set of extensions. These extensions enable developers to work with documents and unstructured data in a way that supports flexibility and extensibility, and they are deployable on a range of platforms. The structure of the framework was inspired by concepts derived from many standard approaches used in data engineering and data science, and was informed by our experience labeling data for use by machine learning algorithms.
The framework is focused on the following areas:
Content Normalization and Classification
Labelling (Tagging), Feature Engineering and Content Organization
Data Extraction and Document Processing
Deploying in batch or as a service
Before we jump into the details of how we approach each of these areas in the platform, let's explain the core of Kodexa.