Key Concepts

What is the Kodexa content model?

Kodexa is a content model and library for working with unstructured data (everything from PDF's and Excel documents to text and images). Our approach to handling these types of data comes from our many years of experience in large financial services organizations tackling a wide range of unstructured data problems and use cases.

Those experiences have been distilled into Kodexa. Our mission is to provide a framework, approach, and capabilities to enable anyone to work with unstructured data. Whether you are a developer who wants to add new capabilities to your product, a product owner looking to create new value, or a business person looking for new opportunities - Kodexa provides a flexible, powerful and proven approach to working with a whole new range of unstructured data (currently 80% of all the data in the business landscape).

In this introduction we will review Kodexa's key concepts and components, provide background on why we developed them, and explain how they fit together to solve the problem of handling unstructured data in modern applications.

Document and Content Nodes

The Kodexa Document is the content model through which the management and structure of unstructured data is made possible. Content nodes are a key feature that allows for our structure's flexibility. Begin your journey into all things Kodexa here.

Selectors, Tags, and Features

Selectors, tags, and features are combined to provide the building blocks of our processing. Selectors are used to retrieve specific nodes, as per the selector's value. Features and tags are used to enhance a document with additional data, without changing the document's structure. When brought together, users are able to find the precise data they're looking for and tag it so it can be referenced in later processing. Learn more about these elements here.

Pipelines and steps are arranged to process documents, enrich them with additional features/information, and extract the data for external use. Actions encapsulate discrete portions of functionality and initiated as a step in the pipeline. Learn how pipelines and steps will help you get more value from your unstructured data here.