Actions provide the processing capabilities in Kodexa. All actions accept a Document, perform some evaluation or process on the Document, and then return a Document. These components are configured in a pipeline when they are wrapped in a pipeline's step.
Kodexa actions all implement the same interface and work against the structure of the Kodexa Document. By supporting this universal interface, you can bring together multiple action implementations to solve an almost limitless set of problems.
We classify actions into one of the following types:
Parsers take the metadata from a Kodexa Document and work against a "Source" to build the content structure for a document.
A Parser will always remove all the content in a document and replace it.
Taggers add tags or features to parts of a Document's content to enrich the content in some way. The key distinction that classifies an action as a "Tagger" is the action's addition of information to the Document - it doesn't change the structure or content of a Document.
A transformer is an action that changes the structure of a Document. For example, it may remove a certain type of node or it may collapse the nodes in a structure. Transformers may also add new nodes (such as columns or sentences) to the Document.
An extractor is used to pull tagged data from a document and put it into a structured form. These structured forms may be tabular, like a CSV, or a more document-like, like JSON.
A pipeline is a linear set of steps that can be applied a Kodexa Document. Each step calls an action which will either parse, enrich, transform, or replace the document. This approach allows you to assemble a set of steps that can structure, tag and normalize a document, file or textual content.
In order to promote re-use and composability, these pipelines can be defined in code or metadata.
Let's take a look at how a pipeline logically works:
The Kodexa Document is passed from the start to the end, and each action (wrapped in a step) accepts the document and returns a document. This doesn't mean that the action is returning the same document that it received, since each action can change the content of the document along the way.