Document Stores

Stores are a way to manage working with documents that have been parsed and are now structured as Kodexa Documents. Stores only contain documents that are in the Kodexa document structure.

In Memory

JSONDocumentStore

The JsonDocumentStore allows you to store parsed or processed documents and can be used as a connector, a sink, or as a stand-alone saved file. When persisted to disk, the JsonDocumentStore is saved in JSON format.

# Using a Store for caching a test
store = JsonDocumentStore(store_path="/tmp/my-json-store")
if store.count() == 0:
pipeline = Pipeline(FolderConnector(path=str(get_test_directory()), file_filter=filename))
pipeline.add_step(ExcelParser(**options))
pipeline.set_sink(store)
pipeline.run()
document = store.get_document(0)

If you want to delete stored documents in an existing, populated JsonDocumentStore, use the force_initialize parameter. It will remove all of the documents within the store and clear out the index of document ids.

# Using a Store for caching a test
store = JsonDocumentStore(store_path="/tmp/my-json-store", force_initialize=True)
if store.count() == 0:
pipeline = Pipeline(FolderConnector(path=str(get_test_directory()), file_filter=filename))
pipeline.add_step(ExcelParser(**options))
pipeline.set_sink(store)
pipeline.run()
document = store.get_document(0)