Basics
Layer Structure
- Document Context and Meta Information including DocumentType
- DocumentEssentials: Includes essential information for this Document Type
- DocumentEntities: First semantic layer with basic word entity information (doc-type agnostic)
- DocumentTexts: Character, Word, Lines of this document (doc-type agnostic)
- DocumentBinaries: Binary Representation of this document (doc-type embedded or linked)
Common Rules
Empty Value in DocumentEssential:
- An empty Value (Value: "") indicates that this element is NOT on the document
E.g. there is no DeliveryDate on your DeliveryNote, then your json includes an Item under DocumentEssential with the "Label": "DeliveryDate" and a "Value": "" - Leave out what you do not know:
E.g. if you do not know the Page structure of a document then leave out the "Pages" array
E.g. if you do not know the Location of a DocumentEssential then leave out the "Location" element - Null:
In some cases generic attributes like Location, Text, etc. are not applicable for certain items; in these cases these attributes must be set to null
DataTypes in Label names
A Label describes the content of a DocumentEssential. DataTypes MUST be declared within the label name of a detail. This enables generic handling of DocumentEssentials in your source.
[Country][Industry][Customer][DocType][DetailDescription][.]<Datatype>
<xyz>
is a must notation[abc]
is an optional notation and denotes validity domain (e.g. detail is only valid for certain customer or in a certain country); pascal case: e.g. PurchaseOrder.Id (always capital letter for new word)
Simple Sample of a BLU DOC
{ "BluDoc.Version": "1.2.0", "DocumentProvider.Name": "customer x", "Created.DateTime": "2023-03-21T21:54:29.123Z", "CreatorSoftware.Name": "BluDoc Creator", "CreatorSoftware.Version": "1.0.0", "Document.Type": "Invoice", "Document.Languages": [ "de", "en" ], "Pages": [ { "Id": 1, "Width": 827, "Height": 1169, "Dpi": 300, "Orientation": "Portrait" } ], "DocumentEssentials": [ { "Label": "Invoice.Type", "Value": "CreditMemo", "Text": null, "Location": null, "Confidence": 1.0000, "ConfidenceThreshold": 0.8123 }, { "Label": "Invoice.Id", "Value": "2000008238", "Text": "2000008238", "Location": { "Page": 1, "Left": 414, "Top": 155, "Width": 91, "Height": 18 }, "Confidence": 1.0000, "ConfidenceThreshold": 0.8123 } ]}
DocumentEssentials Layer
A DocumentEssential describes an essential, structured and normalized information within a document. A Simple DocumentEssential describes one single information within a document like an "PurchaseOrder" or a "DeliveryDate". A Composite DocumentEssential consists of a list of DocumentEssential(s). Based on this simple Composite structure hierachies can be expressed.
Simple DocumentEssential
{ "Label": "Invoice.Id", // the name of a label is defined as: "Value": "3453445-45", // Value holds a normalized value "Text": ""345 3445 - 45", // Text equals characters printed on invoice (not normalized) "Location": { // page-based location of the coordinates "Page": 1, "Left": 752, "Top": 443, "Width": 33, "Height": 13 }, "Confidence": 1.0000, // Confidence has 4 decimals, model confidence < 1.0, Ground Truth confidence=1.0000 "ConfidenceThreshold": 0.8123 // Confidence values above this threshold can be assumed to be safe, if no threshold available then =-1.0000}
Composite DocumentEssential
{ "Label": "Vat.Item", // Changed group to Item "Value": null, "Location": null, "Items": [ { "Label": "Vat.Rate", "Value": "", // empty value // no location available "Text": "", "Confidence": 1.0000, "ConfidenceThreshold": 0.8123 }, { "Label": "Net.Amount", "Value": "640.00", "Text": "640.00", "Location": { "Page": 1, "Left": 752, "Top": 443, "Width": 33, "Height": 13 }, "Confidence": 1.0000, "ConfidenceThreshold": 0.8123 }, { "Label": "Vat.Amount", "Value": "0.00", "Text": "0.00", "Location": { "Page": 1, "Left": 759, "Top": 460, "Width": 27, "Height": 13 }, "Confidence": 1.0000, "ConfidenceThreshold": 0.8123 } ]}