Basics

Layer Structure

  • Document Context and Meta Information including DocumentType
  • DocumentEssentials: Includes essential information for this Document Type
  • DocumentEntities: First semantic layer with basic word entity information (doc-type agnostic)
  • DocumentTexts: Character, Word, Lines of this document (doc-type agnostic)
  • DocumentBinaries: Binary Representation of this document (doc-type embedded or linked)

Common Rules

Empty Value in DocumentEssential:

  • An empty Value (Value: "") indicates that this element is NOT on the document
    E.g. there is no DeliveryDate on your DeliveryNote, then your json includes an Item under DocumentEssential with the "Label": "DeliveryDate" and a "Value": ""
  • Leave out what you do not know:
    E.g. if you do not know the Page structure of a document then leave out the "Pages" array
    E.g. if you do not know the Location of a DocumentEssential then leave out the "Location" element
  • Null:
    In some cases generic attributes like Location, Text, etc. are not applicable for certain items; in these cases these attributes must be set to null

DataTypes in Label names

A Label describes the content of a DocumentEssential. DataTypes MUST be declared within the label name of a detail. This enables generic handling of DocumentEssentials in your source. 

[Country][Industry][Customer][DocType][DetailDescription][.]<Datatype>

<xyz> is a must notation
[abc] is an optional notation and denotes validity domain (e.g. detail is only valid for certain customer or in a certain country); pascal case: e.g. PurchaseOrder.Id (always capital letter for new word)

Simple Sample of a BLU DOC

{
"BluDoc.Version": "1.2.0",
"DocumentProvider.Name": "customer x",
"Created.DateTime": "2023-03-21T21:54:29.123Z",
"CreatorSoftware.Name": "BluDoc Creator",
"CreatorSoftware.Version": "1.0.0",
"Document.Type": "Invoice",
"Document.Languages": [
"de",
"en"
],
"Pages": [
{
"Id": 1,
"Width": 827,
"Height": 1169,
"Dpi": 300,
"Orientation": "Portrait"
}
],
"DocumentEssentials": [
{
"Label": "Invoice.Type",
"Value": "CreditMemo",
"Text": null,
"Location": null,
"Confidence": 1.0000,
"ConfidenceThreshold": 0.8123
},
{
"Label": "Invoice.Id",
"Value": "2000008238",
"Text": "2000008238",
"Location": {
"Page": 1,
"Left": 414,
"Top": 155,
"Width": 91,
"Height": 18
},
"Confidence": 1.0000,
"ConfidenceThreshold": 0.8123
}
]
}

DocumentEssentials Layer

A DocumentEssential describes an essential, structured and normalized information within a document. A Simple DocumentEssential describes one single information within a document like an "PurchaseOrder" or a "DeliveryDate". A Composite DocumentEssential consists of a list of DocumentEssential(s). Based on this simple Composite structure hierachies can be expressed.

Simple DocumentEssential

{
"Label": "Invoice.Id", // the name of a label is defined as:
"Value": "3453445-45", // Value holds a normalized value
"Text": ""345 3445 - 45", // Text equals characters printed on invoice (not normalized)
"Location": { // page-based location of the coordinates
"Page": 1,
"Left": 752,
"Top": 443,
"Width": 33,
"Height": 13
},
"Confidence": 1.0000, // Confidence has 4 decimals, model confidence < 1.0, Ground Truth confidence=1.0000
"ConfidenceThreshold": 0.8123 // Confidence values above this threshold can be assumed to be safe, if no threshold available then =-1.0000
}

Composite DocumentEssential

{
"Label": "Vat.Item", // Changed group to Item
"Value": null,
"Location": null,
"Items": [
{
"Label": "Vat.Rate",
"Value": "", // empty value
// no location available
"Text": "",
"Confidence": 1.0000,
"ConfidenceThreshold": 0.8123
},
{
"Label": "Net.Amount",
"Value": "640.00",
"Text": "640.00",
"Location": {
"Page": 1,
"Left": 752,
"Top": 443,
"Width": 33,
"Height": 13
},
"Confidence": 1.0000,
"ConfidenceThreshold": 0.8123
},
{
"Label": "Vat.Amount",
"Value": "0.00",
"Text": "0.00",
"Location": {
"Page": 1,
"Left": 759,
"Top": 460,
"Width": 27,
"Height": 13
},
"Confidence": 1.0000,
"ConfidenceThreshold": 0.8123
}
]
}


Was this article helpful?