Basics

Layer Structure

  • Document Context and Meta Information including DocumentType
  • DocumentEssentials: Includes essential information for this Document Type
  • DocumentEntities: First semantic layer with basic word entity information (doc-type agnostic)
  • DocumentTexts: Character, Word, Lines of this document (doc-type agnostic)
  • DocumentBinaries: Binary Representation of this document (doc-type embedded or linked)

Common Rules

Empty Value in DocumentEssential:

  • An empty Value (Value: "") indicates that this element is NOT on the document
    E.g. there is no DeliveryDate on your DeliveryNote, then your json includes an Item under DocumentEssential with the "Label": "DeliveryDate" and a "Value": ""
  • Leave out what you do not know:
    E.g. if you do not know the Page structure of a document then leave out the "Pages" array
    E.g. if you do not know the Location of a DocumentEssential then leave out the "Location" element
  • Null:
    In some cases generic attributes like Location, Text, etc. are not applicable for certain items; in these cases these attributes must be set to null

DataTypes in Label names

A Label describes the content of a DocumentEssential. DataTypes MUST be declared within the label name of a detail. This enables generic handling of DocumentEssentials in your source. 

[Country][Industry][Customer][DocType][DetailDescription][.]<Datatype>

<xyz> is a must notation
[abc] is an optional notation and denotes validity domain (e.g. detail is only valid for certain customer or in a certain country); pascal case: e.g. PurchaseOrder.Id (always capital letter for new word)

Simple Sample of a BLU DOC

{
    "BluDoc.Version": "1.2.0",
    "DocumentProvider.Name": "customer x",
    "Created.DateTime": "2023-03-21T21:54:29.123Z",
    "CreatorSoftware.Name": "BluDoc Creator",
    "CreatorSoftware.Version": "1.0.0",
    "Document.Type": "Invoice",
    "Document.Languages": [
        "de",
        "en"
    ],
    "Pages": [
        {
            "Id": 1, 
            "Width": 827,
            "Height": 1169,
            "Dpi": 300,
            "Orientation": "Portrait"
        }
    ],
    "DocumentEssentials": [
        {
            "Label": "Invoice.Type",
            "Value": "CreditMemo", 
            "Text": null,           
            "Location": null,
            "Confidence": 1.0000, 
            "ConfidenceThreshold": 0.8123
        },
        {
            "Label": "Invoice.Id",
            "Value": "2000008238",
            "Text": "2000008238", 
            "Location": {
                "Page": 1, 
                "Left": 414, 
                "Top": 155,
                "Width": 91,
                "Height": 18
            },
            "Confidence": 1.0000,
            "ConfidenceThreshold": 0.8123 
        }
    ]
}

DocumentEssentials Layer

A DocumentEssential describes an essential, structured and normalized information within a document. A Simple DocumentEssential describes one single information within a document like an "PurchaseOrder" or a "DeliveryDate". A Composite DocumentEssential consists of a list of DocumentEssential(s). Based on this simple Composite structure hierachies can be expressed.

Simple DocumentEssential

{
     "Label": "Invoice.Id",             // the name of a label is defined as: 
     "Value": "3453445-45",             // Value holds a normalized value
     "Text": ""345 3445 - 45",          // Text equals characters printed on invoice (not normalized)
     "Location": {                      // page-based location of the coordinates
                     "Page": 1,
                     "Left": 752,
                     "Top": 443,
                     "Width": 33,
                     "Height": 13
                    },
     "Confidence": 1.0000,              // Confidence has 4 decimals, model confidence < 1.0, Ground Truth confidence=1.0000 
     "ConfidenceThreshold": 0.8123      // Confidence values above this threshold can be assumed to be safe, if no threshold available then =-1.0000     
}

Composite DocumentEssential

{
            "Label": "Vat.Item", // Changed group to Item
            "Value": null,
            "Location": null,
            "Items": [
                {
                    "Label": "Vat.Rate",
                    "Value": "", // empty value
                    // no location available
                    "Text": "",
                    "Confidence": 1.0000,
                    "ConfidenceThreshold": 0.8123
                },
                {
                    "Label": "Net.Amount",
                    "Value": "640.00",
                    "Text": "640.00",
                    "Location": {
                        "Page": 1,
                        "Left": 752,
                        "Top": 443,
                        "Width": 33,
                        "Height": 13
                    },
                    "Confidence": 1.0000,
                    "ConfidenceThreshold": 0.8123
                },
                {
                    "Label": "Vat.Amount",
                    "Value": "0.00",
                    "Text": "0.00",
                    "Location": {
                        "Page": 1,
                        "Left": 759,
                        "Top": 460,
                        "Width": 27,
                        "Height": 13
                    },
                    "Confidence": 1.0000,
                    "ConfidenceThreshold": 0.8123
                }
            ]
}


Was this article helpful?