Prompt Templates

Services to consult, create and modify models trained for the Scan Prompt service.

Get Templates (GET)

GET - https://api.verifik.co/v2/ocr/scan-prompt/templates

The Retrieve ScanPromt Templates API allows you to retrieve information about Templates stored within Verifik Database.

The response contains an array of Templates objects, each representing an individual Template in the system. You can iterate through this array to access and utilize the information about each Template as needed for your application.

Headers

NameValue

Content-Type

application/json

Authorization

Bearer <token>

Response

{
  "data": [
    {
      "__v": 2,
      "_id": "64cd7b5012b18c7901e3f451",
      "name": "Escaneo de licencias de conducir de Colombia",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "dateOfBirth",
        "documentNumber",
        "expeditionDate",
        "driverRestrictions",
        "bloodType"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt para extraer licencias de conducir de Colombia",
      "documentTypes": [
        "DLCO"
      ]
    },
    {
      "__v": 2,
      "_id": "64ce89a612b18c7901e3f453",
      "name": "Escaneo de cédulas de Colombia",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt para extraer cédulas de Colombia",
      "documentTypes": [
        "CC"
      ]
    },
    {
      "__v": 1,
      "_id": "64d12c1a12b18c7901e3f45c",
      "name": "Global Passport Scanning",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber",
        "country",
        "dateOfBirth",
        "gender",
        "expirationDate",
        "personalNo"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text may be in multiple languages and may contain special characters. Individuals might have one or two first names and one or two last names. The names must be extracted exactly as they appear, preserving special characters. In some cultures, individuals may have a middle name, which should also be considered. For any observations or special notes related to the passport holder, extract them as provided without translating. The output must be in JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract passport details from all countries",
      "documentTypes": [
        "PA"
      ]
    },
    {
      "__v": 1,
      "_id": "64d1329312b18c7901e3f45f",
      "name": "Escaneo de tarjetas INE de México",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber",
        "CURP",
        "voterKey",
        "state",
        "municipality",
        "address",
        "gender",
        "birthDate",
        "expirationDate"
      ],
      "prompt": "From the provided JSON input, extract the following fields: {{fields}}. The JSON input may contain OCR data from documents in Spanish, such as a Mexican Voter ID card (Credencial para Votar), and may include special characters.\n\nIn extracting the full name, consider that individuals may have one or two first names and normally two last names. You must ensure proper extraction by identifying these components accurately. Specifically, you MUST find the first two letters of each part of the name (first name, middle name if applicable, and last names) inside the first 6-8 characters of the documentNumber before the numbers start (CLAVE DE ELECTOR). This means that the first two letters of the first name, followed by the first two letters of each part of the last name, must appear in the same sequence as they are in the documentNumber. If any part of the name does not have corresponding letters in the documentNumber, the extraction is considered incorrect.\n\nThe output must be in JSON format without any additional explanation, as it will be processed with JSON.parse(resultStringContent). Only consider data with a confidence level of 0.5 or higher, and disregard any fields that do not have a specified confidence score.\n",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract details from Mexican INE (Instituto Nacional Electoral) cards",
      "documentTypes": [
        "INE"
      ]
    },
    {
      "__v": 1,
      "_id": "64d1540e12b18c7901e3f461",
      "name": "Chilean ID Card Scanning",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber",
        "nationality",
        "dateOfBirth",
        "gender",
        "expirationDate",
        "emissionDate",
        "RUT",
        "RUN"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Chilean culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract Chilean ID card details",
      "documentTypes": [
        "RUT"
      ]
    },
    {
      "__v": 1,
      "_id": "653942aa27953fa62c752c83",
      "name": "CUI Peruvian ID Card Scanning",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "secondLastName",
        "fullName",
        "documentNumber",
        "nationality",
        "dateOfBirth",
        "placeOfBirth",
        "dateOfIssue",
        "gender",
        "expirationDate",
        "maritalStatus"
      ],
      "format": "JSON",
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Peruvian culture, individuals have up to 5 names in their ID, both of which must be included. Citizens also normally have 2 to 3 names, so when you see a 4 to 5 words name, extract it that way. Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score. Dates should be in format YYYY/MM/DD",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract the new Peruvian card details",
      "documentTypes": [
        "CUI"
      ]
    },
    {
      "__v": 1,
      "_id": "6539430227953fa62c752c87",
      "name": "Peruvian ID Card Scanning",
      "active": true,
      "fields": [
        "name1",
        "name2",
        "name3",
        "lastName",
        "secondLastName",
        "fullName",
        "documentNumber",
        "nationality",
        "dateOfBirth",
        "placeOfBirth",
        "dateOfIssue",
        "gender",
        "expirationDate",
        "maritalStatus"
      ],
      "format": "JSON",
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Peruvian culture, recognize the division of names properly. Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score. Dates should be in format YYYY/MM/DD",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract Peruvian ID card details",
      "documentTypes": [
        "DNI"
      ]
    }
  ]
}

Create Template (POST)

POST - https://api.verifik.co/v2/ocr/scan-prompt/template

This API endpoint allows you to create new document information extraction templates for specific document types. Templates help in structuring the extraction of key data fields from documents, enabling you to capture the required information accurately. When creating a new template, it's important to emphasize the prompt parameter, which plays a crucial role in instructing the system on how to extract data from the provided documents.

By creating custom templates with detailed prompts, you can ensure that the API accurately extracts the required data from documents, even when dealing with multilingual, distorted, or culturally specific text.

Note: It's essential to tailor the prompt to the unique characteristics of the documents you're processing to achieve the best results.

Headers

NameValue

Content-Type

application/json

Authorization

Bearer <token>

Prompt

Take a look at this prompt example:

{
  "prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
}

Basically the prompt are a series of instructions and considerations about the document, you must define the most important aspects of it, in the body details there are more information of how to correctly create the prompt.

It's important that inside the prompt you add {{fields}} and {{format}}, this is to tell the AI what data extract and in which format return.

Body

NameTypeDescription

documentType

string

Image in Base64 encoded format or a URL where the image is hosted.

fields

string

name

string

format

string

prompt

string

description

string

Body Example

{
  "system": false,
  "documentTypes": [
    "CCVE"
  ],
  "fields": [
    "nombres (can be 1 or more Name)",
    "apellidos (can be 1 or more last Name)",
    "Numero de cedula (ID number always comes after this 'V.')",
    "fecha de nacimiento",
    "fecha de expedicion",
    "fecha de expiracion",
    "nacionalidad (the nationality is until the end of the text)",
    "age(fecha actual - fecha de nacimiento)"
  ],
  "name": "CCVE Verifik",
  "format": "json",
  "prompt": "This text was extracted from a Venezuelan ID card, it's in Spanish. I need you to provide me with the following data in {{format}}: {{fields}}. all fields are in uppercase, the order is document number, apellidos, nombres, sign (the sign sometimes can be letters) and Ignore the directors name this is located near to the right corner . This is the desired JSON format: firstName: Nombres, lastName: Apellidos, documentNumber: numero de cedula, dateOfBirth: fecha de nacimiento,civilStatus: estado civil, expeditionDate: fecha de expedicion, expireDate: fecha de expiracion,  nationality: nacionalidad ",
  "description": "Prompt for Venezuelan IDs for Verifik"
}

Response

{
  "data": {
    "__v": 0,
    "_id": "653bdcc2ff7de0cee3b0760a",
    "code": "unique_identifier",
    "name": "Escaneo de cédulas de Colombia",
    "active": true,
    "client": "623b6317fe5fd1774be9f566",
    "fields": [
      "firstName",
      "lastName",
      "fullName",
      "documentNumber"
    ],
    "format": "json",
    "prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
    "system": false,
    "deleted": false,
    "createdAt": "2023-10-27T15:52:34.058Z",
    "updatedAt": "2023-10-27T15:52:34.058Z",
    "description": "Prompt para extraer cédulas de Colombia",
    "documentTypes": [
      "CC"
    ]
  }
}

Update Template (PUT)

PUT- https://api.verifik.co/v2/ocr/scan-prompt/template

This API endpoint allows you to update a specific OCR template. OCR templates are used to extract structured data from documents. When you update a template, you can modify its settings, such as the fields to extract and the document types it applies to.

This API endpoint is used for managing OCR templates, allowing you to tailor data extraction according to specific document types and requirements. By updating the templates, you can adapt them to changing needs, ensuring accurate data extraction from documents.

Note When updating templates, ensure that the changes align with your OCR data extraction requirements, and test the updated template with sample documents to confirm its effectiveness.

Headers

NameValue

Content-Type

application/json

Authorization

Bearer <token>

Body

NameTypeDescription

image

string

Image in Base64 encoded format or a URL where the image is hosted.

Response

{
  "data": {
    "__v": 0,
    "_id": "653bdcc2ff7de0cee3b0760a",
    "code": "unique_identifier",
    "name": "Escaneo de cédulas de Colombia",
    "active": true,
    "client": "623b6317fe5fd1774be9f566",
    "fields": [
      "firstName",
      "lastName",
      "fullName",
      "documentNumber"
    ],
    "format": "json",
    "prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
    "system": false,
    "deleted": false,
    "createdAt": "2023-10-27T15:52:34.058Z",
    "updatedAt": "2023-10-27T15:52:34.058Z",
    "description": "Prompt para extraer cédulas de Colombia",
    "documentTypes": [
      "CC"
    ]
  }
}

Last updated