Plantillas de ScanPrompt

Servicios para consultar, crear y modificar modelos entrenados para el servicio Scan Prompt.

Obtener plantillas (GET)

GET - https://api.verifik.co/v2/ocr/scan-prompt/templates

La API Retrieve ScanPrompt Templates te permite recuperar información sobre las plantillas almacenadas en la base de datos de Verifik.

La respuesta contiene un array de objetos de plantillas, donde cada objeto representa una plantilla individual dentro del sistema. Puedes iterar a través de este array para acceder y utilizar la información de cada plantilla según las necesidades de tu aplicación.

Headers

Name

Value

Content-Type

application/json

Authorization

Bearer <token>

Response

{
  "data": [
    {
      "__v": 2,
      "_id": "64cd7b5012b18c7901e3f451",
      "name": "Escaneo de licencias de conducir de Colombia",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "dateOfBirth",
        "documentNumber",
        "expeditionDate",
        "driverRestrictions",
        "bloodType"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt para extraer licencias de conducir de Colombia",
      "documentTypes": [
        "DLCO"
      ]
    },
    {
      "__v": 2,
      "_id": "64ce89a612b18c7901e3f453",
      "name": "Escaneo de cédulas de Colombia",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt para extraer cédulas de Colombia",
      "documentTypes": [
        "CC"
      ]
    },
    {
      "__v": 1,
      "_id": "64d12c1a12b18c7901e3f45c",
      "name": "Global Passport Scanning",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber",
        "country",
        "dateOfBirth",
        "gender",
        "expirationDate",
        "personalNo"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text may be in multiple languages and may contain special characters. Individuals might have one or two first names and one or two last names. The names must be extracted exactly as they appear, preserving special characters. In some cultures, individuals may have a middle name, which should also be considered. For any observations or special notes related to the passport holder, extract them as provided without translating. The output must be in JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract passport details from all countries",
      "documentTypes": [
        "PA"
      ]
    },
    {
      "__v": 1,
      "_id": "64d1329312b18c7901e3f45f",
      "name": "Escaneo de tarjetas INE de México",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber",
        "CURP",
        "voterKey",
        "state",
        "municipality",
        "address",
        "gender",
        "birthDate",
        "expirationDate"
      ],
      "prompt": "From the provided JSON input, extract the following fields: {{fields}}. The JSON input may contain OCR data from documents in Spanish, such as a Mexican Voter ID card (Credencial para Votar), and may include special characters.\n\nIn extracting the full name, consider that individuals may have one or two first names and normally two last names. You must ensure proper extraction by identifying these components accurately. Specifically, you MUST find the first two letters of each part of the name (first name, middle name if applicable, and last names) inside the first 6-8 characters of the documentNumber before the numbers start (CLAVE DE ELECTOR). This means that the first two letters of the first name, followed by the first two letters of each part of the last name, must appear in the same sequence as they are in the documentNumber. If any part of the name does not have corresponding letters in the documentNumber, the extraction is considered incorrect.\n\nThe output must be in JSON format without any additional explanation, as it will be processed with JSON.parse(resultStringContent). Only consider data with a confidence level of 0.5 or higher, and disregard any fields that do not have a specified confidence score.\n",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract details from Mexican INE (Instituto Nacional Electoral) cards",
      "documentTypes": [
        "INE"
      ]
    },
    {
      "__v": 1,
      "_id": "64d1540e12b18c7901e3f461",
      "name": "Chilean ID Card Scanning",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "fullName",
        "documentNumber",
        "nationality",
        "dateOfBirth",
        "gender",
        "expirationDate",
        "emissionDate",
        "RUT",
        "RUN"
      ],
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Chilean culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract Chilean ID card details",
      "documentTypes": [
        "RUT"
      ]
    },
    {
      "__v": 1,
      "_id": "653942aa27953fa62c752c83",
      "name": "CUI Peruvian ID Card Scanning",
      "active": true,
      "fields": [
        "firstName",
        "lastName",
        "secondLastName",
        "fullName",
        "documentNumber",
        "nationality",
        "dateOfBirth",
        "placeOfBirth",
        "dateOfIssue",
        "gender",
        "expirationDate",
        "maritalStatus"
      ],
      "format": "JSON",
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Peruvian culture, individuals have up to 5 names in their ID, both of which must be included. Citizens also normally have 2 to 3 names, so when you see a 4 to 5 words name, extract it that way. Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score. Dates should be in format YYYY/MM/DD",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract the new Peruvian card details",
      "documentTypes": [
        "CUI"
      ]
    },
    {
      "__v": 1,
      "_id": "6539430227953fa62c752c87",
      "name": "Peruvian ID Card Scanning",
      "active": true,
      "fields": [
        "name1",
        "name2",
        "name3",
        "lastName",
        "secondLastName",
        "fullName",
        "documentNumber",
        "nationality",
        "dateOfBirth",
        "placeOfBirth",
        "dateOfIssue",
        "gender",
        "expirationDate",
        "maritalStatus"
      ],
      "format": "JSON",
      "prompt": "From the provided JSON input, extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Peruvian culture, recognize the division of names properly. Please avoid translation, simply extract the information as provided. The output must be in a JSON format. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score. Dates should be in format YYYY/MM/DD",
      "system": true,
      "deleted": false,
      "createdAt": "2023-08-03T15:54:41.178Z",
      "updatedAt": "2023-08-05T17:23:52.556Z",
      "description": "Prompt to extract Peruvian ID card details",
      "documentTypes": [
        "DNI"
      ]
    }
  ]
}

Crear Plantilla (POST)

POST - https://api.verifik.co/v2/ocr/scan-prompt/template

Este endpoint de API te permite crear nuevas plantillas de extracción de información para tipos de documentos específicos. Las plantillas ayudan a estructurar la extracción de campos de datos clave, permitiendo capturar la información requerida con precisión. Al crear una nueva plantilla, es fundamental enfatizar el parámetro prompt, ya que juega un papel crucial en la instrucción del sistema sobre cómo extraer los datos de los documentos proporcionados.

Mediante la creación de plantillas personalizadas con prompts detallados, puedes garantizar que la API extraiga con precisión la información necesaria, incluso cuando se trate de texto multilingüe, distorsionado o culturalmente específico.

Nota: Es esencial adaptar el prompt a las características únicas de los documentos que estás procesando para obtener los mejores resultados.

Headers

Name

Value

Content-Type

application/json

Authorization

Bearer <token>

Prompt

Echa un vistazo a este ejemplo de prompt:

{
  "prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
}

Básicamente, el prompt es una serie de instrucciones y consideraciones sobre el documento. Debes definir los aspectos más importantes del mismo. En los detalles del cuerpo de la solicitud, encontrarás más información sobre cómo crear correctamente el prompt.

Es importante que dentro del prompt incluyas {{fields}} y {{format}}, ya que estos indican a la IA qué datos extraer y en qué formato devolverlos.

Body

Name

Type

Description

documentType

string

Image in Base64 encoded format or a URL where the image is hosted.

fields

string

name

string

format

string

prompt

string

description

string

Ejemplo de Body

{
  "system": false,
  "documentTypes": [
    "CCVE"
  ],
  "fields": [
    "nombres (can be 1 or more Name)",
    "apellidos (can be 1 or more last Name)",
    "Numero de cedula (ID number always comes after this 'V.')",
    "fecha de nacimiento",
    "fecha de expedicion",
    "fecha de expiracion",
    "nacionalidad (the nationality is until the end of the text)",
    "age(fecha actual - fecha de nacimiento)"
  ],
  "name": "CCVE Verifik",
  "format": "json",
  "prompt": "This text was extracted from a Venezuelan ID card, it's in Spanish. I need you to provide me with the following data in {{format}}: {{fields}}. all fields are in uppercase, the order is document number, apellidos, nombres, sign (the sign sometimes can be letters) and Ignore the directors name this is located near to the right corner . This is the desired JSON format: firstName: Nombres, lastName: Apellidos, documentNumber: numero de cedula, dateOfBirth: fecha de nacimiento,civilStatus: estado civil, expeditionDate: fecha de expedicion, expireDate: fecha de expiracion,  nationality: nacionalidad ",
  "description": "Prompt for Venezuelan IDs for Verifik"
}

Response

{
  "data": {
    "__v": 0,
    "_id": "653bdcc2ff7de0cee3b0760a",
    "code": "unique_identifier",
    "name": "Escaneo de cédulas de Colombia",
    "active": true,
    "client": "623b6317fe5fd1774be9f566",
    "fields": [
      "firstName",
      "lastName",
      "fullName",
      "documentNumber"
    ],
    "format": "json",
    "prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
    "system": false,
    "deleted": false,
    "createdAt": "2023-10-27T15:52:34.058Z",
    "updatedAt": "2023-10-27T15:52:34.058Z",
    "description": "Prompt para extraer cédulas de Colombia",
    "documentTypes": [
      "CC"
    ]
  }
}

Actualizar Plantilla (PUT)

PUT- https://api.verifik.co/v2/ocr/scan-prompt/template

Este endpoint de API te permite actualizar una plantilla OCR específica. Las plantillas OCR se utilizan para extraer datos estructurados de documentos. Al actualizar una plantilla, puedes modificar sus configuraciones, como los campos a extraer y los tipos de documentos a los que se aplica.

Este endpoint está diseñado para la gestión de plantillas OCR, permitiéndote adaptar la extracción de datos según los tipos de documentos y requisitos específicos. Al actualizar las plantillas, puedes ajustarlas a necesidades cambiantes, asegurando una extracción precisa de datos desde los documentos.

Nota: Al actualizar plantillas, asegúrate de que los cambios se alineen con tus requisitos de extracción de datos OCR y prueba la plantilla actualizada con documentos de muestra para confirmar su efectividad.

Headers

Name

Value

Content-Type

application/json

Authorization

Bearer <token>

Body

Name

Type

Description

image

string

Image in Base64 encoded format or a URL where the image is hosted.

Response

{
  "data": {
    "__v": 0,
    "_id": "653bdcc2ff7de0cee3b0760a",
    "code": "unique_identifier",
    "name": "Escaneo de cédulas de Colombia",
    "active": true,
    "client": "623b6317fe5fd1774be9f566",
    "fields": [
      "firstName",
      "lastName",
      "fullName",
      "documentNumber"
    ],
    "format": "json",
    "prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations  Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
    "system": false,
    "deleted": false,
    "createdAt": "2023-10-27T15:52:34.058Z",
    "updatedAt": "2023-10-27T15:52:34.058Z",
    "description": "Prompt para extraer cédulas de Colombia",
    "documentTypes": [
      "CC"
    ]
  }
}

PreviousDocumentos soportados NextProyectos

Last updated 4 months ago