Develop your client library
Are you sure?
Do you really need to develop a custom client for the Extract API? First consider that an easy to use open-source Python client is already available. Follow the instructions on the GitHub page to install and use the package.
OpenAPI specification
The starting point to develop an Extract API client is the Open API specification.
This is a human-readable document, but it's also meant to be interpreted by special programs1 to automatically generate client code in a variety of languages.
You can play with the Extract API through the Open API specification using the Swagger UI available in the dedicated page inside the developer portal.
Under the hood
REST API
Extract API is a cloud-based service with a REST interface. This means that to use it, a program must be able to access the Web and carry out an HTTP conversation with the API interface.
Whenever the program has to analyze a document, it must request the most suitable API resource (similar to what you do when requesting the page of a site with a Web browser).
Your program must use an HTTP client to request the API resources. The program, via the HTTP client, transmits a request to the API server.
The request body must be a UTF-8 encoded JSON object (responses are UTF-8 encoded JSON objects too).
Requests and responses are described in the reference section of this manual.
Authentication and authorization
Each API request must contain an authorization token. The bearer authentication mechanism is employed, so the token must be obtained with an authentication operation and then included as a header in each request.
The authentication operation is carried out by requesting a resource that is shared between all the expert.ai cloud services.
Its address is:
https://developer.expert.ai/oauth2/token
This resource must be requested with the POST
method and the body of the request must be a JSON object like this:
{
"username": "yourusername",
"password": "yourpassword"
}
with yourusername
and yourpassword
replaced by the developer credentials (your email address and the password) used when registering on the expert.ai developer portal.
The Content-Type
header of the request must be set to:
application/json; charset=utf-8
The response is the token and it is a plain text like this:
eyJraWQiOiJlZXEzSnB5 ... CqJmhj2sLA
The client program must therefore "know" the credentials of the developer and obtain the authorization token through them.
When the program requests Extract API resources, it must add the Authorization
header to each request using this format:
Bearer token
with token
replaced by the actual token.
Authorization tokens expire after 24 hours.
If the application continues to make requests with an expired token it will get 401 Unauthorized
errors. In that case, it must request a new token to replace the old one.
Warning
Use of the Extract API requires an additional activation that must be requested to expert.ai.
Without the activation, the client program will get the 403 Forbidden
error.
An asynchronous processor
Extract Beta processes documents asynchronously.
The document to be analyzed is submitted to Extract Beta requesting the layout-document-async
resource with the POST
verb. The request contains the Base64-encoded document.
In response to this request, Extract Beta starts a recognition task and immediately returns a response containing the ID of that task. The response, therefore, is synchronous (the whole API is) and immediate, but the processing takes place in the background and, depending on the size, type and complexity of the document, can take anywhere from a few seconds to a few minutes.
To get the task's status, ask for the status
resource using the GET
verb. The resource path must contain the ID of the task.
The response contains task status information or, if the task has finished, the results.
The client program then must poll the task until it gets the results.
-
For example OpenAPI Generator, but several more are available on the Web. ↩