Deep linguistic analysis overview
Deep linguistic analysis is a type of document analysis that combines the following interdependent processes:
- Text subdivision
- Part-of-speech tagging
- Morphological analysis
- Lemmatization
- Syntactic analysis
- Semantic analysis
The analysis is "deep" because:
- It performs the common linguistic analysis.
- It disambiguates the terms of the text, i.e. it determines the exact meaning of the text after considering all the other possibilities.
Deep linguistic analysis also performs knowledge linking: Knowledge Graph information and open data—Wikidata, DBpedia and GeoNames references—are returned for text tokens corresponding to syncons of the expert.ai Knowledge Graph. In the case of actual places, geographic coordinates are also provided.
Full analysis includes deep linguistic analysis, but if you are not interested in the other analyses, you can use specific resources having paths like this:
analyze/context/language/disambiguation
Boxed parts are placeholders.
In the manual's reference section you will find all the information required to perform deep linguistic analysis using the API's Restful interface, specifically:
- The format of the request to be submitted to the resources.
- How to build resources' paths and full endpoints.
- The format of the output.
Even if you use the API through a client that hides the REST interface, whether it is made by you or is one of those made available by expert.ai, the last piece of information is useful because it helps you to find out what data the API returns.
Here is an example of performing deep linguistic analysis on a short English text:
This example is based on the Python client you can find on GitHub.
The client gets user credentials from two environment variables:
EAI_USERNAME
EAI_PASSWORD
Set those variables with your account credentials before running the sample program below.
The program prints a JSON representation of the results and the list of tokens' lemmas with their part-of-speech.
from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()
text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
language= 'en'
output = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'disambiguation'
})
# Output tokens' data
print("Output tokens' data:");
print (f'{"TEXT":{20}} {"LEMMA":{40}} {"POS":{6}}')
print (f'{"----":{20}} {"-----":{40}} {"---":{6}}')
for token in output.tokens:
print (f'{text[token.start:token.end]:{20}} {token.lemma:{40}} {token.pos:{6}}')
This example is based on the Java client you can find on GitHub.
The client gets user credentials from two environment variables:
EAI_USERNAME
EAI_PASSWORD
Set those variables with you account credentials before running the sample program below.
The program prints a JSON representation of the results and the list of tokens' lemmas with their part-of-speech.
import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.DefaultCredentialsProvider;
import ai.expert.nlapi.v2.API;
import ai.expert.nlapi.v2.Analyzer;
import ai.expert.nlapi.v2.AnalyzerConfig;
import ai.expert.nlapi.v2.message.AnalyzeResponse;
import ai.expert.nlapi.v2.model.AnalyzeDocument;
public class Main {
public static Authentication createAuthentication() throws Exception {
DefaultCredentialsProvider credentialsProvider = new DefaultCredentialsProvider();
Authenticator authenticator = new BasicAuthenticator(credentialsProvider);
return new Authentication(authenticator);
}
public static Analyzer createAnalyzer() throws Exception {
return new Analyzer(AnalyzerConfig.builder()
.withVersion(API.Versions.V2)
.withContext("standard")
.withLanguage(API.Languages.en)
.withAuthentication(createAuthentication())
.build());
}
public static void main(String[] args) {
try {
String text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";
Analyzer analyzer = createAnalyzer();
AnalyzeResponse disambiguation = analyzer.disambiguation(text);
// Output JSON representation
System.out.println("JSON representation:");
disambiguation.prettyPrint();
// Tokens' lemma and part-of-speech
System.out.println("Tab separated list of tokens' lemma and part-of-speech:");
AnalyzeDocument data = disambiguation.getData();
data.getTokens().stream().forEach(c -> System.out.println(c.getLemma() + "\t" + c.getPos()));
}
catch(Exception ex) {
ex.printStackTrace();
}
}
}
The following curl command posts a document to the deep linguistic analysis resource of the API's REST interface.
Run the command from a shell after replacing token
with the actual authorization token.
curl -X POST https://nlapi.expert.ai/v2/analyze/standard/en/disambiguation \
-H 'Authorization: Bearer token' \
-H 'Content-Type: application/json; charset=utf-8' \
-d '{
"document": {
"text": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan'\''s stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
}
}'
The server returns a JSON object.
The following curl command posts a document to the deep linguistic analysis resource of the API's REST interface.
Open a command prompt in the folder where you installed curl and run the command after replacing token
with the actual authorization token.
curl -X POST https://nlapi.expert.ai/v2/analyze/standard/en/disambiguation -H "Authorization: Bearer token" -H "Content-Type: application/json; charset=utf-8" -d "{\"document\": {\"text\": \"Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.\"}}"
The server returns a JSON object.