Life Sciences - Medical Knowledge Model
Overview
The Life Sciences - Medical Knowledge Model (display name: Life Sciences - Medical EN v#) is an extraction model that predicts various types of biomedical entities out of English texts.
Exracted entities correspond to entries of Unified Medical Language System (UMLS) vocabularies.
More specifically, the model can extract entities belonging to three of the most common classes in the UMLS taxonomy:
- Drugs
- Diseases
- Signs or symptoms
The model has been designed for a specific text type, that is, scientific papers and academic articles like those found on PubMed.
Covered vocabularies
UMLS is a resource that gathers more than 200 vocabularies in the health and biomedical sciences, and that integrates and distributes key terminology and coding standards in order to promote and enable interoperability between computer systems and services.
UMLS biggest component is the Metathesaurus through which all the concepts are interconnected and linked to similar concepts and terminologies inside the vocabularies.
The concepts gathered in UMLS are organized by semantic types or semantic groups linked to each semantic type.
The Life Sciences - Medical Knowledge Model covers these UMLS vocabularies:
- Medical Subject Headings (MeSH)
- International Classification of Disease, Tenth Revision, Clinical Modification (ICD-10-CM)
- International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM)
- Metathesaurus Additional Entry Terms for ICD-9-CM (MTHICD9)
- SNOMED CT United States Edition
- NCI Dictionary of Cancer Terms
- CHV (Consumer Health Vocabulary)
- LNC (Logical Observation Identifiers Names and Codes terminology LOINC®)
- OMIM - Online Mendelian Inheritance in Man
- ICPC2P (International Classification of Primary Care - 2 PLUS)
Extraction groups and classes
UMLS semantic groups:
- Drugs
- Diseases
- Signs or symptoms
are mapped to extraction groups DRUGS, DISEASES and SIGNORSYMPTOMS.
For example, in the UMLS these are considered drugs:
- Antibiotics
- Vitamins
- Pharmacological substances and many others
So they will be predicted as classes of the DRUGS group.
In addition to the types belonging to the drug semantic group, the model extracts mentions of substances that are classified as mechanism of action and that belong to the Mecanismofaction semantic class and Action semantic group. This group comprises substances falling into the agonist, antagonist, inhibitor, blocker and activator categories.
Thus, model is able to extract both the drug’s trade name (for example Trulicity), its molecule name (Dulaglutide) and the name of its mechanism of action (glucagon-like peptide-1/GLP-1 receptor agonist).
Each model's extraction group has one class:
Group | Class |
---|---|
DRUGS | DRUG |
DISEASES | DISEASE |
SIGNORSYMPTOMS | SIGNORSYMPTOM |
For example, given this text:
A 42-year-old Hispanic woman, with end-stage renal disease, anemia, hypertension, and a history of an anaphylactic reaction to basiliximab, was scheduled to receive a living donor transplant and received basiliximab uneventfully.
Dulaglutide was generally well tolerated, with a low inherent risk of hypoglycemia. The most frequently reported adverse events in clinical trials were gastrointestinal-related (for example nausea, vomiting, and diarrhea).
extractions are:
Group | Classs | Class value |
---|---|---|
DISEASES | DISEASE | Chronic kidney disease stage 5 |
DISEASES | DISEASE | anaemia |
DISEASES | DISEASE | hypertension |
DISEASES | DISEASE | Anaphylaxis |
DISEASES | DISEASE | hypoglycemia |
DRUGS | DRUG | basiliximab |
DRUGS | DRUG | basiliximab |
DRUGS | DRUG | dulaglutide |
SIGNORSYMPTOMS | SIGNORSYMPTOM | nausea |
SIGNORSYMPTOMS | SIGNORSYMPTOM | vomit |
SIGNORSYMPTOMS | SIGNORSYMPTOM | diarrhoea |
Output structure
The model output has the same structure as any other model and is affected by the functional properties of the workflow block.
The peculiar part of the output is the result of information extraction, i.e. the extractions
array.
Example
In this model's output, the template key corresponds to the concept of group and template fields correspond to classes.
Considering the text for the above example, the extraction output is:
"extractions": [
{
"fields": [
{
"name": "DISEASE",
"positions": [
{
"end": 58,
"start": 35
}
],
"value": "Chronic kidney disease stage 5"
}
],
"namespace": "lifescience_med_en",
"template": "DISEASES"
},
{
"fields": [
{
"name": "DISEASE",
"positions": [
{
"end": 123,
"start": 102
}
],
"value": "anaphylaxis"
}
],
"namespace": "lifescience_med_en",
"template": "DISEASES"
},
{
"fields": [
{
"name": "DISEASE",
"positions": [
{
"end": 80,
"start": 68
}
],
"value": "hypertension"
}
],
"namespace": "lifescience_med_en",
"template": "DISEASES"
},
{
"fields": [
{
"name": "DISEASE",
"positions": [
{
"end": 66,
"start": 60
}
],
"value": "anaemia"
}
],
"namespace": "lifescience_med_en",
"template": "DISEASES"
},
{
"fields": [
{
"name": "DISEASE",
"positions": [
{
"end": 312,
"start": 300
}
],
"value": "hypoglycemia"
}
],
"namespace": "lifescience_med_en",
"template": "DISEASES"
},
{
"fields": [
{
"name": "DRUG",
"positions": [
{
"end": 138,
"start": 127
},
{
"end": 215,
"start": 204
}
],
"value": "basiliximab"
}
],
"namespace": "lifescience_med_en",
"template": "DRUGS"
},
{
"fields": [
{
"name": "DRUG",
"positions": [
{
"end": 241,
"start": 230
}
],
"value": "dulaglutide"
}
],
"namespace": "lifescience_med_en",
"template": "DRUGS"
},
{
"fields": [
{
"name": "SIGNORSYMPTOM",
"positions": [
{
"end": 426,
"start": 420
}
],
"value": "nausea"
}
],
"namespace": "lifescience_med_en",
"template": "SIGNSORSYMPTOMS"
},
{
"fields": [
{
"name": "SIGNORSYMPTOM",
"positions": [
{
"end": 436,
"start": 428
}
],
"value": "vomit"
}
],
"namespace": "lifescience_med_en",
"template": "SIGNSORSYMPTOMS"
},
{
"fields": [
{
"name": "SIGNORSYMPTOM",
"positions": [
{
"end": 450,
"start": 442
}
],
"value": "diarrhoea"
}
],
"namespace": "lifescience_med_en",
"template": "SIGNSORSYMPTOMS"
}
]