Language Detector processor
Description
The Language Detector processor predicts the languages in which a text is written. It can accept fragmented text from the Text Fragmenter processor, and, in that case, it make predictions about the languages for each fragment too.
Old version
There are two versions of the component available: the latest and 1.0.0. This version is present for backward compatibility with old workflows created with previous versions of NL Flow. For new workflows always use the latest version.
The blocks of version 1.0.0 are identical to those of version 1.9 of NL Flow, however something has changed in the block properties:
- The name of the block, instead of editing the Block name field, can be modified by selecting Edit component name .
- The Propagate input text to output block property has been removed from the Functional tab.
- The Output tab has been added (see below).
All other block properties remain unchanged, so if you are dealing with a version 1.0.0 block refer to the NL Flow version 1.9 documentation and ignore the following here except for the Output tab.
Input
The processor requires the input JSON to contain at least this top level key:
"text": "text"
Optional top level keys are:
"fragments": fragments
and:
"options": options
where:
text
(string): text to process.fragments
(object): it corresponds to the value of thefragments
key in the output of a Text Fragmenter block. With this input, the processor makes predictions about the language of each fragment in addition to the predictions for the whole text.-
options
(object): its properties can be used to override the values of block's functional properties (see below). This is the correspondence between the properties ofoptions
and the functional properties of the block:Object property Corresponding functional parameter languages
Detectable languages outputText
Propagate input text to output enableOthers
Enable "Other language" prediction maxPredictions
Max number of predictions
Block properties
Block properties can be set by editing the block.
Language Detector workflow blocks have the following properties:
-
Basic properties:
- Block name, it can be edited
- Component version (read only)
- Block ID (read only)
-
Functional:
-
Detectable languages
The comma separated list of ISO-639-1 codes1 of the languages the processor can choose from when making predictions.
-
Enable "Other language" prediction: enables the prediction of a
other
label corresponding to languages that are not listed in Detectable languages. Default: on. -
Max number of predictions: maximum number of predictions. Default: 10.
-
-
Deployment:
- Timeout: execution timeout expressed in minutes (m) or seconds (s).
- Replicas: number of required instances
- Memory: required memory
- CPU: thousandths of a CPU required (for example: 1000 = 1 CPU)
-
Input
These properties correspond to the top level key of the input JSON.
They need to be set only when input mapping is necessary. -
Output: read-only, this property is a navigable description of the structure of the output array.
Output
In case the input contains only text—no fragments—, the block output has this structure:
{
"prediction": {}
}
If there are also fragments in the input, the output has this structure:
{
"prediction": {},
"fragmentsPredictions": []
}
If input key options.outputText
is set to true
or is missing and functional property Propagate input text to output is turned on, the output also contains a top level key text
which is the echo of input key text
, for example:
{
"prediction": {
"others": [
{
"label": "de",
"score": 0.004342068452388048
},
{
"label": "es",
"score": 0.0034704774152487518
},
{
"label": "ru",
"score": 0.0028054893482476475
}
]
"winner": {
"label": "en",
"score": 0.9228296875953674
}
},
"text": "How to Pick the Right Coffee Table\nWhen you shop for a coffee table you may be overwhelmed by the wealth of choices available. Coffee tables, sometimes called cocktail tables, come in many styles and materials. Whether you have a comfortable farmhouse look, breezy coastal decor or sleek contemporary furniture, you can find the perfect coffee table for your main living space. If you make the coffee table the last piece of furniture you choose for the room, it is easier to judge the right style, color, material, size and shape.\nHere are some guidelines for finding just the right coffee table to hold the remote and a drink when you settle in for a night of relaxation:\n1. Choose a Style\nRemember that as functional as a coffee table may be, it is really an example of accent furniture."
}
The prediction
object has this structure:
"prediction": {
"winner": {},
"others": []
}
where:
-
winner
(object): it corresponds to the most likely language prediction for the entire text. It has these properties:label
(string): ISO-639-1 code of the predicted languagescore
(decimal number between 0 and 1): confidence score of the prediction
-
others
(array): it contains one item for each least likely language. Each item has the same structure as thewinner
object, with a label and a confidence score. In the array, the items are sorted in descending order on the value of the score property, so the labels with the highest confidence score are found first.
The total number of predictions is influenced by the values of the functional properties of the block, possibly overwritten using the options
input key.
The total number of languages the processor can choose from is determined by the input key options.languages
or, if missing, by the Detectable languages property, with the possible addition of the other
label—corresponding to extra languages—when input key options.enableOthers
is true
or, if missing, property Enable "Other language" prediction is turned on.
In any case, the total number of predictions is at most equal to the value of input key options.maxPredictions
or, if this key is missing, the value of the Max number of predictions property.
fragmentsPredictions
is an array of objects, each of which contains the language predictions for one of the fragments passed in input using the fragments
key.
Each item has this structure:
{
"others": [],
"position": {},
"winner": {}
}
where:
winner
andothers
have the same structure and the same meaning—but with a scope equal to the text fragment—of the homonymous properties of theprediction
object.position
contains the fragment position in the text and is the echo of the item in input arrayfragments.positions
that corresponds to the fragment.
-
There are the ISO-639-1 codes of the languages that can be detected: af, als, am, an, ar, arz, asm, ast, av, az, azb, ba, bar, bcl, be, bg, bh, bn, bo, bpy, br, bs, bxr, ca, cbk, ce, ceb, ckb, co, cs, cv, cy, da, de, diq, dsb, dty, dv, el, eml, en, eo, es, et, eu, fa, fi, fr, frr, fy, ga, gd, gl, gn, gom, gu, gv, he, hi, hif, hr, hsb, ht, hu, hy, ia, id, ie, ilo, io, is, it, ja, jbo, jv, ka, kk, km, kn, ko, krc, ku, kv, kw, ky, la, lb, lez, li, lmo, lo, lrc, lt, lv, mai, mg, mhr, min, mk, ml, mn, mr, mrj, ms, mt, mwl, my, myv, mzn, nah, nap, nds, ne, new, nl, nn, no, oc, or, os, pa, pam, pfl, pl, pms, pnb, ps, ps, pt, qu, rm, ru, rue, sa, sah, sc, scn, sco, sd, sh, si, sk, sl, so, sq, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, tyv, ug, uk, ur, uz, vec, vep, vi, vls, vo, wa, war, wuu, xal, xmf, yi, yo, yue, zh ↩