Language Detector processor
Description
The Language Detector processor predicts the languages in which a text is written. It can accept fragmented text from the Text Fragmenter processor, and, in that case, it make predictions about the languages for each fragment too.
Input
The processor requires the input JSON to contain at least this top level key:
"text": "text"
where text
is the text to process.
Optional top level keys are:
"fragments": fragments
and:
"options": options
fragments
is an object and corresponds to the value of the fragments
key in the output of a Text Fragmenter block. With this input, the processor makes predictions about the language of each fragment in addition to the predictions for the whole text.
options
is an object. Its properties can be used to override the values of block's functional properties (see below). This is the correspondence between the properties of options
and the functional properties of the block:
Object property | Corresponding functional parameter |
---|---|
languages |
Detectable languages |
outputText |
Propagate input text to output |
enableOthers |
Enable "Other language" prediction |
maxPredictions |
Max number of predictions |
Block properties
Block properties can be set by editing the block.
Language Detector workflow blocks have the following properties:
-
Common:
- The unique block ID and the service version, displayed in the title bar (read only, displayed also in the block tooltip in the canvas).
- Block name: the block name, it can be edited.
- Description: the description of the processor (read only).
-
Type Specific:
- Timeout: execution timeout expressed in minutes (m) or seconds (s).
-
Functional:
-
Detectable languages
The comma separated list of ISO-639-1 codes1 of the languages the processor can choose from when making predictions.
-
Propagate input text to output: when turned on, the input key
text
is echoed in the output JSON. Default: off. -
Enable "Other language" prediction: enables the prediction of a
other
label corresponding to languages that are not listed in Detectable languages. Default: on. -
Max number of predictions: maximum number of predictions. Default: 10.
-
-
Deployment:
- Replicas: number of required instances
- Memory: required memory
- CPU: thousandths of a CPU required (for example: 1000 = 1 CPUs)
-
Input
These property correspond to the top level key of the input JSON.
They need to be set only when input mapping is necessary.
Output
In case the input contains only text—no fragments—, the block output has this structure:
{
"prediction": {}
}
If there are also fragments in the input, the output has this structure:
{
"fragmentsPredictions": [],
"prediction": {}
}
If input key options.outputText
is set to true
or is missing and functional property Propagate input text to output is turned on, the output also contains a top level key text
which is the echo of input key text
, for example:
{
"prediction": {
"others": [
{
"label": "de",
"score": 0.004342068452388048
},
{
"label": "es",
"score": 0.0034704774152487518
},
{
"label": "ru",
"score": 0.0028054893482476475
}
]
"winner": {
"label": "en",
"score": 0.9228296875953674
}
},
"text": "How to Pick the Right Coffee Table\nWhen you shop for a coffee table you may be overwhelmed by the wealth of choices available. Coffee tables, sometimes called cocktail tables, come in many styles and materials. Whether you have a comfortable farmhouse look, breezy coastal decor or sleek contemporary furniture, you can find the perfect coffee table for your main living space. If you make the coffee table the last piece of furniture you choose for the room, it is easier to judge the right style, color, material, size and shape.\nHere are some guidelines for finding just the right coffee table to hold the remote and a drink when you settle in for a night of relaxation:\n1. Choose a Style\nRemember that as functional as a coffee table may be, it is really an example of accent furniture."
}
The prediction
object has this structure:
"prediction": {
"others": [],
"winner": {}
}
winner
is an object corresponding to the most likely language prediction for the entire text. It has these properties:
label
(string): ISO-639-1 code of the predicted languagescore
(decimal number between 0 and 1): confidence score of the prediction
others
is an array with one item for each least likely language.
Each item has the same structure as the winner
object, with a label and a confidence score. In the array, the items are sorted in descending order on the value of the score property, so the labels with the highest confidence score are found first.
The total number of predictions is influenced by the values of the functional properties of the block, possibly overwritten using the options
input key.
The total number of languages the processor can choose from is determined by the input key options.languages
or, if missing, by the Detectable languages property, with the possible addition of the other
label—corresponding to extra languages—when input key options.enableOthers
is true
or, if missing, property Enable "Other language" prediction is turned on.
In any case, the total number of predictions is at most equal to the value of input key options.maxPredictions
or, if this key is missing, the value of the Max number of predictions property.
fragmentsPredictions
is an array of objects, each of which contains the language predictions for one of the fragments passed in input using the fragments
key.
Each item has this structure:
{
"others": [],
"position": {},
"winner": {}
}
where winner
and others
have the same structure and the same meaning—but with a scope equal to the text fragment—of the homonymous properties of the prediction
object, while position
contains the fragment position in the text and is the echo of the item in input array fragments.positions
that corresponds to the fragment.
-
There are the ISO-639-1 codes of the languages that can be detected: af, als, am, an, ar, arz, asm, ast, av, az, azb, ba, bar, bcl, be, bg, bh, bn, bo, bpy, br, bs, bxr, ca, cbk, ce, ceb, ckb, co, cs, cv, cy, da, de, diq, dsb, dty, dv, el, eml, en, eo, es, et, eu, fa, fi, fr, frr, fy, ga, gd, gl, gn, gom, gu, gv, he, hi, hif, hr, hsb, ht, hu, hy, ia, id, ie, ilo, io, is, it, ja, jbo, jv, ka, kk, km, kn, ko, krc, ku, kv, kw, ky, la, lb, lez, li, lmo, lo, lrc, lt, lv, mai, mg, mhr, min, mk, ml, mn, mr, mrj, ms, mt, mwl, my, myv, mzn, nah, nap, nds, ne, new, nl, nn, no, oc, or, os, pa, pam, pfl, pl, pms, pnb, ps, ps, pt, qu, rm, ru, rue, sa, sah, sc, scn, sco, sd, sh, si, sk, sl, so, sq, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, tyv, ug, uk, ur, uz, vec, vep, vi, vls, vo, wa, war, wuu, xal, xmf, yi, yo, yue, zh ↩