LinguaLID
Synopsis
Identifies the language of a content using Lingua .
This module is experimental.
Description
LinguaLID evaluates target as a list of elements, then evaluates form for each one as a string. The language of evaluated content is predicted using the Lingua library.
The predicted language is stored in the feature specified by languageFeature using ISO 639-1 two-letter code. Optionally the confidence score is stored in languageConfidenceFeature .
There may be more than one prediction if languageCandidates is set to a number above 1. The last language value has the highest confidence. Low-confidence predictions can be excluded by specifying a value to confidenceThreshold .
The set of predicted languages can be restricted with includeLanguages .
Snippet
<lingualid class="LinguaLID">
</lingualid>
Mandatory parameters
Optional parameters
includeLanguages
Languages to consider in the prediction. Languages can be specified using either ISO 639-1 two-letter codes, 639-3 three-letter codes, or full language name.
languageConfidenceFeature
Feature where to keep the predicition confidence score.
confidenceThreshold
Minimum value of confidence.
form
String content of the target (section contents
by default).
languageCandidates
Number of languages to predict.
languageFeature
Feature where to store the predicted language.
target
Elements to predict the language, by default document.contents
.