YateaExtractor
Synopsis
Extract terms from the corpus using the YaTeA term extractor.
Description
YateaExtractor hands the corpus to the YaTeA extractor. The corpus is first written in a file in the YaTeA input format. Tokens are annotations in the layer wordLayer , their surface form, POS tag and lemma are taken from formFeature , posFeature and lemmaFeature features respectively. If sentenceLayer is set, then an additional SENT marker is added to reinforce sentence boundaries corresponding to annotations in this layer.
The YaTeA is called using the executable set in yateaExecutable : the result will be written in xmlTermsFile and/or termListFile .
Snippet
<yateaextractor class="YateaExtractor">
<rcFile></rcFile>
<yateaExecutable></yateaExecutable>
</yateaextractor>
Mandatory parameters
rcFile
Path to the YaTeA configuration file.
yateaExecutable
Path to the YaTeA executable file.
Optional parameters
configDir
language
localeDir
outputDir
perlLib
Contents of the PERLLIB in the environment of Yatea binary.
postProcessingConfig
BioYaTeA option: path to the post-processing file option.
postProcessingOutput
BioYaTeA option: path to the result file after post-processing.
suffix
termListFile
Path where to write the candidates list produced by YaTeA.
testifiedTerminology
xmlTermsFile
Path where to write the candidates XML file produced by YaTeA.
bioYatea
documentFilter
UNDOCUMENTED
documentTokens
Either to write DOCUMENT special tokens. Not every YaTeA version accepts them.
formFeature
Feature containing the word form.
lemmaFeature
Feature containing the word lemma.
posFeature
Feature containing the word POS tag.
sectionFilter
UNDOCUMENTED
sentenceLayer
Name of the layer containing sentence annotations, sentences are reinforced.
wordLayer
Name of the layer containing the word annotations.
yateaDefaultConfig
yateaOptions
Deprecated parameters
sentenceLayerName
Deprecated alias for sentenceLayer .
wordLayerName
Deprecated alias for wordLayer .
workingDir
Path to the directory where YaTeA is launched. This parameter is deprecated , use xmlTermsFile and termListFile instead.