AlvisNLP

corpus processing engine

YateaExtractor

Synopsis

Extract terms from the corpus using the YaTeA term extractor.

Description

YateaExtractor hands the corpus to the YaTeA extractor. The corpus is first written in a file in the YaTeA input format. Tokens are annotations in the layer wordLayer , their surface form, POS tag and lemma are taken from formFeature , posFeature and lemmaFeature features respectively. If sentenceLayer is set, then an additional SENT marker is added to reinforce sentence boundaries corresponding to annotations in this layer.

The YaTeA is called using the executable set in yateaExecutable : the result will be written in xmlTermsFile and/or termListFile .

Snippet

<yateaextractor class="YateaExtractor">
    <rcFile></rcFile>
    <yateaExecutable></yateaExecutable>
</yateaextractor>

Mandatory parameters

rcFile

Mandatory

Path to the YaTeA configuration file.

yateaExecutable

Mandatory

Path to the YaTeA executable file.

Optional parameters

configDir

Optional

language

Optional
Type: String

localeDir

Optional

outputDir

Optional

perlLib

Optional
Type: String

Contents of the PERLLIB in the environment of Yatea binary.

postProcessingConfig

Optional
Type: InputFile

BioYaTeA option: path to the post-processing file option.

postProcessingOutput

Optional
Type: OutputFile

BioYaTeA option: path to the result file after post-processing.

suffix

Optional
Type: String

termListFile

Optional
Type: OutputFile

Path where to write the candidates list produced by YaTeA.

testifiedTerminology

Optional

xmlTermsFile

Optional
Type: OutputFile

Path where to write the candidates XML file produced by YaTeA.

bioYatea

Default value: `false`
Type: Boolean

documentFilter

Default value: `true`
Type: Expression

UNDOCUMENTED

documentTokens

Default value: `true`
Type: Boolean

Either to write DOCUMENT special tokens. Not every YaTeA version accepts them.

formFeature

Default value: `form`
Type: String

Feature containing the word form.

lemmaFeature

Default value: `lemma`
Type: String

Feature containing the word lemma.

posFeature

Default value: `pos`
Type: String

Feature containing the word POS tag.

sectionFilter

Default value: `true and layer:words`
Type: Expression

UNDOCUMENTED

sentenceLayer

Default value: `sentences`
Type: String

Name of the layer containing sentence annotations, sentences are reinforced.

wordLayer

Default value: `words`
Type: String

Name of the layer containing the word annotations.

yateaDefaultConfig

Default value: `{}`
Type: Mapping

yateaOptions

Default value: `{}`
Type: Mapping

Deprecated parameters

sentenceLayerName

Deprecated
Type: String

Deprecated alias for sentenceLayer .

wordLayerName

Deprecated
Type: String

Deprecated alias for wordLayer .

workingDir

Deprecated

Path to the directory where YaTeA is launched. This parameter is deprecated , use xmlTermsFile and termListFile instead.