AlvisNLP

corpus processing engine

TreeTaggerTermsProjector

Synopsis

Project terms from a lexicon in tree tagger format.

Description

TreeTaggerTermsProjector reads termsFile and assumes a 3-column tree-tagger format. Entries shall be separated by a period ( ./SENT/. ).

Snippet

<treetaggertermsprojector class="TreeTaggerTermsProjector">
    <targetLayer></targetLayer>
    <termsFile></termsFile>
</treetaggertermsprojector>

Mandatory parameters

targetLayer

Mandatory
Type: String

Name of the layer that contains the match annotations.

termsFile

Mandatory

File where to read entries.

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

trieSink

Optional
Type: OutputFile

If set, then TreeTaggerTermsProjector writes the compiled dictionary to the specified file.

trieSource

Optional
Type: InputFile

If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.

allUpperCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on all characters in words that are all upper case.

allowJoined

Default value: `false`
Type: Boolean

If set to true , then allow arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the key amino acid .

caseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allows case folding on all characters.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

ignoreDiacritics

Default value: `false`
Type: Boolean

If set to true , then allow dicacritic removal on all characters. For instance the contents acide amine matches the key acide aminé .

joinDash

Default value: `false`
Type: Boolean

If set to true , then treat dash characters (-) as whitespace characters with regard to allowJoined . For instance, the contents aminoacid matches the entry amino-acid .

lemmaFeature

Default value: `lemma`
Type: String

Feature where to store the term lemma.

lemmaKeys

Default value: `false`
Type: Boolean

Use lemmas as keys instead of the surface form.

matchStartCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on the first character of the entry key.

multipleEntryBehaviour

Default value: `all`

Specifies the behavior if the lexicon contains several entries with the same key.

posFeature

Default value: `pos`
Type: String

Feature where to store the POS tag of matches.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this expression.

skipConsecutiveWhitespaces

Default value: `false`
Type: Boolean

If set to true , then allow the insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid .

skipWhitespace

Default value: `false`
Type: Boolean

If set to true , then allow arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the key aminoacid .

subject

Default value: `WORD`
Type: Subject

Specifies the contents to match.

substituteWhitespace

Default value: `false`
Type: Boolean

If set to true , then all whitespace characters match each other (including ‘\n’, ‘\r’, ‘\t’, and non-breaking spaces).

termFeature

Default value: `term`
Type: String

Feature where to store the term surface form.

wordStartCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on the first character of each word.

Deprecated parameters

lemmaFeatureName

Deprecated
Type: String

Deprecated alias for lemmaFeature .

posFeatureName

Deprecated
Type: String

Deprecated alias for posFeature .

targetLayerName

Deprecated
Type: String

Deprecated alias for targetLayer .

termFeatureName

Deprecated
Type: String

Deprecated alias for termFeature .