TreeTaggerTermsProjector
Synopsis
Project terms from a lexicon in tree tagger format.
Description
TreeTaggerTermsProjector reads termsFile and assumes a 3-column tree-tagger format. Entries shall be separated by a period ( ./SENT/. ).
Snippet
<treetaggertermsprojector class="TreeTaggerTermsProjector">
<targetLayer></targetLayer>
<termsFile></termsFile>
</treetaggertermsprojector>
Mandatory parameters
targetLayer
Name of the layer that contains the match annotations.
termsFile
File where to read entries.
Optional parameters
constantAnnotationFeatures
Constant features to add to each annotation created by this module.
trieSink
If set, then TreeTaggerTermsProjector writes the compiled dictionary to the specified file.
trieSource
If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.
allUpperCaseInsensitive
If set to true , then allow case folding on all characters in words that are all upper case.
allowJoined
If set to true , then allow arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the key amino acid .
caseInsensitive
If set to true , then allows case folding on all characters.
documentFilter
Only process document that satisfy this expression.
ignoreDiacritics
If set to true , then allow dicacritic removal on all characters. For instance the contents acide amine matches the key acide aminé .
joinDash
If set to true , then treat dash characters (-) as whitespace characters with regard to allowJoined . For instance, the contents aminoacid matches the entry amino-acid .
lemmaFeature
Feature where to store the term lemma.
lemmaKeys
Use lemmas as keys instead of the surface form.
matchStartCaseInsensitive
If set to true , then allow case folding on the first character of the entry key.
multipleEntryBehaviour
Specifies the behavior if the lexicon contains several entries with the same key.
posFeature
Feature where to store the POS tag of matches.
sectionFilter
Process only sections that satisfy this expression.
skipConsecutiveWhitespaces
If set to true , then allow the insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid .
skipWhitespace
If set to true , then allow arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the key aminoacid .
subject
Specifies the contents to match.
substituteWhitespace
If set to true , then all whitespace characters match each other (including ‘\n’, ‘\r’, ‘\t’, and non-breaking spaces).
termFeature
Feature where to store the term surface form.
wordStartCaseInsensitive
If set to true , then allow case folding on the first character of each word.
Deprecated parameters
lemmaFeatureName
Deprecated alias for lemmaFeature .
posFeatureName
Deprecated alias for posFeature .
targetLayerName
Deprecated alias for targetLayer .
termFeatureName
Deprecated alias for termFeature .