AlvisNLP

corpus processing engine

TomapProjector

Synopsis

TomapProjector searches for terms and associates a category identifier using ToMap .

Description

TomapProjector searches for the terms specified by yateaFile (in YaTeA XML output format) and classifies them using the ToMap classifier specified by tomapClassifier .

Snippet

<tomapprojector class="TomapProjector">
    <conceptFeature></conceptFeature>
    <targetLayer></targetLayer>
    <tomapClassifier></tomapClassifier>
    <yateaFile></yateaFile>
</tomapprojector>

Mandatory parameters

conceptFeature

Mandatory
Type: String

Name of the feature where to store the concept identifier.

If not set, the concept identifier will not be stored.

targetLayer

Mandatory
Type: String

Name of the layer that contains the match annotations.

tomapClassifier

Mandatory

Path to the file containing proxy terms, their associated identifiers and syntactic structure. Generate this file with TomapTrain .

yateaFile

Mandatory

Path to the file containing extracted terms. This file may be generated with YateaExtractor .

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

explanationFeaturePrefix

Optional
Type: String

Prefix of feature names for the assignment explanation. Features are concept-synonym , significant-head , candidate-head .

If not set, the features will not be stored.

scoreFeature

Optional
Type: String

Feature where to store the similarity between the candidate and proxy terms.

If not set, the similarity will not be stored.

trieSink

Optional
Type: OutputFile

If set, then TomapProjector writes the compiled dictionary to the specified file.

trieSource

Optional
Type: InputFile

If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.

allUpperCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on all characters in words that are all upper case.

allowJoined

Default value: `false`
Type: Boolean

If set to true , then allow arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the key amino acid .

caseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allows case folding on all characters.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

ignoreDiacritics

Default value: `false`
Type: Boolean

If set to true , then allow dicacritic removal on all characters. For instance the contents acide amine matches the key acide aminé .

joinDash

Default value: `false`
Type: Boolean

If set to true , then treat dash characters (-) as whitespace characters with regard to allowJoined . For instance, the contents aminoacid matches the entry amino-acid .

lemmaKeys

Default value: `false`
Type: Boolean

Compare candidate and proxy terms by their lemma. By default TomapProjector compares their surface forms.

This parameter also affects how significant components and token heads are matched.

matchStartCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on the first character of the entry key.

multipleEntryBehaviour

Default value: `all`

Specifies the behavior if the lexicon contains several entries with the same key.

onlyMNP

Default value: `false`
Type: Boolean

Only search for maximal noun phrase terms. By default searches for all terms.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this expression.

skipConsecutiveWhitespaces

Default value: `false`
Type: Boolean

If set to true , then allow the insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid .

skipWhitespace

Default value: `false`
Type: Boolean

If set to true , then allow arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the key aminoacid .

subject

Default value: `WORD`
Type: Subject

Specifies the contents to match.

substituteWhitespace

Default value: `false`
Type: Boolean

If set to true , then all whitespace characters match each other (including ‘\n’, ‘\r’, ‘\t’, and non-breaking spaces).

wordStartCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on the first character of each word.

Deprecated parameters

targetLayerName

Deprecated
Type: String

Deprecated alias for targetLayer .