TabularProjector

Synopsis

Search in the sections content for entries specified in a tabular text file.

Description

TabularProjector reads a list of entries from dictFile and searches for each entry key in sections contents. The format of the dictionary is one entry per line. Each line is split into columns separated by tab characters. The column specified by keyIndex will be the entry key to be searched and the other columns are data associated to the entry.

The parameters skipBlank , skipEmpty , strictColumnNumber , trimColumns , separator , multipleEntryBehaviour control how the dictionary file is read by TabularProjector .

The parameters allowJoined , allUpperCaseInsensitive , caseInsensitive , ignoreDiacritics , joinDash , matchStartCaseInsensitive , skipConsecutiveWhitespaces , skipWhitespace and wordStartCaseInsensitive control how the keys can match the sections content.

The subject parameter specifies which text of the section should be matched. There are two alternatives:

the entries are matched on the contents of the section (the default), subject can also control if matches boundaries coincide with word delimiters;
the entries are matched on the value of a specified feature of annotations in a given layer separated by a whitespace, in this way entries can be searched against word lemmas, for instance.

TabularProjector creates an annotation for each matched key and adds these annotations to the layer specified by targetLayer . The created annotations will have features that correspond to the entry columns. Feature keys are specified by valueFeatures . For instance if valueFeatures is [a,b,c] , then each annotation will have three features named a , b and c with the respective values of the entry’s first, second and third columns. A feature name left blank in valueFeatures will not create a feature. Thus, in order to drop the first column of the entry, valueFeatures should be [,b,c] . In addition, the created annotations will have the constant features specified in constantAnnotationFeatures .

If trieSource is specified, then TabularProjector assumes that the file contains a compiled version of the dictionary. In this case dictFile is not read.

If trieSink is specified, TabularProjector writes a compiled version of the dictionary in the file. The use of compiled dictionaries may accelerate the processing for large dictionaries.

Snippet

<tabularprojector class="TabularProjector">
    <dictFile></dictFile>
    <targetLayer></targetLayer>
</tabularprojector>

Mandatory parameters

dictFile

Mandatory

Type: SourceStream

The dictionary.

targetLayer

Mandatory

Type: String

Name of the layer that contains the match annotations.

Optional parameters

constantAnnotationFeatures

Optional

Type: Mapping

Constant features to add to each annotation created by this module.

trieSink

Optional

Type: OutputFile

If set, then TabularProjector writes the compiled dictionary to the specified file.

trieSource

Optional

Type: InputFile

If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.

valueFeatures

Optional

Type: String[]

Target features in match annotations. The values are the columns in the entry. Ignored if headerLine is set (unless trieSource is set).

allUpperCaseInsensitive

Default value: `false`

TabularProjector

Synopsis

Description

Snippet

Mandatory parameters

dictFile

targetLayer

Optional parameters

constantAnnotationFeatures

trieSink

trieSource

valueFeatures

allUpperCaseInsensitive

allowJoined

caseInsensitive

documentFilter

headerLine

ignoreDiacritics

joinDash

keyIndex

matchStartCaseInsensitive

multipleEntryBehaviour

sectionFilter

separator

skipBlank

skipConsecutiveWhitespaces

skipEmpty

skipWhitespace

strictColumnNumber

subject

substituteWhitespace

trimColumns

wordStartCaseInsensitive

Deprecated parameters

targetLayerName