AlvisNLP

corpus processing engine

OgmiosTokenizer

Synopsis

Tokenizes the sections contents according to the Ogmios tokenizer specifications.

Description

OgmiosTokenizer creates an annotation for each token found in the section contents according to the Ogmios tokenizer specifications and adds these annotations to the targetLayer layer. The created annotations have a the feature tokenTypeFeature with one of the values:

If separatorTokens is false, the OgmiosTokenizer does not create annotations corresponding to whitespace tokens.

Snippet

<ogmiostokenizer class="OgmiosTokenizer">
    <targetLayer></targetLayer>
    <tokenTypeFeature></tokenTypeFeature>
</ogmiostokenizer>

Mandatory parameters

targetLayer

Mandatory
Type: String

Name of the layer where to store the tokens.

tokenTypeFeature

Mandatory
Type: String

Name of the token feature where to store the token type (alpha, num, sep, symb).

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this filter.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this filter.

separatorTokens

Default value: `true`
Type: Boolean

Either if separator tokens should be added.

Deprecated parameters

targetLayerName

Deprecated
Type: String

Deprecated alias for targetLayer .