AlvisNLP

corpus processing engine

NGrams

Synopsis

Computes annotation n-grams.

Description

NGrams computes the n-grams of annotations in tokenLayer and creates an annotation for each n-gram. If sentenceLayer is set, then no n-gram will cross boundaries of annotations in this layer. If keepAnnotations is set, then NGrams will search for annotations with n-gram boundaries in these layers, if one annotation is found then it is recycled instead of creating a new annotation.

Snippet

<ngrams class="NGrams">
    <maxNGramSize></maxNGramSize>
    <targetLayer></targetLayer>
</ngrams>

Mandatory parameters

maxNGramSize

Mandatory
Type: Integer

Maximum number of tokens in n-grams.

targetLayer

Mandatory
Type: String

Name of the layer where to add n-gram annotations, recycled annotations will also be added in this layer.

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

keepAnnotations

Default value: ``
Type: String[]

Name of layers where to search for recycled annotations.

sectionFilter

Default value: `true and layer:words`
Type: Expression

Process only sections that satisfy this expression.

sentenceLayer

Default value: `sentences`
Type: String

Name of the sentence layer.

tokenLayer

Default value: `words`
Type: String

Name of the token layer.

Deprecated parameters

sentenceLayerName

Deprecated
Type: String

Deprecated alias for sentenceLayer .

targetLayerName

Deprecated
Type: String

Deprecated alias for targetLayer .

tokenLayerName

Deprecated
Type: String

Deprecated alias for tokenLayer .