AlvisNLP

corpus processing engine

SeSMig

Synopsis

Detects sentence boundaries and creates one annotation for each sentence.

This module assumes WoSMig processed the same sections.

Description

SeSMig scans for annotations in wordLayer and detects a sentence boundaries defined as either:

SeSMig creates an annotation for each sentence and adds it into the targetLayer . The eosStatusFeature of word annotations are given a new value:

If noBreakLayer is defined, then SeSMig will prevent sentence boundaries inside annotations in this layer.

Snippet

<sesmig class="SeSMig">
</sesmig>

Mandatory parameters

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

noBreakLayer

Optional
Type: String

Name of the layer containing annotations within which there cannot be sentence boundaries.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

eosStatusFeature

Default value: `eos`
Type: String

Name of the feature (in words) containing the end-of-sentence status (not-eos, maybe-eos).

formFeature

Default value: `form`
Type: String

Name of the feature containing the word surface form.

sectionFilter

Default value: `true and layer:words`
Type: Expression

Process only sections that satisfy this expression.

strongPunctuations

Default value: `?.!`
Type: String

List of strong punctuations.

targetLayer

Default value: `sentences`
Type: String

Name of the layer where to store sentence annotations.

typeFeature

Default value: `wordType`
Type: String

Name of the feature where to read word annotation type.

wordLayer

Default value: `words`
Type: String

Name of the layer containing word annotations.

Deprecated parameters

noBreakLayerName

Deprecated
Type: String

Deprecated alias for noBreakLayer .

targetLayerName

Deprecated
Type: String

Deprecated alias for targetLayer .

wordLayerName

Deprecated
Type: String

Deprecated alias for wordLayer .