SeSMig
Synopsis
Detects sentence boundaries and creates one annotation for each sentence.
This module assumes WoSMig processed the same sections.
Description
SeSMig scans for annotations in wordLayer and detects a sentence boundaries defined as either:
- an annotation whose feature eosStatusFeature equals eos ;
- an annotation whose surface form contains only characaters of the value of strongPunctuations and which is followed by an uppercase character;
- an annotation whose feature eosStatusFeature equals maybe-eos and which is followed by an uppercase character.
SeSMig creates an annotation for each sentence and adds it into the targetLayer . The eosStatusFeature of word annotations are given a new value:
- eos : for the last word of each sentence;
- not-eos : for all other words.
If noBreakLayer is defined, then SeSMig will prevent sentence boundaries inside annotations in this layer.
Snippet
<sesmig class="SeSMig">
</sesmig>
Mandatory parameters
Optional parameters
constantAnnotationFeatures
Constant features to add to each annotation created by this module.
noBreakLayer
Name of the layer containing annotations within which there cannot be sentence boundaries.
documentFilter
Only process document that satisfy this expression.
eosStatusFeature
Name of the feature (in words) containing the end-of-sentence status (not-eos, maybe-eos).
formFeature
Name of the feature containing the word surface form.
sectionFilter
Process only sections that satisfy this expression.
strongPunctuations
List of strong punctuations.
targetLayer
Name of the layer where to store sentence annotations.
typeFeature
Name of the feature where to read word annotation type.
wordLayer
Name of the layer containing word annotations.
Deprecated parameters
noBreakLayerName
Deprecated alias for noBreakLayer .
targetLayerName
Deprecated alias for targetLayer .
wordLayerName
Deprecated alias for wordLayer .