TokenizedReader
Synopsis
Reads a tokenized corpus: one token per line, empty line separates sentence.
Description
Reads a tokenized corpus: one token per line, empty line separates sentence.
Snippet
<tokenizedreader class="TokenizedReader">
<source></source>
</tokenizedreader>
Mandatory parameters
source
Mandatory
Type: SourceStream
Path to the file or directory containing the tokenized text.
Optional parameters
constantAnnotationFeatures
Optional
Type: Mapping
Constant features to add to each annotation created by this module.
constantDocumentFeatures
Optional
Type: Mapping
Constant features to add to each document created by this module.
constantSectionFeatures
Optional
Type: Mapping
Constant features to add to each section created by this module.
section
Default value: `text`
Type: String
Name of the section containing the tokenized text.
sentenceLayer
Default value: `sentences`
Type: String
Name of the sentence layer.
tokenLayer
Default value: `words`
Type: String
Name of the token layer.
Deprecated parameters
sectionName
Deprecated
Type: String
Deprecated alias for section .
sentenceLayerName
Deprecated
Type: String
Deprecated alias for sentenceLayer .
tokenLayerName
Deprecated
Type: String
Deprecated alias for tokenLayer .