TreeTaggerReader
Synopsis
Read files in tree-tagger output format and creates a document for each file read.
Description
Each document contains a single section named sectionName ; its contents is constructed by concatenating the first column of each token separated with a space character.
TreeTaggerReader keeps the tree-tagger tokenization in annotations added into the layer wordLayer . The POS tag and lemma are recorded in the annotation’s posFeature and lemmaFeature features respectively.
The document identifier is the path of the corresponding file.
Snippet
<treetaggerreader class="TreeTaggerReader">
<source></source>
</treetaggerreader>
Mandatory parameters
source
Path to the source directory or source file.
Optional parameters
constantAnnotationFeatures
Constant features to add to each annotation created by this module.
constantDocumentFeatures
Constant features to add to each document created by this module.
constantSectionFeatures
Constant features to add to each section created by this module.
lemmaFeature
Name of the feature where to store word lemmas.
posFeature
Name of the feature where to store word POS tags.
charset
Character set of input files.
sectionName
Name of the section of each document.
sentenceLayer
Name of the layer where to store sentence annotations.
wordLayer
Name of the layer where to store word annotations.
Deprecated parameters
lemmaFeatureKey
Deprecated alias for lemmaFeature .
posFeatureKey
Deprecated alias for posFeature .
sentenceLayerName
Deprecated alias for sentenceLayer .
sourcePath
Alias for source . Use source instead.
wordLayerName
Deprecated alias for wordLayer .