TreeTaggerReader

Synopsis

Read files in tree-tagger output format and creates a document for each file read.

Description

Each document contains a single section named sectionName ; its contents is constructed by concatenating the first column of each token separated with a space character.

TreeTaggerReader keeps the tree-tagger tokenization in annotations added into the layer wordLayer . The POS tag and lemma are recorded in the annotation’s posFeature and lemmaFeature features respectively.

The document identifier is the path of the corresponding file.

Snippet

<treetaggerreader class="TreeTaggerReader">
    <source></source>
</treetaggerreader>

Mandatory parameters

source

Mandatory

Type: SourceStream

Path to the source directory or source file.

Optional parameters

constantAnnotationFeatures

Optional

Type: Mapping

Constant features to add to each annotation created by this module.

constantDocumentFeatures

Optional

Type: Mapping

Constant features to add to each document created by this module.

constantSectionFeatures

Optional

Type: Mapping

Constant features to add to each section created by this module.

lemmaFeature

Optional

Type: String

Name of the feature where to store word lemmas.

posFeature

Optional

Type: String

Name of the feature where to store word POS tags.

charset

Default value: `UTF-8`

Type: String

Character set of input files.

sectionName

Default value: `text`

Type: String

Name of the section of each document.

sentenceLayer

Default value: `sentences`

Type: String

Name of the layer where to store sentence annotations.

wordLayer

Default value: `words`

Type: String

Name of the layer where to store word annotations.

Deprecated parameters

lemmaFeatureKey

Deprecated

Type: String

Deprecated alias for lemmaFeature .

posFeatureKey

Deprecated

Type: String

Deprecated alias for posFeature .

sentenceLayerName

Deprecated

Type: String

Deprecated alias for sentenceLayer .

sourcePath

Deprecated

Type: SourceStream

Alias for source . Use source instead.

wordLayerName

Deprecated

Type: String

Deprecated alias for wordLayer .