AlvisNLP

corpus processing engine

PESVReader

Synopsis

Read documents and entities in the PESV format.

Description

PESVReader reads CSV files in docStream and creates one document for each record. The identifier of the document is the id column. The section content is created from the tokenization provided in the processed_text column. The tokenization itself is recorded in the layer named after tokenLayer .

PESVReader also reads CSV files in entitiesStream and creates one entity annotation in the layer named entityLayer for each record. All properties are recorded in the corresponding feature, as well as in a single feature names propertiesFeature .

Snippet

<pesvreader class="PESVReader">
    <docStream></docStream>
    <entitiesStream></entitiesStream>
</pesvreader>

Mandatory parameters

docStream

Mandatory

Path to the file(s) or directory(ies) where to look for document files.

entitiesStream

Mandatory

Path to the file(s) or directory(ies) where to look for entities files.

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

constantDocumentFeatures

Optional
Type: Mapping

Constant features to add to each document created by this module.

constantSectionFeatures

Optional
Type: Mapping

Constant features to add to each section created by this module.

entityLayer

Default value: `entities`
Type: String

Name of the layer where to create entities.

ordFeature

Default value: `ord`
Type: String

Name of the feature where to record the token ordinal.

propertiesFeature

Default value: `properties`
Type: String

Name of the feature where to record entities properties. PESVReader also records each property in a separate feature.

section

Default value: `text`
Type: String

Name of the (unique) section.

tokenLayer

Default value: `tokens`
Type: String

Name of the layer where to create tokens.

Deprecated parameters

entityLayerName

Deprecated
Type: String

Deprecated alias for entityLayer .

ordFeatureKey

Deprecated
Type: String

Deprecated alias for ordFeature .

propertiesFeatureKey

Deprecated
Type: String

Deprecated alias for propertiesFeature .

sectionName

Deprecated
Type: String

Deprecated alias for section .

tokenLayerName

Deprecated
Type: String

Deprecated alias for tokenLayer .