AlvisNLP

corpus processing engine

StanfordCoreNLP

Synopsis

Process the documents with Stanford’s CoreNLP .

This module is experimental.

Description

StanfordCoreNLP tokenizes, POS-tags and lemmatizes each section using CoreNLP .

If ner is set then StanfordCoreNLP also performs Named Entity Recognition. Refer to CoreNLP NER for details on methods and Named Entity Types.

If parse is set, then StanfordCoreNLP parses the sentences and creates dependencies tuples in dependencyRelation .

If pretokenized is set, then StanfordCoreNLP will not create annotations for tokens and sentences. Thus the segmentation must be performed beforehand.

Snippet

<stanfordcorenlp class="StanfordCoreNLP">
</stanfordcorenlp>

Mandatory parameters

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

constantRelationFeatures

Optional
Type: Mapping

Constant features to add to each relation created by this module.

constantTupleFeatures

Optional
Type: Mapping

Constant features to add to each tuple created by this module.

dependencyLabelFeature

Default value: `label`
Type: String

Name of the feature where to store the dependency label.

dependencyRelation

Default value: `dependencies`
Type: String

Name of the relation where to store dependency tuples.

dependencySentenceRole

Default value: `sentence`
Type: String

Name of the role of the dependency tuple argument that references the parsed sentence.

dependentRole

Default value: `dependent`
Type: String

Name of the role of the dependency tuple argument that references the modifier (dependent) token.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

headRole

Default value: `head`
Type: String

Name of the role of the dependency tuple argument that references the head (governor) token.

lemmaFeature

Default value: `lemma`
Type: String

Feature where to record the lemma.

namedEntityLayer

Default value: `named-entities`
Type: String

Layer where to create named entity annotations.

namedEntityTypeFeature

Default value: `ne-type`
Type: String

Feature where to record the named entity type.

ner

Default value: `false`
Type: Boolean

Perform NER.

parse

Default value: `false`
Type: Boolean

Perform dependency parsing.

pipelineProperties

Default value: `{}`
Type: Mapping

Additional properties to pass to CoreNLP pipeline. See the documentation of each pipeline annotator for available options.

posFeature

Default value: `pos`
Type: String

Feature where to record the POS tag.

pretokenized

Default value: `false`
Type: Boolean

Do not perform tokenization and sentence splitting. Read tokens and sentences generated by previous steps.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this expression.

sentenceLayer

Default value: `sentences`
Type: String

Layer where to place (or read if pretokenized is set) sentence annotations.

wordLayer

Default value: `words`
Type: String

Layer where to place (or read if pretokenized is set) tokens annotations.

Deprecated parameters