AlvisNLP

corpus processing engine

Stanza

Synopsis

Applies a Stanza pipeline on the sections.

This module is experimental.

Description

Stanza applies a Stanza pipeline on the contents of sections.

By default the pipeline tokenizes and predicts POS-tags. Stanza also applies dependency parsing if parse is set, constituency parsing if constituency is set, and named entity recognition if ner is set.

The tokenization can be inhibited for using the existing tokens and sentences by setting pretokenized .

Snippet

<stanza class="Stanza">
    <alvisnlpPythonDirectory></alvisnlpPythonDirectory>
</stanza>

Mandatory parameters

alvisnlpPythonDirectory

Mandatory

Directory where the AlvisNLP Python library is found. In principle this parameter is set by default during AlvisNLP install.

Optional parameters

conda

Optional

Path to the conda executable. If not set, the Stanza uses the conda executable from PATH. If condaEnvironment is not set then this parameter is ignored.

condaEnvironment

Optional
Type: String

Name of the conda environment in which the script must be run. If this parameter is not set, then the script is not run in a conda environment.

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

constantDocumentFeatures

Optional
Type: Mapping

Constant features to add to each document created by this module.

constantRelationFeatures

Optional
Type: Mapping

Constant features to add to each relation created by this module.

constantSectionFeatures

Optional
Type: Mapping

Constant features to add to each section created by this module.

constantTupleFeatures

Optional
Type: Mapping

Constant features to add to each tuple created by this module.

environment

Optional
Type: Mapping

Additional variable values to pass to the script’s environment.

python

Optional

Path to the python executable. By default, let the PATH environment determine the location of the Python executable.

workingDirectory

Optional

Directory where to run the script. By default the working directory of AlvisNLP.

constituency

Default value: `false`
Type: Boolean

Either to predict constituents.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

language

Default value: `en`
Type: String

Language of the text.

ner

Default value: `false`
Type: Boolean

Either to perform NER. Named entities will be stored in a layer named entities .

parse

Default value: `false`
Type: Boolean

Either to predict dependency trees.

pretokenized

Default value: `false`
Type: Boolean

Either to skip tokenization and use the existing tokens and sentences.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this expression.

Deprecated parameters