AlvisNLP

corpus processing engine

PythonScript

Synopsis

Runs a Python script. This module is useful for processing the corpus with Python libraries dedicated to NLP.

This module is experimental.

Description

PythonScript assumes the script reads from standard input the AlvisNLP data structure serialized as JSON. PythonScript also assumes the script writes the modifications serialized in JSON to the standard output, unless outputFile is set.

The alvisnlp.py library facilitates the deserialization, serialization, and manipulation of the AlvisNLP data structure. It is located in the directory specified by alvisnlpPythonDirectory .

The script to run is specified with script .

Snippet

<pythonscript class="PythonScript">
    <alvisnlpPythonDirectory></alvisnlpPythonDirectory>
    <script></script>
</pythonscript>

Mandatory parameters

alvisnlpPythonDirectory

Mandatory

Directory where the AlvisNLP Python library is found. In principle this parameter is set by default during AlvisNLP install.

script

Mandatory

Path to the script to run.

Optional parameters

conda

Optional

Path to the conda executable. If not set, the PythonScript uses the conda executable from PATH. If condaEnvironment is not set then this parameter is ignored.

condaEnvironment

Optional
Type: String

Name of the conda environment in which the script must be run. If this parameter is not set, then the script is not run in a conda environment.

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

constantDocumentFeatures

Optional
Type: Mapping

Constant features to add to each document created by this module.

constantRelationFeatures

Optional
Type: Mapping

Constant features to add to each relation created by this module.

constantSectionFeatures

Optional
Type: Mapping

Constant features to add to each section created by this module.

constantTupleFeatures

Optional
Type: Mapping

Constant features to add to each tuple created by this module.

environment

Optional
Type: Mapping

Additional variable values to pass to the script’s environment.

layers

Optional
Type: String[]

Names of layers to serialize. Layers not mentioned in this parameter will not be serialized. Use this to limit the amount of serialized data. By default PythonScript serializes all annotations in all layers.

outputFile

Optional
Type: OutputFile

Path where to write the script standard output. If this parameter is set, then PythonScript will not read the script output for modifications.

python

Optional

Path to the python executable. By default, let the PATH environment determine the location of the Python executable.

relations

Optional
Type: String[]

Names of relations to serialize. Relations not mentioned in this parameter will not be serialized. Use this to limit the amount of serialized data. By default PythonScript serializes all tuples in all relations.

workingDirectory

Optional

Directory where to run the script. By default the working directory of AlvisNLP.

callPython

Default value: `false`
Type: Boolean

Either to call Python interpreter as executable and the script as the command. If this parameter is false , then the user must have execution rights on the script, and the script must have the appropriate shebang to locate the Python interpreter.

commandLine

Default value: ``
Type: String[]

Additional command line arguments to pass to the script.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

scriptParams

Default value: `{}`

Parameters to pass through the the serialized data structure. Expressions are evaluated from the corpus as strings.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this expression.

Deprecated parameters

layerNames

Deprecated
Type: String[]

Deprecated alias for layers .

relationNames

Deprecated
Type: String[]

Deprecated alias for relations .