AlvisNLP

corpus processing engine

WapitiTrain

Synopsis

Train a CRF model using Wapiti .

Description

WapitiTrain trains a CRF sequence tagging model using Wapiti . The sequences are generated by annotations in tokenLayer segmented with annotations in sentenceLayer .

The trained model is written in modelFile . Use this model with WapitiLabel to run predictions on unlabelled sequences.

Wapiti requires a set of patterns that specify the dependencies of each token. The patterns are specified in the file patternFile in the CRF++ template language .

Snippet

<wapititrain class="WapitiTrain">
    <features></features>
    <modelFile></modelFile>
    <wapitiExecutable></wapitiExecutable>
</wapititrain>

Mandatory parameters

features

Mandatory

A list of expressions evaluated as strings from the token annotation. The result represents the set of features of each token. For the training phase, the last feature is the label.

modelFile

Mandatory
Type: OutputFile

Path of the trained model file.

wapitiExecutable

Mandatory

Path to the wapiti executable.

Optional parameters

commandLineOptions

Optional
Type: String[]

Additional command line options to pass to wapiti . See the Wapiti manual for the list of options. Note that options mode , -T , -a , -p , -m are set automatically by WapitiTrain .

modelType

Optional
Type: String

Model type. Allowed values are: maxent , memm , crf (default).

patternFile

Optional
Type: InputFile

Pattern file that specifies token dependencies.

trainAlgorithm

Optional
Type: String

Training algorithm. Allowed values are: l-bfgs (default), sgd-l1 , bcd , rprop , rprop+ , rprop- .

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

sectionFilter

Default value: `true and layer:words`
Type: Expression

Process only sections that satisfy this expression.

sentenceLayer

Default value: `sentences`
Type: String

Layer containing sentence annotations.

tokenLayer

Default value: `words`
Type: String

Layer containing token annotations.

Deprecated parameters

sentenceLayerName

Deprecated
Type: String

Deprecated alias for sentenceLayer .

tokenLayerName

Deprecated
Type: String

Deprecated alias for tokenLayer .