WapitiTrain

Synopsis

Train a CRF model using Wapiti .

Description

WapitiTrain trains a CRF sequence tagging model using Wapiti . The sequences are generated by annotations in tokenLayer segmented with annotations in sentenceLayer .

The trained model is written in modelFile . Use this model with WapitiLabel to run predictions on unlabelled sequences.

Wapiti requires a set of patterns that specify the dependencies of each token. The patterns are specified in the file patternFile in the CRF++ template language .

Snippet

<wapititrain class="WapitiTrain">
    <features></features>
    <modelFile></modelFile>
    <wapitiExecutable></wapitiExecutable>
</wapititrain>

Mandatory parameters

features

Mandatory

Type: Expression[]

A list of expressions evaluated as strings from the token annotation. The result represents the set of features of each token. For the training phase, the last feature is the label.

modelFile

Mandatory

Type: OutputFile

Path of the trained model file.

wapitiExecutable

Mandatory

Type: ExecutableFile

Path to the wapiti executable.

Optional parameters

commandLineOptions

Optional

Type: String[]

Additional command line options to pass to wapiti . See the Wapiti manual for the list of options. Note that options mode , -T , -a , -p , -m are set automatically by WapitiTrain .

modelType

Optional

Type: String

Model type. Allowed values are: maxent , memm , crf (default).

patternFile

Optional

Type: InputFile

Pattern file that specifies token dependencies.

trainAlgorithm

Optional

Type: String

Training algorithm. Allowed values are: l-bfgs (default), sgd-l1 , bcd , rprop , rprop+ , rprop- .

documentFilter

Default value: `true`

Type: Expression

Only process document that satisfy this expression.

sectionFilter

Default value: `true and layer:words`

Type: Expression

Process only sections that satisfy this expression.

sentenceLayer

Default value: `sentences`

Type: String

Layer containing sentence annotations.

tokenLayer

Default value: `words`

Type: String

Layer containing token annotations.

Deprecated parameters

sentenceLayerName

Deprecated

Type: String

Deprecated alias for sentenceLayer .

tokenLayerName

Deprecated

Type: String

Deprecated alias for tokenLayer .