WapitiTrain
Synopsis
Train a CRF model using Wapiti .
Description
WapitiTrain trains a CRF sequence tagging model using Wapiti . The sequences are generated by annotations in tokenLayer segmented with annotations in sentenceLayer .
The trained model is written in modelFile . Use this model with WapitiLabel to run predictions on unlabelled sequences.
Wapiti requires a set of patterns that specify the dependencies of each token. The patterns are specified in the file patternFile in the CRF++ template language .
Snippet
<wapititrain class="WapitiTrain">
<features></features>
<modelFile></modelFile>
<wapitiExecutable></wapitiExecutable>
</wapititrain>
Mandatory parameters
features
A list of expressions evaluated as strings from the token annotation. The result represents the set of features of each token. For the training phase, the last feature is the label.
modelFile
Path of the trained model file.
wapitiExecutable
Path to the wapiti
executable.
Optional parameters
commandLineOptions
Additional command line options to pass to wapiti
. See the Wapiti manual for the list of options. Note that options mode
, -T
, -a
, -p
, -m
are set automatically by WapitiTrain .
modelType
Model type. Allowed values are: maxent , memm , crf (default).
patternFile
Pattern file that specifies token dependencies.
trainAlgorithm
Training algorithm. Allowed values are: l-bfgs (default), sgd-l1 , bcd , rprop , rprop+ , rprop- .
documentFilter
Only process document that satisfy this expression.
sectionFilter
Process only sections that satisfy this expression.
sentenceLayer
Layer containing sentence annotations.
tokenLayer
Layer containing token annotations.
Deprecated parameters
sentenceLayerName
Deprecated alias for sentenceLayer .
tokenLayerName
Deprecated alias for tokenLayer .