Word2Vec
Synopsis
Computes word embeddings using the CONTES/Gensis implementation.
This module is experimental.
Description
Computes word embeddings using the CONTES/Gensis implementation.
Snippet
<word2vec class="Word2Vec">
<contesDir></contesDir>
<python3Executable></python3Executable>
<workers></workers>
</word2vec>
Mandatory parameters
contesDir
Root directory of CONTES.
python3Executable
Path to the Python 3 executable.
workers
Use this many worker threads to train the model (=faster training with multicore machines).
Optional parameters
additionalArguments
UNDOCUMENTED
jsonFile
File where to write embeddings as a JSON object.
modelFile
UNDOCUMENTED
txtFile
File where to write embeddings as a table.
vectorFeature
Name of the feature where to store embeddings of each token. If this parameter is not set, then embeddings are not stored in any feature.
documentFilter
Only process document that satisfy this expression.
formFeature
Feature to use as word form.
minCount
UNDOCUMENTED
sectionFilter
Process only sections that satisfy this expression.
sentenceLayer
Name of the layer containing sentence annotations.
tokenLayer
Name of the layer containing token annotations.
vectorSize
The dimensionality of the feature vectors. Often effective between 100 and 300.
windowSize
The maximum distance between the current and predicted word within a sentence.
Deprecated parameters
formFeatureName
Deprecated alias for formFeature .
sentenceLayerName
Deprecated alias for sentenceLayer .
tokenLayerName
Deprecated alias for tokenLayer .
vectorFeatureName
Deprecated alias for vectorFeature .