AlvisNLP

corpus processing engine

OpenNLPDocumentCategorizer

Synopsis

Categorizes documents with a model trained with OpenNLPDocumentCategorizerTrain .

This module is experimental.

Description

OpenNLPDocumentCategorizer uses a model trained with OpenNLPDocumentCategorizerTrain to categorize unlabeled documents. The documents are specified by documents . The classifier algorithm uses the document content specified by tokens and form .

Snippet

<opennlpdocumentcategorizer class="OpenNLPDocumentCategorizer">
    <categoryFeature></categoryFeature>
    <model></model>
</opennlpdocumentcategorizer>

Mandatory parameters

categoryFeature

Mandatory
Type: String

Feature where to store the predicted category.

model

Mandatory

Model file generated with OpenNLPDocumentCategorizerTrain .

Optional parameters

scoreFeature

Optional
Type: String

Feature where to store the score of the predicted category.

scoresFeaturePrefix

Optional
Type: String

Prefix of feature names where to store the score of each category.

documents

Default value: `documents`
Type: Expression

Elements to classify. This expression is evaluated from the corpus.

form

Default value: `@form`
Type: Expression

Form of the token. This expression is evaluated as a string from the token.

tokens

Default value: `sections.layer:words`
Type: Expression

Tokens of the elements to classify. This expression is evaluated as a list of elements from the element to classify.

Deprecated parameters