AlvisNLP

corpus processing engine

KeywordsSelector

Synopsis

Selects most relevant keywords in documents.

Description

KeywordsSelector selects the most relevant keywords in documents. The candidate keywords are specified with keywords evaluated as a list of elements with the document as the context element. The keyword text is specified by keywordForm .

KeywordsSelector ranks the keywords according to the scoreFunction function, then selects the keywordCount keywords with the highest value. The selected keywords are stored in the document feature keywordFeature , and the corresponding scores in scoreFeature .

Snippet

<keywordsselector class="KeywordsSelector">
</keywordsselector>

Mandatory parameters

Optional parameters

keywordFeature

Optional
Type: String

Document feature where to store the selected keywords.

outFile

Optional

scoreFeature

Optional
Type: String

Document feature where to store the score of selected keywords computed by scoreFunction .

charset

Default value: `UTF-8`
Type: String

documentId

Default value: `@id`
Type: Expression

documents

Default value: `documents`
Type: Expression

keywordCount

Default value: `2147483647`
Type: Integer

Number of keywords to select.

keywordForm

Default value: `@form`
Type: Expression

Text of the keyword. This expression is evaluated as a string with the keyword element as the context.

keywords

Default value: `sections.layer:words`
Type: Expression

Expression evaluated as a list of elements with the document as the context element. Each element represents a keyword of the document.

scoreFunction

Default value: `ABSOLUTE`

Function to use for ranking keywords. Available functions include the keyword frequency, different variants of tf-idf and Okapi BM25.

scoreThreshold

Default value: `0.0`
Type: Double

separator

Default value: ` `
Type: Character

Deprecated parameters