AlvisNLP

corpus processing engine

FSOVFileReader

Synopsis

Project-specific text file reader.

Description

FSOVFileReader reads text files in the same way as TextFileReader . Additionally, for each file read, it also reads metadata in a file with the same name with the .xml extension.

Snippet

<fsovfilereader class="FSOVFileReader">
    <sourcePath></sourcePath>
    <xmlDir></xmlDir>
</fsovfilereader>

Mandatory parameters

sourcePath

Mandatory

Path to the source directory or source file.

xmlDir

Mandatory

Directory where to find metadata files.

Optional parameters

constantDocumentFeatures

Optional
Type: Mapping

Constant features to add to each document created by this module

constantSectionFeatures

Optional
Type: Mapping

Constant features to add to each section created by this module

linesLimit

Optional
Type: Integer

Maximum number of lines per document.

sizeLimit

Optional
Type: Integer

Maximum number of characters per document. No limit if not set.

bodySection

Default value: `body`
Type: String

Name of the section containing the contents of the document.

charset

Default value: `UTF-8`
Type: String

Character set of the input files.

titleSection

Default value: `title`
Type: String

Name of the section containing the title of the document.

Deprecated parameters

bodySectionName

Deprecated
Type: String

Deprecated alias for bodySection .

titleSectionName

Deprecated
Type: String

Deprecated alias for titleSection .