TextFileReader
Synopsis
Reads files and adds a document in the corpus for each file.
Description
TextFileReader reads file(s) from source and creates a document in the corpus for each file. The identifier of the created document is the absolute path of the corresponding file. The created document has a single section named section whose contents is the contents of the corresponding file.
If source is a path to a file, then TextFileReader will read this file. If source is a path to a directory, then TextFileReader will read the files in this directory.
If linesLimit is set, then TextFileReader creates a new document for each set of lines. For instance, if linesLimit is set to 10 and a file contains 25 lines, then 3 documents are created: two containing 10 lines and one containing the las 5 lines.
Files are read using the same encoding charset .
The created documents will all have the features defined in constantDocumentFeatures . The unique section will have the features defined in constantSectionFeatures .
Snippet
<textfilereader class="TextFileReader">
<source></source>
</textfilereader>
Mandatory parameters
source
Path to the source directory or source file.
Optional parameters
constantDocumentFeatures
Constant features to add to each document created by this module.
constantSectionFeatures
Constant features to add to each section created by this module.
linesLimit
Maximum number of lines per document.
sizeLimit
Maximum number of characters per document. No limit if not set.
baseNameId
Use the filename base name instead of the full path as document identifier.
charset
Character set of the input files.
section
Name of the single section containing the whole contents of a file.
Deprecated parameters
sectionName
Deprecated alias for section .