MultiRegExp
Synopsis
Search for several regular expressions in sections contents.
This module is experimental.
Description
MultiRegExp attempts to match regular expression patterns read from patternsFile on section contents. The patterns file is a CSV file where one column contains patterns. The patterns must follow the Java Pattern syntax .
MultiRegExp creates an annotation in targetLayer for each match. Additionally MultiRegExp adds to the annotation a feature for each column corresponding to the matched pattern.
The matches for each individual pattern will not overlap, however matches of different patterns may overlap.
Snippet
<multiregexp class="MultiRegExp">
<patternsFile></patternsFile>
<targetLayer></targetLayer>
<valueFeatures></valueFeatures>
</multiregexp>
Mandatory parameters
patternsFile
CSV file containing patterns.
targetLayer
Layer where to place annotations.
valueFeatures
Name of the features created for each annotation, corresponding to the columns of patternsFile including the patterns column.
Optional parameters
constantAnnotationFeatures
Constant features to add to each annotation created by this module.
delimiter
Column delimiter of CSV file.
escape
Character used to escape characters in column values.
headerLine
Either to skip the first row.
quote
Character used to quote the column values.
trimValues
Either to trim leading and trailing whitespaces from column values.
baseFormat
Base format of CSV file. Must be either: deault, excel, mysql, rfc4180, oracle, postgresql_csv, postgresql_text, tdf, tab.
caseInsensitive
Either the match is insensitive to case.
documentFilter
Only process document that satisfy this expression.
keyColumn
Column index that contains patterns. First column is 0
.
matchWordBoundaries
Only create annotations for matches that fit exactly between word boundaries.
sectionFilter
Process only sections that satisfy this expression.