OBOProjector
Synopsis
Projects OBO terms and synonyms on sections.
Description
OBOProjector reads oboFiles in OBO format and searches for term names and synonyms in sections.
The parameters allowJoined , allUpperCaseInsensitive , caseInsensitive , ignoreDiacritics , joinDash , matchStartCaseInsensitive , skipConsecutiveWhitespaces , skipWhitespace and wordStartCaseInsensitive control the matching between the section and the entry keys.
The subject parameter specifies which text of the section should be matched. There are two options:
- the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
- the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.
OBOProjector creates an annotation for each matched entry and adds these annotations to the layer named targetLayer . The created annotations will have features nameFeature , idFeature and pathFeature set to the matched term name, identifier and path.
If specified, then OBOProjector assumes that trieSource contains a compiled version of the dictionary. oboFiles is not read. If specified, OBOProjector writes a compiled version of the dictionary in trieSink . The use of compiled dictionaries may accelerate the processing for large dictionaries.
Snippet
<oboprojector class="OBOProjector">
<oboFiles></oboFiles>
<targetLayer></targetLayer>
</oboprojector>
Mandatory parameters
oboFiles
Path to the source OBO files.
targetLayer
Name of the layer that contains the match annotations.
Optional parameters
altPathFeatures
UNDOCUMENTED
ancestorsFeature
Name of the feature that contains the term ancestors ids.
childrenFeature
Name of the feature that contains the term children ids.
constantAnnotationFeatures
Constant features to add to each annotation created by this module.
idFeature
Feature where to store the matched term identifier.
nameFeature
Feature where to store the matched term name.
parentsFeature
Name of the feature that contains the term parents ids.
pathFeature
Feature where to store the matched term path.
trieSink
If set, then OBOProjector writes the compiled dictionary to the specified file.
trieSource
If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.
versionFeature
Name of the feature where to store the ontology version.
allUpperCaseInsensitive
If set to true , then allow case folding on all characters in words that are all upper case.
allowJoined
If set to true , then allow arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the key amino acid .
caseInsensitive
If set to true , then allows case folding on all characters.
documentFilter
Only process document that satisfy this expression.
ignoreDiacritics
If set to true , then allow dicacritic removal on all characters. For instance the contents acide amine matches the key acide aminé .
joinDash
If set to true , then treat dash characters (-) as whitespace characters with regard to allowJoined . For instance, the contents aminoacid matches the entry amino-acid .
keepDBXref
Add all database cross-references of the term. OBOProjector creates a feature key-value pair for each dbxref in the matching term.
matchStartCaseInsensitive
If set to true , then allow case folding on the first character of the entry key.
multipleEntryBehaviour
Specifies the behavior if the lexicon contains several entries with the same key.
sectionFilter
Process only sections that satisfy this expression.
skipConsecutiveWhitespaces
If set to true , then allow the insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid .
skipWhitespace
If set to true , then allow arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the key aminoacid .
subject
Specifies the contents to match.
substituteWhitespace
If set to true , then all whitespace characters match each other (including ‘\n’, ‘\r’, ‘\t’, and non-breaking spaces).
wordStartCaseInsensitive
If set to true , then allow case folding on the first character of each word.
Deprecated parameters
targetLayerName
Deprecated alias for targetLayer .