OBOProjector

Synopsis

Projects OBO terms and synonyms on sections.

Description

OBOProjector reads oboFiles in OBO format and searches for term names and synonyms in sections.

The parameters allowJoined , allUpperCaseInsensitive , caseInsensitive , ignoreDiacritics , joinDash , matchStartCaseInsensitive , skipConsecutiveWhitespaces , skipWhitespace and wordStartCaseInsensitive control the matching between the section and the entry keys.

The subject parameter specifies which text of the section should be matched. There are two options:

the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.

OBOProjector creates an annotation for each matched entry and adds these annotations to the layer named targetLayer . The created annotations will have features nameFeature , idFeature and pathFeature set to the matched term name, identifier and path.

If specified, then OBOProjector assumes that trieSource contains a compiled version of the dictionary. oboFiles is not read. If specified, OBOProjector writes a compiled version of the dictionary in trieSink . The use of compiled dictionaries may accelerate the processing for large dictionaries.

Snippet

<oboprojector class="OBOProjector">
    <oboFiles></oboFiles>
    <targetLayer></targetLayer>
</oboprojector>

Mandatory parameters

oboFiles

Mandatory

Type: InputFile[]

Path to the source OBO files.

targetLayer

Mandatory

Type: String

Name of the layer that contains the match annotations.

Optional parameters

altPathFeatures

Optional

Type: MultiMapping

UNDOCUMENTED

ancestorsFeature

Optional

Type: String

Name of the feature that contains the term ancestors ids.

childrenFeature

Optional

Type: String

Name of the feature that contains the term children ids.

constantAnnotationFeatures

Optional

Type: Mapping

Constant features to add to each annotation created by this module.

idFeature

Optional

Type: String

Feature where to store the matched term identifier.

nameFeature

Optional

Type: String

Feature where to store the matched term name.

parentsFeature

Optional

Type: String

Name of the feature that contains the term parents ids.

pathFeature

Optional

Type: String

Feature where to store the matched term path.

trieSink

Optional

Type: OutputFile

If set, then OBOProjector writes the compiled dictionary to the specified file.

trieSource

Optional

Type: InputFile

If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.

versionFeature

Optional

Type: String

Name of the feature where to store the ontology version.

allUpperCaseInsensitive

Default value: `false`

Type: Boolean

If set to true , then allow case folding on all characters in words that are all upper case.

allowJoined

Default value: `false`

Type: Boolean

If set to true , then allow arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the key amino acid .

caseInsensitive

Default value: `false`

Type: Boolean

If set to true , then allows case folding on all characters.

documentFilter

Default value: `true`

Type: Expression

Only process document that satisfy this expression.

ignoreDiacritics

Default value: `false`

Type: Boolean

If set to true , then allow dicacritic removal on all characters. For instance the contents acide amine matches the key acide aminé .

joinDash

Default value: `false`

Type: Boolean

If set to true , then treat dash characters (-) as whitespace characters with regard to allowJoined . For instance, the contents aminoacid matches the entry amino-acid .

keepDBXref

Default value: `false`

Type: Boolean

Add all database cross-references of the term. OBOProjector creates a feature key-value pair for each dbxref in the matching term.

matchStartCaseInsensitive

Default value: `false`

Type: Boolean

If set to true , then allow case folding on the first character of the entry key.

multipleEntryBehaviour

Default value: `all`

Type: MultipleEntryBehaviour

Specifies the behavior if the lexicon contains several entries with the same key.

sectionFilter

Default value: `true`

Type: Expression

Process only sections that satisfy this expression.

skipConsecutiveWhitespaces

Default value: `false`

Type: Boolean

If set to true , then allow the insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid .

skipWhitespace

Default value: `false`

Type: Boolean

If set to true , then allow arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the key aminoacid .

subject

Default value: `WORD`

Type: Subject

Specifies the contents to match.

substituteWhitespace

Default value: `false`

Type: Boolean

If set to true , then all whitespace characters match each other (including ‘\n’, ‘\r’, ‘\t’, and non-breaking spaces).

wordStartCaseInsensitive

Default value: `false`

Type: Boolean

If set to true , then allow case folding on the first character of each word.

Deprecated parameters

targetLayerName

Deprecated

Type: String

Deprecated alias for targetLayer .