AlvisNLP

corpus processing engine

RDFProjector

Synopsis

Projects OBO terms and synonyms on sections.

Description

RDFProjector reads source SKOS terminologies or OWL ontologies and searches for class and concept labels in sections.

The parameters allowJoined , allUpperCaseInsensitive , caseInsensitive , ignoreDiacritics , joinDash , matchStartCaseInsensitive , skipConsecutiveWhitespaces , skipWhitespace and wordStartCaseInsensitive control the matching between the section and the entry keys.

The subject parameter specifies which text of the section should be matched. There are two options:

RDFProjector creates an annotation for each matched entry and adds these annotations to the layer named targetLayer . The created annotations will have the feature uriFeature containing the URI of the matched class or concept. RDFProjector may also map property object values into features specified by labelFeatures .

Snippet

<rdfprojector class="RDFProjector">
    <source></source>
    <targetLayer></targetLayer>
    <uriFeature></uriFeature>
</rdfprojector>

Mandatory parameters

source

Mandatory

Path to the source SKOS/OWL files.

targetLayer

Mandatory
Type: String

Name of the layer that contains the match annotations.

uriFeature

Mandatory
Type: String

Feature where to store the entry URI.

Optional parameters

constantAnnotationFeatures

Optional
Type: Mapping

Constant features to add to each annotation created by this module.

language

Optional
Type: String

Specify the language of labels to project. If this parameter is not set then labels of any language are projected. Labels without a language qualifier are always projected regardless of the value of this parameter.

matchedLabelFeature

Optional
Type: String

Feature where to store the matched label.

matchedLanguageFeature

Optional
Type: String

UNDOCUMENTED

matchedPropertyFeature

Optional
Type: String

Feature where to store the URI of the property of the matched label.

trieSink

Optional
Type: OutputFile

If set, then RDFProjector writes the compiled dictionary to the specified file.

trieSource

Optional
Type: InputFile

If set, read the compiled dictionary from the specified file. Compiled dictionaries are usually faster for large dictionaries.

allUpperCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on all characters in words that are all upper case.

allowJoined

Default value: `false`
Type: Boolean

If set to true , then allow arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the key amino acid .

caseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allows case folding on all characters.

documentFilter

Default value: `true`
Type: Expression

Only process document that satisfy this expression.

ignoreDiacritics

Default value: `false`
Type: Boolean

If set to true , then allow dicacritic removal on all characters. For instance the contents acide amine matches the key acide aminé .

joinDash

Default value: `false`
Type: Boolean

If set to true , then treat dash characters (-) as whitespace characters with regard to allowJoined . For instance, the contents aminoacid matches the entry amino-acid .

labelFeatures

Default value: `{rdfs-label=rdfs:label, skos-prefLabel=skos:prefLabel}`
Type: Mapping

Mapping from feature names to property URIs. This parameter indicates the properties of the entry to record in features.

labelURIs

Default value: `rdfs:label,skos:prefLabel,skos:altLabel,skos:hiddenLabel,skos:notation,oboInOwl:hasBroadSynonym,oboInOwl:hasExactSynonym,oboInOwl:hasRelatedSynonym,oboInOwl:hasSynonym`
Type: String[]

RDF properties whose object values that represent entry keys.

matchStartCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on the first character of the entry key.

multipleEntryBehaviour

Default value: `all`

Specifies the behavior if the lexicon contains several entries with the same key.

prefixes

Default value: `{}`
Type: Mapping

Specify URI prefixes to be used in resourceTypeURIs , labelURIs , and labelFeatures

rdfFormat

Default value: `Lang:RDF/XML`
Type: Lang

Specify the RDF serialization format (xml, rdfxml, xmlrdf, turtle, ttl, n3, ntriples, ntriple, nt, jsonld, rdfjson, jsonrdf, json, trig, nquads, nq, nthrift, csv, tsv, trix).

resourceTypeURIs

Default value: `owl:Class,skos:Concept`
Type: String[]

Type of RDF resources that represent an entry.

sectionFilter

Default value: `true`
Type: Expression

Process only sections that satisfy this expression.

skipConsecutiveWhitespaces

Default value: `false`
Type: Boolean

If set to true , then allow the insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid .

skipWhitespace

Default value: `false`
Type: Boolean

If set to true , then allow arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the key aminoacid .

subject

Default value: `WORD`
Type: Subject

Specifies the contents to match.

substituteWhitespace

Default value: `false`
Type: Boolean

If set to true , then all whitespace characters match each other (including ‘\n’, ‘\r’, ‘\t’, and non-breaking spaces).

wordStartCaseInsensitive

Default value: `false`
Type: Boolean

If set to true , then allow case folding on the first character of each word.

Deprecated parameters

targetLayerName

Deprecated
Type: String

Deprecated alias for targetLayer .

uriFeatureName

Deprecated
Type: String

Deprecated alias for uriFeature .