AlvisNLP

corpus processing engine

WebOfKnowledgeReader

Synopsis

Reads Web of Knowledge search result import files.

Description

WARNING: WoK delivers files with a wrong Byte Order Mark , it is advised you remove it using a text editor before feeding it to WebOfKnowledgeReader .

The PT field (Publication Type) is used as a document marker, WebOfKnowledgeReader will create a document each time it reads a PT field.

The VR field will be read and, if its value is different from “1.0”, then WebOfKnowledgeReader fails.

The following fields will be read and stored as document features, one feature per line: AU, AF, BA, BF, CA, GP, BE, SO, SE, BS, LA, CT, CY, CL, SP, HO, C1, RP, EM, RI, OI, FU, CR, TC, Z9, PU, PI, PA, SN, BN, J9, JI, PD, PY, VL, IS, PN, SU, MA, BP, EP, AR, DI, D2, PG, P2, GA, UT, SI, NR.

The following fields will be read and stored as document features, several features per line split with semicolons: DE, DT, ID, WC, SC.

The following fields will be read and stored as sections, all lines concatenated for the contents: TI, AB, FX.

The following fields will be ignored: ER, EF, FN.

The feature and section names are the 2-character field code. For an interpretation of field codes, see WoK format documentation .

Snippet

<webofknowledgereader class="WebOfKnowledgeReader">
    <source></source>
</webofknowledgereader>

Mandatory parameters

source

Mandatory

Location of the WoK file(s).

Optional parameters

constantDocumentFeatures

Optional
Type: Mapping

Constant features to add to each document created by this module.

constantSectionFeatures

Optional
Type: Mapping

Constant features to add to each section created by this module.

tabularFormat

Default value: `false`
Type: Boolean

Read files in tabular export format.

Deprecated parameters