CTS2-LE Supported Terminologies: Unterschied zwischen den Versionen
Billig (Diskussion | Beiträge) |
Billig (Diskussion | Beiträge) |
||
| Zeile 48: | Zeile 48: | ||
GET <HOST>/WebCts2LE/service/crud/bulk/load-std-terminology?directory=terminologies/loinc&loadSpec=load-spec-2.71.json | GET <HOST>/WebCts2LE/service/crud/bulk/load-std-terminology?directory=terminologies/loinc&loadSpec=load-spec-2.71.json | ||
</pre> | </pre> | ||
| − | < | + | <h1 id="supported-terminologies">Supported Terminologies</h1> |
<pre> | <pre> | ||
{ | { | ||
| Zeile 126: | Zeile 126: | ||
} | } | ||
</pre> | </pre> | ||
| − | < | + | <h1 id="notes">Notes</h1> |
<h2 id="removal">Removal</h2> | <h2 id="removal">Removal</h2> | ||
<p>Removal time is nearly equal load time. It is recommended that the terminology is not contained in the store.</p> | <p>Removal time is nearly equal load time. It is recommended that the terminology is not contained in the store.</p> | ||
Version vom 18. Januar 2022, 20:30 Uhr
Inhaltsverzeichnis
Loading Standard Terminologies
Due to license policies of standard terminology providers we do not make available provider input files. Customers have to download these files from provider sites.
To load these standard terminologies the customer has to copy the input files to a dedicated directory (called LD in the following) together with a specification json file (SF). In context of docker, kubernetes etc. a dedicated volume should be used.
Currently, the following constraints have to be fulfilled:
- the cts2le instance
- should not contain the terminology to load
- services cannot be accessed during load
Specification File (SF)
{
"terminologyDesignator": "<regex: 'icd-alpha|ops|hl7Fhir|mesh|ucum|loinc|snomed'>",
// usual version string
"version": "<string>",
// group name (used in the navigator)
"groupName": "<string>",
// unique resource id in context of a cts2le instance
"resourceId": "<string>",
// input file paths relative to the given directory <LD>
"files": [
"<string>"
// , ...
]
}
!!! In case of designator hl7Fhir the version, groupName and resourceId property is defined by the standard itself and defining these properties has no effect.
REST interface
GET <HOST>/WebCts2LE/service/crud/bulk/load-std-terminology
Query Parameters
directory: direcory path (LD)loadSpec: path to specification file SF (relative to LD)
Note
Afterwards an update of the suggester (used by the navigator) has to be performed:
GET <HOST>/WebCts2LE/service/manage/index/suggester/update
Example
GET <HOST>/WebCts2LE/service/crud/bulk/load-std-terminology?directory=terminologies/loinc&loadSpec=load-spec-2.71.json
Supported Terminologies
{
"terminologyDesignator": "icd-alpha",
"version": "2021",
"groupName": "ICD",
"resourceId": "Icd-2021",
"files": [
// file 1, 2 must be the icd xml and the alphaid txt, respectively
"icd10gm2021syst_claml_20200918_20201111.xml",
"icd10gm2021_alphaid_edvtxt_20201002.txt"
]
}
{
"terminologyDesignator": "ops",
"version": "2021",
"groupName": "OPS",
"resourceId": "Ops-2021",
"files": [
"ops2021syst_claml_20201016.xml"
]
}
{
"terminologyDesignator": "ucum",
"version": "2017",
"groupName": "UCUM",
"resourceId": "Ucum-2017",
"files": [
"ucum.tsv"
]
}
{
"terminologyDesignator": "loinc",
"version": "2.71",
"groupName": "LOINC",
"resourceId": "Loinc-2.71",
"files": [
// file 1, 2 must be the hierarchy csv and the table csv, respectively
// usually file 1 is at 'AccessoryFiles/MultiAxialHierarchy/' and file 2 at
// 'LoincTable/'
// in the providers zip file, e.g. 'loinc271.zip'
"MultiAxialHierarchy.csv",
"Loinc.csv"
]
}
{
"terminologyDesignator": "hl7Fhir",
// version, groupName, and resource id is automatically generated !
"files": [
"valuesets.xml",
"v3-codesystems.xml"
]
}
{
"terminologyDesignator": "snomed",
"version": "2021-07",
"groupName": "SNOMED",
"resourceId": "Snomed-2021-07",
// following files usually are located at 'Snapshot/Terminology' within the snomed zip file
"files": [
"sct2_Description_Snapshot-en_INT_20210731.txt",
"sct2_Concept_Snapshot_INT_20210731.txt",
"sct2_Relationship_Snapshot_INT_20210731.txt",
"sct2_StatedRelationship_Snapshot_INT_20210731.txt",
"sct2_TextDefinition_Snapshot-en_INT_20210731.txt"
]
}
Notes
Removal
Removal time is nearly equal load time. It is recommended that the terminology is not contained in the store.
Loading Time
Due to used RDF quad store technology the loading time (i.e. 'weaving' the RDF-triple knowledge graph based on the flat files) on a notebook (~2 GHz throttled, 16 GB RAM) is:
- Smomed: ~90 min
- Loinc: ~45 min
- icd/alphaid: ~20 min
- MeSH: ~15 min
- FHIR: ~25 min (~1500 single CS/VS files)
- ops: ~8 min
- ucum: <1 min
It is recommended in the context of docker, kubernetes, etc. to define high CPU/RAM resources for the fuseki container to decrease loading time.
It is a known restriction for quad stores that loading time is high compared to other nosql data stores. On the other hand it offers elaborated functionality based on semantic web technologies. Future versions of CTS2-LE could utilize nosql data stores.
Database Space
Quad stores usually occupies huge space on disk because every quad is indexed. E.g. Snomed has ~18.000.000 quads and the quad store fuseki occupies ~50 GB directly after loading. But fuseki offers a compaction of the database to ~8 GB with the call
java -cp fuseki-server.jar tdb2.tdbcompact [--help] --loc=<DB>
where DB is the disk location of the database, in docker or kubernetes context the path /etc/fuseki/apache-jena-fuseki-3.8.0/run/databases/cts2le (for details see <a href="https://jena.apache.org/documentation/tdb2/tdb2_cmds.html">https://jena.apache.org/documentation/tdb2/tdb2_cmds.html</a>, keyword tdb2.tdbcompact). !!! Note that the fuseki container must not be running.
Future version of CTS2-LE will provide compaction via a REST call.