CTS2-LE Supported Terminologies: Unterschied zwischen den Versionen

Aus CTS2-LE
Zur Navigation springen Zur Suche springen
(Die Seite wurde neu angelegt: „CTS2-LE provides standard interfaces for importing terminologies and value sets. While some terminologies are distributed in such standard formats (e.g. ICD-10…“)
 
K
 
(27 dazwischenliegende Versionen von 3 Benutzern werden nicht angezeigt)
Zeile 1: Zeile 1:
CTS2-LE provides standard interfaces for importing terminologies and value sets. While some terminologies are distributed in such standard formats (e.g. ICD-10 is provided by WHO in ClaML format), many widely used terminologies are only available in a proprietary format and therefore need specific loaders that map this proprietary format onto the CTS2 data model. This section lists all terminologies which are recently supported by CTS2-LE together with information on how to obtain and import each terminology.  
+
            <h2 id="preface">Preface</h2>
 
+
<p>Due to license policies of standard terminology providers we do not make available provider input files. Customers have to download these files from provider sites.</p>
== Licensed Terminologies ==
+
<p>To load a standard terminology the customer has to copy the input files to a dedicated directory (called <strong>LD</strong> in the following) together with a specification json file (<strong>SF</strong>). In context of docker, kubernetes etc. a dedicated volume should be used. To avoid inconsistencies, services should not be accessed during load.</p>
{| class="wikitable"
+
<h2 id="terminology-directory-ld">Terminology Directory (<strong>LD</strong>)</h2>
|-
+
<p>A dedicated directory, called <strong>LD</strong> in the following, contains all files required for a specific standard terminology or terminology packet. There should be a separate directory for each, currently Snomed, Loinc, Mesh, and BfArM.</p>
! Terminology !! Version, Language !! System Intern Identifier !! Obtained from, file format
+
<p>The deployment for Kubernetes provides an extra volume with the path</p>
|-
+
<ul>
| International Classification of Diseases (ICD) || 2010, Englisch || ICD10en-v10 || [http://apps.who.int/classifications/apps/icd/ClassificationDownloadNR/login.aspx?ReturnUrl=%2fclassifications%2fapps%2ficd%2fClassificationDownload%2fdefault.aspx World Health Organization], ClaML
+
<li><code>/etc/webcts2le/inst/terminologies</code></li>
|-
+
</ul>
| Internationale statistische Klassifikation der Krankheiten und verwandter Gesundheitsprobleme || 2013, Deutsch || ICD10de-v13 || [http://www.dimdi.de/dynamic/de/klassi/downloadcenter/icd-10-gm/vorgaenger/version2013/ Dimdi downloads], ClaML
+
<p>Subdirectories prefixed with this path should be used for <strong>LD</strong>, which is specified when starting the load (see [[#rest-interface|REST interface for loading]]).</p>
|-
+
<h2 id="standard-terminologies">Standard Terminologies</h2>
| Alphabetisches Verzeichnis zur ICD-10-GM || 2013, Deutsch || alphaID || [http://www.dimdi.de/dynamic/de/klassi/downloadcenter/alpha-id/vorgaenger/alphaid2013.zip Dimdi downloads], TXT (CSV)  
+
<h3 id="specification-file-sf">Specification File (<strong>SF</strong>)</h3>
|-
+
<syntaxhighlight lang="json" line>
| Operationen- und Prozedurenschlüssel || 2013, Deutsch || ops13 || [http://www.dimdi.de/dynamic/de/klassi/downloadcenter/ops/vorgaenger/version2013/ Dimdi downloads], ClaML
+
{
|-
+
    "terminologyDesignator": "<regex: 'mesh|loinc|snomed'>",
| Medical Subject Headings || 2014, Englisch || mesh14 || [http://www.nlm.nih.gov/mesh/filelist.html NIH downloads] (U.S. National Library of Medicine), XML
+
    // usual version string
|-
+
    "version": "<string>",
| Medical Subject Headings || 2014, Deutsch || mesh14-de || [http://www.dimdi.de/static/de/klassi/mesh_umls/mesh/bestell.htm Dimdi downloads], XML
+
    // group name (used in the navigator)
|-
+
    "groupName": "<string>",
| HL7-v3- and FHIR- value sets || 2013, Englisch || hl7-fhir  || [http://www.hl7.org/implement/standards/fhir/downloads.html FHIR downloads] (Health Level Seven - FHIR V0.12), XML
+
    // unique resource id in context of a cts2le instance
|-
+
    "resourceId": "<string>",
| Anatomisch Therapeutisch Chemische Klassifikation || 2015, Deutsch || ATC_DDD_v15 || [http://wido.de/amtl_atc-code.html WiDO downloads] (Wissenschaftliches Institut der AOK), XML
+
    // input file paths relative to the given directory <LD>
|-
+
    "files": [
| Unified Code for Units of Measure || 2013, Englisch || ucum || [http://unitsofmeasure.org/trac/ UCUM downloads], TSV
+
        "<string>"
|-
+
        // , ...
| Logical Observation Identifiers Names and Codes || 2.44, Englisch || loinc || [https://loinc.org/downloads/accessory-files LOINC downloads] (Regenstrief Institute, Inc.), XML
+
    ]
|-
+
}
| SNOMED CD || 2015, Englisch || snomed || [http://www.ihtsdo.org/ IHTSDO (International Health Terminology Standards Development Organisation)]
+
</syntaxhighlight>
|-
+
<h3 id="supported-standard-terminologies">Supported Standard Terminologies</h3>
| PathLex || 2015, Englisch || pathlex || [http://bioportal.bioontology.org/ontologies/PATHLEX Anatomic Pathology Lexicon], OWL/RDF
+
<syntaxhighlight lang="json" line>
|}
+
{
 +
  "terminologyDesignator": "snomed",
 +
  "version": "20250515",
 +
  "groupName": "snomed-ct",
 +
  "resourceId": "Snomed-20250515",
 +
  // following files usually are located at 'Snapshot/Terminology' within the snomed zip file
 +
  // it is required that the order of the files is stated as below, i.e.
 +
  "defaultLanguage": "de",
 +
  "files": [
 +
    "sct2_Description_Snapshot_GermanyEdition_20250515.txt",
 +
    "sct2_Concept_Snapshot_GermanyEdition_20250515.txt",
 +
    "sct2_Relationship_Snapshot_GermanyEdition_20250515.txt",
 +
    "sct2_StatedRelationship_Snapshot_GermanyEdition_20250515.txt",
 +
    "sct2_TextDefinition_Snapshot_GermanyEdition_20250515.txt"
 +
  ]
 +
}
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="json" line>
 +
{
 +
    "terminologyDesignator": "loinc",
 +
    "version": "2.80",
 +
    "groupName": "loinc-tree",
 +
    "resourceId": "Loinc-2.80",
 +
    // defines the corr. display language
 +
    "defaultLanguage" : "de",
 +
    // the following linguistic variants must be exist at directory 'AccessoryFiles/LinguisticVariants',
 +
    "linguisticVariants": [
 +
        {"lang": "de", "file": "Loinc_2.80/AccessoryFiles/LinguisticVariants/deDE15LinguisticVariant.csv"},
 +
        {"lang": "at", "file": "Loinc_2.80/AccessoryFiles/LinguisticVariants/deAT24LinguisticVariant.csv"}
 +
    ],
 +
    "files": [
 +
        // file 1, 2 must be the hierarchy csv and the table csv, respectively
 +
        // usually file 1 is at 'AccessoryFiles/MultiAxialHierarchy/' and file 2 at
 +
        // 'LoincTable/' in the providers zip file, e.g. 'loinc271.zip'
 +
        "Loinc_2.80/AccessoryFiles/ComponentHierarchyBySystem/ComponentHierarchyBySystem.csv",
 +
        "Loinc_2.80/LoincTable/Loinc.csv"
 +
    ]
 +
}
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="json" line>
 +
{
 +
    "terminologyDesignator": "mesh",
 +
    "version": "2025",
 +
    "groupName": "mesh",
 +
    "resourceId": "Mesh2025",
 +
    "files": [
 +
        "desc2025.xml"
 +
    ]
 +
}
 +
</syntaxhighlight>
 +
<h2 id="bfarm-terminologies">BfArM Terminologies</h2>
 +
<p>BfArM (Bundesinstitut für Arzneimittel und Medizinprodukte) provides the standard terminologies for Germany.</p>
 +
<!-- To load these standard terminologies the customer has to create a dedicated directory (called **LD** in the following) together with a specification json file (**SF**). In context of docker, kubernetes etc. a dedicated volume should be used. -->
 +
<h3 id="packages">Packages</h3>
 +
<p>To download these terminologies, the</p>
 +
<ul>
 +
<li><code>https://github.com/gematik/zts-api-client-examples.git</code></li>
 +
</ul>
 +
<p>project must first be cloned. The command</p>
 +
<ul>
 +
<li><code>cd curl ; ./download.sh -c true</code></li>
 +
</ul>
 +
<p>downloads all to the directory <code>packages</code> (call <code>./download.sh --help</code> for details).</p>
 +
<h4 id="structure">Structure</h4>
 +
<p>The directory <code>packages</code> downloaded from bfarm must be located in a dedicated directory (e.g. <code>bfarm</code> as <strong>LD</strong>). The following structure is an example for two packages (ICDGM, OPS).</p>
 +
<pre>
 +
bfarm
 +
|_ packages
 +
|  |_ bfarm.terminologien.icd10gm-2025.0.0.tar.gz
 +
|  |  |_ package/CodeSystem-icd10gm-agelow-2025.json
 +
| |  |_ package/CodeSystem-icd10gm-agereject-2025.json
 +
|  |  |_ package/package.json
 +
| | |_ ...
 +
| |_ bfarm.terminologien.ops-2025.0.0.tar.gz
 +
|  |  |_ ...
 +
|_ fhir-packs.jsonc
 +
</pre>
 +
<p>The specification file (here <code>fhir-packs.jsonc</code>) must be located within the <code>bfarm</code> directory.</p>
 +
<h3 id="specification-file-sf-1">Specification File (<strong>SF</strong>)</h3>
 +
<p>For BfArM terminologies, the following specification file is used.</p>
 +
<syntaxhighlight lang="json" line>
 +
{
 +
    "terminologyDesignator": "fhir-package",
 +
    "canonicalUrlRegex": "<regex>", // optional
 +
    "packageRegex": "<regex>"
 +
}
 +
</syntaxhighlight>
 +
<ul>
 +
<li>
 +
<p><code>terminologyDesignator</code></p>
 +
<ul>
 +
<li>has to be set to <code>fhir-package</code></li>
 +
</ul>
 +
</li>
 +
<li>
 +
<p><code>canonicalUrlRegex</code></p>
 +
<ul>
 +
<li>this filter loads only terminologies whose <em>canonical URL</em> (<code>https://hl7.org/fhir/R4/datatypes.html#canonical</code>) conforms to <code>&lt;regex&gt;</code>. E.g., regex <code>.*(agerejec|agelow).*</code> will only load the terminologies <code>CodeSystem-icd10gm-agereject-2025.json</code> and <code>CodeSystem-icd10gm-agelow-2025.json</code> because its canonical URLs are
 +
<ul>
 +
<li><code>https: //terminologien.bfarm.de/fhir/CodeSystem/icd10gm-agereject|2025</code> and</li>
 +
<li><code>https: //terminologien.bfarm.de/fhir/CodeSystem/icd10gm-agelow|2025</code>, respectively.</li>
 +
</ul>
 +
</li>
 +
</ul>
 +
</li>
 +
<li>
 +
<p><code>packageRegex</code></p>
 +
<ul>
 +
<li>this filter loads only packages whose <em>canonical package name</em> conforms to <code>&lt;regex&gt;</code>. The <em>canonical package name</em> ist defined as the form <code>&lt;name&gt;|&lt;version&gt;</code> where <code>name</code> and <code>version</code> are the properties in the package definition file
 +
<ul>
 +
<li><code>bfarm/packages/bfarm.terminologien.icd10gm-2025.0.0.tar.gz/package/package.json</code> (see section [[#packages|Packages]] above).</li>
 +
<li>E.g., regex <code>.*(icd10gm\\|2025|ops\\|2025).*</code> will only load the ICD and OPS package.</li>
 +
</ul>
 +
</li>
 +
</ul>
 +
</li>
 +
</ul>
 +
<h3 id="example">Example</h3>
 +
<syntaxhighlight lang="json" line>
 +
{
 +
    "terminologyDesignator": "fhir-package",
 +
    "canonicalUrlRegex": ".*(agerejec|exotic|einmalk).*",
 +
    "packageRegex": ".*(icd10gm\\|2025|ops\\|2025).*"
 +
}
 +
</syntaxhighlight>
 +
<p>In this example only the terminologies for age rejection and the exotic one of the ICD as well as the one-time codes of the OPS package will be loaded.</p>
 +
<h2 id="rest-interface">REST interface</h2>
 +
<ul>
 +
<li>HTTP GET <code>&lt;host-cts2le&gt;/WebCts2LE/service/crud/bulk/load-std-terminology</code></li>
 +
</ul>
 +
<p>where <code>&lt;host-cts2le&gt;</code> is the kubernetes service url for the <code>cts2le</code> container.</p>
 +
<h3 id="query-parameters">Query Parameters</h3>
 +
<ul>
 +
<li><code>directory</code>: direcory path (<strong>LD</strong>)</li>
 +
<li><code>loadSpec</code>: path to specification file <strong>SF</strong> (relative to <strong>LD</strong>)</li>
 +
</ul>
 +
<h3 id="example-1">Example</h3>
 +
<ul>
 +
<li>HTTP GET <code>&lt;host-cts2le&gt;/WebCts2LE/service/crud/bulk/load-std-terminology?directory=/etc/webcts2le/inst/terminologies/loinc&amp;loadSpec=load-spec-2.80.json</code></li>
 +
</ul>
 +
<h3 id="note">Note</h3>
 +
<p>Depending on the kubernetes settings, a timeout can occur. Nevertheless, the loading process is started and can be observed in the container logs. Future versions could introduce a task-concept for such purposes.</p>
 +
<p>Afterwards an update of the suggester (used by the navigator) has to be performed:</p>
 +
<ul>
 +
<li>HTTP GET <code>&lt;host-cts2le&gt;/WebCts2LE/service/manage/index/suggester/update</code></li>
 +
</ul>
 +
<h2 id="metrics">Metrics</h2>
 +
<h3 id="removal">Removal</h3>
 +
<p>Removal time is nearly equal load time due to the RDF store. It is recommended that the terminology is not present in the store beforehand. The following numbers assume an empty store.</p>
 +
<h3 id="loading-time">Loading Time</h3>
 +
<p>Due to used RDF quad store technology, the loading time (i.e. 'weaving' the RDF-triple knowledge graph based on the flat files) on e.g. openshift is</p>
 +
<ul>
 +
<li>Smomed: ~15 min</li>
 +
<li>Loinc: ~9 min</li>
 +
<li>MeSH: ~7 min</li>
 +
<li>BfArM: ~10 min</li>
 +
</ul>
 +
<p>(CPU ~2.7 GHz (cat /proc/cpuinfo); OS UI POD metrics: ~7.6 GB used memory)</p>
 +
<!-- It is recommended in the context of docker, kubernetes, etc. to define high CPU/RAM resources for the backend container (*cts2le*, *solr*, and especially *fuseki*) to decrease loading time. -->
 +
<p>It is a known restriction for quad stores that loading time is high compared to other nosql data stores. On the other hand it offers elaborated functionality based on semantic web technologies. Future versions of CTS2-LE could utilize nosql data stores.</p>
 +
<h3 id="disk-space">Disk Space</h3>
 +
<p>The following disk space (per container) is occupied for the complete terminology set after loading</p>
 +
<ul>
 +
<li><em>cts2le</em>: ~3 GB</li>
 +
<li><em>fuseki</em>: ~27 GB</li>
 +
<li><em>solr</em>: ~1 GB</li>
 +
</ul>
 +
<p>Deleting and adding the same terminology leads to an increase in resource consumption over time and quad stores usually occupies huge space on disk because every quad is indexed. E.g. Snomed has ~18.000.000 quads. <em>fuseki</em> offers a runtime compaction of the database with the call</p>
 +
<pre>
 +
curl --request POST --url 'http://fuseki:3030/$/compact/cts2le?deleteOld=true'
 +
</pre>
 +
<p>For details, please visit <code>https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html#compact</code> or contact the Fhg Fokus team.</p>

Aktuelle Version vom 21. Dezember 2025, 18:44 Uhr

Preface

Due to license policies of standard terminology providers we do not make available provider input files. Customers have to download these files from provider sites.

To load a standard terminology the customer has to copy the input files to a dedicated directory (called LD in the following) together with a specification json file (SF). In context of docker, kubernetes etc. a dedicated volume should be used. To avoid inconsistencies, services should not be accessed during load.

Terminology Directory (LD)

A dedicated directory, called LD in the following, contains all files required for a specific standard terminology or terminology packet. There should be a separate directory for each, currently Snomed, Loinc, Mesh, and BfArM.

The deployment for Kubernetes provides an extra volume with the path

  • /etc/webcts2le/inst/terminologies

Subdirectories prefixed with this path should be used for LD, which is specified when starting the load (see REST interface for loading).

Standard Terminologies

Specification File (SF)

 1 {
 2     "terminologyDesignator": "<regex: 'mesh|loinc|snomed'>",
 3     // usual version string
 4     "version": "<string>",
 5     // group name (used in the navigator)
 6     "groupName": "<string>",
 7     // unique resource id in context of a cts2le instance
 8     "resourceId": "<string>",
 9     // input file paths relative to the given directory <LD>
10     "files": [
11         "<string>"
12         // , ...
13     ]
14 }

Supported Standard Terminologies

 1 {
 2   "terminologyDesignator": "snomed",
 3   "version": "20250515",
 4   "groupName": "snomed-ct",
 5   "resourceId": "Snomed-20250515",
 6   // following files usually are located at 'Snapshot/Terminology' within the snomed zip file
 7   // it is required that the order of the files is stated as below, i.e. 
 8   "defaultLanguage": "de",
 9   "files": [
10     "sct2_Description_Snapshot_GermanyEdition_20250515.txt",
11     "sct2_Concept_Snapshot_GermanyEdition_20250515.txt",
12     "sct2_Relationship_Snapshot_GermanyEdition_20250515.txt",
13     "sct2_StatedRelationship_Snapshot_GermanyEdition_20250515.txt",
14     "sct2_TextDefinition_Snapshot_GermanyEdition_20250515.txt"
15   ]
16 }
 1 {
 2     "terminologyDesignator": "loinc",
 3     "version": "2.80",
 4     "groupName": "loinc-tree",
 5     "resourceId": "Loinc-2.80",
 6     // defines the corr. display language
 7     "defaultLanguage" : "de",
 8     // the following linguistic variants must be exist at directory 'AccessoryFiles/LinguisticVariants',
 9     "linguisticVariants": [
10         {"lang": "de", "file": "Loinc_2.80/AccessoryFiles/LinguisticVariants/deDE15LinguisticVariant.csv"},
11         {"lang": "at", "file": "Loinc_2.80/AccessoryFiles/LinguisticVariants/deAT24LinguisticVariant.csv"}
12     ],
13     "files": [
14         // file 1, 2 must be the hierarchy csv and the table csv, respectively
15         // usually file 1 is at 'AccessoryFiles/MultiAxialHierarchy/' and file 2 at
16         // 'LoincTable/' in the providers zip file, e.g. 'loinc271.zip'
17         "Loinc_2.80/AccessoryFiles/ComponentHierarchyBySystem/ComponentHierarchyBySystem.csv",
18         "Loinc_2.80/LoincTable/Loinc.csv"
19     ]
20 }
1 {
2     "terminologyDesignator": "mesh",
3     "version": "2025",
4     "groupName": "mesh",
5     "resourceId": "Mesh2025",
6     "files": [
7         "desc2025.xml"
8     ]
9 }

BfArM Terminologies

BfArM (Bundesinstitut für Arzneimittel und Medizinprodukte) provides the standard terminologies for Germany.

Packages

To download these terminologies, the

project must first be cloned. The command

  • cd curl ; ./download.sh -c true

downloads all to the directory packages (call ./download.sh --help for details).

Structure

The directory packages downloaded from bfarm must be located in a dedicated directory (e.g. bfarm as LD). The following structure is an example for two packages (ICDGM, OPS).

bfarm
|_ packages
|  |_ bfarm.terminologien.icd10gm-2025.0.0.tar.gz
|  |  |_ package/CodeSystem-icd10gm-agelow-2025.json
|  |  |_ package/CodeSystem-icd10gm-agereject-2025.json
|  |  |_ package/package.json
|  |  |_ ...
|  |_ bfarm.terminologien.ops-2025.0.0.tar.gz
|  |  |_ ...
|_ fhir-packs.jsonc

The specification file (here fhir-packs.jsonc) must be located within the bfarm directory.

Specification File (SF)

For BfArM terminologies, the following specification file is used.

1 {
2     "terminologyDesignator": "fhir-package",
3     "canonicalUrlRegex": "<regex>", // optional
4     "packageRegex": "<regex>"
5 }
  • terminologyDesignator

    • has to be set to fhir-package
  • canonicalUrlRegex

    • this filter loads only terminologies whose canonical URL (https://hl7.org/fhir/R4/datatypes.html#canonical) conforms to <regex>. E.g., regex .*(agerejec|agelow).* will only load the terminologies CodeSystem-icd10gm-agereject-2025.json and CodeSystem-icd10gm-agelow-2025.json because its canonical URLs are
      • https: //terminologien.bfarm.de/fhir/CodeSystem/icd10gm-agereject|2025 and
      • https: //terminologien.bfarm.de/fhir/CodeSystem/icd10gm-agelow|2025, respectively.
  • packageRegex

    • this filter loads only packages whose canonical package name conforms to <regex>. The canonical package name ist defined as the form <name>|<version> where name and version are the properties in the package definition file
      • bfarm/packages/bfarm.terminologien.icd10gm-2025.0.0.tar.gz/package/package.json (see section Packages above).
      • E.g., regex .*(icd10gm\\|2025|ops\\|2025).* will only load the ICD and OPS package.

Example

1 {
2     "terminologyDesignator": "fhir-package",
3     "canonicalUrlRegex": ".*(agerejec|exotic|einmalk).*",
4     "packageRegex": ".*(icd10gm\\|2025|ops\\|2025).*"
5 }

In this example only the terminologies for age rejection and the exotic one of the ICD as well as the one-time codes of the OPS package will be loaded.

REST interface

  • HTTP GET <host-cts2le>/WebCts2LE/service/crud/bulk/load-std-terminology

where <host-cts2le> is the kubernetes service url for the cts2le container.

Query Parameters

  • directory: direcory path (LD)
  • loadSpec: path to specification file SF (relative to LD)

Example

  • HTTP GET <host-cts2le>/WebCts2LE/service/crud/bulk/load-std-terminology?directory=/etc/webcts2le/inst/terminologies/loinc&loadSpec=load-spec-2.80.json

Note

Depending on the kubernetes settings, a timeout can occur. Nevertheless, the loading process is started and can be observed in the container logs. Future versions could introduce a task-concept for such purposes.

Afterwards an update of the suggester (used by the navigator) has to be performed:

  • HTTP GET <host-cts2le>/WebCts2LE/service/manage/index/suggester/update

Metrics

Removal

Removal time is nearly equal load time due to the RDF store. It is recommended that the terminology is not present in the store beforehand. The following numbers assume an empty store.

Loading Time

Due to used RDF quad store technology, the loading time (i.e. 'weaving' the RDF-triple knowledge graph based on the flat files) on e.g. openshift is

  • Smomed: ~15 min
  • Loinc: ~9 min
  • MeSH: ~7 min
  • BfArM: ~10 min

(CPU ~2.7 GHz (cat /proc/cpuinfo); OS UI POD metrics: ~7.6 GB used memory)

It is a known restriction for quad stores that loading time is high compared to other nosql data stores. On the other hand it offers elaborated functionality based on semantic web technologies. Future versions of CTS2-LE could utilize nosql data stores.

Disk Space

The following disk space (per container) is occupied for the complete terminology set after loading

  • cts2le: ~3 GB
  • fuseki: ~27 GB
  • solr: ~1 GB

Deleting and adding the same terminology leads to an increase in resource consumption over time and quad stores usually occupies huge space on disk because every quad is indexed. E.g. Snomed has ~18.000.000 quads. fuseki offers a runtime compaction of the database with the call

curl --request POST --url 'http://fuseki:3030/$/compact/cts2le?deleteOld=true'

For details, please visit https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html#compact or contact the Fhg Fokus team.