You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, CodelistGenerator provides codesFromConceptSet() and codesFromCohort() functions that extract codelists from JSON files containing concept set expressions or cohort definitions.
We propose to expand the management of the concept_sets to the database. In some OMOP CDM setups (such as those managed by IOMED), concept_sets are stored directly in database tables (concept_set, concept_set_item, etc.) within the same database instance as the analysis data.
This feature request proposes adding a new function, getCodelistFromConceptSet(), that queries these database tables directly to build formal codelist objects, similar to how other functions in the package query vocabulary tables directly (e.g., getDrugIngredientCodes(), getICD10StandardCodes()).
Rationale
• Cleaner workflow: Eliminates the need to export/import JSON files when concept sets are already stored natively in the database.
• Consistency: Aligns with the package's philosophy of direct database queries for vocabulary-based codelists.
• Tested workflow: At IOMED, we maintain concept sets in dedicated database tables within the OMOP instance, allowing for streamlined querying without intermediate file handling.
• Efficiency: Reduces overhead of JSON parsing and file I/O when database access is already available.
Proposed Database Schema
The function would work with the OMOP CDM tables and a small extension:
erDiagram
concept_set ||--o{ concept_set_item : "has items"
concept ||--o{ concept_set_item : "is included in"
concept_set {
int concept_set_id PK
text concept_set_name
}
concept {
int concept_id PK
varchar concept_name
varchar domain_id
varchar vocabulary_id
varchar concept_class_id
varchar standard_concept
varchar concept_code
date valid_start_date
date valid_end_date
varchar invalid_reason
}
concept_set_item {
int concept_set_id PK,FK
int concept_id PK,FK
}
concept_class ||--o{ concept : "classifies"
domain ||--o{ concept : "belongs to"
vocabulary ||--o{ concept : "from"
getCodelistFromConceptSet<-function(conceptSetId, con, cdmSchema) {
# Point to the required tables in the databaseconcept_set_tbl<-dplyr::tbl(con, dbplyr::in_schema(cdmSchema, "concept_set"))
concept_set_item_tbl<-dplyr::tbl(con, dbplyr::in_schema(cdmSchema, "concept_set_item"))
# Retrieve the name of the concept set to use as the codelist namecodelistName<-concept_set_tbl|>dplyr::filter(.data$concept_set_id==conceptSetId) |>dplyr::pull("concept_set_name") |>
unique()
# Error handling: check if the concept set ID was foundif (length(codelistName) ==0) {
stop(glue::glue("No concept set found for concept_set_id: {conceptSetId}"))
}
# Warning if multiple names exist for the same IDif (length(codelistName) >1) {
warning(glue::glue("Multiple names found for concept_set_id: {conceptSetId}. Using the first one: '{codelistName[1]}'"))
codelistName<-codelistName[1]
}
codelistName<- clean_name(codelistName)
# Retrieve all unique concept IDs associated with the concept set IDconcept_ids<-concept_set_item_tbl|>dplyr::filter(.data$concept_set_id==conceptSetId) |>dplyr::pull("concept_id") |>
unique()
# Create a named list structure required by newCodelistcodelist<-list(concept_ids) |>magrittr::set_names(codelistName)
# Return the formal, validated codelist objectreturn(omopgenerics::newCodelist(codelist))
}
Implementation Details
The function would:
Query concept_set table: Retrieve the concept_set_name for the given conceptSetId to use as the codelist name.
Query concept_set_item table: Get all associated concept_ids for the concept set.
Name cleaning: Apply name standardization (e.g., via a clean_name() helper function).
Codelist creation: Build a named list and return an omopgenerics::newCodelist object.
Error handling: Validate that the concept set exists and handle edge cases like multiple names.
Dependencies
• Requires omopgenerics package for newCodelist()
• Uses dplyr for database operations
• Assumes clean_name() helper function (could be added or use existing package utilities)
Related Functions
• codesFromConceptSet(): Current JSON-based approach
• getDrugIngredientCodes(): Similar direct database querying pattern
• getICD10StandardCodes(): Another vocabulary table query function
Testing Considerations
• Unit tests with mock database containing concept_set tables
• Integration tests with real OMOP CDM databases
• Edge case testing (missing concept sets, empty results, etc.)
Currently, CodelistGenerator provides
codesFromConceptSet()andcodesFromCohort()functions that extract codelists from JSON files containing concept set expressions or cohort definitions.We propose to expand the management of the
concept_setsto the database. In some OMOP CDM setups (such as those managed by IOMED),concept_setsare stored directly in database tables (concept_set, concept_set_item, etc.) within the same database instance as the analysis data.This feature request proposes adding a new function,
getCodelistFromConceptSet(), that queries these database tables directly to build formal codelist objects, similar to how other functions in the package query vocabulary tables directly (e.g., getDrugIngredientCodes(), getICD10StandardCodes()).Rationale
• Cleaner workflow: Eliminates the need to export/import JSON files when concept sets are already stored natively in the database.
• Consistency: Aligns with the package's philosophy of direct database queries for vocabulary-based codelists.
• Tested workflow: At IOMED, we maintain concept sets in dedicated database tables within the OMOP instance, allowing for streamlined querying without intermediate file handling.
• Efficiency: Reduces overhead of JSON parsing and file I/O when database access is already available.
Proposed Database Schema
The function would work with the OMOP CDM tables and a small extension:
erDiagram concept_set ||--o{ concept_set_item : "has items" concept ||--o{ concept_set_item : "is included in" concept_set { int concept_set_id PK text concept_set_name } concept { int concept_id PK varchar concept_name varchar domain_id varchar vocabulary_id varchar concept_class_id varchar standard_concept varchar concept_code date valid_start_date date valid_end_date varchar invalid_reason } concept_set_item { int concept_set_id PK,FK int concept_id PK,FK } concept_class ||--o{ concept : "classifies" domain ||--o{ concept : "belongs to" vocabulary ||--o{ concept : "from"Proposed Function Signature and Implementation
See OmopHelpers for the full implementation.
Implementation Details
The function would:
Dependencies
• Requires omopgenerics package for newCodelist()
• Uses dplyr for database operations
• Assumes clean_name() helper function (could be added or use existing package utilities)
Related Functions
• codesFromConceptSet(): Current JSON-based approach
• getDrugIngredientCodes(): Similar direct database querying pattern
• getICD10StandardCodes(): Another vocabulary table query function
Testing Considerations
• Unit tests with mock database containing concept_set tables
• Integration tests with real OMOP CDM databases
• Edge case testing (missing concept sets, empty results, etc.)