Presentation Title

Minimizing Data-Transformational Information Loss in Comparative Effectiveness Research Infrastructure

Domain

Clinical Informatics

Type

Infrastructure/Architecture Overview

Theme

effectiveness

Start Date

7-6-2014 1:15 PM

End Date

7-6-2014 2:45 PM

Structured Abstract

Introduction

Creating multi-site comparative effectiveness research (CER) networks often requires transformation of each data partner’s local terminologies and data models to central terminologies and common data models (CDMs). We present our methods for minimizing the semantic and syntactic information loss (or gain) arising due to these transformations.

Methods

The PHIS+ project has created a clinical database for the performance of CER. We based our work on the OpenFurther platform, which aggregates/federates heterogeneous data from multiple data sources and provides syntactic and semantic data interoperability for clinical research. The six PHIS+ hospitals use different electronic clinical data source systems and implement different local coding schemata. We harmonized the hospital data models to CDMs we developed for each data stream (laboratory, microbiology & radiology) and mapped their local terminologies to standard terminologies (ST). Working groups consisting of individuals with diverse expertise, considered national recommendations, availability of local metadata, and the needs of CER to develop the CDMs and select ST to map the local coding schemata. We then worked with each hospital’s IT team to harmonize their local data models with the CDMs. The working group reviewed and compared the harmonizations across the six hospitals and made any necessary changes.

We developed metadata specifications that define the required detail for mapping local codes to STs. These specifications were incorporated into a metadata collection tool that the hospitals used to provide the necessary metadata in required formats. We mapped local codes to ST concepts based on each code’s available metadata as opposed to pre-defined (highest common granularity) ST concepts. Initial mappings were reviewed by other terminologists in the informatics team, and were followed up with a review by the hospitals. We then loaded the mappings and the provided local metadata, into a terminology server. This stored knowledge of how local systems define, use, and store their data is leveraged by OpenFurther to process the local terminology and data model and to populate a database using the CDMs. The mappings were reviewed once more as a part of the data quality validations efforts for each CER study.

For each the CER study within the PHIS+ project, ST concepts are used to define the study cohort (inclusion/exclusion criteria), exposure variables, and study outcomes. ST concepts were selected based on the semantic granularity as constrained by the study cohort requirements, and alternatives were discussed with the study leads. Less granular ST concept selections included hierarchically or axially (e.g. LOINC) subsumed concepts.

Innovation, Discussion & Conclusion

This approach of utilizing the data source’s available metadata, reviewing mappings multiple times, and selecting subsuming concepts minimizes information loss through data provenance. In addition, using local metadata allows for monitoring and versioning changes to the local coding schemata and associated mappings over time. It has also led to the development of a rich database for current and future pediatric CER that require different levels of semantic granularity.

Next Steps

Next steps will include development of methods to quantify these information losses that arise due to these data transformations.

Acknowledgements

This project was funded under grant number R01 HS019862 from the AHRQ, U.S. Department of Health and Human Services (DHHS). The opinions expressed [in this document] are those of the authors and do not reflect the official position of AHRQ or the DHHS. FURTHeR development was supported by the NCRR and the NCATS, NIH, through Grant UL1RR025764 and supplement 3UL1RR025764-02S2. We acknowledge the terminology support provided by Apelon. We would also like to thank the PI(s) of the PHIS+ project, the PHIS+ teams at each hospital and Children's Hospital Association and the OpenFurther team.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS
 
Jun 7th, 1:15 PM Jun 7th, 2:45 PM

Minimizing Data-Transformational Information Loss in Comparative Effectiveness Research Infrastructure

Introduction

Creating multi-site comparative effectiveness research (CER) networks often requires transformation of each data partner’s local terminologies and data models to central terminologies and common data models (CDMs). We present our methods for minimizing the semantic and syntactic information loss (or gain) arising due to these transformations.

Methods

The PHIS+ project has created a clinical database for the performance of CER. We based our work on the OpenFurther platform, which aggregates/federates heterogeneous data from multiple data sources and provides syntactic and semantic data interoperability for clinical research. The six PHIS+ hospitals use different electronic clinical data source systems and implement different local coding schemata. We harmonized the hospital data models to CDMs we developed for each data stream (laboratory, microbiology & radiology) and mapped their local terminologies to standard terminologies (ST). Working groups consisting of individuals with diverse expertise, considered national recommendations, availability of local metadata, and the needs of CER to develop the CDMs and select ST to map the local coding schemata. We then worked with each hospital’s IT team to harmonize their local data models with the CDMs. The working group reviewed and compared the harmonizations across the six hospitals and made any necessary changes.

We developed metadata specifications that define the required detail for mapping local codes to STs. These specifications were incorporated into a metadata collection tool that the hospitals used to provide the necessary metadata in required formats. We mapped local codes to ST concepts based on each code’s available metadata as opposed to pre-defined (highest common granularity) ST concepts. Initial mappings were reviewed by other terminologists in the informatics team, and were followed up with a review by the hospitals. We then loaded the mappings and the provided local metadata, into a terminology server. This stored knowledge of how local systems define, use, and store their data is leveraged by OpenFurther to process the local terminology and data model and to populate a database using the CDMs. The mappings were reviewed once more as a part of the data quality validations efforts for each CER study.

For each the CER study within the PHIS+ project, ST concepts are used to define the study cohort (inclusion/exclusion criteria), exposure variables, and study outcomes. ST concepts were selected based on the semantic granularity as constrained by the study cohort requirements, and alternatives were discussed with the study leads. Less granular ST concept selections included hierarchically or axially (e.g. LOINC) subsumed concepts.

Innovation, Discussion & Conclusion

This approach of utilizing the data source’s available metadata, reviewing mappings multiple times, and selecting subsuming concepts minimizes information loss through data provenance. In addition, using local metadata allows for monitoring and versioning changes to the local coding schemata and associated mappings over time. It has also led to the development of a rich database for current and future pediatric CER that require different levels of semantic granularity.

Next Steps

Next steps will include development of methods to quantify these information losses that arise due to these data transformations.