Domain

Analytic Methods

Type

Empirical Study

Theme

effectiveness; population

Start Date

7-6-2014 1:15 PM

End Date

7-6-2014 2:45 PM

Structured Abstract

Introduction

Clinical research, especially randomized clinical trials, generates high-quality evidence for medical decision-making. However, the lack of population representativeness has compromised the generalizability of many clinical studies and has been a significant concern of both the general public and the scientific community. The open-access clinical research information, such as ClinicalTrials.gov, and the increasingly available vast amounts of electronic health information together present a new opportunity for us to improve clinical research’s generalizability and study feasibility at early design phases.

This “proof-of-concept” study presents a distribution-based method to aggregate target populations in existing Type 2 diabetes clinical trials and profile the general Type 2 diabetic population in electronic health records using selected clinical variables. A comparison of these two populations answers questions such as “Who have been systematically omitted in Type 2 diabetes trials” and “How closely correlated are Type 2 diabetic patients and Type 2 diabetic clinical trial target populations regarding selected clinical variables.”

Methods

Using Type 2 diabetes as an example, we downloaded 1,761 clinical trials with a study condition of “Type 2 diabetes” from ClinicalTrials.gov. We also obtained the electronic health records (EHR) for 26120 Type 2 diabetes patients who were treated at Columbia University Medical Center. Our text-mining tool EliXR can automatically extract numerical comparison statements and semantic eligibility features from free-text eligibility criteria. We employed EliXR to analyze free-text eligibility criteria and extract each element and its associated numerical comparison statement (e.g., hemoglobin A1c (HbA1c) > 7.5%). We compared the distributions of age, HbA1c, and kidney disease status of the eligibility criteria of the 1761 clinical trials with the characteristics of the 26120 patients.

Findings

A person 44 years old or a person with HbA1c value of 8.2 has the best chance to be included in most Type 2 diabetes trials. In contrast, most diabetics are around 63 or have an HbA1c value around 6.3. In addition, about 38% diabetes trials exclude patients with renal failures or kidney diseases, while 3% diabetes trials include patients with related kidney conditions. These kidney conditions are prevalent in about 22% of the diabetic population.

Discussion

Although we used only one disease and three variables to prove the concept, this method can be applied to any medical condition and can include more variables. By aligning the aggregated trial populations and the general patient population, we can identify the population subgroups that tend to be omitted or overly studied in existing clinical research studies, and hence enable data-driven subject selection and eligibility criteria design.

Conclusions

This study confirms that (a) Type 2 diabetes clinical trials systematically target the young and sick (i.e., with higher HbA1c values) ones among diabetes patients; and (b) Type 2 diabetes clinical trials tend to exclude patients with conditions such as kidney diseases. More importantly, we contribute a distribution-based method for using electronic data to profile population health and identify clinical evidence gaps towards precision clinical research eligibility criteria design. This method can potentially help us improve the generalizability and precision of population-based clinical research.

Acknowledgements

This work was supported by National Library of Medicine grants R01LM009886, R01LM010815, and R01LM006910, and by National Center for Advancing Translational Sciences grant UL1TR000040.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS
 
Jun 7th, 1:15 PM Jun 7th, 2:45 PM

Using Electronic Data To Profile Population Health and Identify Clinical Evidence Gaps: Towards Precision Clinical Research Design

Introduction

Clinical research, especially randomized clinical trials, generates high-quality evidence for medical decision-making. However, the lack of population representativeness has compromised the generalizability of many clinical studies and has been a significant concern of both the general public and the scientific community. The open-access clinical research information, such as ClinicalTrials.gov, and the increasingly available vast amounts of electronic health information together present a new opportunity for us to improve clinical research’s generalizability and study feasibility at early design phases.

This “proof-of-concept” study presents a distribution-based method to aggregate target populations in existing Type 2 diabetes clinical trials and profile the general Type 2 diabetic population in electronic health records using selected clinical variables. A comparison of these two populations answers questions such as “Who have been systematically omitted in Type 2 diabetes trials” and “How closely correlated are Type 2 diabetic patients and Type 2 diabetic clinical trial target populations regarding selected clinical variables.”

Methods

Using Type 2 diabetes as an example, we downloaded 1,761 clinical trials with a study condition of “Type 2 diabetes” from ClinicalTrials.gov. We also obtained the electronic health records (EHR) for 26120 Type 2 diabetes patients who were treated at Columbia University Medical Center. Our text-mining tool EliXR can automatically extract numerical comparison statements and semantic eligibility features from free-text eligibility criteria. We employed EliXR to analyze free-text eligibility criteria and extract each element and its associated numerical comparison statement (e.g., hemoglobin A1c (HbA1c) > 7.5%). We compared the distributions of age, HbA1c, and kidney disease status of the eligibility criteria of the 1761 clinical trials with the characteristics of the 26120 patients.

Findings

A person 44 years old or a person with HbA1c value of 8.2 has the best chance to be included in most Type 2 diabetes trials. In contrast, most diabetics are around 63 or have an HbA1c value around 6.3. In addition, about 38% diabetes trials exclude patients with renal failures or kidney diseases, while 3% diabetes trials include patients with related kidney conditions. These kidney conditions are prevalent in about 22% of the diabetic population.

Discussion

Although we used only one disease and three variables to prove the concept, this method can be applied to any medical condition and can include more variables. By aligning the aggregated trial populations and the general patient population, we can identify the population subgroups that tend to be omitted or overly studied in existing clinical research studies, and hence enable data-driven subject selection and eligibility criteria design.

Conclusions

This study confirms that (a) Type 2 diabetes clinical trials systematically target the young and sick (i.e., with higher HbA1c values) ones among diabetes patients; and (b) Type 2 diabetes clinical trials tend to exclude patients with conditions such as kidney diseases. More importantly, we contribute a distribution-based method for using electronic data to profile population health and identify clinical evidence gaps towards precision clinical research eligibility criteria design. This method can potentially help us improve the generalizability and precision of population-based clinical research.