Submission Type

Empirical Research


disease progression, chronic kidney disease, hidden Markov model, transition probability, missing at random, missing not at random, EM algorithm


Chronic diseases are often described by stages of severity. Clinical decisions about what to do are influenced by the stage, whether a patient is progressing, and the rate of progression. For chronic kidney disease (CKD), relatively little is known about the transition rates between stages. To address this, we used electronic health records (EHR) data on a large primary care population, which should have the advantage of having both sufficient follow-up time and sample size to reliably estimate transition rates for CKD. However, EHR data have some features that threaten the validity of any analysis. In particular, the timing and frequency of labratory values and clinical measurements are not determined a priori by research investigators, but rather, depend on many factors, including the current health of the patient. We developed an approach for estimatating CKD stage transition rates using hidden Markov models (HMMs), when the level of information and observation time vary among individuals. To estimate the HMMs in a computationally manageable way, we used a “discretization” method to transform daily data into intervals of 30 days, 90 days, or 180 days. We assessed the accuracy and computation time of this method via simulation studies. We also used simulations to study the effect of informative observation times on the estimated transition rates. Our simulation results showed good performance of the method, even when missing data are non-ignorable. We applied the methods to EHR data from over 60,000 primary care patients who have chronic kidney disease (stage 2 and above). We estimated transition rates between six underlying disease states. The results were similar for men and women.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.