Learning About Machine Learning: The Promise and Pitfalls of Big Data and the Electronic Health Record
This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.
In medicine, we are often interested in understanding differences between those with and without a specific disease. Such differences may point toward modifiable risk factors or, when combined into a quantitative model, allow us to predict who is at high risk of disease development to direct treatment. Along these lines, predictive models of heart failure (HF) have been a popular target: in the past decade, over 28 models of incident HF have been published.1 Most have good discriminatory properties (C statistics ranging from 0.70 to 0.89) and were developed using a modest number of predictors (typically <15), combined in linear models. Study populations varied from traditional epidemiological cohorts with regularly scheduled visits and physician-adjudicated outcomes to large patient databases that rely on diagnostic codes for defining HF cases and supplying predictive features.
Article, see p 649
In recent years, there has been increasing excitement about the application of machine-learning strategies toward these same problems.2 Although such approaches are based on the same goal of patient classification, the machine-learning community, with its roots in computer science, tends to embrace far larger numbers of predictors, sometimes transformed or grouped or even derived empirically through feature engineering. These new predictors are then incorporated into a more flexible range of models to improve performance. Beyond attempting to achieve superior models, practitioners of machine learning also look to implement clinical decision support tools that use model predictions to guide physician behavior.
In this issue of Circulation: Cardiovascular Quality and Outcomes, Ng et al3 describes an application of machine-learning approaches toward the problem of predicting incident HF within the electronic health record (EHR) with its potentially vast trove of data. A unique aspect of this study is the authors’ interest in not just reporting their model performance but its sensitivity to critical …