Building Scalable Health Analytic Platform: Computational Phenotyping and Cloud-based Predictive Modelling | Event | AIHI - Australian Institute of Health Innovation

Building Scalable Health Analytic Platform: Computational Phenotyping and Cloud-based Predictive Modelling | Event | AIHI - Australian Institute of Health Innovation

Event Date
Monday, 10 August 2015

ABSTRACT

As the adoption of electronic health records (EHRs) has grown, EHRs are now composed of a diverse array of data, including structured information and unstructured clinical progress notes. Two unique challenges need to be addressed in order to utilize EHR data in clinical research and practice:

1)    Computational Phenotyping: How to turn complex and messy EHR data into meaningful clinical concepts or phenotypes?

2)    Scalable predictive modelling: How to efficiently construct and validate clinical predictive models from EHR?

In this talk, we discuss our approaches to these challenges. For computational phenotyping, we present EHR data as data as inter-connected high-order relations i.e. tensors (e.g. tuples of patient-medication-diagnosis, patient-lab, and patient-symptoms), and then develop expert-guided sparse nonnegative tensor factorization for extracting multiple phenotype candidates from EHR data. Most of the phenotype candidates are considered clinically meaningful and with predictive power.

For predictive modelling, we introduce CloudAtlas, a cloud-based parallel predictive modelling system using big data infrastructure including Hadoop and Spark. Besides parallel model building, CloudAtlas can accurately estimate the running time and cost for a predictive modelling workflow then provHisions the proper cluster on demand in the cloud.In particular, we demonstrate that CloudAtlas can achieve 40X speedup plus 40% cost saving compared to traditional sequential execution on large EHR datasets.

SPEAKER PROFILE

Jimeng Sun is an Associate Professor of School of Computational Science and Engineering at College of Computing in Georgia Institute of Technology. Prior to joining Georgia Tech, he was a research staff member at IBM TJ Watson Research Center. His research focuses on health analytics using electronic health records and data mining, especially in designing novel tensor analysis and similarity learning methods and developing large-scale predictive modelling systems. He has published over 70 papers, filed over 20 patents (5 granted). He has received ICDM best research paper award in 2008, SDM best research paper award in 2007, and KDD Dissertation runner-up award in 2008. Dr. Sun received his B.S. in Computer Science from Hong Kong University of Science and Technology in 2002, and PhD in Computer Science from Carnegie Mellon University in 2007.

Back to the top of this page