7月1日J. Richard Landis(University of Pennsylvania Perelman School of Medicine)报告

发布者:周晓英发布时间:2018-07-01浏览次数:239



统计与数据科学学院



  -------------------------------------------------

             学术讲座


讲座题目:Subgroup Discovery Using Consensus Clustering Methods : Applications to Deep Phenotyping Data

人:Professor J. Richard Landis, PhD

Department of Biostatistics, Epidemiology and Informatics

University of Pennsylvania Perelman School of Medicine

时间:201872日(星期一)10:00

地点:统计与数据科学学院426教室

Introduction:Discovery of patient subtypes with differential profiles of risk for multiple outcomes is essential for improving health promotion and precision medicine strategies. For many disease subpopulations, unique phenotypes exist, but their subgroup identity is not observable. Discovery of unique clusters will potentially open up prognostic insights into likely outcomes, etiological insights into causal pathways, and prevention strategy insights for improved health outcomes.

Methods: This talk will describe hierarchical clustering methods, first clustering baseline variables into I item clusters, followed by consensus clustering of patients into K patient subgroups, separately within selected subdomains of deep phenotyping data.A K-means clustering algorithm, using an average linkage method, was applied to the I item clusters as variables, and was repeated within 1,000 randomly selected subsamples of size 80%N, utilizing R Bioconductor ConsensusClusterPlus (Wilkerson, Hayes and Neil (2010)).An evaluation criterion based on the “Proportion of Ambiguous Pairs within Clusters” (PAC) was used to generate a mean consensus score for each of the K clusters.This entire consensus clustering process was then implemented sequentially for K=2, … ,10 clusters, to facilitate the optimal selection of K within each subdomain of phenotype data.A Monte Carlo consensus clustering algorithm utilizing R Bioconductor M3C (John C (2017)) was also implemented to evaluate the likelihood that resulting subgroups for selected choices of K were due to chance, based on comparisons between the real data and 100 sets of simulated data under the null hypothesis of no distinct subgroups (K=1).

Results:Utilizing deep phenotyping data from the Epidemiology and Phenotyping (EP) Cohort Study conducted within the Multidisciplinary Approach to the Study of Chronic Pelvic Pain (MAPP) Research Network (http://www.mappnetwork.org/), item clustering was implemented among 26 features from the pelvic pain, urgency and frequency (PUF) domains, and separately among 45 sites from the body map pain domain, to determine the final set of variables for consensus clustering of patients into subgroups within each domain.Consensus clustering among all I=26 features within the pelvic PUF domain resulted in K=3 patient clusters (mean consensus: 90%, 92%, 94%), and among I=23 body site clusters of features resulted in K=3 patient clusters (mean consensus: 84%, 87%, 89%), suggesting that these patient subtypes are highly reproducible within each of these 2 domains of phenotyping data.Furthermore, the resulting PUF clusters were predictive of longitudinal change in urological pain severity, and separately in urinary severity, when adjusted for a relevant set of potential confounders, such as baseline severity and physical quality of life.

Conclusions:These consensus clustering methods provide credible re-sampling criteria for evaluating the reproducibility of domain-specific subgroups, and for scaling discovery of more refined subgroups by expanding to additional symptom domains.Furthermore, these methods hold promise in identifying clinically relevant patient sub-phenotypes with differential longitudinal profiles for multiple outcomes.


REFERENCES:


1.Monti, Stefano et al. (2003). “Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data.” Machine Learning, 52, 91–118. Kluwer Academic Pub, link.springer.com/content/pdf/10.1023/A:1023949509487.pdf.


2.Wilkerson, D. M, Hayes and Neil D (2010). “ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking.” Bioinformatics, 26(12), pp. 1572-1573.


3.Senbabaoglu, Yasin et al (2014). “Critical limitations of consensus clustering in class discovery.” Nature: Scientific Reports, vol. 4, 27, www.nature.com/articles/srep06207.


4.Kriegel, Hans-Peter, and Erich Schubert (2016). “The (Black) art of runtime evaluation: Are we comparing algorithms or implementations?” Knowledge and Information Systems, vol. 52, no. 2,341–378., link.springer.com/article/10.1007%2Fs10115-016-1004-2.


5.John C (2017). M3C: Monte Carlo Consensus Clustering. R package version 1.0.0.


讲座人简历:J. Richard Landis, PhD, is Professor of Biostatistics and Director of the Division of Biostatistics in the Department of Bisotatistics, Epidemiology and Informatics, at the University of Pennsylvania Perelman School of Medicine, and holds a secondary appointment as Professor of Statistics in the Wharton School (http://www.cceb.med.upenn.edu/faculty/index.php?id=18).He has extensive experience leading NIH-funded Data Coordinating Centers (DCCs) for multi-center research networks, and has co-authored more than 180 articles in the peer-reviewed scientific literature in the areas of statistical methods for repeated measurement and longitudinal categorical data, epidemiological studies, complex sample surveys and applications to cardiovascular, ophthalmology, respiratory, psychiatric, renal and urological research.Dr. Landis is Director of the Research Methods Core and Co-Investigator of the Informatics Core in Penn’s CTSA-Hub (2016-21). He serves as DCC PI for NIDDK’s “Multidisciplinary Approach to Pelvic Pain (MAPP) Research Network” (2008-19) (http://www.mappnetwork.org), having previously served as PI of the DCCs for NIDDK’s Urologic Pelvic Pain Collaborative Research Networks (CPCRN, ICCTG, CPCRN-2, ICCRN) that conducted 9 RCTs (1998-2008).Within renal research, he is Co-investigator of the Scientific and Data Coordinating Center (SDCC) for NIDDK’s “Chronic Renal Insufficiency Cohort (CRIC) Research Network” (2001-18) (http://www.cristudy.org), MPI for NIDDK’s “Data Coordinating Center for Hemodialysis Pilot Studies Consortium” (2013-18), and Co-Investigator for the “Pragmatic Trials in Maintenance Hemodialysis” (2012-17).


邀请人:邹长亮 教授


欢迎广大师生参加!


统计与数据科学学院

201871