The following speakers will present the challenges and results from this year's Dialogue for Reverse Engineering Assessments and Methods. Visit the DREAM website for more information about the challenges.
The goal of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge is to assess the accuracy of computational models to predict breast cancer survival based on gene expression data, copy number data, and standard clinical information of 2,000 breast cancer patients (the MATABRIC dataset). The challenge was structured as a scientific collaborative competition in which participants were encouraged to use a common software platform (Synapse) for model submission and assessment. Here we summarize the results of the more than 30 teams that participated in the challenge. An analysis of the leaderboard history reveals that top performing models steadily improved with time. We present a comparison of all the submissions, the methods used by the participants and the result of aggregating individual predictions. Finally we discuss the performance of the best predictions on different subsets of the test dataset based on known breast cancer biomarkers like ER, HER2, or histological grade.
Speaker Biography: Erhan Bilal joined IBM in 2010, after receiving a PhD degree in computational biology from Rutgers University, BioMapS. Before that, he completed MSc and BS degrees in automatic control and industrial informatics at the Politehnica University of Bucharest, Romania.
Boston University, Howard Hughes Medical Institute
Over 45 teams participated in two challenges sponsored by the National Cancer Institute to predict drug sensitivities. In the first challenge, teams were supplied with genome-scale datasets over 53 cell lines: namely, gene expression (microarray), exome sequencing, RNA-sequencing, DNA methylation, copy number variation, and protein quantification. Across 31 drug treatments, participants were asked to predict the most sensitive to more resistant cell lines. For the second challenge, teams were supplied with gene expression microarray data sampled from a lymphoma cell line treated with a drug at the 20% inhibitory concentration. Participants were then asked to predict the most synergistic to most antagonistic drug combinations. Here, we present the evaluation methodology and results for both challenges.
Speaker Biography: James Costello received his PhD in informatics from the School of Informatics and Computing at Indiana University, Bloomington. There he studied data integration and gene function prediction methods with a focus on Drosophila genomics. Jim is currently a postdoctoral researcher in the laboratory of Dr. James J. Collins at Boston University, where he takes systems biology approaches to address questions in cancer genomics, gene regulatory network reconstruction, and the genomic effects of antibiotics on both microbial and mammalian biology. Jim helped co-organize the DREAM5 network inference challenge and is a co-organizer of the 2012 NCI-DREAM drug sensitivity challenge.
Oregon Health & Science University
Genomic, epigenomic, transcriptional, and proteomic analyses of breast cancers reveal heterogeneity on every level, indicating that there are many mechanisms involved in disease development and progression. In order to gain a greater understanding of disease development and response to therapeutics, we have employed a panel of ~50 well characterized breast cancer cell lines for experimental investigation. Clustering of gene expression profiles from the panel indicates that the cell line collection models the luminal, basal, and claudin-low subtypes defined in primary tumor samples. Further, high-resolution SNP copy number analysis confirms that the cell line panel contains many of the same regions of recurrent aberration as are found in primary tumors. Nearly all compounds tested against the panel show differential responses, and many are associated with subtype, pathways, and/or genomic aberrations. Altogether, these observations suggest mechanisms of response and resistance that may inform development of clinical trials and guide therapeutic deployment.
Speaker Biography: Laura Heiser is a research assistant professor at Oregon Health and Science University. Dr. Heiser earned her PhD in neuroscience from the University of Pittsburgh, and completed postdoctoral studies under the direction of Paul Spellman at Lawrence Berkeley National Laboratory. During her postdoctoral work, she developed algorithms to analyze and integrate genomic and proteomic datasets into a computational model of deregulated cell signaling in breast cancer. Her research focuses on the genomic and epigenomic changes that drive oncogenic behavior in breast cancer, with the goal of identifying aberrations and pathways associated with therapeutic response and resistance.
Ludwig Maximilians University Munich
The DREAM-Phil Bowen challenge was conducted to improve the treatment of amyotrophic lateral sclerosis (ALS) patients by predicting their future disease progression. This challenge is based on the PRO-ACT database that records monthly clinical examinations, including lab tests and questionaires, of thousands of ALS patients. The determination of ALS progression is difficult due to the high variety of patients, which is reflected in the PRO-ACT clinical data. We give a brief overview of the contents of this database, focusing on the features important for predictions. We will then present the general layout and scoring of the challenge. Finally, the participating teams will be introduced and their achieved performance will be discussed in relation to the employed prediction techniques as well as the exploited clinical data items.
Speaker Biographies: Robert Küffner habilitated in informatics in 2010 and is, since 2003, a group leader and lecturer for computer science and bioinformatics at the Ludwig-Maximilians Universität München, Germany. Between the years 2000 and 2003, he was head of software development at the National Center for Genome Resources (NCGR) in New Mexico, USA. He received his PhD in molecular biology in 1998 at the Heinrich-Heine Universität Düsseldorf, Germany. Küffner’s main interests include the investigation and reconstruction of biological networks via Petri Nets as well as research in the areas of text mining, expression analysis, gene regulation, and systems biology. Approaches and tools resulting from this research have been applied in many projects to provide systematic bioinformatics support; e.g. for the pre-processing, analysis and integration of large-scale transcriptomics and proteomics datasets. Recently, his team was recognized as best performer in two international community-wide challenges where comprehensive blinded assessments of network inference approaches have been conducted.
Johann Hawe finished his Bachelor of Science in bioinformatics in 2012 at the Ludwig-Maxmimilans Universität München, where he is currently performing his Master studies. Before studying bioinformatics, he worked as a software-developer at ModulAcht e. K. in 2009. Johann's main interests are the application of machine learning techniques to medical problems. During his work as a student assistant, he explored ways to improve the treatment of patients in intensive care units by analyzing the MIMIC II database. He further conducted his Bachelor's thesis in context of the DREAM-Phil Bowen ALS Prediction Prize4Life, where he predicted the future rate of disease progression of ALS patients.
The past two decades have seen amazing growth in the ability to generate genomic data. This has been fueled, in part, by the rapidly decreasing cost of sequencing technologies. However, with only a few exceptions, the acquisition of this type of data has failed to generate significant improvements in the treatment of human diseases. Improving the methodology around this genotype to phenotype prediction problem is an active area of research. Unfortunately, a key impediment to meaningful advances in this area is the relatively closed nature of scientific research that stems from both the publication - grant - work cycle of academic research as well as the walled-off approaches of industry. To address these cultural barriers, the DREAM project formed 6 years ago with a mission to organize open computational challenges in systems biology that get multiple groups working on the same problems from different angles. Similarly, Sage Bionetworks started out of a conviction that the open sharing of clinical and biological data is the lynchpin to building more predictive models of disease. Driven by these shared objectives, Sage Bionetworks and DREAM have partnered to organize the Sage/DREAM Breast Cancer Prognosis Challenge. The goal of the Breast Cancer Prognosis Challenge is to assess the accuracy of computational models designed to predict breast cancer survival (median 10 year follow up) based on clinical information about the patient's tumor as well as genome-wide molecular profiling data including gene expression and copy number profiles. The Challenge leverages statistical and machine-learning approaches to analyze clinical and genomic data, as well as innovative rewards to incentivize scientists to work together and even off of one another’s models.
Speaker Biography: Adam Margolin is a pioneer in computational approaches for inferring cellular regulatory networks, and developing predictive models linking alterations in cellular networks to genotype-specific cancer therapeutics. Prior to joining Sage Bionetworks, Dr. Margolin initiated and led the genotype-specific therapeutics initiative at the Broad Institute, leading to development of approaches to analyze data from genetic characterization of cancer cell lines coupled with viability screens following small molecule treatment. The results of this work were recently published in Nature and Cancer Cell, demonstrating that most well characterized genetic alterations known to confer sensitivity to particular compounds can be discovered in an unbiased de novo analysis, and that cell lines sensitive or resistant to compound treatment could be predicted with high accuracy. As Director of Computational Biology at Sage Bionetworks, Dr. Margolin oversees the development of novel computational methods for predicting disease-related phenotypes; developing large-scale data processing and sharing tools; and leading several consortium-based projects to leverage cloud-enabled computing technologies for collaborative analysis across a distributed network of investigators.
We present the scoring results for the 12 teams that participated in the DREAM7 Network Topology and Parameter Estimation Challenge. The challenge consisted of estimating parameters and predicting outcomes of perturbations for two gene regulatory networks, one with 9 genes (model 1) and the other with 11 genes (model 2). Participants were also asked to discover 3 “missing” connections not originally provided for model 2. Teams were given an initial data set for each model and allowed to "buy" data on a limited credit-based system, allowing for strategies that included experimental design as part of the inference method.
Performance assessment for model 1 was based on the discrepancy between actual and predicted values of three proteins and on the distance between estimated and known parameters. For model 2, scoring was based on the number of correctly predicted links. We present a comparative analysis of the submissions and a survey of the methods used to solve this challenge.
Speaker Biography: Pablo Meyer received his Bachelor's degree in physics from the Universidad Nacional Autonoma de Mexico (2002), MS in physics from the University of Paris (1999), and his PhD in biology from the Rockefeller University (2005), receiving a Burroughs Wellcome Fellowship for his work on live imaging of protein interactions in the Drosophila circadian clock. In 2007 he received a Helen Hay Whitney Fellowship to work at Columbia University on live-imaging of metabolism during sporulation in Bacillus subtillis. In 2010 he joined the IBM Computational Biology center at IBM Research where he is a DREAM project organizer and finds himself working at the intersection of modelling, data analysis, and wet lab experimentation. His most recent interests are in studying enzyme distribution in the cell in and their link to metabolism/cancer using high-throughput biological-data analysis
European Bioinformatics Institute
The recent focus on reverse engineering has been on the identification of causal interaction topologies. Such interactions are usually inferred from high-dimensional gene expression experiments involving different perturbations. Once network interaction topologies are characterized, how can we decide between models having similar topologies and how do we characterize the actual kinetics of these networks? In the Network Topology and Parameter Inference Challenge we aimed to explore these questions, in particular how to find the most informative experiments to address them – given the normal situation of limited budget and time, this is of critical importance. In subchallenge 1, participants were given the topology but not parameters of a 9 gene regulation model, and a budget to buy experiments (generated with the model), and asked to find the parameters. In subchallenge 2, participants were given an incomplete topology with 11 genes, and asked to find 3 missing links in the models. More information at www.the-dream-project.org/challenges/network-topology-and-parameter-inference-challenge
Speaker Biography: Julio Saez-Rodriguez has been a group leader at the European Bioinformatics Institute (EMBL-EBI) since 2010, with a joint appointment in the EMBL Genome Biology Unit in Heidelberg, as well as a senior fellow at Wolfson College (Cambridge). Since 2010 he has been a co-organizer of the DREAM initiative, started by Gustavo Stolovitzky and Andrea Califano to catalyze the development of computational methods in systems biology and medicine. He studied chemical engineering at the University of Oviedo (Spain) and the University of Stuttgart (Germany) (2001), obtaining his PhD at the University of Magdeburg (2007) based on his research at the Max-Planck-Institute with E. D. Gilles. Afterwards, he was a postdoctoral fellow at Harvard Medical School with Peter Sorger, in collaboration with Doug Lauffenburger at MIT, and a scientific coordinator of the NIH-NIGMS Cell Decision Process Center. Julio's group develops and applies computational methods to acquire a functional understanding of signaling networks and their deregulation in disease, with a focus on cancer, and is working to apply this knowledge to develop novel therapeutics. To this end, his group collaborates closely with experimental groups in academia and pharmaceutical companies.
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease. The majority of ALS patients will die within 3-5 years of diagnosis. However, there is substantial variability in progression, with a small subset of patients surviving 10 years or more. Predicting progression on an individual patient level is important for patients and can mean substantial reductions in the costs of clinical trials. The DREAM-Phil Bowen ALS Prediction Prize4Life used crowdsourcing to address this critical question by incentivizing the development of algorithms that use standard clinical data to predict disease progression. The challenge used data from the first-of-its-kind PRO-ACT (Pooled Resource Open-access ALS Clinical Trials) database, containing over 8,500 ALS patients' records, which will be launched later this year. In this presentation we will discuss the lessons learned from the conduct of this unique challenge regarding the use of crowdsourcing to generate real solutions to urgent clinical needs.
Speaker Biography: Neta Zach has been a scientific director at Prize4Life since 2011. She has a PhD in computational neuroscience from the Hebrew University in Israel, completed her postdoctoral work as a Fulbright scholar at the Rockefeller University in New York, and joined the University of Pennsylvania as a faculty member before deciding to switch from academic research to the health philanthropy sector. She believes that for important scientific breakthroughs to take place, much better, innovative scientific administration is key. In addition to her scientific work, she also holds a Master’s degree in public administration from the University of Pennsylvania and had a five year tenure as a science journalist at the Israeli daily newspaper HaAretz.