Discovery on Target
Discovery on Target Mobile Header

Big Data Analytics and Solutions header

About This Conference:

Effectively utilizing big-data opportunities can help biopharma companies better identify new potential drug candidates and develop them into effective, approved and reimbursed medicines more quickly. This potential cannot be unlocked without addressing key issues including role of big data in drug design; big data approaches across multiple research initiatives; scalability; data modeling, simulating and visualizing in novel ways; and translating data into knowledge for improved clinical decision making, patient care and drug development.  

Cambridge Healthtech Institute’s Inaugural Conference on Big Data Analytics and Solutions assembles leading researchers and thought leaders who will share big data approaches to help map diseases, identify biomarkers, and discover targets for potential therapies. 

 Download Brochure | Register  

Wednesday, October 8

7:00 am Registration and Morning Coffee


8:05 Chairperson’s Opening Remarks

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

8:15 Disease and Big (NGS) Data: Searching for Needles in Needlestacks

Joseph D. Szustakowski, Ph.D., Senior Group Head, Novartis Institutes for BioMedical Research

9:00 Big Data: We May Have the Right Drugs, but Do We Have the Right Targets?

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

Drug development targets the proposed molecular mechanism associated with a disease but in much of medicine, accurate definition/diagnosis of the disease or phenotype is lacking. We focus on applying big data to understand the complexity of the disease to both improve clinical decision making/patient care and drug development, particularly in complex diseases and syndromes. Our approach uniquely starts with clinical need rather than data generation.

9:30 Rapid Drug Repositioning by Combining High-Throughput Experiments and Publicly Available Datasets

Blake Borgeson, Co-Founder and CTO, Recursion Pharmaceuticals

Drug repositioning has emerged as a compelling opportunity for quickly bringing to market treatments to meet medical need, as well as for generating more value from previously shelved drug candidates. Combining big datasets both publicly available and generated internally has the potential to accelerate this process, and Recursion is at the forefront of this advance. Previous groups have separately proposed experimental or computational approaches to identify opportunities for repositioning. Here we describe integrating various sources of data from both data-intensive high throughput experiments and large public datasets such as the Connectivity Map project. This approach helps us improve experimental design, boost confidence in selecting hits for further validation, and bring treatments to market even faster. This talk will describe integrating various sources of data from both data-intensive high throughput experiments and large public datasets such as the Connectivity Map project. We will discuss multiple types and sources of data including public databases of various types and high-content experimental data. A practical evaluation of the ways in which these sources can be integrated to support drug repositioning efforts will be discussed, including how these various data sources can complement and reinforce each other. Additionally, the talk will describe relevant challenges and pitfalls of combining public datasets with in-house experimental results.

10:00 Grand Opening Coffee Break in the Exhibit Hall with Poster Viewing


10:45 Cell-Based Assays for Drug Target Discovery: Lessons from Transcriptomic Studies With Human Lymphoblastoid Cell Lines

Noam Shomron, Ph.D., Associate Professor, Head, Functional Genomics Laboratory, Faculty of Medicine, Tel-Aviv University

Genome-wide pharmacogenomic studies for developing targeted therapies offer the advantage of hypothesis-free search for tentative drug response biomarkers (efficacy and safety). However, they require large patient cohorts and are therefore very costly. This talk presents our experience with an alternative approach, based on genome-wide transcriptomic profiling of a panel of human lymphoblastoid cell lines (LCLs) representing unrelated healthy donors.

11:15 Using Big Data Analytics in the Cloud to Improve Vaccine Yields

Jerry Megaro, Director, Manufacturing Advanced Analytics and Innovation, Merck

Craig Sutherland, Executive Director, Technology & Data Science, Life Sciences and Health Practice, Booz Allen Hamilton

Here we present a proof-of-concept experiment where big data tools and techniques were applied to integrate and analyze 12 years worth of vaccine manufacturing data from 16 data sources to identify characteristics that influence yield. Additionally, the talk will describe how we leveraged shared data lake platform services operated within an Amazon Web Services Virtual Private Cloud (VPC) and scaled up Elastic Compute Cloud (EC2) services on-demand to support the analysis effort.

11:45 From Big Data to Smart Data: Using Quantitative Systems Pharmacology for De-risking R&D Projects in CNS R&D 
Hugo Geerts, Ph.D., CSO, Computational Neuropharmacology, In Silico Biosciences 
Despite heavy investment in CNS R&D and increased available information in multiple –omics databases, the clinical failure rate remains above 90%. The ‘reductionist’ trap that focuses on very detailed bits and pieces of data makes it a huge challenge to generate actionable knowledge in an integrative and useful way. We propose Quantitative Systems Pharmacology as an example of a completely new generation of deep analytics approaches that can be a possible powerful and valuable tool for knowledge generation in the context of going from ‘Big Data to Smart Data’. The use of computer-based mechanistic modeling based upon the physiology of (human) brain networks, functional imaging of genetics and pharmacology of drug-receptor interaction and parametrized with clinical data as a common language allows integrating a wide diversity of information into an actionable platform that can be interrogated at different levels. We will show examples in Alzheimer’s disease and schizophrenia where this approach could have made a substantial impact on clinical trial success rate. Similar to other ‘engineering’ industries, the platform can also be developed also as a knowledge repository that conserves and expand ‘tacit’ corporate scientific knowledge.”

12:15 pm Selected Poster Presentation: An Integrative Platform for Discovery of Drugs and Small Chemicals Associated With Autism Spectrum Disorder
Adam Brown, Ph.D. Student, Biological and Biomedical Sciences, Harvard University  
Repositioning approved drug molecules in novel therapeutic areas is of key interest to the pharmaceutical industry. To aid in this effort, several large databases have been compiled in an attempt to aggregate gene-level information about drug molecules (e.g. Drugbank, the Comparative Toxicogenomics Database, and the Connectivity Map, among others). However, no pipelines exist to comprehensively query of gene expression databases for associations between drugs and diseases. To address these issues, we developed a generalized Kolmogorov-Smirnov enrichment test-based methodology for identifying compounds that modulate genes derived from expert-curated and publicly available datasets. We demonstrated our approach by correctly identifying two out of three frontline prostate cancer therapies from a database of over 7,000 drug and non-drug compounds. We predict a list of eight drugs that modulate genes that are also perturbed in autism spectrum disorder (ASD), seven of which target genes that have previously been implicated in ASD. Finally, we show that our method significantly enriches for drugs from a database containing both drug and non-drug compounds, demonstrating its utility as a hypothesis-generating tool for drug repositioning studies. 

12:45 Session Break

1:00 Luncheon Presentation - MetaCoreTM: The Next Generation of Systems and Network Biology to Support Gene Variant and Expression Co-Analysis 
Chris Willis, Ph.D., Solution Scientist, Discovery and Translational Sciences, Thomson Reuters
To date, drugs have only explored targeting 10% of the coding human genome. Systems biology approaches aim to increase these numbers through understanding molecular disruptions at the pathway level in disease. This talk will discuss the challenges associated with analyzing the exponential growth in biomedical research knowledge and multi-omics data. Furthermore, this talk will demonstrate the use of Thomson Reuters MetaCoreTM platform to analyze gene variant and transcriptomic data simultaneously. Lastly, the development of bioinformatic algorithms and the value of taking a combination knowledge/data driven approach to data analysis will highlight the roadmap for MetaCoreTM. 

1:40 Session Break

1:50 Chairperson’s Opening Remarks

Cindy Crowninshield, RDN, LDN, Senior Conference Director/Team Lead, Cambridge Healthtech Institute

2:00-3:00 The Water Must Flow: A Data Services Architecture for the Broad Institute 

Chris Dwan, Assistant Director, Research Computing and Data Services, Broad Institute of MIT and Harvard 


3:00 Caleydo Entourage: Visualizing Relationships between Biological Pathways

Alexander Lex, Ph.D., Researcher, Harvard School of Engineering & Applied Sciences

This talk will introduce Entourage, a visualization technique for analyzing interrelationships between multiple related biological pathways. We use a novel technique - contextual subsets - to determine and present parts of other pathways that are relevant in the context of a focus pathway. Entourage supports dynamic querying of pathway based on node occurrence, overall similarity of pathways, or over-representation of pathways in experimental data. I will demonstrate three case studies showing how Entourage can be used to judge potential side-effects of compounds, to find potential targets for drug-repositioning and how it can be combined with visualization of experimental data to reason about varying effects of compounds on samples. Entourage is part of Caleydo, an open-source visualization framework. 

3:30 Refreshment Break in the Exhibit Hall with Poster Viewing 

4:10 The Secrets in Their Landscapes: Using ‘Google Exacyle’ to Elucidate Activation Mechanism of GPCRs for Selective Drug Design

Diwakar Shukla, Ph.D., Simbios Distinguished Fellow, Laboratory of Vijay Pande, Chemistry Department, Stanford University and soon to be Professor, Chemical Engineering, University of Illinois at Urbana-Champaign

Mechanistic understanding of GPCR activation could be obtained via insilico approaches, although this is very challenging due to the long activation timescales. Here, we employ a novel computational paradigm that couples cloud computing and Markov state model based sampling algorithms for mapping the conformational landscape of β2-adrenergic receptor. These computations provide the atomistic picture of activation and help identify key structural intermediates for drug design.

4:40 Free Energies from a Molecular Printing Press

Kenneth M. Merz, Jr., Director, Institute for Cyber Enabled Research (iCER) and Joseph Zichis Chair in Chemistry, Department of Chemistry, Department of Biochemistry and Molecular Biology, Michigan State University

Docking (posing) calculations coupled with binding free energy estimates (scoring) are a mainstay of structure-based drug design. Docking and scoring methods have steadily improved over the years, but remain challenging because of the extensive sampling that is required, the need for accurate scoring functions and challenges encountered in accurately estimating entropy effects. To address this we
developed the Moveable Type (MT) method that combines knowledge-based approaches (via data-mining of structural databases) with physics-based models to create molecular ensembles. The MT method employs an elegant approach to generate the necessary statistical mechanical ensembles by using a grid-based representation of a physics-based potential combined with atom pair probabilities extracted from structural databases like the Protein Databank (PDB) or the Cambridge Structural Database (CSD). In the realm of structure-based drug design this allows us to rapidly compute the ligand, protein and protein-ligand (inclusive of solvation effects) ensembles which then can be used to directly estimate protein-ligand binding free energies using a ratio of partition functions. This approach improves the quality of the potential (scoring) function by reducing computational uncertainty, sampling phase space in one shot and accurately incorporating entropy effects. This allows us to compute binding free energies rapidly, accurately and yields molecular poses at a minimal computational cost relative to currently available methods.

5:10 Interactive Breakout Discussion Groups
This interactive session provides conference delegates and speakers an opportunity to choose a specific roundtable discussion group to join. Each group has a moderator to ensure focused discussions around key issues within the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not meant to be a corporate or specific product discussion.  

Drug Discovery....Is it about Big Data or Unmet Clinical Needs?

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

  • Does Big Data actually address unmet clinical needs?
  • Do new technologies adequately address Pharma's needs?
  • Developing a drug/Curing a Disease/Improving Healthcare....are they all the same thing?

Re-use of EMR Data for Quality Improvement, Business Intelligence and Clinical Research in the Era of the Learning Healthcare System

Louis Fiore, M.D., MPH, Executive Director, Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Department of Veterans Affairs; Associate Professor, Boston University School of Medicine, Boston University School of Public Health

Valmeek Kudesia, M.D., Director, Clinical Informatics, Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC)

  • Create learning healthcare system and what the benefits are
  • Navigate the cultural and operational issues of using EMR for secondary purposes
  • Bring data to life with tried and tested data visualisation techniques and tools
  • Design projects that will appeal to your CFO – delivering business intelligence that will cut costs or achieve greater ROI

6:10 Welcome Reception in the Exhibit Hall with Poster Viewing

7:15 Close of Day

Thursday, October 9

7:30 am Registration and Morning Coffee


8:00 Chairperson’s Opening Remarks

Peter Henstock, Ph.D., Senior Principal Scientist, Research Business Technology Group, Pfizer, Inc.

8:10 CARD: A Web-Based Application for Statistical Analysis and Interactive Visualization of RNAi Screen Data

Bhaskar Dutta, Ph.D., Staff Scientist, Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health

CARD is a web-based application for comprehensive analysis and interpretation of RNAi screen data. It uses client-server architecture and integrates multiple types of genome-scale biological data. Different existing and novel algorithms for RNAi screen data analyses are implemented as back-end R modules. The results generated by these modules are visualized, using JavaScript libraries, as interactive tables and figures at the Internet browser. All the data and results are saved and can be securely shared between researchers.

8:40 Genome Wide Association Visual Analysis (GWAVA) Enabled by a Scalable Data Pipeline

Peter Henstock, Ph.D., Senior Principal Scientist, Research Business Technology Group, Pfizer, Inc.

Ami Khandeshi, Manager, Research Business Technology Group, Pfizer Inc.

GWAVA is our corporate standard software for interactively querying and visualizing genome-wide association studies (GWAS) data loaded into the tranSMART platform. Managing multiple genes and studies, it facilitates an understanding of the relationship between SNP p-values and study endpoints by providing and managing access to different views of the data. The engine that empowers GWAVA consists of a set of interfaces implemented with the tranSMART core technology stack to facilitate a flexible and scalable data access using a various public and custom annotation sources. GWAVA has recently been released as open source software available through the tranSMART Foundation.


9:10 A Proven Platform for Diagnosis and Discovery Using Massive WGS and Phenotypic Data in Real Time

Jeffrey Gulcher, Ph.D., M.D., Co.Founder, President & CSO, NextCODE Health

NextCODE offers the world’s only road-tested solutions for rapidly analyzing population-scale whole-genome and health data to detect disease-causing mutations, enabling clinical diagnosis and the optimization of existing and new treatments. Built to mine 350,000 whole genomes, our genome-ordered relational (GOR) database architecture is uniquely powered to take full advantage of the new WGS technology. Its SDL can query phenotypic datasets to define cases and controls as well as drug response.

9:40 Coffee Break in the Exhibit Hall with Poster Viewing

10:30 The Path to Establishing a Global Data Analysis Infrastructure at AstraZeneca

Justin Johnson, Principal Translational Genomic Scientist, Oncology, AstraZeneca

NGS technologies are evolving, the data is growing, data is subject to artifacts and NGS moves towards the clinic. It is imperative that we build tools and methods to translate NGS data into knowledge for target discovery, patient selection, and translational medicine through a flexible, scalable and secure infrastructure. The hybrid cloud / local solution and novel data warehousing strategies being built at AstraZeneca will allow for a global streamlined ability to analyze, store and interpret NGS data. Ultimately it will provide the ability to identify variation responsible in tumorigenesis and stratify the disease as well as identify mechanisms of resistance and correlate preclinical drug response with genome data.

11:00 Data and Computational Requirements for Implementing the Department of Veterans Affairs Precision Oncology Program - A Partnership Between Clinical Care and Clinical Research

Louis Fiore, M.D., MPH, Executive Director, Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Department of Veterans Affairs; Associate Professor, Boston University School of Medicine, Boston University School of Public Health

Precision Oncology centers on genetic profiling of cancers to identify driver mutations that can be targeted by novel anti-cancer agents. Routine testing of patient tumor samples is rapidly becoming the standard of care. The artifacts of routine care, that is detailed cancer mutational status and longitudinal patient information, can be efficiently re-purposed for research use. Products of this effort can include discovery and validation of cancer biomarkers and creation of a cancer 'knowledgebase' and 'predictive engine' that could serve to inform clinicians about best possible patient treatments based on similar patients in the database. This presentation will discuss the pilot project within VA to create such an integrated system.

11:30 Enjoy Lunch on Your Own

1:00 pm Plenary Keynote Program 

Chas BountraChas Bountra, Ph.D., Professor of Translational Medicine & Head, Structural Genomics Consortium, University of Oxford

Martin TolarMartin Tolar, M.D., Ph.D., Founder, President & CEO, Alzheon, Inc.

Andrew L. Hopkins, Andrew L. Hopkins, D.Phil, FRSC, FSB, Chair of Medicinal Informatics and SULSA Research Professor of Translational Biology, Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee

2:45 Refreshment Break in the Exhibit Hall with Poster Viewing

3:45 Close of Conference

Japan-Flag Korea-Flag China-Simplified-Flag China-Traditional-Flag  

Final Agenda Now Available

Final Agenda Now Available









The exhibit hall was sold out in 2015, so please contact us early to reserve your place. To customize your sponsorship or exhibit package for 2016, contact:

Jon Stroup
Sr. Business Development Manager






Next-Generation Histone Deacetylase Inhibitors

Strategies for Tackling Rare Genetic Diseases

Understanding CRISPR: Mechanisms and Applications

Autoimmunity – Small Molecule Approaches

NK Cell-Based Cancer Immunotherapy



Targeting Histone Methyltransferases and Demethylases

Targeting the Ubiquitin Proteasome System

Targeting the Microbiome
– Part 1

GPCR-Based Drug Discovery - Part 1

Advances in Gene Editing and Gene Silencing – Part 1

Gene Therapy Breakthroughs

Antibodies Against Membrane Protein Targets – Part 1

Targeting Cardio-Metabolic Diseases

Targeting Ocular Disorders


Targeting Epigenetic Readers and Chromatin Remodelers

Kinase Inhibitor Discovery

Targeting the Microbiome
– Part 2

GPCR-Based Drug Discovery - Part 2

Advances in Gene Editing and Gene Silencing – Part 2

Translating Cancer Genomics

Antibodies Against Membrane Protein Targets – Part 2

Metabolomics in Drug Discovery

TRAINING SEMINAR: Data Visualization

Monday, September 19 | 8:00 - 11:00 am

(SC1) Immunology Basics for Chemists

(SC2) Designing Peptide Therapeutics for Specific PPIs

(SC3) Phenotypic Screening and Chemical Probe Development

(SC4) Medical Dermatology Therapeutic R&D and Technical Innovation - Part 1

Monday, September 19 | 12:00 - 3:00 pm

(SC5) GPCR Structure-Based Drug Discovery

(SC6) RNA as a Small Molecule Drug Target

(SC7) Using IP Landscape Studies to Improve Your Confidence

(SC8) Medical Dermatology Therapeutic R&D and Technical Innovation - Part 2

Monday, September 19 | 3:30 - 6:30 pm

(SC9) Targeting of GPCRs with Monoclonal Antibodies

(SC10) Introduction to Targeted Covalent Inhibitors

(SC11) Contact Lens Drug Delivery Systems

(SC12) Introduction to Gene Editing

Monday, September 19 | 7:00 - 9:30 pm

T(SC13) Convergence of Immunotherapy and Epigenetics for Cancer Treatment

Wednesday, September 21 , 7:00 - 9:30 pm

(SC14) Cancer Metabolism: Pathways, Targets and Clinical Updates

(SC15) Introduction to Allosteric Modulators and Biased Ligands of GPCRs

(SC16) Functional Screening Strategies Using CRISPR and RNAi

(SC17) Challenges and Opportunities in DNA Methyl Transferase (DNMT) Inhibitors as Therapeutics