1. Big Data in Medicine: Prize4Life ALS Prediction Challenge
The DREAM-Phil Bowen
ALS Prediction Prize
Bruce Toman
Cornell University, on behalf of
Prize4Life and
1
2. Medicine
• Technology = applied science
• IT
– Semiconductor = applied physics
– Encryption algorithms = applied math
• Medicine
– applied biology… but…
http://www.youtube.com/watch?v=RrS2uROUjK4&f
eature=BFa&list=PL757C8E14C021A1BF
2
2
3. Big Data in Medicine
• Clinical trial vs basic science data
• Basic science: gene sequences, protein
structures, small molecule structures, etc
• Clinical data
– Patient data
• Corporate owned: EHRs- huge privacy issues
• New “open source” databases – 23andMe,
PatientsLikeMe, Sage Bionetworks
– Clinical trial data
• Corporate owned
• Consortia: CAMD Alzheimer's disease database, and…
3
3
4. The Pooled Resource Open-Access ALS
Clinical Trials (PRO-ACT) Project
Collaboration with the Northeast ALS Consortium (NEALS) and the Neurology
Clinical Trials Unit at Massachusetts General Hospital
Will have > 8000 de-identified patient data records from 13 completed public
and industry clinical trials - Largest merged ALS patient record data set ever
Invaluable resource to help address questions around ALS natural history, trial
design, patient stratification, and biomarkers
Will be made available in December 2012
4
5. What is ALS
(amyotrophic lateral sclerosis, or Lou Gehrig’s Disease)?
Lou Gehrig Stephen Hawking
(died within 2 years of diagnosis) (has lived with the disease for over 40 years)
5
6. Challenges for ALS Therapy Development
The only existing FDA-approved treatment, approved in
1995, prolongs life by only 2-3 months
Orphan disease - 30k in the US (vs. >25 million diabetes
patients)
Challenge of ALS clinical trials
Disease progression varies, so need many patients for
statistical power
Limited number of patients available, so takes a long
time
High expense of these trials drives companies away
6
7. Prize4Life (P4L) Aims to Accelerate
Therapy Development for ALS
P4L is a results-oriented non-profit
organization focused exclusively on ALS
Founded in 2006 by Avichai Kremer and
Avi, ~9 mo Harvard Business School colleagues when he
after diagnosis
was diagnosed at age 29.
P4L adopted the Incentive Prize Model
Focus research on unmet needs in ALS
Lower risk for therapy-development companies
Bring new ideas and new minds into the field
winner of the 2011 PM award for Entrepreneurship and Innovation
~7 years after diagnosis
7
8. Incentive Prizes
Accelerate Scientific Innovation
2005: The DARPA Grand
1927: The Orteig Prize Challenge - robotics
revolutionized modern
aviation (Charles
Lindbergh)
2004: The Ansari X-Prize 2012: The Archon
revolutionized personal Genomics X Prize:
space flight. (Spaceship Energizing personal
One) medicine
8
9. The DREAM-Phil Bowen ALS Prediction Prize
$25,000 for a Predictive Algorithm of ALS Disease Progression
July 15 - October 15, 2012
www.innocentive.com/ar/challenge/9933047
9
10. The Prize has Launched!
Winner gets $25,000
Speaking invitation and travel expenses to the
DREAM7 conference (Nov 12-15, San
Francisco) for one member of the three best
performing teams.
Make an impact for ALS patients
reducing clinical trial costs => more drugs tested!
Solve an intriguing challenge
10
10
11. The challenge details
• The training data contains 900 patients tracked for 12 months or more
• For a given patient, between 100 and ~200 datapoints. ~135,000 datapoints.
• Goal: Based on the information from the first 3 months , predict the progression of
the disease over the next 9 months
• Progression- a change a scale called ALSFRS (ALS functional rating scale) ranging
from 0-40
– 10 questions with a score of 0-4 for each one, with4 being normal, 0 being complete inability)
– Predict (ALSFRS(12)-ASFRS(3))/# of months
• Data: demographics, family history, medications, symptoms, lab results(blood and
urine), all available over a year’s time
• Data is spotty. This is reality of clinical trials. One of the great things in this
challenge. But your algorithm needs to deal with this.
11
11
12. Submitting Your Solution
July 15 - October 1:
Develop your code with a training set
Test your code on a leaderboard set that you cannot see. Innocentive
provides your score and ranks you on a leaderboard
On October 1: solvers obtain the leaderboard set to further
improve their algorithms
By October 15: Submit final code in R and a description
Algorithms will be evaluated with a validation set.
A winner will be announced in early November.
Judges include Merit Cudkowitz,MD Director, Neurology Clinical Trials Unit,
MGH; Gustavo Stolovitzy, PhD – IBM Computational Biology Center, co-
Founder of DREAM
12
12
13. The Winner!
The solver with an algorithm with smallest RMSD and
a complete submission will win!
The solver is required to publish their algorithm
description (alone or in coordination with Prize4Life)
Solver retains all intellectual property rights in the
code.
13
13
I’d like to tell you about a Prize launched for development of a computational approach to predict disease progression in ALS (Lou Gehrigs disease). This is a devastating disease in which all motor functions are lost while the mind remains unaffected and there are currently not effective treatments for the disease. The goal of this prize is to ultimately accelerate discovery of a treatment by improving ALS clinical trials.
Cohrane:http://www.cochrane.org/about-us/our-policies/support-free-access-to-all-data-from-all-clinical-trialsThe C-Path Online Data Repository (CODR) currently houses the integrated Coalition Against Major Diseases (CAMD) Alzheimer's disease database, which contains data on over 6,000 subjects from more than 20 studies of Alzheimer's disease and Mild Cognitive Impairment. Additionally, a CODR database for Parkinson's disease has been constructed and is ready to receive SDTM-standardized clinical trials data.
The challenge is based on a database of patient information- he PRO ACT database. The PROACT database contains 7500 patients record from Industry and academic clinical trials. Once the challenge is over, we will publish the database for everyone to use.
In ALS, the motor neurons in the brain and spinal cord that control movement progressively degenerate and die leading to rapid, progressive paralysis, difficulty speaking and swallowing, but the mind remains fully aware to the endDeath within 2-5 years from diagnosis (usually respiratory failure) but few patients,5-10% like famous physicist Stephen Hawking, has survived for almost 50 years. Gehrig is a fast-progressor, while Hawking is a slow progressor. Hawking is outlier – 49 years. Next longest was 21 years, then 17. really, slow progressor means 10 years. There is currently no way to predict how the disease will progress in each patient, and the extreme variabillity makes it very hard to for clinical trials.Affected men and women primarily of ages 40-70 but can strike at any adult age.no racial, ethnic or socioeconomic boundaries. Unknown cause - majority of patients (90%) have no family history. The majority of patients (90%) have no family history of ALS (sporadic disease); The remaining 10% of persons with ALS have a close second family member with ALS, which is referred to as familial ALS (FALS). There is a wealthy of information on the physiology of the disease (what is happening to the body and why) but it is not understood what is the primary trigger. At point where symptoms emerge, usually a bunch of damage has already been done – maybe 30% of the motor neurons are dead. Initial symptoms are generally weakness of some kind, in one body part, usually an arm or leg. Really weakness, it is not like you wake up paralyzed. And initial progression is generally slow over the first year. Usually that time is spent with your GP, kind of trying to figure out why you have this complaint. Usually you hit the neurologist because the weakness extends to a second body part. In a few cases it is “bulbar onset” – it hits the throat first and you have trouble speaking. So initial slope of progression is slow, then becomes very fast, then becomes slow at the end.
30K in US, 600K worldwide, Prize4Life was established with a goal to overcome these challenges and accelerate therapy development for ALSThe high cost of clinical trials limits drug companies’ ability to test potential treatments. Researchers must recruit 100-200patients and run trials that last as long as 2 years just to eliminate a drug from the running. Costs can be reduced by: Strengthening the package on the preclinical side of the gapDecreasing the cost on the clinical side of the gapIncreasing probability of success (lowering risk) on the clinical side of the gapPrize4Life is focused on addressing these challenges to lower the risk for companies and to accelerate therapy development.
We are offering at $25k prize for a predictive algorithm of ALS disease progression based on a large database of clinical trial data that we have recently compiled and that will be open to the public in Dec 2012.
DREAM (Dialogues for reverse engineering assessment and methods), they are a leading force in bioinformatic and system biology data challenges, running 4-5 of them annually for the last 7 years and have produced a wealth of publications related to reverse engineering of cellular network. They are evangelists for data sharing and collaboration. Importantly they also aim to look at the value of collaboration and adding up of different approaches and how this often lead to a breakthrough in computational ability, thus developing tools for future collaborations.
Ot finer details part one:What you will need to do is devise an algorithm,- you will have 900 patients to train on. You will need to use information from the first 3 months of data to predict the progression over the next 9 months.Progression is measured as a change in a neurological scale called ASLFRS (ALS functional rating scale). ALSFRS consist of 10 questions assessing everyday activity just as strength of hand grip, ability to walk etc. each question gets a score of 0-4, with 4 being normal and 0 being completely dysfunctional. Overall the total number ranges between 0 to 40, with a lower number referring to worse performance. The slope of change is the value at 12 month, minus the value at 3 month, divided by 9. This is what you need to predict.Wht do you have to predict with? demographics, family history, medications, symptoms, lab results(blood and urine), all available over a years time
RMSD root mean square deviation (deviation between their slope and the real slope).