SlideShare a Scribd company logo
1 of 26
Data has shape
and shape has
meaningTM
2
• Overview of IRIS from Ayasdi
• A tool for looking at large datasets and trying to find meaning
• Walking through an example of an Ayasdi analysis
Outline
3
• We are gathering more data all the time
What IRIS is for…
4
…and while data are often collected to address specific questions, the data
may also hold additional insights
5
CD
+Stim, Ab
Baseline
“There isn’t a single story happening in your complex data” – Anthony Bak, Ayasdi
• IRIS combines topological math with a highly flexible and intuitive interface to
analyze large datasets
• Creates different shapes that can be explored
• Ayasdi can be used on different kinds of high complexity datasets
• Transcriptome profiling
• Clinical data
• Flow cytometry data
• Financial data
• Text
• Etc.
That’s where we think IRIS from Ayasdi will help
6
• Concept is: data has shape based on how elements in the datasets are mathematically
related to each other
• For example, how are samples alike?
• IRIS takes the data, performs a mathematical transformation, and uses the output to
group samples together and draw a picture
• This is done iteratively with different mathematical transformations to give multiple
different views of the data’s shapes
• The shapes highlight possibly interesting parts of the dataset
• In our case, disease or patient subsets
How does IRIS work?
7
8
From Ayasdi
The problem of having a liberal arts education…
9
Platonic ideal
of chair
What an IRIS analysis looks like
10
3 different shapes
made from the
same data
Explaining the parts
11
Dots represent
groups of
samples that
are similar to
each other
Connecting lines
represent at
least one shared
member
between groups
Features like
this arm on the
shape can be
examined in
further detail
Coloring (red=high to blue=low) can be
based on initial math or annotations (ie,
gender, disease), gene expression, etc.
• Groups and shapes area analyzed and interpreted
• We try to understand what underlies the shapes and forms that arise
• Link back to biology, patients, effect
• Learn new insights
• Create hypotheses, test on the fly,
• Iterate
• Next several slides will be an example of an IRIS analysis and insights
How does an IRIS analysis proceed?
12
• Institute for Health Metrics and Evaluation (IHME)
• Performed survey of smoking prevalence worldwide, from 1980-2012
• 187 countries
• Dataset contains smoking frequency broken down by age, gender, year
• 518 columns, 187 rows
• Some reasons to look at this data:
• Practice—and IRIS workflow is pretty much the same for any dataset
• Using non-gene expression data
• Smoking is a risk factor for RA, diabetes, etc.
Example analysis: Smoking prevalence
13
These were derived from the IHME data
14
Thinking like an
analyst: what do
different parts of
shapes mean?
There’s a lot to
potentially explore
Start with this basic shape:
15
What are these
two groups?
Upper arm
Lower arm
Certain mathematical transformations often create this antibody shape in large
datasets
First step: define groups and do numerical and categorical comparison to
rest of shape
16
Lower arm categorical table
Column Name Value
Percent in
Group 1
Percent in Both
Group 1 and
Group 2
Count in Group
1
Count in Both
Group 1 and
Group 2 p-value
ISOsubregion 35 0.27 0.06 6 11 4.23E-04
Developing Yes 1.00 0.73 22 137 6.48E-04
ISOsubregion 14 0.27 0.09 6 17 0.006991494
Annualized Rate of Change
(%) Male and Female 1980
to 2012 -0.5 0.18 0.04 4 8 0.007475094
Annualized Rate of Change
(%) Male and Female 1980
to 2012 -0.7 0.18 0.05 4 10 0.019024382
ISOregion 2 0.45 0.27 10 50 0.035708684
Bangladesh
Burkina Faso
Burundi
Cambodia
Djibouti
Federated States of Micronesia
Ghana
Guinea-Bissau
Indonesia
Jamaica
Laos
Malawi
Maldives
Myanmar
Namibia
Paraguay
Philippines
Rwanda
Somalia
Sri Lanka
Thailand
Zimbabwe
Southeastern Asia
Eastern Africa
Highlighting lower arm countries on a map
17
Some
geographical
clustering
Now looking at numerical annotations
18
Column Name KS Statistic KS p-value T-test p-value Group 1 Mean - Group 2 Mean KS Sign
Smoking Prevalence (%) Age 80+ 1997 0.62 4.83578E-07 3.79979E-05 6.960909091 +
Smoking Prevalence (%) Age 80+ 2000 0.62 4.83578E-07 2.55956E-05 7.112424242 +
Smoking Prevalence (%) Age 80+ 1999 0.62 6.72238E-07 2.9015E-05 7.072121212 +
Smoking Prevalence (%) Age 80+ 2001 0.62 6.72238E-07 2.5208E-05 7.133030303 +
Smoking Prevalence (%) Age 80+ 2002 0.62 6.72238E-07 2.38392E-05 7.140909091 +
Smoking Prevalence (%) Age 80+ 1996 0.61 9.31143E-07 4.89306E-05 6.880909091 +
Smoking Prevalence (%) Age 80+ 1998 0.61 9.31143E-07 3.31192E-05 7.008787879 +
Smoking Prevalence (%) Age 80+ 2003 0.61 9.31143E-07 2.36669E-05 7.144242424 +
Smoking Prevalence (%) Age 80+ 1995 0.58 3.66511E-06 5.92711E-05 6.813030303 +
Smoking Prevalence (%) Age 80+ 2004 0.58 4.98014E-06 2.33953E-05 7.080606061 +
Smoking Prevalence (%) Age 75 2004 0.57 5.51162E-06 1.50199E-05 7.676363636 +
Smoking Prevalence (%) Age 75 2008 0.57 5.51162E-06 2.02097E-05 7.436666667 +
Smoking Prevalence (%) Age 75 2009 0.57 5.51162E-06 2.04579E-05 7.365151515 +
Smoking Prevalence (%) Age 75 2011 0.57 6.09737E-06 2.0317E-05 7.224545455 +
Smoking Prevalence (%) Age 75 2012 0.57 6.09737E-06 1.89945E-05 7.184242424 +
Smoking Prevalence (%) Age 80+ 2005 0.57 6.09737E-06 2.25215E-05 7.026363636 +
Smoking Prevalence (%) Age 75 2003 0.57 7.45331E-06 1.28236E-05 7.777878788 +
Smoking Prevalence (%) Age 75 2005 0.57 7.45331E-06 1.61689E-05 7.576666667 +
Smoking Prevalence (%) Age 75 2006 0.57 7.45331E-06 1.84185E-05 7.536969697 +
Smoking Prevalence (%) Age 75 2007 0.57 7.45331E-06 1.94395E-05 7.496666667 +
Smoking Prevalence (%) Age 75 2010 0.57 7.45331E-06 2.08264E-05 7.294848485 +
Smoking Prevalence (%) Age 80+ 2012 0.57 7.45331E-06 3.11246E-05 6.652121212 +
Smoking Prevalence (%) Age 80+ 1994 0.56 8.23553E-06 6.50367E-05 6.795151515 +
Smoking Prevalence (%) Age 80+ 2007 0.56 8.23553E-06 2.66895E-05 6.890909091 +
Smoking Prevalence (%) Age 75 2002 0.56 1.00428E-05 1.19239E-05 7.858484848 +
Smoking Prevalence (%) Age 80+ 2011 0.56 1.00428E-05 3.17879E-05 6.670606061 +
Smoking Prevalence (%) Age 80+ 2006 0.56 1.10835E-05 2.3874E-05 6.958181818 +
Smoking Prevalence (%) Age 80+ 2010 0.55 1.22271E-05 3.14422E-05 6.696666667 +
Ranking by one of
their built in
statistics, see
quickly that data
columns largely
reflect smoking
prevalence among
the elderly
Pick a few years for the 80+ smoking prevalence to graph boxplots
19
Okay, so confirming
insights: we’re looking
at a subset of countries
that have a high rate of
smoking in the elderly.
Note that Upper Arm
group has a
substantially lower rate
Other countries
have high rates in
the elderly; and
within the lower
arm group, some
have relatively
low rates
So we’ve found a
subpopulation
But that’s not the whole story
20
Country
Lower arm
group
Smoking Prevalence
(%) Age 80+ 2000 Country
Lower arm
group
Smoking Prevalence
(%) Age 80+ 2000
Pakistan no 34 Laos yes 29.4
Tonga no 25.2 Myanmar yes 26.4
Kiribati no 24.4 Namibia yes 23.3
Nepal no 23.8 Bangladesh yes 21.8
Lebanon no 22.2 Cambodia yes 20
Timor-Leste no 18.8 Indonesia yes 18.1
Denmark no 17.1 Federated States of Micronesia yes 17.6
Tunisia no 16.4 Philippines yes 15.8
Jordan no 16.2 Paraguay yes 14.5
Lesotho no 15.9 Malawi yes 14.4
South Korea no 15.9 Djibouti yes 14.3
Malaysia no 15.8 Zimbabwe yes 13.7
Dominican Republic no 15 Thailand yes 13
Vanuatu no 14.5 Maldives yes 12.5
Palestine no 14.2 Sri Lanka yes 11.2
Vietnam no 13.9 Burkina Faso yes 11
Cyprus no 13.7 Burundi yes 9.7
Samoa no 13.6 Rwanda yes 8.7
Albania no 13.4 Somalia yes 8.5
Mongolia no 13.1 Ghana yes 7.9
South Africa no 13.1 Jamaica yes 7.6
China no 13 Guinea-Bissau yes 7.5
• Many directions to go here
• In IRIS
• persistence of group
• Co-occurrence with other annotations beyond “developing”
• Outside of IRIS
• Once you know a subgroup exists, statistical analyses
• Visualization techniques such as heatmaps
What are the characteristics that define that subpopulation?
21
Persistence (or not) of subgroup integrity across shapes and analyses
22
From this we can go back to
the mathematical
transformations used to
make each set of shapes
and find clues to what is
driving this group to stay
together in some shapes
but not others
Overlay of different kinds of information
23
Comparison of developing
country status suggests
two groups we could
compare to look for
additional insights
Annualized rate of change
between 1980-1996 is
another annotation we
could look into more
Developing = no
Developing = yesPopulation
Ann rate of change 1980-96
Comparing the two developing world enriched groups
24
• Found differences between older age smoking prevalence—lower arm group
has higher rate
• We already knew that
• Also found differences in 10yr old smoking prevalence—lower arm group has
lower rate
• We didn’t know that…
10 year old smoking prevalence
25
1980
20102000
1990 Smoking in kids
consistently low in the
lower arm group.
Suggests for public health
intervention for these
countries--need to confirm
pattern and, if it confirms,
look at transition from non-
smoking to smoking and
when that happens
Looking more closely at Annualized rate of change
26
Ann rate of change 1980-96 Ann rate of change 2006-2012
Ann rate of change 1980-2012 Ann rate of change 1996-2006 Suggestion that lower arm
group had relatively less
decrease in overall smoking
rates in the 80s and 90s,
but rate of decrease began
to pickup in the 2000s,
relative to other countries
From a Public Health
standpoint, now go back
and ask what kinds of
smoking cessation
interventions were put in
place in the 2000s

More Related Content

Similar to Ayasdi with IHME data

ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingOffice for National Statistics
 
Statistics for the Health Scientist: Basic Statistics I
Statistics for the Health Scientist: Basic Statistics IStatistics for the Health Scientist: Basic Statistics I
Statistics for the Health Scientist: Basic Statistics IDrLukeKane
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdfSyedDaniyalKazmi2
 
Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)
Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)
Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)Ipsos Public Affairs
 
Quantitative techniques for
Quantitative techniques forQuantitative techniques for
Quantitative techniques forsmumbahelp
 
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)Ipsos Public Affairs
 
Data and Information Details and Differences
Data and Information Details and DifferencesData and Information Details and Differences
Data and Information Details and DifferencesSaurabh846965
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelingstone55
 
Artificial Intelligence and Machine Learning for business
Artificial Intelligence and Machine Learning for businessArtificial Intelligence and Machine Learning for business
Artificial Intelligence and Machine Learning for businessSteven Finlay
 
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)Ipsos Public Affairs
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdfMarioKopljar1
 
Research Course - RCT.pptx
Research Course - RCT.pptxResearch Course - RCT.pptx
Research Course - RCT.pptxMarioKopljar1
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdfMarioKopljar1
 
classIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxclassIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxXICSStudents
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1gueste87a4f
 
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/12/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker  (02/12/2020)Reuters/Ipsos Core Political Survey: Presidential Approval Tracker  (02/12/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/12/2020)Ipsos Public Affairs
 

Similar to Ayasdi with IHME data (20)

ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
 
Statistics for the Health Scientist: Basic Statistics I
Statistics for the Health Scientist: Basic Statistics IStatistics for the Health Scientist: Basic Statistics I
Statistics for the Health Scientist: Basic Statistics I
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdf
 
Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)
Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)
Reuters/Ipsos Core Political Survey: Congressional Approval Tracker (02/20/2020)
 
Pensions Core Course 2013: Assessing Elderly Welfare and Pension Performance ...
Pensions Core Course 2013: Assessing Elderly Welfare and Pension Performance ...Pensions Core Course 2013: Assessing Elderly Welfare and Pension Performance ...
Pensions Core Course 2013: Assessing Elderly Welfare and Pension Performance ...
 
Quantitative techniques for
Quantitative techniques forQuantitative techniques for
Quantitative techniques for
 
Data Science-final7
Data Science-final7Data Science-final7
Data Science-final7
 
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (03/04/2020)
 
Data and Information Details and Differences
Data and Information Details and DifferencesData and Information Details and Differences
Data and Information Details and Differences
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modeling
 
Artificial Intelligence and Machine Learning for business
Artificial Intelligence and Machine Learning for businessArtificial Intelligence and Machine Learning for business
Artificial Intelligence and Machine Learning for business
 
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/26/2020)
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdf
 
Research Course - RCT.pptx
Research Course - RCT.pptxResearch Course - RCT.pptx
Research Course - RCT.pptx
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdf
 
classIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxclassIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptx
 
Adams2011
Adams2011Adams2011
Adams2011
 
Stat11t chapter1
Stat11t chapter1Stat11t chapter1
Stat11t chapter1
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1
 
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/12/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker  (02/12/2020)Reuters/Ipsos Core Political Survey: Presidential Approval Tracker  (02/12/2020)
Reuters/Ipsos Core Political Survey: Presidential Approval Tracker (02/12/2020)
 

Ayasdi with IHME data

  • 1.
  • 2. Data has shape and shape has meaningTM 2
  • 3. • Overview of IRIS from Ayasdi • A tool for looking at large datasets and trying to find meaning • Walking through an example of an Ayasdi analysis Outline 3
  • 4. • We are gathering more data all the time What IRIS is for… 4
  • 5. …and while data are often collected to address specific questions, the data may also hold additional insights 5 CD +Stim, Ab Baseline “There isn’t a single story happening in your complex data” – Anthony Bak, Ayasdi
  • 6. • IRIS combines topological math with a highly flexible and intuitive interface to analyze large datasets • Creates different shapes that can be explored • Ayasdi can be used on different kinds of high complexity datasets • Transcriptome profiling • Clinical data • Flow cytometry data • Financial data • Text • Etc. That’s where we think IRIS from Ayasdi will help 6
  • 7. • Concept is: data has shape based on how elements in the datasets are mathematically related to each other • For example, how are samples alike? • IRIS takes the data, performs a mathematical transformation, and uses the output to group samples together and draw a picture • This is done iteratively with different mathematical transformations to give multiple different views of the data’s shapes • The shapes highlight possibly interesting parts of the dataset • In our case, disease or patient subsets How does IRIS work? 7
  • 9. The problem of having a liberal arts education… 9 Platonic ideal of chair
  • 10. What an IRIS analysis looks like 10 3 different shapes made from the same data
  • 11. Explaining the parts 11 Dots represent groups of samples that are similar to each other Connecting lines represent at least one shared member between groups Features like this arm on the shape can be examined in further detail Coloring (red=high to blue=low) can be based on initial math or annotations (ie, gender, disease), gene expression, etc.
  • 12. • Groups and shapes area analyzed and interpreted • We try to understand what underlies the shapes and forms that arise • Link back to biology, patients, effect • Learn new insights • Create hypotheses, test on the fly, • Iterate • Next several slides will be an example of an IRIS analysis and insights How does an IRIS analysis proceed? 12
  • 13. • Institute for Health Metrics and Evaluation (IHME) • Performed survey of smoking prevalence worldwide, from 1980-2012 • 187 countries • Dataset contains smoking frequency broken down by age, gender, year • 518 columns, 187 rows • Some reasons to look at this data: • Practice—and IRIS workflow is pretty much the same for any dataset • Using non-gene expression data • Smoking is a risk factor for RA, diabetes, etc. Example analysis: Smoking prevalence 13
  • 14. These were derived from the IHME data 14 Thinking like an analyst: what do different parts of shapes mean? There’s a lot to potentially explore
  • 15. Start with this basic shape: 15 What are these two groups? Upper arm Lower arm Certain mathematical transformations often create this antibody shape in large datasets
  • 16. First step: define groups and do numerical and categorical comparison to rest of shape 16 Lower arm categorical table Column Name Value Percent in Group 1 Percent in Both Group 1 and Group 2 Count in Group 1 Count in Both Group 1 and Group 2 p-value ISOsubregion 35 0.27 0.06 6 11 4.23E-04 Developing Yes 1.00 0.73 22 137 6.48E-04 ISOsubregion 14 0.27 0.09 6 17 0.006991494 Annualized Rate of Change (%) Male and Female 1980 to 2012 -0.5 0.18 0.04 4 8 0.007475094 Annualized Rate of Change (%) Male and Female 1980 to 2012 -0.7 0.18 0.05 4 10 0.019024382 ISOregion 2 0.45 0.27 10 50 0.035708684 Bangladesh Burkina Faso Burundi Cambodia Djibouti Federated States of Micronesia Ghana Guinea-Bissau Indonesia Jamaica Laos Malawi Maldives Myanmar Namibia Paraguay Philippines Rwanda Somalia Sri Lanka Thailand Zimbabwe Southeastern Asia Eastern Africa
  • 17. Highlighting lower arm countries on a map 17 Some geographical clustering
  • 18. Now looking at numerical annotations 18 Column Name KS Statistic KS p-value T-test p-value Group 1 Mean - Group 2 Mean KS Sign Smoking Prevalence (%) Age 80+ 1997 0.62 4.83578E-07 3.79979E-05 6.960909091 + Smoking Prevalence (%) Age 80+ 2000 0.62 4.83578E-07 2.55956E-05 7.112424242 + Smoking Prevalence (%) Age 80+ 1999 0.62 6.72238E-07 2.9015E-05 7.072121212 + Smoking Prevalence (%) Age 80+ 2001 0.62 6.72238E-07 2.5208E-05 7.133030303 + Smoking Prevalence (%) Age 80+ 2002 0.62 6.72238E-07 2.38392E-05 7.140909091 + Smoking Prevalence (%) Age 80+ 1996 0.61 9.31143E-07 4.89306E-05 6.880909091 + Smoking Prevalence (%) Age 80+ 1998 0.61 9.31143E-07 3.31192E-05 7.008787879 + Smoking Prevalence (%) Age 80+ 2003 0.61 9.31143E-07 2.36669E-05 7.144242424 + Smoking Prevalence (%) Age 80+ 1995 0.58 3.66511E-06 5.92711E-05 6.813030303 + Smoking Prevalence (%) Age 80+ 2004 0.58 4.98014E-06 2.33953E-05 7.080606061 + Smoking Prevalence (%) Age 75 2004 0.57 5.51162E-06 1.50199E-05 7.676363636 + Smoking Prevalence (%) Age 75 2008 0.57 5.51162E-06 2.02097E-05 7.436666667 + Smoking Prevalence (%) Age 75 2009 0.57 5.51162E-06 2.04579E-05 7.365151515 + Smoking Prevalence (%) Age 75 2011 0.57 6.09737E-06 2.0317E-05 7.224545455 + Smoking Prevalence (%) Age 75 2012 0.57 6.09737E-06 1.89945E-05 7.184242424 + Smoking Prevalence (%) Age 80+ 2005 0.57 6.09737E-06 2.25215E-05 7.026363636 + Smoking Prevalence (%) Age 75 2003 0.57 7.45331E-06 1.28236E-05 7.777878788 + Smoking Prevalence (%) Age 75 2005 0.57 7.45331E-06 1.61689E-05 7.576666667 + Smoking Prevalence (%) Age 75 2006 0.57 7.45331E-06 1.84185E-05 7.536969697 + Smoking Prevalence (%) Age 75 2007 0.57 7.45331E-06 1.94395E-05 7.496666667 + Smoking Prevalence (%) Age 75 2010 0.57 7.45331E-06 2.08264E-05 7.294848485 + Smoking Prevalence (%) Age 80+ 2012 0.57 7.45331E-06 3.11246E-05 6.652121212 + Smoking Prevalence (%) Age 80+ 1994 0.56 8.23553E-06 6.50367E-05 6.795151515 + Smoking Prevalence (%) Age 80+ 2007 0.56 8.23553E-06 2.66895E-05 6.890909091 + Smoking Prevalence (%) Age 75 2002 0.56 1.00428E-05 1.19239E-05 7.858484848 + Smoking Prevalence (%) Age 80+ 2011 0.56 1.00428E-05 3.17879E-05 6.670606061 + Smoking Prevalence (%) Age 80+ 2006 0.56 1.10835E-05 2.3874E-05 6.958181818 + Smoking Prevalence (%) Age 80+ 2010 0.55 1.22271E-05 3.14422E-05 6.696666667 + Ranking by one of their built in statistics, see quickly that data columns largely reflect smoking prevalence among the elderly
  • 19. Pick a few years for the 80+ smoking prevalence to graph boxplots 19 Okay, so confirming insights: we’re looking at a subset of countries that have a high rate of smoking in the elderly. Note that Upper Arm group has a substantially lower rate
  • 20. Other countries have high rates in the elderly; and within the lower arm group, some have relatively low rates So we’ve found a subpopulation But that’s not the whole story 20 Country Lower arm group Smoking Prevalence (%) Age 80+ 2000 Country Lower arm group Smoking Prevalence (%) Age 80+ 2000 Pakistan no 34 Laos yes 29.4 Tonga no 25.2 Myanmar yes 26.4 Kiribati no 24.4 Namibia yes 23.3 Nepal no 23.8 Bangladesh yes 21.8 Lebanon no 22.2 Cambodia yes 20 Timor-Leste no 18.8 Indonesia yes 18.1 Denmark no 17.1 Federated States of Micronesia yes 17.6 Tunisia no 16.4 Philippines yes 15.8 Jordan no 16.2 Paraguay yes 14.5 Lesotho no 15.9 Malawi yes 14.4 South Korea no 15.9 Djibouti yes 14.3 Malaysia no 15.8 Zimbabwe yes 13.7 Dominican Republic no 15 Thailand yes 13 Vanuatu no 14.5 Maldives yes 12.5 Palestine no 14.2 Sri Lanka yes 11.2 Vietnam no 13.9 Burkina Faso yes 11 Cyprus no 13.7 Burundi yes 9.7 Samoa no 13.6 Rwanda yes 8.7 Albania no 13.4 Somalia yes 8.5 Mongolia no 13.1 Ghana yes 7.9 South Africa no 13.1 Jamaica yes 7.6 China no 13 Guinea-Bissau yes 7.5
  • 21. • Many directions to go here • In IRIS • persistence of group • Co-occurrence with other annotations beyond “developing” • Outside of IRIS • Once you know a subgroup exists, statistical analyses • Visualization techniques such as heatmaps What are the characteristics that define that subpopulation? 21
  • 22. Persistence (or not) of subgroup integrity across shapes and analyses 22 From this we can go back to the mathematical transformations used to make each set of shapes and find clues to what is driving this group to stay together in some shapes but not others
  • 23. Overlay of different kinds of information 23 Comparison of developing country status suggests two groups we could compare to look for additional insights Annualized rate of change between 1980-1996 is another annotation we could look into more Developing = no Developing = yesPopulation Ann rate of change 1980-96
  • 24. Comparing the two developing world enriched groups 24 • Found differences between older age smoking prevalence—lower arm group has higher rate • We already knew that • Also found differences in 10yr old smoking prevalence—lower arm group has lower rate • We didn’t know that…
  • 25. 10 year old smoking prevalence 25 1980 20102000 1990 Smoking in kids consistently low in the lower arm group. Suggests for public health intervention for these countries--need to confirm pattern and, if it confirms, look at transition from non- smoking to smoking and when that happens
  • 26. Looking more closely at Annualized rate of change 26 Ann rate of change 1980-96 Ann rate of change 2006-2012 Ann rate of change 1980-2012 Ann rate of change 1996-2006 Suggestion that lower arm group had relatively less decrease in overall smoking rates in the 80s and 90s, but rate of decrease began to pickup in the 2000s, relative to other countries From a Public Health standpoint, now go back and ask what kinds of smoking cessation interventions were put in place in the 2000s

Editor's Notes

  1. Flow and other immunological data, genomic and transcriptomic data, medical and clinical data, personal monitoring data
  2. RNAseq experiment; primary goal to identify
  3. This made me thing of the analogy of platonic ideals and real-world reflections. Big data is like the ideal, full of all kinds of meaning. Each different take comes from the same ideal and gives its own perspective on underlying structure
  4. IHME is over on 5th Avenue
  5. I’ll be using the very technical terms, “Lower arm” and “Upper arm.”
  6. Here are some initial potential insights. Equatorial countries, cluster in SE Asia, some other in Africa, developing countries.
  7. Blue arrows denote developed, as opposed to developing, nations. Make the point that ahead of time, is it likely someone would have selected this group as being different from other developing nations with high smoking rates in the elderly?
  8. Analogous to finding disease subsets: find patterns that you might not have automatically assumed were there.