Ayasdi with IHME data

Data has shape
and shape has
meaningTM
2

• Overview of IRIS from Ayasdi
• A tool for looking at large datasets and trying to find meaning
• Walking through an example of an Ayasdi analysis
Outline
3

• We are gathering more data all the time
What IRIS is for…
4

…and while data are often collected to address specific questions, the data
may also hold additional insights
5
CD
+Stim, Ab
Baseline
“There isn’t a single story happening in your complex data” – Anthony Bak, Ayasdi

• IRIS combines topological math with a highly flexible and intuitive interface to
analyze large datasets
• Creates different shapes that can be explored
• Ayasdi can be used on different kinds of high complexity datasets
• Transcriptome profiling
• Clinical data
• Flow cytometry data
• Financial data
• Text
• Etc.
That’s where we think IRIS from Ayasdi will help
6

• Concept is: data has shape based on how elements in the datasets are mathematically
related to each other
• For example, how are samples alike?
• IRIS takes the data, performs a mathematical transformation, and uses the output to
group samples together and draw a picture
• This is done iteratively with different mathematical transformations to give multiple
different views of the data’s shapes
• The shapes highlight possibly interesting parts of the dataset
• In our case, disease or patient subsets
How does IRIS work?
7

The problem of having a liberal arts education…
9
Platonic ideal
of chair

What an IRIS analysis looks like
10
3 different shapes
made from the
same data

Explaining the parts
11
Dots represent
groups of
samples that
are similar to
each other
Connecting lines
represent at
least one shared
member
between groups
Features like
this arm on the
shape can be
examined in
further detail
Coloring (red=high to blue=low) can be
based on initial math or annotations (ie,
gender, disease), gene expression, etc.

• Groups and shapes area analyzed and interpreted
• We try to understand what underlies the shapes and forms that arise
• Link back to biology, patients, effect
• Learn new insights
• Create hypotheses, test on the fly,
• Iterate
• Next several slides will be an example of an IRIS analysis and insights
How does an IRIS analysis proceed?
12

• Institute for Health Metrics and Evaluation (IHME)
• Performed survey of smoking prevalence worldwide, from 1980-2012
• 187 countries
• Dataset contains smoking frequency broken down by age, gender, year
• 518 columns, 187 rows
• Some reasons to look at this data:
• Practice—and IRIS workflow is pretty much the same for any dataset
• Using non-gene expression data
• Smoking is a risk factor for RA, diabetes, etc.
Example analysis: Smoking prevalence
13

These were derived from the IHME data
14
Thinking like an
analyst: what do
different parts of
shapes mean?
There’s a lot to
potentially explore

Start with this basic shape:
15
What are these
two groups?
Upper arm
Lower arm
Certain mathematical transformations often create this antibody shape in large
datasets

First step: define groups and do numerical and categorical comparison to
rest of shape
16
Lower arm categorical table
Column Name Value
Percent in
Group 1
Percent in Both
Group 1 and
Group 2
Count in Group
1
Count in Both
Group 1 and
Group 2 p-value
ISOsubregion 35 0.27 0.06 6 11 4.23E-04
Developing Yes 1.00 0.73 22 137 6.48E-04
ISOsubregion 14 0.27 0.09 6 17 0.006991494
Annualized Rate of Change
(%) Male and Female 1980
to 2012 -0.5 0.18 0.04 4 8 0.007475094
Annualized Rate of Change
(%) Male and Female 1980
to 2012 -0.7 0.18 0.05 4 10 0.019024382
ISOregion 2 0.45 0.27 10 50 0.035708684
Bangladesh
Burkina Faso
Burundi
Cambodia
Djibouti
Federated States of Micronesia
Ghana
Guinea-Bissau
Indonesia
Jamaica
Laos
Malawi
Maldives
Myanmar
Namibia
Paraguay
Philippines
Rwanda
Somalia
Sri Lanka
Thailand
Zimbabwe
Southeastern Asia
Eastern Africa

Highlighting lower arm countries on a map
17
Some
geographical
clustering

Now looking at numerical annotations
18
Column Name KS Statistic KS p-value T-test p-value Group 1 Mean - Group 2 Mean KS Sign
Smoking Prevalence (%) Age 80+ 1997 0.62 4.83578E-07 3.79979E-05 6.960909091 +
Smoking Prevalence (%) Age 75 2004 0.57 5.51162E-06 1.50199E-05 7.676363636 +
Ranking by one of
their built in
statistics, see
quickly that data
columns largely
reflect smoking
prevalence among
the elderly

Pick a few years for the 80+ smoking prevalence to graph boxplots
19
Okay, so confirming
insights: we’re looking
at a subset of countries
that have a high rate of
smoking in the elderly.
Note that Upper Arm
group has a
substantially lower rate

Other countries
have high rates in
the elderly; and
within the lower
arm group, some
have relatively
low rates
So we’ve found a
subpopulation
But that’s not the whole story
20
Country
Lower arm
group
Smoking Prevalence
(%) Age 80+ 2000 Country
Lower arm
group
Smoking Prevalence
(%) Age 80+ 2000
Pakistan no 34 Laos yes 29.4
Tonga no 25.2 Myanmar yes 26.4
Kiribati no 24.4 Namibia yes 23.3
Nepal no 23.8 Bangladesh yes 21.8
Lebanon no 22.2 Cambodia yes 20
Timor-Leste no 18.8 Indonesia yes 18.1
Denmark no 17.1 Federated States of Micronesia yes 17.6
Tunisia no 16.4 Philippines yes 15.8
Jordan no 16.2 Paraguay yes 14.5
Lesotho no 15.9 Malawi yes 14.4
South Korea no 15.9 Djibouti yes 14.3
Malaysia no 15.8 Zimbabwe yes 13.7
Dominican Republic no 15 Thailand yes 13
Vanuatu no 14.5 Maldives yes 12.5
Palestine no 14.2 Sri Lanka yes 11.2
Vietnam no 13.9 Burkina Faso yes 11
Cyprus no 13.7 Burundi yes 9.7
Samoa no 13.6 Rwanda yes 8.7
Albania no 13.4 Somalia yes 8.5
Mongolia no 13.1 Ghana yes 7.9
South Africa no 13.1 Jamaica yes 7.6
China no 13 Guinea-Bissau yes 7.5

• Many directions to go here
• In IRIS
• persistence of group
• Co-occurrence with other annotations beyond “developing”
• Outside of IRIS
• Once you know a subgroup exists, statistical analyses
• Visualization techniques such as heatmaps
What are the characteristics that define that subpopulation?
21

Persistence (or not) of subgroup integrity across shapes and analyses
22
From this we can go back to
the mathematical
transformations used to
make each set of shapes
and find clues to what is
driving this group to stay
together in some shapes
but not others

Overlay of different kinds of information
23
Comparison of developing
country status suggests
two groups we could
compare to look for
additional insights
Annualized rate of change
between 1980-1996 is
another annotation we
could look into more
Developing = no
Developing = yesPopulation
Ann rate of change 1980-96

Comparing the two developing world enriched groups
24
• Found differences between older age smoking prevalence—lower arm group
has higher rate
• We already knew that
• Also found differences in 10yr old smoking prevalence—lower arm group has
lower rate
• We didn’t know that…

10 year old smoking prevalence
25
1980
20102000
1990 Smoking in kids
consistently low in the
lower arm group.
Suggests for public health
intervention for these
countries--need to confirm
pattern and, if it confirms,
look at transition from non-
smoking to smoking and
when that happens

Looking more closely at Annualized rate of change
26
Ann rate of change 1980-96 Ann rate of change 2006-2012
Ann rate of change 1980-2012 Ann rate of change 1996-2006 Suggestion that lower arm
group had relatively less
decrease in overall smoking
rates in the 80s and 90s,
but rate of decrease began
to pickup in the 2000s,
relative to other countries
From a Public Health
standpoint, now go back
and ask what kinds of
smoking cessation
interventions were put in
place in the 2000s

Ayasdi with IHME data

Recommended

Recommended

More Related Content

Similar to Ayasdi with IHME data

Similar to Ayasdi with IHME data (20)

Ayasdi with IHME data

Editor's Notes