This document discusses analyzing data from a study where rats were exposed to different doses of a potential carcinogen. Rats were followed until a tumor developed or 104 days. The study aims to determine if exposed rats develop tumors faster than unexposed rats. It discusses challenges in analyzing time-to-event data like censoring and lack of symmetry. Key methods covered include hazard ratios, log-rank tests, and comparing tumor rates between groups while accounting for varying follow-up times.
Due to the advancements in various data acquisition and storage technologies, different disciplines have attained the ability to not only accumulate a wide variety of data but also to monitor observations over longer time periods. In many real-world applications, the primary objective of monitoring these observations is to estimate when a particular event of interest will occur in the future. One of the major difficulties in handling such problem is the presence of censoring, i.e., the event of interests is unobservable in some instance which is either because of time limitation or losing track. Due to censoring, standard statistical and machine learning based predictive models cannot readily be applied to analyze the data. An important subfield of statistics called survival analysis provides different mechanisms to handle such censored data problems. In addition to the presence of censoring, such time-to-event data also encounters several other research challenges such as instance/feature correlations, high-dimensionality, temporal dependencies, and difficulty in acquiring sufficient event data in a reasonable amount of time. To tackle such practical concerns, the data mining and machine learning communities have started to develop more sophisticated and effective algorithms that either complement or compete with the traditional statistical methods in survival analysis. In spite of the importance of this problem and relevance to real-world applications, this research topic is scattered across various disciplines. In this tutorial, we will provide a comprehensive and structured overview of both statistical and machine learning based survival analysis methods along with different applications. We will also discuss the commonly used evaluation metrics and other related topics. The material will be coherently organized and presented to help the audience get a clear picture of both the fundamentals and the state-of-the-art techniques.
Due to the advancements in various data acquisition and storage technologies, different disciplines have attained the ability to not only accumulate a wide variety of data but also to monitor observations over longer time periods. In many real-world applications, the primary objective of monitoring these observations is to estimate when a particular event of interest will occur in the future. One of the major difficulties in handling such problem is the presence of censoring, i.e., the event of interests is unobservable in some instance which is either because of time limitation or losing track. Due to censoring, standard statistical and machine learning based predictive models cannot readily be applied to analyze the data. An important subfield of statistics called survival analysis provides different mechanisms to handle such censored data problems. In addition to the presence of censoring, such time-to-event data also encounters several other research challenges such as instance/feature correlations, high-dimensionality, temporal dependencies, and difficulty in acquiring sufficient event data in a reasonable amount of time. To tackle such practical concerns, the data mining and machine learning communities have started to develop more sophisticated and effective algorithms that either complement or compete with the traditional statistical methods in survival analysis. In spite of the importance of this problem and relevance to real-world applications, this research topic is scattered across various disciplines. In this tutorial, we will provide a comprehensive and structured overview of both statistical and machine learning based survival analysis methods along with different applications. We will also discuss the commonly used evaluation metrics and other related topics. The material will be coherently organized and presented to help the audience get a clear picture of both the fundamentals and the state-of-the-art techniques.
An illustration of the usefulness of the multi-state model survival analysis ...cheweb1
This seminar will demonstrate the potential of multi-state survival modeling (MSM) as a tool for decision analytic modelling and compare it to the usual Markov transition modelling approach. After briefly reviewing examples of MSM in the health economics literature, a technology appraisal submitted to NICE evaluating the cost effectiveness of Rituximab for first line treatment of chronic lymphocytic leukaemia will be used for illustration purposes. Finally, areas of future research will be outlined.
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...Waqas Tariq
The Kaplan Meier method is used to analyze data based on the survival time. In this paper used Kaplan Meier procedure and Cox regression with these objectives. The objectives are finding the percentage of survival at any time of interest, comparing the survival time of two studied groups and examining the effect of continuous covariates with the relationship between an event and possible explanatory variables. The variables (Age, Gender, Weight, Drinking, Smoking, District, Employer, Blood Group) are used to study the survival patients with cancer stomach. The data in this study taken from Hiwa/Hospital in Sualamaniyah governorate during the period of (48) months starting from (1/1/2010) to (31/12/2013) .After Appling the Cox model and achieve the hypothesis we estimated the parameters of the model by using (Partial Likelihood) method and then test the variables by using (Wald test) the result show that the variables age and weight are influential at the survival of time.
Introduction to survival analysis Providing intuition of hazard function, survival function, cumulative failure function. Life table, KM and log-rank test
Dr Vivek Baliga - The Basics Of Medical StatisticsDr Vivek Baliga
Medical statistics can be daunting. Understanding them is essential to understand any research paper. Here are some basic in medical statistics by Dr Vivek Baliga, Consultant Internal Medicine, Bangalore. Read more by Dr Vivek Baliga at http://drvivekbaliga.net
Mathematics in Epidemiology and Biostatistics (Medical Booklet Series by Dr. ...Dr. Aryan (Anish Dhakal)
Basic mathematics needed for epidemiology and bio statistics. Slides include formulas and conceptual understanding of sensitivity, specificity, predictive values, likelihood ratios, odds, probability and many more.
Various designs of observational studies (prospective, retrospective, and cross-sectional) and analytical studies (clinical trials and laboratory experiments), and guidelines to choose appropriate sample size
In order to understand medical statistics, you have to learn the very basic concepts as mean, median, and standard deviation. interpretation and understanding of clinical study results depends mainly on statistical background.
THIS POWERPOINT EXPLAINS ABOUT HYPOTHESIS AND ITS TYPES, ROLE OF HYPOTHESIS,TEST OF SIGNIFICANCE AND PROCEDURE FOR TESTING A HYPOTHESIS, TYPE I AND TYPE ii ERROR
A presentation meant for non-statisticians on statistics and general statistical analysis. Basically provides a short overview of the processes involved in data collection, storage, hypothesis generation and statistical analysis. It does not deal with bayesian statistics. Presented at PRODVANCE 2016 Ahmedabad
An illustration of the usefulness of the multi-state model survival analysis ...cheweb1
This seminar will demonstrate the potential of multi-state survival modeling (MSM) as a tool for decision analytic modelling and compare it to the usual Markov transition modelling approach. After briefly reviewing examples of MSM in the health economics literature, a technology appraisal submitted to NICE evaluating the cost effectiveness of Rituximab for first line treatment of chronic lymphocytic leukaemia will be used for illustration purposes. Finally, areas of future research will be outlined.
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...Waqas Tariq
The Kaplan Meier method is used to analyze data based on the survival time. In this paper used Kaplan Meier procedure and Cox regression with these objectives. The objectives are finding the percentage of survival at any time of interest, comparing the survival time of two studied groups and examining the effect of continuous covariates with the relationship between an event and possible explanatory variables. The variables (Age, Gender, Weight, Drinking, Smoking, District, Employer, Blood Group) are used to study the survival patients with cancer stomach. The data in this study taken from Hiwa/Hospital in Sualamaniyah governorate during the period of (48) months starting from (1/1/2010) to (31/12/2013) .After Appling the Cox model and achieve the hypothesis we estimated the parameters of the model by using (Partial Likelihood) method and then test the variables by using (Wald test) the result show that the variables age and weight are influential at the survival of time.
Introduction to survival analysis Providing intuition of hazard function, survival function, cumulative failure function. Life table, KM and log-rank test
Dr Vivek Baliga - The Basics Of Medical StatisticsDr Vivek Baliga
Medical statistics can be daunting. Understanding them is essential to understand any research paper. Here are some basic in medical statistics by Dr Vivek Baliga, Consultant Internal Medicine, Bangalore. Read more by Dr Vivek Baliga at http://drvivekbaliga.net
Mathematics in Epidemiology and Biostatistics (Medical Booklet Series by Dr. ...Dr. Aryan (Anish Dhakal)
Basic mathematics needed for epidemiology and bio statistics. Slides include formulas and conceptual understanding of sensitivity, specificity, predictive values, likelihood ratios, odds, probability and many more.
Various designs of observational studies (prospective, retrospective, and cross-sectional) and analytical studies (clinical trials and laboratory experiments), and guidelines to choose appropriate sample size
In order to understand medical statistics, you have to learn the very basic concepts as mean, median, and standard deviation. interpretation and understanding of clinical study results depends mainly on statistical background.
THIS POWERPOINT EXPLAINS ABOUT HYPOTHESIS AND ITS TYPES, ROLE OF HYPOTHESIS,TEST OF SIGNIFICANCE AND PROCEDURE FOR TESTING A HYPOTHESIS, TYPE I AND TYPE ii ERROR
A presentation meant for non-statisticians on statistics and general statistical analysis. Basically provides a short overview of the processes involved in data collection, storage, hypothesis generation and statistical analysis. It does not deal with bayesian statistics. Presented at PRODVANCE 2016 Ahmedabad
Observational study is divided into descriptive and analytical studies.
Non-experimental
Observational because there is no individual intervention
Treatment and exposures occur in a “non-controlled” environment
Individuals can be observed prospectively or retrospectively
COHORT STUDY- an “observational” design comparing individuals with a known risk factor or exposure with others without the risk factor or exposure.
looking for a difference in the risk (incidence) of a disease over time.
best observational design
data usually collected prospectively (some retrospective)
CASE CONTROL - EFFECT TO CAUSE
Retrospective
When disease is rare
.
Study of the distribution and determinants of
health-related states or events in specified populations and the application of this study to control health problems.
John M. Last, Dictionary of Epidemiology
What are the five critical elements ensuring the program planning success?
1) Mobilizing the community
2) Collecting and organizing data
3) Choosing health priorities
4) Developing a comprehensive intervention plan
5) Evaluating PATCH
The four Multiple Determinants of Chronic Disease?
1) Behavioral determinants
2) Healthcare determinants
3) Environmental determinants
4) Social determinants.
What is Epidemiology?
distribution and determinants of health-related states in specified populations, and the application of this study to the control of health problems
compare between person analyzes and Time analyses?
Person: distribution of a disease or condition varies in the population according to personal characteristics, such as age, race, or gender
Time: surveillance systems monitor the trends in occurrence of chronic disease rates through utilizing the epidemic curve to detect outbreaks
4 elements for Health Believe Model
1) Perceived suscssibility
2) Perceived severity
3) Perceived benefits
4) Perceived barrier
5) Cuss action
6) Self-efficacy
cause of tobacco use?
1) Societal and individual factors
2) Advertising and promotion (tobacco” Safer)
3) Access
4) Social norms
5) Individual psychosocial factors
6) Continued tobacco use
7) Inadequate understanding
8) Lower price
elements of a chronic disease surveillance system:
1) Notifiable Disease Systems
2) statistics vital
3) Sentinel Surveillance
4) Chronic Disease Registries
5) Health Surveys
6) Administrative Data Collection Systems
7) Census Data
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Rats Data
Rats treated with a potential
carcinogen
Control rats – no exposure
Different doses
Low dose
Medium dose
High dose
Rats followed until a tumor developed or
until day 104 (15 weeks)
3. Rats Data
Experiment terminated at 104 days
and any tumor free rats are sacrificed
Censoring
If a rat escapes or is hurt during
follow-up, follow-up is terminated
Censoring
Question – do exposed rats develop
tumors at a faster rate than
unexposed?
4. Time-to-Event Data
Many different types of studies collect data
on the length of time until a particular
event happens
Time to heart attack or stroke in the
Framingham Heart Study
Time to AIDS in treatment studies with HIV-
infected persons
Time to return to work following a workplace
injury
Time to disease onset in hamsters following
exposure to C. difficile
5. Time-to-Event Data
Time-to-event data has 2
characteristics that complicate its
analysis
Times to event are not symmetrically
distributed
Not every person/hamster/rat/specimen
has the event under observation
6. Lack of Symmetry
4 6 8 10 12 14 16
01020304050
temp.norm
4 6 8 10 12 14 16 18
020406080100
temp.logs
0 10 20 30 40 50 60
050100150
temp.exp
0 5 10 15 20 25
010203040 temp.weib
The top 2 bar
charts show
samples from
symmetric
distributions
The bottom 2 bar
charts show
samples from
right-skewed
distributions
7. Lack of Symmetry
When data aren’t symmetric the usual
statistics don’t work well
The mean is no longer in the center of the
sample
It is shifted to the right
The standard deviation isn’t very useful
Measuring one standard deviation to the right
from the mean gives a lower percentage of the
population than measuring one standard
deviation to the left
9. Lack of Symmetry
Ways to cope with a lack of symmetry
Transform the data using a log or square root
This tends to make the distribution more
symmetric
Use robust or non-parametric methods
Medians instead of means
Inter-quartile ranges instead of standard
deviations
Rank-based Wilcoxon tests instead of t-tests
10. Lack of Symmetry
In survival analysis, there is an
emphasis on use of non-parametric
methods because of the skewness
However, it is also possible to use
non-symmetric distributions to cope
with the lack of symmetry
Exponential distribution
Weibull distribution
11. Censoring
That not every
person/hamster/rat/specimen has the
event under observation is a much
harder problem
This is called censoring
Censoring is what prevents us from
being able to use standard methods
on time-to-event data
12. Censoring
Censoring creates a difference
between
What we get to see on everyone – the
follow-up time
Time to the event or censoring
What we want to see on everyone – the
time to event
13. Censoring
The follow-up times are always less
than or equal to the time-to-event
This is a big problem
The follow-up times are systematically
too short
The mean follow-up time is less than the
mean time-to-event
The median follow-up time is less than
the median time-to-event
14. Censoring
The figure shows a
study with 7 subjects.
Circles show when
subjects had events
and squares show
when they were
censored. The solid
blue lines are the
follow-up times, and
the red dotted lines
are the times from
censoring until the
event.
median follow-up time is 2.5 years
median survival time is 4.0 years
15. Censoring
The more censoring there is, the
worse the discrepancy between
follow-up time and time to event
Need to remove the effect of
censoring from the follow-up times to
get a sense of time-to-event
This is what survival analysis is all
about
16. Hazard
In longitudinal studies, data is collected on
a sample of individuals over time
Often we are interested in the occurrence
of an important event in our sample
Death
Heart attack
Tumor
17. Risk
If our interest is on whether subjects
ever experience the event in the
study
Analysis of risk of the event
Based on the proportion of subjects
who have the event
Unadjusted analysis done using
contingency tables and chi-square tests
Adjusted analysis using logistic
regression
18. Hazard
If our interest is on how long until
subjects experience the event in the
study
Analysis of the hazard of the event
Based on the rate of occurrence of the
event
The difference between risk and hazard
is that the hazard incorporates time
19. Hazard
A risk analysis is based on the
number of events per person
A hazard analysis is based on the
number of events per person-year (or
other person-time measure)
A hazard analysis would be more
appropriate if the length of follow-up
varies
20. Hazard
Risk: 2 out of 6
subjects have the
event (33%)
Hazard: 2 events
per 38 person-
weeks (0.05 events
per person-week)
21. Censoring
In the example we just saw, one subject withdrew
and did not complete the study or have the event
Some other subjects completed the study but never
had the event
In a hazard analysis, both types of subject are
considered censored and their follow-up time is used
in the denominator
In a risk analysis, incorporating the first type of
subject requires some arbitrary rules
Restricting analysis to subjects who do not
withdraw
Assigning the event (or absence of event) to
subjects who withdraw
22. Hazard
A person-years analysis is fine as long
as we assume that the hazard rate is
constant
Constant hazard -> exponential
distribution
Most modern survival analysis is done in
a way to avoid assuming that the hazard
rate is constant over time
Makes the notation and terminology more abstract
Analyses are more complicated than taking the ratio
of events to person-time
23. Hazard
To move away from having a
constant hazard over time
To calculate the hazard at a particular
time t
Look only in subjects eligible to have the
event at t
The hazard at t is the person-time event
rate for a very short period of time
starting at t (limit as the period of time
shrinks to 0)
24. Hazard Ratio
If we follow two groups over time, we
may want to compare their hazard
rates
The hazard ratio is the hazard (at
time t) in group 1 divided by the
hazard (at time t) in group 2
( )
( )
( )
1
2
Group hazard t
Hazard Ratio t
Group hazard t
=
25. Hazard Ratio
The hazard ratio has similar
construction to the relative risk,
Except that the hazards can change
with time
To simplify the models, we often
assume Proportional Hazards
i.e. the hazard ratio is constant over
time
26. Hazard Ratio
Under proportional hazards
Although the hazards within each group
may change over time
The ratio of the hazards between groups
stays the same over time
( )
( )
1
2
Group hazard t
Hazard Ratio
Group hazard t
=
28. Testing Survival Difference
Nonparametric tests are typically
used to test for survival differences
between groups
Logrank test – best test if the hazards
are proportional between groups
Gives equal weight to events that happen
throughout follow-up
Wilcoxon test – gives more weight to
events early in follow-up
29. Testing Survival Difference
Parametric test assuming exponential
survival is also sometimes used
Best test if the hazard is constant within
the groups being compared
Goes along with the person-time analysis
36. Survival Comparison
Comparing the 4 groups
Logrank test p=0.095
Wilcoxon test p=0.311
Exponential test p=0.171
So, we don’t find a difference between
groups that couldn’t be attributed to
chance
But the logrank result is suggestive that
with a larger experiment we might attain
statistical significance
37. Analyzing Time-to-Event
Outcomes
Becoming more and more common
Challenge mostly due to censoring
Most basic analysis – exponential
hazard is estimated by
#events/total person-time
Modern methods avoid assuming
exponential data
Logrank test