SlideShare a Scribd company logo
Principle of Data
Science
VasanthThirugnanam
Principles of Data science
Essential
steps
1. ResearchTopic
2. Research Question
3. Hypothesis
4.Data collection plan
5. Data analysis
6.Data Reporting
Research Question
Hypothesis
Experiment/
Data collection plan
Data Analysis
Conclusion/
Data Reporting
Replication
Principles of Data science
Research
Topic
Example:
First responders long term health is at risk when involved in
combating wildfire for several years.
Can monitoring individual emission exposure, help manage long
term health risks and extend their active life?
A problem or a need statement with a broad area of interest
Majority of First responders suffer from Cardiac Arrest andTrauma
Principles of Data science
Research
Question
A clearly articulated list of specific research question will define the
data types required to collect.
Example:
RQ1. Are toxic emissions negatively associated with long-term health?
RQ2.Are the current data collection measures, useful in monitoring the individual
emission burden?
RQ3. Are the current methods of Health risk assessments accurate?
Principles of Data science
Hypothesis
Example:
 Ho3: Current methods of Health risk assessments are effective.
 Ha3: Current methods of Health risk assessments are not sufficient.
H0: null hypothesis is a general statement or default position that there is
no relationship between two measured phenomena, or no association
among groups.
Ha: The alternative hypothesis is the hypothesis used
in hypothesis testing that is contrary to the null hypothesis.
H0
Ha
Principles of Data science
Hypothesis
What is
 Type I error
 Type II error
Hypothesis
Ho: Current Health Risk
Assessments are effective in
associating to toxic emission
(isTrue)
Ho: Current Health Risk Assessments are
effective in associating to toxic emission
(is False)
Reject Ho TYPE I Error
Correct Conclusion
(p < 0.05)
Fail to Reject Ho
Correct Conclusion
(p >= 0.05)
Type II Error
For Example:
Principles of Data science
Data
Collection Plan
Type of Data
1. Act, Behavior, or Events
2. Economic data
3. Organizational data
4. Demographic data
5. Self-identity
6. Cultural knowledge
7. Expert knowledge
8. Personal and psychological traits
9. Hidden social patters
Data Location
Operational
Definition
Principles of Data science
Data
Collection Plan
Dataset Who What Why Where When
Firefighters
Dataset
Firefighters
research associate
Wildfire events and
firefighter’s data
To assess the
emission exposure
The National
Institute for
Occupational
Safety and Health
(NIOSH)
For the period 2008
to 2018
Health Report
Dataset
Health report
research associate
Firefighters health
records
To capture the
disease diagnosis
Search Firefighter
fatalities in the
United States
For the period 2008
to 2018
Data Collection plan for Firefighters dataset
Principles of Data science
Data
Collection Plan
Sampling techniques
 Simple random sample
 Clustered sampling
 Representative subgroup sampling
Possible sources of uncertainty
 Sampling Error
 Researcher Bias
 Validity of Instrument
Principles of Data science
Data
Management
Themes of concerns of big data
 Growing data
 Real-time can be Complex
 Data Security
SQL NoSQL
• Relational,Tabular format
• Schema is essential
• GrowVertically
• Unstructured, Semi structured
• No schema
• Grow horizontally
TYPES OF DATA STORAGE (Key Differences)
Example of SQL database: MySql,Oracle, SQLite, Postgres, and MS-SQL.
Examples of NoSQL database: MongoDB, BigTable, Redis, RavenDb, Cassandra,
HBase, Neo4j, and CouchDB
Principles of Data science
DataAnalysis
Flow of data
based on its type
to create insights
Categorical OrdinalInterval-Ratio/
Continuous
Calculate
Frequency,
Distribution
Calculate
Mode
Calculate
Mean,
Median, SD
Vary
Report No
change
No
T-Test | Chi-Squared | Correlation | OLS Regression | Logistic Regression
Report Table, Pie chart, Bar chart
Yes
Descriptive
Statistics
Inferential
Statistics
Principles of Data science
DataAnalysis
Exploratory Data Analysis
Principles of Data science
DataAnalysis
Exploratory Data Analysis
Descriptive statistics on Health Risk ,
Emission level, Exposure duration and
Age
Principles of Data science
Data
Reporting
The most common data reporting formats in business are as follows:
Research
Report
Executive
Summary
Short
Answers
Slide
Presentation
White Paper
Principles of Data science
Summary
Basic research design consists of six core steps:
 Develop a good research question, identifying a small section of
wider topic that is worth exploring.
 Choose a logical structure for research.
 Identify the type of data needed.
 Select a data collection method.
 Choose data collection site, the data source.
 The research question, the type of data, and the data collection
method together leads us to the correct data analysis method to
use.
Principles of Data science
Ethics in Data
Science
 A detailed Informed consent form with the scope of
the research and a transparent method with only
the required information will be collected.
 When accessing the first responder's information,
utmost care will be given to maximize benefits and
minimize harm.
 For the most part, this research should enable
interventions that are designed solely to enhance
the mental being of an individual firefighter or
subject and that have a reasonable expectation of
success.
 All participants will get equal treatment, and every
measurement will be analyzed with the same
method without any bias.
 The assessment of risk and benefits requires a
careful collection of relevant data or any alternate
way of obtaining the benefits sought in the
research.
Informed
Consent
Maximize
benefit
Enhance
Wellbeing
Equal
Treatment
Risk vs Benefit

More Related Content

What's hot

Research
Research Research
Research
anjali5491
 
How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
Pubrica
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
Ewout Steyerberg
 
Project and Thesis
Project and ThesisProject and Thesis
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
GaryCollins74
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
Dorothy Bishop
 
How to increase your Citations
How to increase your CitationsHow to increase your Citations
How to increase your Citations
Hasanain Ghazi
 
Publishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspectivePublishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspective
Deepa Ajithkumar
 
Introduction to Systematic Reviews
Introduction to Systematic ReviewsIntroduction to Systematic Reviews
Introduction to Systematic Reviews
Laura Koltutsky
 
Research Methodology
Research Methodology Research Methodology
Research Methodology
Dr. Sunil Kumar
 
BYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthBYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealth
Ida Sim
 
Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015
MedicReS
 
Bad science (2015)
Bad science (2015)Bad science (2015)
Health Systems Research - 2011
Health Systems Research - 2011Health Systems Research - 2011
Health Systems Research - 2011
Health OER Network
 
Health system research
Health system researchHealth system research
Health system research
Zainab&Sons
 
Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?
Shea Swauger
 
Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015
MedicReS
 
Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress
MedicReS
 
How to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – PubricaHow to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – Pubrica
Pubrica
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
ARDC
 

What's hot (20)

Research
Research Research
Research
 
How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Project and Thesis
Project and ThesisProject and Thesis
Project and Thesis
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
 
How to increase your Citations
How to increase your CitationsHow to increase your Citations
How to increase your Citations
 
Publishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspectivePublishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspective
 
Introduction to Systematic Reviews
Introduction to Systematic ReviewsIntroduction to Systematic Reviews
Introduction to Systematic Reviews
 
Research Methodology
Research Methodology Research Methodology
Research Methodology
 
BYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthBYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealth
 
Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015
 
Bad science (2015)
Bad science (2015)Bad science (2015)
Bad science (2015)
 
Health Systems Research - 2011
Health Systems Research - 2011Health Systems Research - 2011
Health Systems Research - 2011
 
Health system research
Health system researchHealth system research
Health system research
 
Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?
 
Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015
 
Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress
 
How to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – PubricaHow to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – Pubrica
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 

Similar to Principles of data_science

Pandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdfPandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdf
bkbk37
 
To prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTo prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources of
TakishaPeck109
 
Covid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaqCovid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaq
Shawn Mad
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
Bram Zandbelt
 
Research Overview
Research OverviewResearch Overview
Research Overview
S A Tabish
 
Fundamentals of Scientific Research: An Overview
Fundamentals of Scientific Research: An OverviewFundamentals of Scientific Research: An Overview
Fundamentals of Scientific Research: An Overview
S A Tabish
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docx
jeffsrosalyn
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docx
rtodd599
 
Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...
Jennifer Baker
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptx
heencomm
 
practical reporting.pptx
practical reporting.pptxpractical reporting.pptx
practical reporting.pptx
primoboymante
 
Research Evaluation And Data Collection Methods
Research Evaluation And Data Collection MethodsResearch Evaluation And Data Collection Methods
Research Evaluation And Data Collection Methods
Jessica Robles
 
1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx
MerrileeDelvalle969
 
O1
O1O1
Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)
Argitya Righo
 
Tugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptxTugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptx
EriskaAgustin
 
Healthcare analytics
Healthcare analytics Healthcare analytics
Healthcare analytics
Arun K
 
Statistical and critical thinking
Statistical and critical thinkingStatistical and critical thinking
Statistical and critical thinking
RamiroGarcia103
 
Asthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docxAsthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docx
studywriters
 
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid OverdoseCDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
Cassondra Turner McArthur
 

Similar to Principles of data_science (20)

Pandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdfPandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdf
 
To prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTo prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources of
 
Covid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaqCovid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaq
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 
Research Overview
Research OverviewResearch Overview
Research Overview
 
Fundamentals of Scientific Research: An Overview
Fundamentals of Scientific Research: An OverviewFundamentals of Scientific Research: An Overview
Fundamentals of Scientific Research: An Overview
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docx
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docx
 
Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptx
 
practical reporting.pptx
practical reporting.pptxpractical reporting.pptx
practical reporting.pptx
 
Research Evaluation And Data Collection Methods
Research Evaluation And Data Collection MethodsResearch Evaluation And Data Collection Methods
Research Evaluation And Data Collection Methods
 
1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx
 
O1
O1O1
O1
 
Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)
 
Tugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptxTugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptx
 
Healthcare analytics
Healthcare analytics Healthcare analytics
Healthcare analytics
 
Statistical and critical thinking
Statistical and critical thinkingStatistical and critical thinking
Statistical and critical thinking
 
Asthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docxAsthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docx
 
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid OverdoseCDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
 

Recently uploaded

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Recently uploaded (20)

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Principles of data_science

  • 2. Principles of Data science Essential steps 1. ResearchTopic 2. Research Question 3. Hypothesis 4.Data collection plan 5. Data analysis 6.Data Reporting Research Question Hypothesis Experiment/ Data collection plan Data Analysis Conclusion/ Data Reporting Replication
  • 3. Principles of Data science Research Topic Example: First responders long term health is at risk when involved in combating wildfire for several years. Can monitoring individual emission exposure, help manage long term health risks and extend their active life? A problem or a need statement with a broad area of interest Majority of First responders suffer from Cardiac Arrest andTrauma
  • 4. Principles of Data science Research Question A clearly articulated list of specific research question will define the data types required to collect. Example: RQ1. Are toxic emissions negatively associated with long-term health? RQ2.Are the current data collection measures, useful in monitoring the individual emission burden? RQ3. Are the current methods of Health risk assessments accurate?
  • 5. Principles of Data science Hypothesis Example:  Ho3: Current methods of Health risk assessments are effective.  Ha3: Current methods of Health risk assessments are not sufficient. H0: null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. Ha: The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis. H0 Ha
  • 6. Principles of Data science Hypothesis What is  Type I error  Type II error Hypothesis Ho: Current Health Risk Assessments are effective in associating to toxic emission (isTrue) Ho: Current Health Risk Assessments are effective in associating to toxic emission (is False) Reject Ho TYPE I Error Correct Conclusion (p < 0.05) Fail to Reject Ho Correct Conclusion (p >= 0.05) Type II Error For Example:
  • 7. Principles of Data science Data Collection Plan Type of Data 1. Act, Behavior, or Events 2. Economic data 3. Organizational data 4. Demographic data 5. Self-identity 6. Cultural knowledge 7. Expert knowledge 8. Personal and psychological traits 9. Hidden social patters Data Location Operational Definition
  • 8. Principles of Data science Data Collection Plan Dataset Who What Why Where When Firefighters Dataset Firefighters research associate Wildfire events and firefighter’s data To assess the emission exposure The National Institute for Occupational Safety and Health (NIOSH) For the period 2008 to 2018 Health Report Dataset Health report research associate Firefighters health records To capture the disease diagnosis Search Firefighter fatalities in the United States For the period 2008 to 2018 Data Collection plan for Firefighters dataset
  • 9. Principles of Data science Data Collection Plan Sampling techniques  Simple random sample  Clustered sampling  Representative subgroup sampling Possible sources of uncertainty  Sampling Error  Researcher Bias  Validity of Instrument
  • 10. Principles of Data science Data Management Themes of concerns of big data  Growing data  Real-time can be Complex  Data Security SQL NoSQL • Relational,Tabular format • Schema is essential • GrowVertically • Unstructured, Semi structured • No schema • Grow horizontally TYPES OF DATA STORAGE (Key Differences) Example of SQL database: MySql,Oracle, SQLite, Postgres, and MS-SQL. Examples of NoSQL database: MongoDB, BigTable, Redis, RavenDb, Cassandra, HBase, Neo4j, and CouchDB
  • 11. Principles of Data science DataAnalysis Flow of data based on its type to create insights Categorical OrdinalInterval-Ratio/ Continuous Calculate Frequency, Distribution Calculate Mode Calculate Mean, Median, SD Vary Report No change No T-Test | Chi-Squared | Correlation | OLS Regression | Logistic Regression Report Table, Pie chart, Bar chart Yes Descriptive Statistics Inferential Statistics
  • 12. Principles of Data science DataAnalysis Exploratory Data Analysis
  • 13. Principles of Data science DataAnalysis Exploratory Data Analysis Descriptive statistics on Health Risk , Emission level, Exposure duration and Age
  • 14. Principles of Data science Data Reporting The most common data reporting formats in business are as follows: Research Report Executive Summary Short Answers Slide Presentation White Paper
  • 15. Principles of Data science Summary Basic research design consists of six core steps:  Develop a good research question, identifying a small section of wider topic that is worth exploring.  Choose a logical structure for research.  Identify the type of data needed.  Select a data collection method.  Choose data collection site, the data source.  The research question, the type of data, and the data collection method together leads us to the correct data analysis method to use.
  • 16. Principles of Data science Ethics in Data Science  A detailed Informed consent form with the scope of the research and a transparent method with only the required information will be collected.  When accessing the first responder's information, utmost care will be given to maximize benefits and minimize harm.  For the most part, this research should enable interventions that are designed solely to enhance the mental being of an individual firefighter or subject and that have a reasonable expectation of success.  All participants will get equal treatment, and every measurement will be analyzed with the same method without any bias.  The assessment of risk and benefits requires a careful collection of relevant data or any alternate way of obtaining the benefits sought in the research. Informed Consent Maximize benefit Enhance Wellbeing Equal Treatment Risk vs Benefit

Editor's Notes

  1. This slide deck was created to demonstrate my learnings in this course and some of the interesting observations are included to show my level of understanding.
  2. The essential steps of data science research are discussed in this presentation. All six steps discussed here ensure all critical elements are considered in the research process and provide a clear insight for any other researchers to learn. The six steps are, Research Topic: Describes a problem or need statement Research Question: A precise list of questions that directly gives clues on the data type, unit of measure, and data source. Hypothesis: Clearly defines the relationship between the variable. It starts with the baseline assumption that there is no relationship between the independent variable and the dependent variable. Data collection plan: A suitable and successful method of collecting the data by following the right sampling methods Data analysis: Descriptive and Inferential statistics performed on the collected data Data Reporting: Discuss various reporting techniques for varying levels of audiences.
  3. In recent decades, the Western United States has seen heightened wildfire activity, characterized by a higher frequency of massive wildfires, a more extended fire season, larger fire size, and a higher total area burned. With projected temperature increases, soil moisture reduction, and more frequent air stagnation, the burden of wildfires on air quality, public health, and environmental management will likely increase. With state-of-the-art wearable sensors, AI models, and detailed health information, we propose to investigate the impacts of historical and future wildfires on first respondents long term health risks.
  4. RQ1. Are toxic emissions negatively associated with long-term health? Study the levels of toxic emissions from past wildfire events and map it to the health records of the first responders to identify any correlation in the data sets. What are the health risks associated with this occupation? RQ2. Are the current data collection measures, useful in monitoring the individual emission burden? Study the current data collection methods and evaluate their effectiveness in monitoring individual fire fighter's emission burden. Establish the correlation of current methods and their effectiveness in calculating the duration of emission burden. RQ3. Are the current methods of Health risk assessments accurate? What are the different methods used in calculating the health risks and how a specific toxic emission is associated with a Health Risk? What are the thresholds of the Emission burden?
  5. Ho1: Toxic emissions do not affect the long-term health risk Ha1: Toxic emissions have a negative association with the health risk   Ho2: Current data collection methods are not effective in calculating individual emission burden of the firefighters. Ha2: Current data collection methods are useful in calculating individual emission burden Ho3: Current methods of Health risk assessments are not accurate. Ha3: Current methods of Health risk assessments are accurate.
  6. Type I error is the rejection of a true null hypothesis. Type II error is the failure of rejecting a false null hypothesis. In the example The p-value is > 0.05, the firefighters with longer hours of work in a toxic emission had a higher incidence of health disorder. The null hypothesis was accepted with the conclusion that the methods of health risk assessment are beneficial in associating with the toxic emission.
  7. Based on the formulated research questions, retrospective analysis of various wildfire events for the last ten years and an anonymized list of fire fighter's health records are required. Careful selection of both quantitative and qualitative data from specific wildfire events with a duration of containment, level of emission, type of sensor used, firefighters age, shift schedules, reported Injuries and pre-existing conditions need to be collected. Longer-Term details of specific health records related to firefighter's hospital visits, insurance claims information and medicine prescription information, diagnosis date, and diagnosis details need to be collected. From the data source, a set of vital information will be extracted for each of the wildfire events. An event is a specific wildfire incident that burnt at least more than 1000 acres or produced significant structural damage or loss of life. Exposed-days is the number of days each firefighter worked in a job or at a location with the potential for exposure. It will be derived from the employment date and event date. Fire-runs is the total number of fire-runs made by each firefighter. It will be derived from the event date per event. Fire-hours is the total time spent at fires by each firefighter. It will be derived from the exposed hours per day. The individual Emission burden is the total duration of individual emission exposure. Daily Emission burden is the hours of emission burden in a day. A day is 24 hours and starts at 00:00 hours and ends at 23:59 hours. The emission burden per event is the sum of the daily emission burden per event. Level of toxicity is a qualitative assessment based on the pollutants, in six different levels Good, Moderate, Unhealthy for the sensitive group, Unhealthy, Very Unhealthy, Hazardous. The first Noted date is the date on which a specific disease condition was first diagnosed. The disease condition is the actual finding of the Disease state and its stage.
  8. Both data sources will be quantitatively analyzed using the two main methods, Observations, and Questioners. Careful observation of the types of emission exposure and quantifying its duration for each of the combating firefighters is important. Questioners will be developed to assess the emission levels at the event locations. Each of the identified disease condition and the first noted date will be collected per firefighter. Qualitatively assess the worsening of disease condition from periodic health screening reports, based on its progress. The exposure assessment will be conducted by researchers who will be blinded to healthcare reports, to reduce the likelihood of information bias in the subsequent analyses. The below table shows the high-level plan of who, what, why, where and when for data collection.
  9. While the Descriptive statistics and the Inferential statistics are vital for quantitative analysis, there is a need for careful sample selection to make a meaningful inference of the population statistics. The document discusses various sampling methods and reviews its relations with the population statistic. Each of the sampling techniques was reviewed, and my level of confidence in each of the sample mean to the population mean.
  10. The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs. of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs of big data initiatives. Both the SQL and NoSQL databases have their applications, based on the development requirements.
  11. The datasets have a combination of continuous data, discrete numerical data, geospatial data, and categorical data types. A standard frequency of data aggregation will be determined before the analysis to calculate daily emission exposure, emission exposure per wildfire event, emission exposure for the entire career. Establish Mean, median level for emission exposure, and corresponding clinical diagnosis. Develop an unsupervised clustering of a dataset based on similar emission exposure and associated health risks. The analysis will help determine an emission exposure threshold that can be used to effectively manage the Health risk proactively and develop recommendations on care pathways. The infographic shows a typical path of different data types in research activity. From the categorical data, we can calculate the Frequency and Mode before applying a Chi-Squared test or a Logistic regression in case of a classification scenario. From an Interval-Ratio or Continuous data, we can calculate Mean, Median, and Standard deviation to see if there is any variation, and accept the null hypothesis in the case of no variation. Several options are available for continuous data based on the spread and Kurtosis. Finally, an appropriate method of visualization can be used to view and communicate the behavior of the data.
  12. Fire Fighters Age: It is a continuous variable with type float, rounded to the nearest months of the firefighter's age. Measure of Central Tendency: Mean and Median for this sample are pretty close to each other because the mean value is the balancing point, and it is also the average. Since all values are unique for this sample, there is no value for Mode. Measure of Spread: The range or the difference between the minimum value and the maximum values shows the dispersion but in cases of outliers, it does not clearly indicate the spread. The standard deviation measures how far an individual value is from the mean value. In general, for larger sample size, the distribution is normal.
  13. There is a huge variation in the CO emission and shows a great relationship with the Health risk. The duration of exposure varies significantly when compared to various health risks. Diagnostic Condition: It is the diagnosis reported by the physician and serves as a qualitative variable describing the state of health. It is a discrete variable to map the health condition of the firefighter. T
  14. Research Report: the longest and most comprehensive presentation format,  Executive Summary: one to two pages providing an overview of the findings with a statement of action items, Short Answers:  a statement of action items, Slide Presentation: designed for an oral presentation that provides some context of the research, the findings and the action items, White Paper: a short report that describes the research and findings, action items, and how other needs and broader findings in the research area.
  15. Develop recommendations, possibly a wearable sensor built to collect and managing emission exposure on an individual basis effectively. Develop an AI/ML model to proactively identify the potential firefighter early on to manage the health conditions effectively. Post data collection, both the datasets require careful mapping of Independent variables to associate the positive or negative correlations with studying the impact on overall health risk, retrospectively.
  16. Last but not the least, The 45CFR46 and the Belmont study summarizes the ethical principles identified by the commission in the context of its deliberation. Scientific research has produced substantial social benefits. It has also posed some troubling ethical questions. The code consists of rules, some general and other specifics that guide the investigators and other reviewers of research in their work. It was depressing to read about some of the early research participants were treated unethically and helps us learn a systematic method in not repeating the unfair practices. All interested citizens, including Scientist, Research subjects, and Reviewers will get trained with the research scope and the extent of data collection required for this analysis. The main objective is to follow an analytical framework that will guide the resolution of the ethical problem arising from research involving firefighter’s health reports. However, some of the firefighters may not be capable of self-determination, or the capacity of self-determination may mature during the research participation, and some participants may not be in a position to assess their liberty due to their illness. Subjects are to be treated in an ethical manner, not only by respecting their decisions, but also protecting them from harm and secure their wellbeing. Like the principle of respect for persons finds expression in the requirement for Informed consent and the principle of beneficence in risk/benefit assessment, the principle of justice also gives rise to moral requirements that there be fair procedures and outcomes in the selection of firefighter’s event and health history.