SlideShare a Scribd company logo
1 of 16
Principle of Data
Science
VasanthThirugnanam
Principles of Data science
Essential
steps
1. ResearchTopic
2. Research Question
3. Hypothesis
4.Data collection plan
5. Data analysis
6.Data Reporting
Research Question
Hypothesis
Experiment/
Data collection plan
Data Analysis
Conclusion/
Data Reporting
Replication
Principles of Data science
Research
Topic
Example:
First responders long term health is at risk when involved in
combating wildfire for several years.
Can monitoring individual emission exposure, help manage long
term health risks and extend their active life?
A problem or a need statement with a broad area of interest
Majority of First responders suffer from Cardiac Arrest andTrauma
Principles of Data science
Research
Question
A clearly articulated list of specific research question will define the
data types required to collect.
Example:
RQ1. Are toxic emissions negatively associated with long-term health?
RQ2.Are the current data collection measures, useful in monitoring the individual
emission burden?
RQ3. Are the current methods of Health risk assessments accurate?
Principles of Data science
Hypothesis
Example:
 Ho3: Current methods of Health risk assessments are effective.
 Ha3: Current methods of Health risk assessments are not sufficient.
H0: null hypothesis is a general statement or default position that there is
no relationship between two measured phenomena, or no association
among groups.
Ha: The alternative hypothesis is the hypothesis used
in hypothesis testing that is contrary to the null hypothesis.
H0
Ha
Principles of Data science
Hypothesis
What is
 Type I error
 Type II error
Hypothesis
Ho: Current Health Risk
Assessments are effective in
associating to toxic emission
(isTrue)
Ho: Current Health Risk Assessments are
effective in associating to toxic emission
(is False)
Reject Ho TYPE I Error
Correct Conclusion
(p < 0.05)
Fail to Reject Ho
Correct Conclusion
(p >= 0.05)
Type II Error
For Example:
Principles of Data science
Data
Collection Plan
Type of Data
1. Act, Behavior, or Events
2. Economic data
3. Organizational data
4. Demographic data
5. Self-identity
6. Cultural knowledge
7. Expert knowledge
8. Personal and psychological traits
9. Hidden social patters
Data Location
Operational
Definition
Principles of Data science
Data
Collection Plan
Dataset Who What Why Where When
Firefighters
Dataset
Firefighters
research associate
Wildfire events and
firefighter’s data
To assess the
emission exposure
The National
Institute for
Occupational
Safety and Health
(NIOSH)
For the period 2008
to 2018
Health Report
Dataset
Health report
research associate
Firefighters health
records
To capture the
disease diagnosis
Search Firefighter
fatalities in the
United States
For the period 2008
to 2018
Data Collection plan for Firefighters dataset
Principles of Data science
Data
Collection Plan
Sampling techniques
 Simple random sample
 Clustered sampling
 Representative subgroup sampling
Possible sources of uncertainty
 Sampling Error
 Researcher Bias
 Validity of Instrument
Principles of Data science
Data
Management
Themes of concerns of big data
 Growing data
 Real-time can be Complex
 Data Security
SQL NoSQL
• Relational,Tabular format
• Schema is essential
• GrowVertically
• Unstructured, Semi structured
• No schema
• Grow horizontally
TYPES OF DATA STORAGE (Key Differences)
Example of SQL database: MySql,Oracle, SQLite, Postgres, and MS-SQL.
Examples of NoSQL database: MongoDB, BigTable, Redis, RavenDb, Cassandra,
HBase, Neo4j, and CouchDB
Principles of Data science
DataAnalysis
Flow of data
based on its type
to create insights
Categorical OrdinalInterval-Ratio/
Continuous
Calculate
Frequency,
Distribution
Calculate
Mode
Calculate
Mean,
Median, SD
Vary
Report No
change
No
T-Test | Chi-Squared | Correlation | OLS Regression | Logistic Regression
Report Table, Pie chart, Bar chart
Yes
Descriptive
Statistics
Inferential
Statistics
Principles of Data science
DataAnalysis
Exploratory Data Analysis
Principles of Data science
DataAnalysis
Exploratory Data Analysis
Descriptive statistics on Health Risk ,
Emission level, Exposure duration and
Age
Principles of Data science
Data
Reporting
The most common data reporting formats in business are as follows:
Research
Report
Executive
Summary
Short
Answers
Slide
Presentation
White Paper
Principles of Data science
Summary
Basic research design consists of six core steps:
 Develop a good research question, identifying a small section of
wider topic that is worth exploring.
 Choose a logical structure for research.
 Identify the type of data needed.
 Select a data collection method.
 Choose data collection site, the data source.
 The research question, the type of data, and the data collection
method together leads us to the correct data analysis method to
use.
Principles of Data science
Ethics in Data
Science
 A detailed Informed consent form with the scope of
the research and a transparent method with only
the required information will be collected.
 When accessing the first responder's information,
utmost care will be given to maximize benefits and
minimize harm.
 For the most part, this research should enable
interventions that are designed solely to enhance
the mental being of an individual firefighter or
subject and that have a reasonable expectation of
success.
 All participants will get equal treatment, and every
measurement will be analyzed with the same
method without any bias.
 The assessment of risk and benefits requires a
careful collection of relevant data or any alternate
way of obtaining the benefits sought in the
research.
Informed
Consent
Maximize
benefit
Enhance
Wellbeing
Equal
Treatment
Risk vs Benefit

More Related Content

What's hot

How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaPubrica
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEwout Steyerberg
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...GaryCollins74
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save livesDorothy Bishop
 
How to increase your Citations
How to increase your CitationsHow to increase your Citations
How to increase your CitationsHasanain Ghazi
 
Publishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspectivePublishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspectiveDeepa Ajithkumar
 
Introduction to Systematic Reviews
Introduction to Systematic ReviewsIntroduction to Systematic Reviews
Introduction to Systematic ReviewsLaura Koltutsky
 
BYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthBYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthIda Sim
 
Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015MedicReS
 
Health Systems Research - 2011
Health Systems Research - 2011Health Systems Research - 2011
Health Systems Research - 2011Health OER Network
 
Health system research
Health system researchHealth system research
Health system researchZainab&Sons
 
Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?Shea Swauger
 
Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015MedicReS
 
Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress MedicReS
 
How to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – PubricaHow to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – PubricaPubrica
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studiesARDC
 

What's hot (20)

Research
Research Research
Research
 
How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Project and Thesis
Project and ThesisProject and Thesis
Project and Thesis
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
 
How to increase your Citations
How to increase your CitationsHow to increase your Citations
How to increase your Citations
 
Publishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspectivePublishers' expectations- Nursing perspective
Publishers' expectations- Nursing perspective
 
Introduction to Systematic Reviews
Introduction to Systematic ReviewsIntroduction to Systematic Reviews
Introduction to Systematic Reviews
 
Research Methodology
Research Methodology Research Methodology
Research Methodology
 
BYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthBYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealth
 
Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015Collin O´Neil MedicReS 5th World Congress 2015
Collin O´Neil MedicReS 5th World Congress 2015
 
Bad science (2015)
Bad science (2015)Bad science (2015)
Bad science (2015)
 
Health Systems Research - 2011
Health Systems Research - 2011Health Systems Research - 2011
Health Systems Research - 2011
 
Health system research
Health system researchHealth system research
Health system research
 
Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?
 
Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015Shing Lee MedicReS World Congress 2015
Shing Lee MedicReS World Congress 2015
 
Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress
 
How to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – PubricaHow to structure your table for systematic review and meta analysis – Pubrica
How to structure your table for systematic review and meta analysis – Pubrica
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 

Similar to Principles of data_science

Pandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdfPandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdfbkbk37
 
To prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTo prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTakishaPeck109
 
Covid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaqCovid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaqShawn Mad
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
 
Research Overview
Research OverviewResearch Overview
Research OverviewS A Tabish
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docxjeffsrosalyn
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docxrtodd599
 
Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...Jennifer Baker
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptxheencomm
 
practical reporting.pptx
practical reporting.pptxpractical reporting.pptx
practical reporting.pptxprimoboymante
 
Research Evaluation And Data Collection Methods
Research Evaluation And Data Collection MethodsResearch Evaluation And Data Collection Methods
Research Evaluation And Data Collection MethodsJessica Robles
 
1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxxMerrileeDelvalle969
 
Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)Argitya Righo
 
Tugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptxTugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptxEriskaAgustin
 
Healthcare analytics
Healthcare analytics Healthcare analytics
Healthcare analytics Arun K
 
Statistical and critical thinking
Statistical and critical thinkingStatistical and critical thinking
Statistical and critical thinkingRamiroGarcia103
 
Asthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docxAsthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docxstudywriters
 
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid OverdoseCDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid OverdoseCassondra Turner McArthur
 
20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiologyInternet Medical Journal
 

Similar to Principles of data_science (20)

Pandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdfPandemic Preparedness Results and Recommendations.pdf
Pandemic Preparedness Results and Recommendations.pdf
 
To prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTo prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources of
 
Covid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaqCovid 19 methods of data collection-sharoon mushtaq
Covid 19 methods of data collection-sharoon mushtaq
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 
Research Overview
Research OverviewResearch Overview
Research Overview
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docx
 
Running Head WEEK 1 .docx
Running Head WEEK 1                                              .docxRunning Head WEEK 1                                              .docx
Running Head WEEK 1 .docx
 
Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...Project Estimation Techniques And Methods For The Data...
Project Estimation Techniques And Methods For The Data...
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptx
 
practical reporting.pptx
practical reporting.pptxpractical reporting.pptx
practical reporting.pptx
 
Research Evaluation And Data Collection Methods
Research Evaluation And Data Collection MethodsResearch Evaluation And Data Collection Methods
Research Evaluation And Data Collection Methods
 
1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx1Methods and Statistical AnalysisName xxx
1Methods and Statistical AnalysisName xxx
 
O1
O1O1
O1
 
Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)Analisis Jurnal (Using PICO Model)
Analisis Jurnal (Using PICO Model)
 
Tugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptxTugas 1_Septiani Wulandari_engineering.pptx
Tugas 1_Septiani Wulandari_engineering.pptx
 
Healthcare analytics
Healthcare analytics Healthcare analytics
Healthcare analytics
 
Statistical and critical thinking
Statistical and critical thinkingStatistical and critical thinking
Statistical and critical thinking
 
Asthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docxAsthma Study My Nursing Experts.docx
Asthma Study My Nursing Experts.docx
 
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid OverdoseCDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
CDC 2018-Evidence-Based-Strategies For Preventing Opioid Overdose
 
20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Principles of data_science

  • 2. Principles of Data science Essential steps 1. ResearchTopic 2. Research Question 3. Hypothesis 4.Data collection plan 5. Data analysis 6.Data Reporting Research Question Hypothesis Experiment/ Data collection plan Data Analysis Conclusion/ Data Reporting Replication
  • 3. Principles of Data science Research Topic Example: First responders long term health is at risk when involved in combating wildfire for several years. Can monitoring individual emission exposure, help manage long term health risks and extend their active life? A problem or a need statement with a broad area of interest Majority of First responders suffer from Cardiac Arrest andTrauma
  • 4. Principles of Data science Research Question A clearly articulated list of specific research question will define the data types required to collect. Example: RQ1. Are toxic emissions negatively associated with long-term health? RQ2.Are the current data collection measures, useful in monitoring the individual emission burden? RQ3. Are the current methods of Health risk assessments accurate?
  • 5. Principles of Data science Hypothesis Example:  Ho3: Current methods of Health risk assessments are effective.  Ha3: Current methods of Health risk assessments are not sufficient. H0: null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. Ha: The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis. H0 Ha
  • 6. Principles of Data science Hypothesis What is  Type I error  Type II error Hypothesis Ho: Current Health Risk Assessments are effective in associating to toxic emission (isTrue) Ho: Current Health Risk Assessments are effective in associating to toxic emission (is False) Reject Ho TYPE I Error Correct Conclusion (p < 0.05) Fail to Reject Ho Correct Conclusion (p >= 0.05) Type II Error For Example:
  • 7. Principles of Data science Data Collection Plan Type of Data 1. Act, Behavior, or Events 2. Economic data 3. Organizational data 4. Demographic data 5. Self-identity 6. Cultural knowledge 7. Expert knowledge 8. Personal and psychological traits 9. Hidden social patters Data Location Operational Definition
  • 8. Principles of Data science Data Collection Plan Dataset Who What Why Where When Firefighters Dataset Firefighters research associate Wildfire events and firefighter’s data To assess the emission exposure The National Institute for Occupational Safety and Health (NIOSH) For the period 2008 to 2018 Health Report Dataset Health report research associate Firefighters health records To capture the disease diagnosis Search Firefighter fatalities in the United States For the period 2008 to 2018 Data Collection plan for Firefighters dataset
  • 9. Principles of Data science Data Collection Plan Sampling techniques  Simple random sample  Clustered sampling  Representative subgroup sampling Possible sources of uncertainty  Sampling Error  Researcher Bias  Validity of Instrument
  • 10. Principles of Data science Data Management Themes of concerns of big data  Growing data  Real-time can be Complex  Data Security SQL NoSQL • Relational,Tabular format • Schema is essential • GrowVertically • Unstructured, Semi structured • No schema • Grow horizontally TYPES OF DATA STORAGE (Key Differences) Example of SQL database: MySql,Oracle, SQLite, Postgres, and MS-SQL. Examples of NoSQL database: MongoDB, BigTable, Redis, RavenDb, Cassandra, HBase, Neo4j, and CouchDB
  • 11. Principles of Data science DataAnalysis Flow of data based on its type to create insights Categorical OrdinalInterval-Ratio/ Continuous Calculate Frequency, Distribution Calculate Mode Calculate Mean, Median, SD Vary Report No change No T-Test | Chi-Squared | Correlation | OLS Regression | Logistic Regression Report Table, Pie chart, Bar chart Yes Descriptive Statistics Inferential Statistics
  • 12. Principles of Data science DataAnalysis Exploratory Data Analysis
  • 13. Principles of Data science DataAnalysis Exploratory Data Analysis Descriptive statistics on Health Risk , Emission level, Exposure duration and Age
  • 14. Principles of Data science Data Reporting The most common data reporting formats in business are as follows: Research Report Executive Summary Short Answers Slide Presentation White Paper
  • 15. Principles of Data science Summary Basic research design consists of six core steps:  Develop a good research question, identifying a small section of wider topic that is worth exploring.  Choose a logical structure for research.  Identify the type of data needed.  Select a data collection method.  Choose data collection site, the data source.  The research question, the type of data, and the data collection method together leads us to the correct data analysis method to use.
  • 16. Principles of Data science Ethics in Data Science  A detailed Informed consent form with the scope of the research and a transparent method with only the required information will be collected.  When accessing the first responder's information, utmost care will be given to maximize benefits and minimize harm.  For the most part, this research should enable interventions that are designed solely to enhance the mental being of an individual firefighter or subject and that have a reasonable expectation of success.  All participants will get equal treatment, and every measurement will be analyzed with the same method without any bias.  The assessment of risk and benefits requires a careful collection of relevant data or any alternate way of obtaining the benefits sought in the research. Informed Consent Maximize benefit Enhance Wellbeing Equal Treatment Risk vs Benefit

Editor's Notes

  1. This slide deck was created to demonstrate my learnings in this course and some of the interesting observations are included to show my level of understanding.
  2. The essential steps of data science research are discussed in this presentation. All six steps discussed here ensure all critical elements are considered in the research process and provide a clear insight for any other researchers to learn. The six steps are, Research Topic: Describes a problem or need statement Research Question: A precise list of questions that directly gives clues on the data type, unit of measure, and data source. Hypothesis: Clearly defines the relationship between the variable. It starts with the baseline assumption that there is no relationship between the independent variable and the dependent variable. Data collection plan: A suitable and successful method of collecting the data by following the right sampling methods Data analysis: Descriptive and Inferential statistics performed on the collected data Data Reporting: Discuss various reporting techniques for varying levels of audiences.
  3. In recent decades, the Western United States has seen heightened wildfire activity, characterized by a higher frequency of massive wildfires, a more extended fire season, larger fire size, and a higher total area burned. With projected temperature increases, soil moisture reduction, and more frequent air stagnation, the burden of wildfires on air quality, public health, and environmental management will likely increase. With state-of-the-art wearable sensors, AI models, and detailed health information, we propose to investigate the impacts of historical and future wildfires on first respondents long term health risks.
  4. RQ1. Are toxic emissions negatively associated with long-term health? Study the levels of toxic emissions from past wildfire events and map it to the health records of the first responders to identify any correlation in the data sets. What are the health risks associated with this occupation? RQ2. Are the current data collection measures, useful in monitoring the individual emission burden? Study the current data collection methods and evaluate their effectiveness in monitoring individual fire fighter's emission burden. Establish the correlation of current methods and their effectiveness in calculating the duration of emission burden. RQ3. Are the current methods of Health risk assessments accurate? What are the different methods used in calculating the health risks and how a specific toxic emission is associated with a Health Risk? What are the thresholds of the Emission burden?
  5. Ho1: Toxic emissions do not affect the long-term health risk Ha1: Toxic emissions have a negative association with the health risk   Ho2: Current data collection methods are not effective in calculating individual emission burden of the firefighters. Ha2: Current data collection methods are useful in calculating individual emission burden Ho3: Current methods of Health risk assessments are not accurate. Ha3: Current methods of Health risk assessments are accurate.
  6. Type I error is the rejection of a true null hypothesis. Type II error is the failure of rejecting a false null hypothesis. In the example The p-value is > 0.05, the firefighters with longer hours of work in a toxic emission had a higher incidence of health disorder. The null hypothesis was accepted with the conclusion that the methods of health risk assessment are beneficial in associating with the toxic emission.
  7. Based on the formulated research questions, retrospective analysis of various wildfire events for the last ten years and an anonymized list of fire fighter's health records are required. Careful selection of both quantitative and qualitative data from specific wildfire events with a duration of containment, level of emission, type of sensor used, firefighters age, shift schedules, reported Injuries and pre-existing conditions need to be collected. Longer-Term details of specific health records related to firefighter's hospital visits, insurance claims information and medicine prescription information, diagnosis date, and diagnosis details need to be collected. From the data source, a set of vital information will be extracted for each of the wildfire events. An event is a specific wildfire incident that burnt at least more than 1000 acres or produced significant structural damage or loss of life. Exposed-days is the number of days each firefighter worked in a job or at a location with the potential for exposure. It will be derived from the employment date and event date. Fire-runs is the total number of fire-runs made by each firefighter. It will be derived from the event date per event. Fire-hours is the total time spent at fires by each firefighter. It will be derived from the exposed hours per day. The individual Emission burden is the total duration of individual emission exposure. Daily Emission burden is the hours of emission burden in a day. A day is 24 hours and starts at 00:00 hours and ends at 23:59 hours. The emission burden per event is the sum of the daily emission burden per event. Level of toxicity is a qualitative assessment based on the pollutants, in six different levels Good, Moderate, Unhealthy for the sensitive group, Unhealthy, Very Unhealthy, Hazardous. The first Noted date is the date on which a specific disease condition was first diagnosed. The disease condition is the actual finding of the Disease state and its stage.
  8. Both data sources will be quantitatively analyzed using the two main methods, Observations, and Questioners. Careful observation of the types of emission exposure and quantifying its duration for each of the combating firefighters is important. Questioners will be developed to assess the emission levels at the event locations. Each of the identified disease condition and the first noted date will be collected per firefighter. Qualitatively assess the worsening of disease condition from periodic health screening reports, based on its progress. The exposure assessment will be conducted by researchers who will be blinded to healthcare reports, to reduce the likelihood of information bias in the subsequent analyses. The below table shows the high-level plan of who, what, why, where and when for data collection.
  9. While the Descriptive statistics and the Inferential statistics are vital for quantitative analysis, there is a need for careful sample selection to make a meaningful inference of the population statistics. The document discusses various sampling methods and reviews its relations with the population statistic. Each of the sampling techniques was reviewed, and my level of confidence in each of the sample mean to the population mean.
  10. The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs. of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs of big data initiatives. Both the SQL and NoSQL databases have their applications, based on the development requirements.
  11. The datasets have a combination of continuous data, discrete numerical data, geospatial data, and categorical data types. A standard frequency of data aggregation will be determined before the analysis to calculate daily emission exposure, emission exposure per wildfire event, emission exposure for the entire career. Establish Mean, median level for emission exposure, and corresponding clinical diagnosis. Develop an unsupervised clustering of a dataset based on similar emission exposure and associated health risks. The analysis will help determine an emission exposure threshold that can be used to effectively manage the Health risk proactively and develop recommendations on care pathways. The infographic shows a typical path of different data types in research activity. From the categorical data, we can calculate the Frequency and Mode before applying a Chi-Squared test or a Logistic regression in case of a classification scenario. From an Interval-Ratio or Continuous data, we can calculate Mean, Median, and Standard deviation to see if there is any variation, and accept the null hypothesis in the case of no variation. Several options are available for continuous data based on the spread and Kurtosis. Finally, an appropriate method of visualization can be used to view and communicate the behavior of the data.
  12. Fire Fighters Age: It is a continuous variable with type float, rounded to the nearest months of the firefighter's age. Measure of Central Tendency: Mean and Median for this sample are pretty close to each other because the mean value is the balancing point, and it is also the average. Since all values are unique for this sample, there is no value for Mode. Measure of Spread: The range or the difference between the minimum value and the maximum values shows the dispersion but in cases of outliers, it does not clearly indicate the spread. The standard deviation measures how far an individual value is from the mean value. In general, for larger sample size, the distribution is normal.
  13. There is a huge variation in the CO emission and shows a great relationship with the Health risk. The duration of exposure varies significantly when compared to various health risks. Diagnostic Condition: It is the diagnosis reported by the physician and serves as a qualitative variable describing the state of health. It is a discrete variable to map the health condition of the firefighter. T
  14. Research Report: the longest and most comprehensive presentation format,  Executive Summary: one to two pages providing an overview of the findings with a statement of action items, Short Answers:  a statement of action items, Slide Presentation: designed for an oral presentation that provides some context of the research, the findings and the action items, White Paper: a short report that describes the research and findings, action items, and how other needs and broader findings in the research area.
  15. Develop recommendations, possibly a wearable sensor built to collect and managing emission exposure on an individual basis effectively. Develop an AI/ML model to proactively identify the potential firefighter early on to manage the health conditions effectively. Post data collection, both the datasets require careful mapping of Independent variables to associate the positive or negative correlations with studying the impact on overall health risk, retrospectively.
  16. Last but not the least, The 45CFR46 and the Belmont study summarizes the ethical principles identified by the commission in the context of its deliberation. Scientific research has produced substantial social benefits. It has also posed some troubling ethical questions. The code consists of rules, some general and other specifics that guide the investigators and other reviewers of research in their work. It was depressing to read about some of the early research participants were treated unethically and helps us learn a systematic method in not repeating the unfair practices. All interested citizens, including Scientist, Research subjects, and Reviewers will get trained with the research scope and the extent of data collection required for this analysis. The main objective is to follow an analytical framework that will guide the resolution of the ethical problem arising from research involving firefighter’s health reports. However, some of the firefighters may not be capable of self-determination, or the capacity of self-determination may mature during the research participation, and some participants may not be in a position to assess their liberty due to their illness. Subjects are to be treated in an ethical manner, not only by respecting their decisions, but also protecting them from harm and secure their wellbeing. Like the principle of respect for persons finds expression in the requirement for Informed consent and the principle of beneficence in risk/benefit assessment, the principle of justice also gives rise to moral requirements that there be fair procedures and outcomes in the selection of firefighter’s event and health history.