Principles of data_science

Principle of Data
Science
VasanthThirugnanam

Principles of Data science
Essential
steps
1. ResearchTopic
2. Research Question
3. Hypothesis
4.Data collection plan
5. Data analysis
6.Data Reporting
Research Question
Hypothesis
Experiment/
Data collection plan
Data Analysis
Conclusion/
Data Reporting
Replication

Research
Topic
Example:
First responders long term health is at risk when involved in
combating wildfire for several years.
Can monitoring individual emission exposure, help manage long
term health risks and extend their active life?
A problem or a need statement with a broad area of interest
Majority of First responders suffer from Cardiac Arrest andTrauma

Research
Question
A clearly articulated list of specific research question will define the
data types required to collect.
Example:
RQ1. Are toxic emissions negatively associated with long-term health?
RQ2.Are the current data collection measures, useful in monitoring the individual
emission burden?
RQ3. Are the current methods of Health risk assessments accurate?

Hypothesis
Example:
 Ho3: Current methods of Health risk assessments are effective.
 Ha3: Current methods of Health risk assessments are not sufficient.
H0: null hypothesis is a general statement or default position that there is
no relationship between two measured phenomena, or no association
among groups.
Ha: The alternative hypothesis is the hypothesis used
in hypothesis testing that is contrary to the null hypothesis.
H0
Ha

Hypothesis
What is
 Type I error
 Type II error
Hypothesis
Ho: Current Health Risk
Assessments are effective in
associating to toxic emission
(isTrue)
Ho: Current Health Risk Assessments are
effective in associating to toxic emission
(is False)
Reject Ho TYPE I Error
Correct Conclusion
(p < 0.05)
Fail to Reject Ho
Correct Conclusion
(p >= 0.05)
Type II Error
For Example:

Data
Collection Plan
Type of Data
1. Act, Behavior, or Events
2. Economic data
3. Organizational data
4. Demographic data
5. Self-identity
6. Cultural knowledge
7. Expert knowledge
8. Personal and psychological traits
9. Hidden social patters
Data Location
Operational
Definition

Data
Collection Plan
Dataset Who What Why Where When
Firefighters
Dataset
Firefighters
research associate
Wildfire events and
firefighter’s data
To assess the
emission exposure
The National
Institute for
Occupational
Safety and Health
(NIOSH)
For the period 2008
to 2018
Health Report
Dataset
Health report
research associate
Firefighters health
records
To capture the
disease diagnosis
Search Firefighter
fatalities in the
United States
For the period 2008
to 2018
Data Collection plan for Firefighters dataset

Data
Collection Plan
Sampling techniques
 Simple random sample
 Clustered sampling
 Representative subgroup sampling
Possible sources of uncertainty
 Sampling Error
 Researcher Bias
 Validity of Instrument

Data
Management
Themes of concerns of big data
 Growing data
 Real-time can be Complex
 Data Security
SQL NoSQL
• Relational,Tabular format
• Schema is essential
• GrowVertically
• Unstructured, Semi structured
• No schema
• Grow horizontally
TYPES OF DATA STORAGE (Key Differences)
Example of SQL database: MySql,Oracle, SQLite, Postgres, and MS-SQL.
Examples of NoSQL database: MongoDB, BigTable, Redis, RavenDb, Cassandra,
HBase, Neo4j, and CouchDB

DataAnalysis
Flow of data
based on its type
to create insights
Categorical OrdinalInterval-Ratio/
Continuous
Calculate
Frequency,
Distribution
Calculate
Mode
Calculate
Mean,
Median, SD
Vary
Report No
change
No
T-Test | Chi-Squared | Correlation | OLS Regression | Logistic Regression
Report Table, Pie chart, Bar chart
Yes
Descriptive
Statistics
Inferential
Statistics

DataAnalysis
Exploratory Data Analysis

DataAnalysis
Exploratory Data Analysis
Descriptive statistics on Health Risk ,
Emission level, Exposure duration and
Age

Data
Reporting
The most common data reporting formats in business are as follows:
Research
Report
Executive
Summary
Short
Answers
Slide
Presentation
White Paper

Summary
Basic research design consists of six core steps:
 Develop a good research question, identifying a small section of
wider topic that is worth exploring.
 Choose a logical structure for research.
 Identify the type of data needed.
 Select a data collection method.
 Choose data collection site, the data source.
 The research question, the type of data, and the data collection
method together leads us to the correct data analysis method to
use.

Ethics in Data
Science
 A detailed Informed consent form with the scope of
the research and a transparent method with only
the required information will be collected.
 When accessing the first responder's information,
utmost care will be given to maximize benefits and
minimize harm.
 For the most part, this research should enable
interventions that are designed solely to enhance
the mental being of an individual firefighter or
subject and that have a reasonable expectation of
success.
 All participants will get equal treatment, and every
measurement will be analyzed with the same
method without any bias.
 The assessment of risk and benefits requires a
careful collection of relevant data or any alternate
way of obtaining the benefits sought in the
research.
Informed
Consent
Maximize
benefit
Enhance
Wellbeing
Equal
Treatment
Risk vs Benefit

Principles of data_science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Principles of data_science

Similar to Principles of data_science (20)

Recently uploaded

Recently uploaded (20)

Principles of data_science

Editor's Notes