This document provides an overview of data science including:
- Definitions of data science and the motivations for its increasing importance due to factors like big data, cloud computing, and the internet of things.
- The key skills required of data scientists and an overview of the data science process.
- Descriptions of different types of databases like relational, NoSQL, and data warehouses versus data lakes.
- An introduction to machine learning, data mining, and data visualization.
- Details on courses for learning data science.
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Intro to Data Science by DatalentTeam at Data Science Clinic#11
1. Introduction to Data Science
@Data Science Clinic #11
6-Jun-2017
All Season Place
Dr. Sotarat Thammaboosadee
@DatalentTeam
2. Agenda
• What is Data Science?
• Motivation
• Data Scientist’s Skill
• Data Science Process
• Relational Database vs NoSQL Database
• Data Warehouse vs Data Lake
• AI / Machine Learning / Data Mining
• Data Visualization
• Courses
7. Datalent Team Member
• Wichian Boonyaprapa (Aod)
– Business Analyst and Intelligence
• Siriraj Hospital
• Chanwit Onsumran (Earth)
– Financial Analyst
• Mahidol University
8. Datalent Team Member
• Teerapat Kansadub (Champ)
– Data Engineer
• Faculty of Physical Therapy, Mahidol university
• Chanon Srisuwan (Fern)
– Process Engineer
• Ramathibodi Hospital
10. Datalent Team: Data Talent Development
Research Group
• Website: http://www.datalentteam.com
11. What is Data Science?
• Data Science
– is the study of the generalizable extraction
of knowledge from data (Wikipedia)
– is getting predictive and/or actionable
insight from data (Neil Raden)
– Involves extracting, creating, and processing
data to run it into business value (Vincent
Granville)
12. What’s new?
• Data science is not new, Data science is just modernizing existing
reporting solution, analytics solution, data warehousing solution,
business intelligence solution, and even data management
solution. (Jothi Periasamy)
• So, Data science is…
– New thinking
– New ideas
– New data source
– New data structure
– New data architecture
– New data processing mechanism
– New innovation on data
– New way of solving problems
13.
14. Motivation: Why data science now?
http://datascienceth.com/why-data-science-now/
45. Six Types of Databases
45
Relational Analytical (OLAP) Key-Value
Column-Family
key value
key value
key value
key value
DocumentGraph
https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
46. Relational
• Data is usually stored in row by row
manner (row store)
• Standardized query language (SQL)
• Data model defined before you
add data
• Joins merge data from multiple
tables
• Results are tables
• Pros: mature ACID transactions
with fine-grain security controls
• Cons: Requires up front data
modeling, does not scale well
46https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
47. Analytical (OLAP)
• Based on "Star" schema with
central fact table for each event
• Optimized for analysis of read-
analysis of historical data
• Use of MDX language to count
query "measures" for
"categories" of data
• Pros: fast queries for large data
• Cons: not optimized for
transactions and updates
47https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
48. Key-Value Stores
• Keys used to access opaque
blobs of data
• Values can contain any type
of data (images, video)
Pros: scalable, simple API
(put, get, delete)
Cons: no way to query based
on the content of the value
48
key value
key value
key value
key value
https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
49. Column-Family
• Key includes a row, column
family and column name
• Store versioned blobs in one
large table
• Queries can be done on rows,
column families and column
names
• Pros: Good scale out
• Cons: Can not query blob
content, row and column
designs are critical
49
Examples: HBase,
Cassandra
https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
50. Graph Store
• Data is stored in a series of nodes
and properties
• Queries are really graph traversals
• Ideal when relationships between
data is key:
– e.g. social networks
• Pros: fast network search, works
with public linked data sets
• Cons: Poor scalability when graphs
don't fit into RAM, specialized
query language
50
Examples: Neo4j,
AllegroGraph
https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
51. Document Store
• Data stored in nested
hierarchies
• Logical data remains stored
together as a unit
• Any item in the document can
be queried
• Pros: No object-relational
mapping layer, ideal for search
• Cons: Complex to implement,
incompatible with SQL
51
Examples: MongoDB,
Couchbase
https://www.slideshare.net/Couchbase/webinar-making-sense-of-nosql-applying-
nonrelational-databases-to-business-needs
57. AI / Machine Learning / Data Mining
http://blogs.sas.com/content/subconsciousmusings/2014/08/22/looking-
backwards-looking-forwards-sas-data-mining-and-machine-learning/
65. CRISP-DM: Phases
• Business Understanding
Project objectives and requirements understanding, Data mining problem
definition
• Data Understanding
Initial data collection and familiarization, Data quality problems identification
• Data Preparation
Table, record and attribute selection, Data transformation and cleaning
• Modeling
Modeling techniques selection and application, Parameters calibration
• Evaluation
Business objectives & issues achievement evaluation
• Deployment
Result model deployment, Repeatable data mining process implementation
66. Phases and Tasks
Business
Understanding
Data
Understanding
Data
Preparation
Modeling DeploymentEvaluation
Format
Data
Integrate
Data
Construct
Data
Clean
Data
Select
Data
Determine
Business
Objectives
Review
Project
Produce
Final
Report
Plan Monitering
&
Maintenance
Plan
Deployment
Determine
Next Steps
Review
Process
Evaluate
Results
Assess
Model
Build
Model
Generate
Test Design
Select
Modeling
Technique
Assess
Situation
Explore
Data
Describe
Data
Collect
Initial
Data
Determine
Data Mining
Goals
Verify
Data
Quality
Produce
Project Plan