SlideShare a Scribd company logo
Dr. Windu Gata, M.Kom
Data Science
Profile
Dr. Windu Gata, M.Kom
Internal Trainer Multimatics
IT Consultant since 1995
Lecturer since 2003
Computer Researcher since 2008
Profile
Datawarehouse
Data Science
Datawarehouse dan Data Science
Introduction Data Science
• What and Why Data Science
• Main Role and Data Science
Method
• History and Data Science
Implementation
Introduction Data Science
• Data (noun)
• facts and statistics collected together for reference or analysis
• the quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical signals and
recorded on magnetic, optical, or mechanical recording media.
• (philosophy) things known or assumed as facts, making the basis of reasoning or
calculation.
• mid 17th century (as a term in philosophy): from Latin, plural of datum
• Datum (noun)
• a piece of information.
• a fixed starting point of a scale or operation
Introduction Data Science
• Da.ta
• n keterangan yang benar dan nyata; pengumpulan; untuk
memperoleh keterangan tentang kehidupan petani
• n keterangan atau bahan nyata yang dapat dijadikan dasar kajian
(analisis atau kesimpulan)
• nKomp informasi dalam bentuk yang dapat diproses oleh komputer,
seperti representasi digital dari teks, angka, gambar grafis, atau suara
Introduction Data Science
• Da.ta
• n keterangan yang benar dan nyata; pengumpulan; untuk
memperoleh keterangan tentang kehidupan petani
• n keterangan atau bahan nyata yang dapat dijadikan dasar kajian
(analisis atau kesimpulan)
• nKomp informasi dalam bentuk yang dapat diproses oleh komputer,
seperti representasi digital dari teks, angka, gambar grafis, atau suara
Introduction Data Science
Impact
• Data Information Knowledge Insight Wisdom
Introduction Data Science
• Data Information Knowledge Insight Wisdom
Introduction Data Science
• What Data Science
Data science combines math and
statistics, specialized programming,
advanced analytics, artificial
intelligence (AI), and machine
learning with specific subject matter
expertise to uncover actionable insights
hidden in an organization’s data. These
insights can be used to guide decision
making and strategic planning.
What is Data Science | IBM
Introduction Data Science
Data Science
Problem Solver
Introduction Data Science
• Why Data Science
1. Helps In Analyzing Needs & Resources
2. Helps Acquire Customers
3. Improves A Business’ Efficiency &Effectiveness
4. Derives Valuable Insights
5. Can Slash Marketing Costs
6. Helps To Make More Informed Decisions
7. Helping To Solve The Problem Of Big Data
8. Can Help Improve Business Productivity
9. Can Help A Business Expand Internationally
10. Helps A Business Against Competition
11. Can Help Improve Customer Service
12. Enables Personalized Service
13. Enhance The Workplace Experience
14. Applications In Healthcare & Biotech
15. A Part Of Research & Development
15 Reasons Why Data Science Is Important - Curious Desire
CRISP-DM
CRISP-DM
Business Understanding
The Business Understanding phase focuses on understanding
the objectives and requirements of the project
1. Determine business objectives: You should first “thoroughly understand, from a business perspective, what
the customer really wants to accomplish.” and then define business success criteria.
2. Assess situation: Determine resources availability, project requirements, assess risks and contingencies,
and conduct a cost-benefit analysis.
3. Determine data mining goals: In addition to defining the business objectives, you should also define what
success looks like from a technical data mining perspective.
4. Produce project plan: Select technologies and tools and define detailed plans for each project phase.
CRISP-DM
Data Understanding
it drives the focus to identify, collect, and analyze the data sets
that can help you accomplish the project goals
1. Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis tool.
2. Describe data: Examine the data and document its surface properties like data format, number of
records, or field identities.
3. Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships among the data.
4. Verify data quality: How clean/dirty is the data? Document any quality issues.
CRISP-DM
Data Preparation
prepares the final data set(s) for modeling
1. Select data: Determine which data sets will be used and document reasons for inclusion/exclusion.
2. Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to garbage-in, garbage-out.
A common practice during this task is to correct, impute, or remove erroneous values.
3. Construct data: Derive new attributes that will be helpful. For example, derive someone’s body mass
index from height and weight fields.
4. Integrate data: Create new data sets by combining data from multiple sources.
5. Format data: Re-format data as necessary. For example, you might convert string values that store
numbers to numeric values so that you can perform mathematical operations.
CRISP-DM
Modeling
build and assess various models based on several different
modeling techniques
1. Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net).
2. Generate test design: Pending your modeling approach, you might need to split the data into training,
test, and validation sets.
3. Build model: As glamorous as this might sound, this might just be executing a few lines of code like “reg
= LinearRegression().fit(X, y)”.
4. Assess model: Generally, multiple models are competing against each other, and the data scientist needs
to interpret the model results based on domain knowledge, the pre-defined success criteria, and the test
design.
Data Mining
CRISP-DM
Evaluation
Whereas the Assess Model task of the Modeling phase focuses
on technical model assessment, the Evaluation phase looks
more broadly at which model best meets the business and what
to do next
1. Evaluate results: Do the models meet the business success criteria? Which one(s) should we approve for
the business?
2. Review process: Review the work accomplished. Was anything overlooked? Were all steps properly
executed? Summarize findings and correct anything if needed.
3. Determine next steps: Based on the previous three tasks, determine whether to proceed to
deployment, iterate further, or initiate new projects.
CRISP-DM
Deployment
A model is not particularly useful unless the customer can
access its results.
1. Plan deployment: Develop and document a plan for deploying the model.
2. Plan monitoring and maintenance: Develop a thorough monitoring and maintenance plan to avoid
issues during the operational phase (or post-project phase) of a model.
3. Produce final report: The project team documents a summary of the project which might include a final
presentation of data mining results.
4. Review project: Conduct a project retrospective about what went well, what could have been better,
and how to improve in the future.
Data Mining Tools
Data Mining -Regression
Price Prediction
Forecasting
Production lead time prediction
Data Mining -Classification
Disease Diagnosis
Fraud Detection
Sentiment Analysis
Predictive Maintenance
Credit Risk Analysis
Etc
Data Mining -Association
Market Basket Analysis
Medicine Basket Analysis
Etc
Data Mining -Clustering
Customer Segmention
Network traffic
Price Risk Clustering
Etc
Data Mining –Data Model
Data Mining –Data Model
Data Science.pdf

More Related Content

Similar to Data Science.pdf

data science.pptx
data science.pptxdata science.pptx
data science.pptx
shaikruhiarsha3zenco
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
nikshaikh786
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data ScienceJohn B. Rollins, Ph.D.
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
Ramakrishna Reddy Bijjam
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
Data Science Council of America
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
mustaq4
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
Venkat .P
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
cloudserviceuit
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
Basma Gamal
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
Sukirti Garg
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
audeleypearl
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
roushhsiu
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
PothyeswariPothyes
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
simonithomas47935
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 

Similar to Data Science.pdf (20)

data science.pptx
data science.pptxdata science.pptx
data science.pptx
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 

Data Science.pdf

  • 1. Dr. Windu Gata, M.Kom Data Science
  • 2. Profile Dr. Windu Gata, M.Kom Internal Trainer Multimatics IT Consultant since 1995 Lecturer since 2003 Computer Researcher since 2008
  • 6. Introduction Data Science • What and Why Data Science • Main Role and Data Science Method • History and Data Science Implementation
  • 7. Introduction Data Science • Data (noun) • facts and statistics collected together for reference or analysis • the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. • (philosophy) things known or assumed as facts, making the basis of reasoning or calculation. • mid 17th century (as a term in philosophy): from Latin, plural of datum • Datum (noun) • a piece of information. • a fixed starting point of a scale or operation
  • 8. Introduction Data Science • Da.ta • n keterangan yang benar dan nyata; pengumpulan; untuk memperoleh keterangan tentang kehidupan petani • n keterangan atau bahan nyata yang dapat dijadikan dasar kajian (analisis atau kesimpulan) • nKomp informasi dalam bentuk yang dapat diproses oleh komputer, seperti representasi digital dari teks, angka, gambar grafis, atau suara
  • 9. Introduction Data Science • Da.ta • n keterangan yang benar dan nyata; pengumpulan; untuk memperoleh keterangan tentang kehidupan petani • n keterangan atau bahan nyata yang dapat dijadikan dasar kajian (analisis atau kesimpulan) • nKomp informasi dalam bentuk yang dapat diproses oleh komputer, seperti representasi digital dari teks, angka, gambar grafis, atau suara
  • 10. Introduction Data Science Impact • Data Information Knowledge Insight Wisdom
  • 11. Introduction Data Science • Data Information Knowledge Insight Wisdom
  • 12. Introduction Data Science • What Data Science Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning. What is Data Science | IBM
  • 13. Introduction Data Science Data Science Problem Solver
  • 14. Introduction Data Science • Why Data Science 1. Helps In Analyzing Needs & Resources 2. Helps Acquire Customers 3. Improves A Business’ Efficiency &Effectiveness 4. Derives Valuable Insights 5. Can Slash Marketing Costs 6. Helps To Make More Informed Decisions 7. Helping To Solve The Problem Of Big Data 8. Can Help Improve Business Productivity 9. Can Help A Business Expand Internationally 10. Helps A Business Against Competition 11. Can Help Improve Customer Service 12. Enables Personalized Service 13. Enhance The Workplace Experience 14. Applications In Healthcare & Biotech 15. A Part Of Research & Development 15 Reasons Why Data Science Is Important - Curious Desire
  • 16. CRISP-DM Business Understanding The Business Understanding phase focuses on understanding the objectives and requirements of the project 1. Determine business objectives: You should first “thoroughly understand, from a business perspective, what the customer really wants to accomplish.” and then define business success criteria. 2. Assess situation: Determine resources availability, project requirements, assess risks and contingencies, and conduct a cost-benefit analysis. 3. Determine data mining goals: In addition to defining the business objectives, you should also define what success looks like from a technical data mining perspective. 4. Produce project plan: Select technologies and tools and define detailed plans for each project phase.
  • 17. CRISP-DM Data Understanding it drives the focus to identify, collect, and analyze the data sets that can help you accomplish the project goals 1. Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis tool. 2. Describe data: Examine the data and document its surface properties like data format, number of records, or field identities. 3. Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships among the data. 4. Verify data quality: How clean/dirty is the data? Document any quality issues.
  • 18. CRISP-DM Data Preparation prepares the final data set(s) for modeling 1. Select data: Determine which data sets will be used and document reasons for inclusion/exclusion. 2. Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to garbage-in, garbage-out. A common practice during this task is to correct, impute, or remove erroneous values. 3. Construct data: Derive new attributes that will be helpful. For example, derive someone’s body mass index from height and weight fields. 4. Integrate data: Create new data sets by combining data from multiple sources. 5. Format data: Re-format data as necessary. For example, you might convert string values that store numbers to numeric values so that you can perform mathematical operations.
  • 19. CRISP-DM Modeling build and assess various models based on several different modeling techniques 1. Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net). 2. Generate test design: Pending your modeling approach, you might need to split the data into training, test, and validation sets. 3. Build model: As glamorous as this might sound, this might just be executing a few lines of code like “reg = LinearRegression().fit(X, y)”. 4. Assess model: Generally, multiple models are competing against each other, and the data scientist needs to interpret the model results based on domain knowledge, the pre-defined success criteria, and the test design.
  • 21. CRISP-DM Evaluation Whereas the Assess Model task of the Modeling phase focuses on technical model assessment, the Evaluation phase looks more broadly at which model best meets the business and what to do next 1. Evaluate results: Do the models meet the business success criteria? Which one(s) should we approve for the business? 2. Review process: Review the work accomplished. Was anything overlooked? Were all steps properly executed? Summarize findings and correct anything if needed. 3. Determine next steps: Based on the previous three tasks, determine whether to proceed to deployment, iterate further, or initiate new projects.
  • 22. CRISP-DM Deployment A model is not particularly useful unless the customer can access its results. 1. Plan deployment: Develop and document a plan for deploying the model. 2. Plan monitoring and maintenance: Develop a thorough monitoring and maintenance plan to avoid issues during the operational phase (or post-project phase) of a model. 3. Produce final report: The project team documents a summary of the project which might include a final presentation of data mining results. 4. Review project: Conduct a project retrospective about what went well, what could have been better, and how to improve in the future.
  • 24. Data Mining -Regression Price Prediction Forecasting Production lead time prediction
  • 25. Data Mining -Classification Disease Diagnosis Fraud Detection Sentiment Analysis Predictive Maintenance Credit Risk Analysis Etc
  • 26. Data Mining -Association Market Basket Analysis Medicine Basket Analysis Etc
  • 27. Data Mining -Clustering Customer Segmention Network traffic Price Risk Clustering Etc