This slide present Data Analytics concept. Topics are level of analytics, CRISP-DM, data science use cases e.g., customer segmentation, churn prediction, product recommendation, demand forecasting
La visualisation est un élément important de la compréhension et de la (re)présentation des données dans les (data) sciences. Elle repose sur des principes et des outils que Christophe Bontemps (Toulouse School of Economics) décryptera à la lumière de son expérience et de ses lectures.
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
This slide present Data Analytics concept. Topics are level of analytics, CRISP-DM, data science use cases e.g., customer segmentation, churn prediction, product recommendation, demand forecasting
La visualisation est un élément important de la compréhension et de la (re)présentation des données dans les (data) sciences. Elle repose sur des principes et des outils que Christophe Bontemps (Toulouse School of Economics) décryptera à la lumière de son expérience et de ses lectures.
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
Introduction To Machine Learning | EdurekaEdureka!
** Data Science Certification Training: https://www.edureka.co/data-science **
This Edureka's PPT on "Introduction To Machine Learning" will help you understand the basics of Machine Learning and how it can be used to solve real-world problems. The following topics are covered in this session:
Need For Machine Learning
What is Machine Learning?
Machine Learning Definitions
Machine Learning Process
Types Of Machine Learning
Type Of Problems Solved Using Machine Learning
Demo
YouTube Video: https://youtu.be/BuezNNeOGCI
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
***** Data Science Training - https://www.edureka.co/data-science *****
This Edureka tutorial on "Data Science Training" will provide you with a detailed and comprehensive training on Data Science, the real-life use cases and the various paths one can take to become a data scientist. It will also help you understand the various phases of Data Science.
Data Science Blog Series: https://goo.gl/1CKTyN
http://www.edureka.co/data-science
In this slide I answer the basic questions about machine learning like:
What is Machine Learning?
What are the types of machine learning?
How to deal with data?
How to test model performance?
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
What Is Machine Learning? | Machine Learning Basics | EdurekaEdureka!
YouTube Link: https://youtu.be/hjh1ikznScg
*** Machine Learning Certification Training - https://www.edureka.co/machine-learning-certification-training ***
This PPT covers the Basics of Machine Learning. It will explain why machine learning came to existence and how it solved major problems. This PPT also describes the various types of Machine Learning with real-life examples.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
Given at the PyData NYC 2013 conference (http://vimeo.com/79517341), and will be given at PyTennessee 2014.
Scikit-learn is one of the most well-known machine learning Python modules in existence. But how does it work, and what, for that matter, is machine learning? For those with programming experience but who are new to machine learning, this talk gives a beginner-level overview of how machine learning can be useful, important machine learning concepts, and how to implement them with scikit-learn. We’ll use real world data to look at supervised and unsupervised machine learning algorithms and why scikit-learn is useful for performing these tasks.
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
This presentation on Machine Learning will help you understand what is clustering, K-Means clustering, flowchart to understand K-Means clustering along with demo showing clustering of cars into brands, what is logistic regression, logistic regression curve, sigmoid function and a demo on how to classify a tumor as malignant or benign based on its features. Machine Learning algorithms can help computers play chess, perform surgeries, and get smarter and more personal. K-Means & logistic regression are two widely used Machine learning algorithms which we are going to discuss in this video. Logistic Regression is used to estimate discrete values (usually binary values like 0/1) from a set of independent variables. It helps to predict the probability of an event by fitting data to a logit function. It is also called logit regression. K-means clustering is an unsupervised learning algorithm. In this case, you don't have labeled data unlike in supervised learning. You have a set of data that you want to group into and you want to put them into clusters, which means objects that are similar in nature and similar in characteristics need to be put together. This is what k-means clustering is all about. Now, let us get started and understand K-Means clustering & logistic regression in detail.
Below topics are explained in this Machine Learning tutorial part -2 :
1. Clustering
- What is clustering?
- K-Means clustering
- Flowchart to understand K-Means clustering
- Demo - Clustering of cars based on brands
2. Logistic regression
- What is logistic regression?
- Logistic regression curve & Sigmoid function
- Demo - Classify a tumor as malignant or benign based on features
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
Learn more at: https://www.simplilearn.com/
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...Edureka!
** Data Science Master's Program: https://www.edureka.co/masters-program/data-scientist-certification **
This video on "How to become a Data Scientist" includes all the skills required for becoming a modern day Data Scientist. This video will answer the below questions:
1. Why should you go for data science?
2. What is the roadmap to become a data scientist?
3. What are the tools and techniques required to become a data scientist?
4. What are the roles of a data scientist?
Subscribe to our channel to get video updates. Hit the subscribe button above and click on the bell icon.
Check out our Data Science Training Playlist: https://goo.gl/Jg1pJJ
This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
Introduction To Machine Learning | EdurekaEdureka!
** Data Science Certification Training: https://www.edureka.co/data-science **
This Edureka's PPT on "Introduction To Machine Learning" will help you understand the basics of Machine Learning and how it can be used to solve real-world problems. The following topics are covered in this session:
Need For Machine Learning
What is Machine Learning?
Machine Learning Definitions
Machine Learning Process
Types Of Machine Learning
Type Of Problems Solved Using Machine Learning
Demo
YouTube Video: https://youtu.be/BuezNNeOGCI
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
***** Data Science Training - https://www.edureka.co/data-science *****
This Edureka tutorial on "Data Science Training" will provide you with a detailed and comprehensive training on Data Science, the real-life use cases and the various paths one can take to become a data scientist. It will also help you understand the various phases of Data Science.
Data Science Blog Series: https://goo.gl/1CKTyN
http://www.edureka.co/data-science
In this slide I answer the basic questions about machine learning like:
What is Machine Learning?
What are the types of machine learning?
How to deal with data?
How to test model performance?
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
What Is Machine Learning? | Machine Learning Basics | EdurekaEdureka!
YouTube Link: https://youtu.be/hjh1ikznScg
*** Machine Learning Certification Training - https://www.edureka.co/machine-learning-certification-training ***
This PPT covers the Basics of Machine Learning. It will explain why machine learning came to existence and how it solved major problems. This PPT also describes the various types of Machine Learning with real-life examples.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
Given at the PyData NYC 2013 conference (http://vimeo.com/79517341), and will be given at PyTennessee 2014.
Scikit-learn is one of the most well-known machine learning Python modules in existence. But how does it work, and what, for that matter, is machine learning? For those with programming experience but who are new to machine learning, this talk gives a beginner-level overview of how machine learning can be useful, important machine learning concepts, and how to implement them with scikit-learn. We’ll use real world data to look at supervised and unsupervised machine learning algorithms and why scikit-learn is useful for performing these tasks.
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
This presentation on Machine Learning will help you understand what is clustering, K-Means clustering, flowchart to understand K-Means clustering along with demo showing clustering of cars into brands, what is logistic regression, logistic regression curve, sigmoid function and a demo on how to classify a tumor as malignant or benign based on its features. Machine Learning algorithms can help computers play chess, perform surgeries, and get smarter and more personal. K-Means & logistic regression are two widely used Machine learning algorithms which we are going to discuss in this video. Logistic Regression is used to estimate discrete values (usually binary values like 0/1) from a set of independent variables. It helps to predict the probability of an event by fitting data to a logit function. It is also called logit regression. K-means clustering is an unsupervised learning algorithm. In this case, you don't have labeled data unlike in supervised learning. You have a set of data that you want to group into and you want to put them into clusters, which means objects that are similar in nature and similar in characteristics need to be put together. This is what k-means clustering is all about. Now, let us get started and understand K-Means clustering & logistic regression in detail.
Below topics are explained in this Machine Learning tutorial part -2 :
1. Clustering
- What is clustering?
- K-Means clustering
- Flowchart to understand K-Means clustering
- Demo - Clustering of cars based on brands
2. Logistic regression
- What is logistic regression?
- Logistic regression curve & Sigmoid function
- Demo - Classify a tumor as malignant or benign based on features
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
Learn more at: https://www.simplilearn.com/
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...Edureka!
** Data Science Master's Program: https://www.edureka.co/masters-program/data-scientist-certification **
This video on "How to become a Data Scientist" includes all the skills required for becoming a modern day Data Scientist. This video will answer the below questions:
1. Why should you go for data science?
2. What is the roadmap to become a data scientist?
3. What are the tools and techniques required to become a data scientist?
4. What are the roles of a data scientist?
Subscribe to our channel to get video updates. Hit the subscribe button above and click on the bell icon.
Check out our Data Science Training Playlist: https://goo.gl/Jg1pJJ
This presentation described Big Data concept. Then it shows example of applications in Banking. The presenter is Dr. Tuangtong Wattarujeekrit in Big Data Analytics Day event.
This slides present concept of Data Mining and Big Data Analytics. The topices are:
- Internet of Things (IoT)
- Data Science/Mining applications
- Data Science/Mining techniques including (1) Association, (2) Clustering, (3) Classification
- CRISP-DM: Cross Industry Standard Process for Data Mining
Introduction to big data and analytic eakasit patcharawongsakdaBAINIDA
Introduction to big data and analytic Eakasit Patcharawongsakda ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Introduction to Predictive Analytics with case studies
1. Introduction to
Predictive Analytics
with case studies
Eakasit Pacharawongsakda, Ph.D.
Co-founders of Data Cube &
Big Data Engineering Program, CITE, DPU
28 June 2017 at
The 2nd NIDA Business Analytics and Data Sciences Conference
2. http://dataminingtrend.com http://facebook.com/datacube.th
About us
• ชื่อ: เอกสิทธิ์ พัชรวงศ์ศักดา
• การศึกษา:
• ปริญญาเอก วิทยาการคอมพิวเตอร์
สถาบันเทคโนโลยีนานาชาติสิรินธร (SIIT) มหาวิทยาลัยธรรมศาสตร์
• ปริญญาโท วิศวกรรมคอมพิวเตอร์ มหาวิทยาลัยเกษตรศาสตร์
• ปริญญาตรี วิศวกรรมคอมพิวเตอร์ มหาวิทยาลัยเกษตรศาสตร์
(เกียรตินิยมอันดับ 2)
• ประสบการณ์
• Certified RapidMiner Analyst and Ambassador
• Research Collaboration with Western Digital (Thailand) เฟสที่ 1 ระยะเวลา 6 เดือน
• ร่วมวิจัย โครงการสํารวจข้อมูลเพื่อการวิเคราะห์พฤติกรรมของนักท่องเที่ยวเชิงลึก ด้วยวิธีการทําเหมือง
ข้อมูล การท่องเที่ยวแห่งประเทศไทย (ททท)
• วิทยากรอบรมการใช้งานซอฟต์แวร์ open source ทางด้าน data mining
2
4. http://dataminingtrend.com http://facebook.com/datacube.th
About us
4
RapidMiner Analyst
Certification
This is to Certify that
Successfully passed the examination for the Certified RapidMiner Analyst.
The RapidMiner Analyst certification level is designed for individuals who wish to demonstrate
a fundamental understanding of how RapidMiner software works and is used.
Certified Analyst professionals will be able to prepare data and create predictive models in
standard data environments typically found within most analyst positions.
The candidate has proven the ability to:
Prepare data Build predictive models
Evaluate the model’s quality Score new data sets
Deploy data mining models
With:
RapidMiner Studio RapidMiner Server
Date:
Eakasit Pacharawongsakda
33. http://dataminingtrend.com http://facebook.com/datacube.th
BI & Data Mining
33
Business
Intelligence
Data
Mining
Time
Analytical
Approach
Past Future
Explanatory
Exploratory
source:Data Science and Big Data Analytics: Discovering, analyzing, visualizing and presenting data
BI questions
• What happened last
quarter?
• How many unit sold?
• Where is the problem? In
which situations
Data Mining questions
• What if … ?
• What will happen next?
• Why is this happen?
34. http://dataminingtrend.com http://facebook.com/datacube.th
What is data mining
• “The exploration and analysis of large quantities
of data in order to discover meaningful patterns and
rules” – Data Mining Techniques (3rd Edition)
• เป็นการวิเคราะห์ข้อมูล เพื่อหารูปแบบ (patterns) หรือความสัมพันธ์
(relation) ระหว่างข้อมูลในฐานข้อมูลขนาดใหญ่
• “Extraction of interesting (non-trivial, previously,
unknown and potential useful) information from data in
large databases” – Data Mining Concepts &
Techniques (3rd Edition)
• เป็นกระบวนการดึงข่าวสารที่น่าสนใจ และมีประโยชน์แต่ไม่เคยรู้มา
ก่อนจากฐานข้อมูลขนาดใหญ่
34
image sources: https://binarylinks.wordpress.com/tag/data-mining/
http://www.amazon.com/Data-Mining-Techniques-Relationship-Management/dp/0470650931
35. http://dataminingtrend.com http://facebook.com/datacube.th
What is data mining
35
ข้อมูล' เทคนิคการทำ data mining' รูปแบบที่มีประโยชน์'
image source:http://www.computerrepairanaheim.net
https://sites.google.com/a/whps.org/diamond-teamkp/
http://meetings2.informs.org/wordpress/analytics2014/2014/04/01/why-oranalytics-people-need-to-know-about-database-technology/
74. http://dataminingtrend.com http://facebook.com/datacube.th
Classification example
• ตัวอย่าง spam e-mail classification
• ระบุว่า e-mail ไหนบ้างที่เป็น spam e-mail
74
ID Text Type
1
Please call our customer service representative on FREE PHONE 0808 145 4742 between
9am-11pm as you have WON a guaranteed £1000 cash
2 You have won $1,000 cash or a $2,000 prize! To claim, call 09050000327
3 I'm gonna be home soon and I don't want to talk about this stuff anymore tonight
4 Is that seriously how you spell his name?
5
Double mins and txts 4 6months FREE Bluetooth on Orange. Available on Sony, Nokia Motorola
phones.
6 FREE RINGTONE text FIRST to 87131 for a poly or text GET to 87131 for a true tone!
7 Sorry, I'll you call later in meeting.
8
Congratulations - in this week's competition draw u have won the £1450 prize to claim just call
09050002311
9 Thanks a lot for your wishes on my birthday. Thanks you for making my birthday truly memorable.
10 Hello, What are you doing? Did you attend the training course today?
spam
spam
normal
normal
normal
normal
spam
spam
spam
normal
75. http://dataminingtrend.com http://facebook.com/datacube.th
Classification example
• ตัวอย่าง spam e-mail classification
• ระบุว่า e-mail ไหนบ้างที่เป็น spam e-mail
75
ID Text Type
1
Please call our customer service representative on FREE PHONE 0808 145 4742 between
9am-11pm as you have WON a guaranteed £1000 cash
spam
2 You have won $1,000 cash or a $2,000 prize! To claim, call 09050000327 spam
3 I'm gonna be home soon and I don't want to talk about this stuff anymore tonight normal
4 Is that seriously how you spell his name? normal
5
Double mins and txts 4 6months FREE Bluetooth on Orange. Available on Sony, Nokia Motorola
phones.
spam
6 FREE RINGTONE text FIRST to 87131 for a poly or text GET to 87131 for a true tone! spam
7 Sorry, I'll you call later in meeting. normal
8
Congratulations - in this week's competition draw u have won the £1450 prize to claim just call
09050002311
spam
9 Thanks a lot for your wishes on my birthday. Thanks you for making my birthday truly memorable. normal
10 Hello, What are you doing? Did you attend the training course today? normal
76. http://dataminingtrend.com http://facebook.com/datacube.th
Classification example
• ตัวอย่าง spam e-mail classification
• หา keyword ที่ใช้บ่งบอกว่าเป็น spam e-mail
76
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Text Type
1
Please call our customer service
representative on FREE PHONE 0808
145 4742 between 9am-11pm as you
have WON a guaranteed £1000 cash
spam
2
You have won $1,000 cash or a $2,000
prize! To claim, call 09050000327
spam
3
I'm gonna be home soon and I don't
want to talk about this stuff anymore
tonight
normal
4
Is that seriously how you spell his
name?
normal
5
Double mins and txts 4 6months FREE
Bluetooth on Orange. Available on
Sony, Nokia Motorola phones.
spam
… … …
keywords
77. http://dataminingtrend.com http://facebook.com/datacube.th
Classification example
• ตัวอย่าง spam e-mail classification
• สร้างโมเดล (classification model) จากข้อมูล training data ซึ่งมีลาเบล (label)
77
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
attribute label
Free
Won
Normal Spam
Spam
classification model
= N = Y
= N = Y
training data
81. http://dataminingtrend.com http://facebook.com/datacube.th
• ตัวอย่าง spam e-mail classification
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
Classification example
81
attribute labelID
training data
สร้าง classification model
ID Free Won Cash Type
11 Y Y N ?
12 N Y N ?
unseen data
classification model
ID Type
11 spam
12 spam
1
2
3 4
85. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• พิจารณาคลาส normal
• True Positive (TP)
• True Negative (TN)
• False Positive (FP)
• False Negative (FN)
85
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
pred.true. normal spam
normal TP FP
spam FN TN
dataminingtrend.com
86. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• พิจารณาคลาส normal
• True Positive (TP)
• จำนวนที่ทำนายตรงกับข้อมูลจริงใน
คลาสที่กำลังพิจารณา
• True Negative (TN)
• False Positive (FP)
• False Negative (FN)
86
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
pred.true. normal spam
normal 4 FP
spam FN TN
dataminingtrend.com
87. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• พิจารณาคลาส normal
• True Positive (TP)
• True Negative (TN)
• จำนวนที่ทำนายตรงกับข้อมูลจริงใน
คลาสที่ไม่ได้กำลังพิจารณา
• False Positive (FP)
• False Negative (FN)
87
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
pred.true. normal spam
normal 4 FP
spam FN 6
dataminingtrend.com
88. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• พิจารณาคลาส normal
• True Positive (TP)
• True Negative (TN)
• False Positive (FP)
• จำนวนที่ทำนายผิดเป็นคลาสที่กำลัง
พิจารณา
• False Negative (FN)
88
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
pred.true. normal spam
normal 4 3
spam FN 6
dataminingtrend.com
89. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• พิจารณาคลาส normal
• True Positive (TP)
• True Negative (TN)
• False Positive (FP)
• False Negative (FN)
• จำนวนที่ทำนายผิดเป็นคลาสที่ไม่ได้
กำลังพิจารณา
89
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
4 normal spam
5 spam spam
6 spam spam
7 normal spam
8 spam normal
9 normal normal
10 normal normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
15 normal normal
pred.true. normal spam
normal 4 3
spam 2 6
dataminingtrend.com
91. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• Precision
• จำนวนที่ทำนายถูกจากข้อมูลที่
ทำนายว่าเป็นคลาสที่พิจารณาอยู่
• Precision สำหรับ normal
• True Positive
True Positive + False Positive
• 4/7 x 100 = 57.12%
• Precision สำหรับ spam
• 6/8 x 100 = 75%
91
ID Type Predicted
3 normal normal
8 spam normal
9 normal normal
10 normal normal
13 spam normal
14 spam normal
15 normal normal
pred.true. normal spam
normal TP FP
spam FN TN
Precision
ID Type Predicted
1 spam spam
2 spam spam
4 normal spam
5 spam spam
6 spam spam
7 normal spam
11 spam spam
12 spam spam
predict เป็นคลาส spam
predict เป็นคลาส normal
confusion matrix ของคลาส normal
92. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• Recall
• จำนวนข้อมูลที่ทำนายถูก
• Recall สำหรับ normal
• True Positive
True Positive + False Negative
• 4/6 x 100 = 66.67%
• Recall สำหรับ spam
• 6/9 x 100 = 66.67%
92
pred.true. normal spam
normal TP FP
spam FN TN
คลาส spam
คลาส normal
confusion matrix ของคลาส normal
Recall
ID Type Predicted
3 normal normal
4 normal spam
7 normal spam
9 normal normal
10 normal normal
15 normal normal
ID Type Predicted
1 spam spam
2 spam spam
5 spam spam
6 spam spam
8 spam normal
11 spam spam
12 spam spam
13 spam normal
14 spam normal
94. http://dataminingtrend.com http://facebook.com/datacube.th
Performance (classification)
• F-Measure
• ค่าเฉลี่ยของ Precision และ Recall
• 2 x Precision x Recall
Precision + Recall
• F-Measure สำหรับ normal
• 2 x 57.12 x 66.67 = 61.53%
57.12 + 66.67
• F-Measure สำหรับ spam
• 2 x 75 x 66.7 = 70.59%
75 + 66.7
94
ID Type Predicted
3 normal normal
8 spam normal
9 normal normal
10 normal normal
13 spam normal
14 spam normal
15 normal normal
Precision = 4/7 x 100 = 57.12%
Recall = 4/6 x 100 = 66.67%
ID Type Predicted
3 normal normal
4 normal spam
7 normal spam
9 normal normal
10 normal normal
15 normal normal
110. http://dataminingtrend.com http://facebook.com/datacube.th
• ใช้ข้อมูล training ในการทดสอบประสิทธิภาพของโมเดล
Self Consistency test
110
สร้าง
classification model
prediction results
classification model
ID Type Predicted
1 spam spam
2 spam spam
3 normal normal
1
training data
testing data
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
2
3 4
Goodช้อมูลชุดเดียวกัน
111. http://dataminingtrend.com http://facebook.com/datacube.th
• แบ่งข้อมูลออกเป็น 2 ชุด
• training data สำหรับสร้างโมเดล และ testing data สำหรับทดสอบ
Split test
111
สร้าง
classification model
prediction results
classification model
ID Type Predicted
3 normal normal
1
training data
testing data
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
ID Free Won Cash Type
3 N N N normal
2
3 4
ข้อมูล 2 ใน 3 ใช้สร้างโมเดล
ข้อมูล 1 ใน 3 ใช้ทดสอบโมเดล
112. http://dataminingtrend.com http://facebook.com/datacube.th
• แบ่งข้อมูลออกเป็น N ชุด เช่น N = 5 หรือ 10
• ข้อมูล N-1 ชุดสำหรับสร้างโมเดล และ ข้อมูลส่วนที่เหลือสำหรับทดสอบ วนทำจนครบ N
Cross-validation
112
สร้าง
classification model
prediction results
classification model
ID Type Predicted
3 normal normal
1
training data
testing data
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
ID Free Won Cash Type
3 N N N normal
2
3 4
ข้อมูล ID 1 และ 2 ใช้สร้างโมเดล
ข้อมูล ID 3 ใช้ทดสอบโมเดล
113. http://dataminingtrend.com http://facebook.com/datacube.th
• แบ่งข้อมูลออกเป็น N ชุด เช่น N = 5 หรือ 10
• ข้อมูล N-1 ชุดสำหรับสร้างโมเดล และ ข้อมูลส่วนที่เหลือสำหรับทดสอบ วนทำจนครบ N
Cross-validation
113
สร้าง
classification model
prediction results
classification model
ID Type Predicted
2 spam spam
1
training data
testing data
ID Free Won Cash Type
1 Y Y Y spam
3 N N N normal
ID Free Won Cash Type
2 N Y Y spam
2
3 4
ข้อมูล ID 1 และ 3 ใช้สร้างโมเดล
ข้อมูล ID 2 ใช้ทดสอบโมเดล
114. http://dataminingtrend.com http://facebook.com/datacube.th
• แบ่งข้อมูลออกเป็น N ชุด เช่น N = 5 หรือ 10
• ข้อมูล N-1 ชุดสำหรับสร้างโมเดล และ ข้อมูลส่วนที่เหลือสำหรับทดสอบ วนทำจนครบ N
Cross-validation
114
สร้าง
classification model
prediction results
classification model
ID Type Predicted
1 spam spam
1
training data
testing data
ID Free Won Cash Type
2 N Y Y spam
3 N N N normal
ID Free Won Cash Type
1 Y Y Y spam
2
3 4
ข้อมูล ID 2 และ 3 ใช้สร้างโมเดล
ข้อมูล ID 1 ใช้ทดสอบโมเดล
115. http://dataminingtrend.com http://facebook.com/datacube.th
• ตัวอย่างของ 5-fold cross-validation
Cross-validation
115
ID Attributes Label
1 X1 spam
2 X2 spam
3 X3 normal
4 X4 spam
5 X5 spam
6 X6 spam
7 X7 spam
8 X8 normal
9 X9 normal
10 X10 normal
11 X11 spam
12 X12 spam
13 X13 normal
14 X14 normal
15 X15 normal
1
2
3
4
5
2
3
4
5
1
training
testing
รอบที่ 1
1
3
4
5
2
training
testing
รอบที่ 2
1
2
4
5
3
training
testing
รอบที่ 3
1
2
3
5
4
training
testing
รอบที่ 4
1
2
3
4
5
training
testing
รอบที่ 5
model model model model model
training data
116. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• Overview of a Decision Tree
Logins 4 weeks
> 6.5 < 6.5
Emailyes
yes
= free = premium
Sales 4 weeks
yes no
> 2 < 2
116
Depth = 1
Root — Top internal node
Branch — Outcome of test
Leaf Node — Class label
Internal Node — Decision on variable
117. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• สร้างกฏได้จาก Decision Tree โดยการใส่ไปตามแต่ละ Path ของ
Tree
117
Logins 4 weeks
> 6.5 < 6.5
Emailyes
yes
= free = premium
Sales 4 weeks
yes no
> 2 < 2
โมเดล decision tree
• IF Logins 4 weeks > 6.5 THEN
Response = yes
business rule ที่ได้จากโมเดล decision tree
118. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• สร้างกฏได้จาก Decision Tree โดยการใส่ไปตามแต่ละ Path ของ
Tree
118
Logins 4 weeks
> 6.5 < 6.5
Emailyes
yes
= free = premium
Sales 4 weeks
yes no
> 2 < 2
โมเดล decision tree
• IF Logins 4 weeks > 6.5 THEN
Response = yes
• IF Logins 4 weeks < 6.5 AND
Email = premium THEN
Response = yes
business rule ที่ได้จากโมเดล decision tree
119. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• สร้างกฏได้จาก Decision Tree โดยการใส่ไปตามแต่ละ Path ของ
Tree
119
Logins 4 weeks
> 6.5 < 6.5
Emailyes
yes
= free = premium
Sales 4 weeks
yes no
> 2 < 2
โมเดล decision tree
• IF Logins 4 weeks > 6.5 THEN
Response = yes
• IF Logins 4 weeks < 6.5 AND
Email = premium THEN
Response = yes
• IF Logins 4 weeks < 6.5 AND
Email = free AND
Sales 4 weeks > 2 THEN
Response = yes
business rule ที่ได้จากโมเดล decision tree
120. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• สร้างกฏได้จาก Decision Tree โดยการใส่ไปตามแต่ละ Path ของ
Tree
120
Logins 4 weeks
> 6.5 < 6.5
Emailyes
yes
= free = premium
Sales 4 weeks
yes no
> 2 < 2
โมเดล decision tree
• IF Logins 4 weeks > 6.5 THEN
Response = yes
• IF Logins 4 weeks < 6.5 AND
Email = premium THEN
Response = yes
• IF Logins 4 weeks < 6.5 AND
Email = free AND
Sales 4 weeks > 2 THEN
Response = yes
• IF Logins 4 weeks < 6.5 AND
Email = free AND
Sales 4 weeks < 2 THEN
Response = no
business rule ที่ได้จากโมเดล decision tree
123. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูล Weather
• เก็บสภาพภูมิอากาศจำนวน 14 วันเพื่อพิจารณาว่าจะมีการแข่งขันกีฬาได้หรือไม่
123
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
135. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• การใช้โมเดล predict ข้อมูลใหม่
135
Outlook
Humidity
= sunny = rainy
No
Yes Windy
= overcast
Yes No Yes
= high = normal = TRUE = FALSE
ID Outlook Temperature Humidity Windy
1 sunny hot high FALSE
โมเดล decision tree
ข้อมูลที่ใช้ทดสอบ
136. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• การใช้โมเดล predict ข้อมูลใหม่
136
Outlook
Humidity
= sunny = rainy
No
Yes Windy
= overcast
Yes No Yes
= high = normal = TRUE = FALSE
ID Outlook Temperature Humidity Windy
1 sunny hot high FALSE
โมเดล decision tree
ข้อมูลที่ใช้ทดสอบ
137. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เรียงลำดับข้อมูลที่เป็นตัวเลขจากน้อยไปมาก
• แบ่งข้อมูลออกเป็น 2 ส่วนโดยการหาจุดกึ่งกลางระหว่างค่าตัวเลข 2 ค่า
• คำนวณค่า Information Gain จากข้อมูล 2 ส่วนที่แบ่งได้
• เลือกจุดกึ่งกลางที่ให้ค่า Information Gain สูงที่สุดมาใช้งานต่อ
137
138. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 67.5 เป็นตัวแบ่ง ได้ค่า IG = 0.11
138
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 67.5
ID Humidity Play
7 < 67.5 no
6 > 67.5 no
9 > 67.5 yes
11 > 67.5 yes
13 > 67.5 yes
3 > 67.5 no
5 > 67.5 yes
10 > 67.5 no
14 > 67.5 yes
1 > 67.5 yes
2 > 67.5 yes
12 > 67.5 yes
8 > 67.5 yes
4 > 67.5 no
139. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 72.5 เป็นตัวแบ่ง ได้ค่า IG = 0.25
139
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 72.5
ID Humidity Play
7 < 72.5 no
6 < 72.5 no
9 < 72.5 yes
11 < 72.5 yes
13 > 72.5 yes
3 > 72.5 no
5 > 72.5 yes
10 > 72.5 no
14 > 72.5 yes
1 > 72.5 yes
2 > 72.5 yes
12 > 72.5 yes
8 > 72.5 yes
4 > 72.5 no
140. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 76.5 เป็นตัวแบ่ง ได้ค่า IG = 0.03
140
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 76.5
ID Humidity Play
7 < 76.5 no
6 < 76.5 no
9 < 76.5 yes
11 < 76.5 yes
13 < 76.5 yes
3 > 76.5 no
5 > 76.5 yes
10 > 76.5 no
14 > 76.5 yes
1 > 76.5 yes
2 > 76.5 yes
12 > 76.5 yes
8 > 76.5 yes
4 > 76.5 no
141. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 79.0 เป็นตัวแบ่ง ได้ค่า IG = 0.05
141
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 79.0
ID Humidity Play
7 < 79.0 no
6 < 79.0 no
9 < 79.0 yes
11 < 79.0 yes
13 < 79.0 yes
3 < 79.0 no
5 > 79.0 yes
10 > 79.0 no
14 > 79.0 yes
1 > 79.0 yes
2 > 79.0 yes
12 > 79.0 yes
8 > 79.0 yes
4 > 79.0 no
142. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 82.5 เป็นตัวแบ่ง ได้ค่า IG = 0.05
142
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 82.5
ID Humidity Play
7 < 82.5 no
6 < 82.5 no
9 < 82.5 yes
11 < 82.5 yes
13 < 82.5 yes
3 < 82.5 no
5 < 82.5 yes
10 < 82.5 no
14 < 82.5 yes
1 > 82.5 yes
2 > 82.5 yes
12 > 82.5 yes
8 > 82.5 yes
4 > 82.5 no
143. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 87.5 เป็นตัวแบ่ง ได้ค่า IG = 0.02
143
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 87.5
ID Humidity Play
7 < 87.5 no
6 < 87.5 no
9 < 87.5 yes
11 < 87.5 yes
13 < 87.5 yes
3 < 87.5 no
5 < 87.5 yes
10 < 87.5 no
14 < 87.5 yes
1 < 87.5 yes
2 > 87.5 yes
12 > 87.5 yes
8 > 87.5 yes
4 > 87.5 no
144. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 92.5 เป็นตัวแบ่ง ได้ค่า IG = 0.01
144
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 92.5
ID Humidity Play
7 < 92.5 no
6 < 92.5 no
9 < 92.5 yes
11 < 92.5 yes
13 < 92.5 yes
3 < 92.5 no
5 < 92.5 yes
10 < 92.5 no
14 < 92.5 yes
1 < 92.5 yes
2 < 92.5 yes
12 < 92.5 yes
8 > 92.5 yes
4 > 92.5 no
145. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
• เมื่อใช้ Humidity = 95.5 เป็นตัวแบ่ง ได้ค่า IG = 0.01
145
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
ค่าเฉลี่ย = 95.5
ID Humidity Play
7 < 95.5 no
6 < 95.5 no
9 < 95.5 yes
11 < 95.5 yes
13 < 95.5 yes
3 < 95.5 no
5 < 95.5 yes
10 < 95.5 no
14 < 95.5 yes
1 < 95.5 yes
2 < 95.5 yes
12 < 95.5 yes
8 > 95.5 yes
4 > 95.5 no
146. http://dataminingtrend.com http://facebook.com/datacube.th
Decision Tree
• ข้อมูลเป็นตัวเลข
146
ID Humidity Play
7 65.0 no
6 70.0 no
9 70.0 yes
11 70.0 yes
13 75.0 yes
3 78.0 no
5 80.0 yes
10 80.0 no
14 80.0 yes
1 85.0 yes
2 90.0 yes
12 90.0 yes
8 95.0 yes
4 96.0 no
จุดตัด IG
67.5 0.11
72.5 0.25
76.5 0.03
79.0 0.05
82.5 0.05
87.5 0.02
92.5 0.01
95.5 0.01
ตารางจุดตัดและค่า Information Gain (IG)
ให้ค่า IG มากที่สุด
148. http://dataminingtrend.com http://facebook.com/datacube.th
Probability
• Joint Probability
• ความน่าจะเป็นที่ 2 เหตุการณ์เกิดร่วมกัน
• ความน่าจะเป็นที่มีคำว่า Free อยู่ใน spam email
• สัญลักษณ์ P(Free=Y ∩ spam)
148
all email (100 ฉบับ)
spam
(20 ฉบับ)
normal
(80 ฉบับ)
Free
ความน่าจะเป็นที่มีคำว่า
Free ใน normal email
ความน่าจะเป็นที่เป็น
spam email
ความน่าจะเป็นที่มีคำว่า
Free ใน spam email
149. http://dataminingtrend.com http://facebook.com/datacube.th
Naive Bayes
• ใช้หลักการของความน่าจะเป็น (probability)
149
P(A|B) = P(A∩B)
P(B)
ความน่าจะเป็นที่ B เกิด
ก่อนและ A เกิดตามมา
ความน่าจะเป็นที่ A
และ B เกิดร่วมกัน
P(B|A) = P(A∩B)
P(A)
P(A∩B) = P(A|B) x P(B) = P(B|A) x P(A)
P(B|A) = P(A|B) x P(B)
P(A)
Bayes Theorem
150. http://dataminingtrend.com http://facebook.com/datacube.th
Naive Bayes
• P (C|A) คือ ความน่าจะเป็นของข้อมูลที่มีแอตทริบิวต์ A จะมีคลาส C
• P (A|C) คือ ความน่าจะเป็นของข้อมูลใน training data ที่มีแอตทริบิวต์ A และมี
คลาส C
• P(A|C) = P(a1 ∩a2 ∩a3…∩aM|C)
• P(A|C) = P(a1|C) x P(a2|C) x … P(aM|C)
• P (C) หรือ P (A) คือ ความน่าจะเป็นของคลาส C หรือ แอตทริบิวต์ A
150
P(C|A) = P(A|C) x P(C)
P(A)
LikelihoodPosterior probability Prior probability
151. http://dataminingtrend.com http://facebook.com/datacube.th
Naive Bayes
• สร้างโมเดลเพื่อทำนาย spam email
151
P(Type = normal) = 5/10 = 0.50
P(Type = spam) = 5/10 = 0.50
attribute Type = normal Type = spam
Free = Y 0/5 = 0.00 3/5 = 0.60
Free = N 5/5 = 1.00 2/5 = 0.40
Won = Y 0/5 = 0.00 3/5 = 0.60
Won = N 5/5 = 1.00 2/5 = 0.40
Cash = Y 0/5 = 0.00 2/5 = 0.40
Cash = N 5/5 = 1.00 3/5 = 0.60
ID Free Won Cash Type
3 N N N normal
4 N N N normal
7 N N N normal
9 N N N normal
10 N N N normal
1 Y Y Y spam
2 N Y Y spam
5 Y N N spam
6 Y N N spam
8 N Y N spam
โมเดล Naive Bayes
training data
152. http://dataminingtrend.com http://facebook.com/datacube.th
Naive Bayes
• การใช้โมเดลเพื่อ predict ข้อมูลใหม่
152
P(Type = normal) = 5/10 = 0.50
P(Type = spam) = 5/10 = 0.50
attribute Type = normal Type = spam
Free = Y 0/5 = 0.00 3/5 = 0.60
Free = N 5/5 = 1.00 2/5 = 0.40
Won = Y 0/5 = 0.00 3/5 = 0.60
Won = N 5/5 = 1.00 2/5 = 0.40
Cash = Y 0/5 = 0.00 2/5 = 0.40
Cash = N 5/5 = 1.00 3/5 = 0.60
โมเดล Naive Bayes
ID Free Won Cash
1 Y Y Y
ข้อมูลที่ใช้ทดสอบ
P(Type = normal|A) = P(Free = Y|Type = normal) x
P(Won = Y|Type = normal) x
P(Cash = Y|Type = normal) x
P(Type = normal)
= 0.00 x 0.00 x 0.00 x 0.50
= 0.00
P(Type = spam|A) = P(Free = Y|Type = spam) x
P(Won = Y|Type = spam) x
P(Cash = Y|Type = spam) x
P(Type = spam)
= 0.60 x 0.60 x 0.40 x 0.50
= 0.07
P(C|A) = P(A|C) x P(C)
ค่า prob มากสุด
154. http://dataminingtrend.com http://facebook.com/datacube.th
Classification: Balanced data
• ในการสร้างโมเดลจำเป็นต้องมี training
data เพื่อให้เรียนรู้
• แอตทริบิวต์ทั่วไป คือ แอตทริบิวต์หรือ
ตัวแปรที่ใช้ในการสร้างโมเดล
• แอตทริบิวต์ประเภทลาเบล คือ
แอตทริบิวต์ที่เป็นคำตอบที่เราสนใจในการ
สร้างโมเดล เช่น spam/normal, response/
no response
• ข้อมูล training data ควรจะมีข้อมูลแต่
ละลาเบล (label) เท่ากัน หรือ ใกล้เคียง
กัน (balanced data) เพื่อให้โมเดล
สามารถเรียนรู้ได้จากทุกลาเบล
154
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
attribute label
ข้อมูล training data ที่เป็น balanced data
158. http://dataminingtrend.com http://facebook.com/datacube.th
• พิจารณาคลาส normal
• Accuracy = 85%
• Recall (normal) = 100%
• Recall (fraud) = 0%
• Precision (normal) = 85%
• Precision (fraud) = 0%
Performance of unbalanced data
158
ID Type Predicted
1 fraud normal
2 fraud normal
3 fraud normal
4 normal normal
5 normal normal
6 normal normal
7 normal normal
8 normal normal
9 normal normal
10 normal normal
11 normal normal
12 normal normal
13 normal normal
14 normal normal
15 normal normal
16 normal normal
17 normal normal
18 normal normal
19 normal normal
20 normal normal
pred.true. true normal true fraud
pred. normal 17 3
pred. fraud 0 0
ID Type Predicted
1 fraud normal
2 fraud normal
3 fraud normal
4 normal normal
5 normal normal
6 normal normal
7 normal normal
8 normal normal
9 normal normal
10 normal normal
11 normal normal
12 normal normal
13 normal normal
14 normal normal
15 normal normal
16 normal normal
17 normal normal
18 normal normal
19 normal normal
20 normal normal
159. http://dataminingtrend.com http://facebook.com/datacube.th
Classification: unbalanced data
• การแก้ไขปัญหาของ imbalanced data
• sampling approach
• under-sampling
• สุ่มตัวอย่าง (sample) ข้อมูลที่เป็น majority class ให้มีจำนวนน้อยลง
• over-sampling
• สร้างข้อมูลตัวอย่างที่เป็น minority class ให้มีจำนวนเพิ่มขึ้น
• cost-sensitive approach
• กำหนดค่าน้ำหนัก (weight) ให้แต่ละลาเบลไม่เท่ากัน
• minority class จะมีค่าน้ำหนักมาก
• majority class จะมีค่าน้ำหนักน้อยกว่า
159
165. http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
• แบ่งได้เป็น 2 แบบ
• Filter approach เป็นการคำนวณค่าน้ำหนัก (หรือค่าความสัมพันธ์) ของแต่ละ
แอตทริบิวต์และเลือกเฉพาะแอตทริบิวต์ที่สำคัญเก็บไว้
• Wrapper approach เป็นการคำนวณค่าน้ำหนักโดยใช้โมเดล classification เป็นตัว
วัดประสิทธิภาพของแอตทริบิวต์
165
ID Free Won Cash Call Service Type
1 Y Y Y Y Y spam
2 N Y Y Y N spam
compute weight
ID Free Won Type
1 Y Y spam
2 N Y spam
แอตทริบิวต์ทั้งหมดใน training data
แอตทริบิวต์หลังจากการเลือก
(selection) แล้ว
ID Free Won Cash Call Service Type
1 Y Y Y Y Y spam
2 N Y Y Y N spam
ID Free Won Type
1 Y Y spam
2 N Y spam
แอตทริบิวต์ทั้งหมดใน training data
แอตทริบิวต์หลังจากการเลือก
(selection) แล้ว
classification
model
Attribute Selection: Filter Approach
Attribute Selection: Wrapper Approach
166. http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
• เป็นวิธีการเลือกแอตทริบิวต์ใส่เข้าไปหรือถอดออกมาเพื่อสร้างโมเดล
และเลือก set ของแอตทริบิวต์ทีดีไว้ใช้
• ใช้แอตทริบิวต์ Free อย่างเดียว
166
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Free Type
1 Y spam
2 N spam
3 N normal
4 N normal
5 Y spam
6 Y spam
7 N normal
8 N spam
9 N normal
10 N normal
167. http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
• เป็นวิธีการเลือกแอตทริบิวต์ใส่เข้าไปหรือถอดออกมาเพื่อสร้างโมเดล
และเลือก set ของแอตทริบิวต์ทีดีไว้ใช้
• ใช้แอตทริบิวต์ Won อย่างเดียว
167
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Won Type
1 Y spam
2 Y spam
3 N normal
4 N normal
5 N spam
6 N spam
7 N normal
8 Y spam
9 N normal
10 N normal
168. http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
• เป็นวิธีการเลือกแอตทริบิวต์ใส่เข้าไปหรือถอดออกมาเพื่อสร้างโมเดล
และเลือก set ของแอตทริบิวต์ทีดีไว้ใช้
• ใช้แอตทริบิวต์ Cash อย่างเดียว
168
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Cash Type
1 Y spam
2 Y spam
3 N normal
4 N normal
5 N spam
6 N spam
7 N normal
8 N spam
9 N normal
10 N normal
169. http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
• เป็นวิธีการเลือกแอตทริบิวต์ใส่เข้าไปหรือถอดออกมาเพื่อสร้างโมเดล
และเลือก set ของแอตทริบิวต์ทีดีไว้ใช้
• ใช้แอตทริบิวต์ Free และ Won
169
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Free Won Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N Y spam
9 N N normal
10 N N normal