Machine Learning and Big Data
By Poo Kuan Hoong (Multimedia University)
Disclaimer: The views and opinions expressed in this slides are those of
the author and do not necessarily reflect the official policy or position
of Multimedia University. Examples of analysis performed within this
slides are only examples. They should not be utilized in real-world
analytic products as they are based only on very limited and dated
open source information. Assumptions made within the analysis are
not reflective of the position of Multimedia University.
Data Science Institute
• The Data Science Institute is a research
center based in the Faculty of Computing
& Informatics, Multimedia University.
• The members comprise of expertise
across faculties such as Faculty of
Computing and Informatics, Faculty of
Engineering, Faculty of Management &
Faculty of Information Science and
Technology.
• Conduct research in leading data science
areas including stream mining, video
analytics, machine learning, deep
learning, next generation data
visualization and advanced data
modelling.
Domain Sub-Domain Research Areas
Algorithm and Machine
Learning
High Performance and
Parallel Computing
1. HPC for massive heterogeneous data
sources
2. Enhanced algorithmic performance using
shared and distributed memory parallel
processing (GPGPU).
Performance Optimization 1. Big Data Stream Mining
2. Data Storage
Social Media Analytics Data mining 1. Predictive Analytics
Social Media Modelling 1. Sentiment Analysis
2. Topic Modelling
Research Structure
Research Structure
Domain Sub-Domain Research Areas
Behavioral Analytics Media Analytics 1. Media Recommender
2. Customer Profiling
Smart Cities 1. Sensor networks
Transport & mobility
management
1. Image and Video Analytics
Network Analysis 1. Fault Prediction
2. Intrusion Prediction
Domain Sub-Domain Research Areas
Public Health Analytics Public health data 1. Infectious Disease modeling
2. Home Monitoring and Sensing
Technologies
Multi-domain
Electronic Health Records
data
1. Knowledge + Data Driven Risk Factor
2. Text mining for clinical notes
Financial & Business
Analytics
Marketing and e-commerce 1. Finance and Banking
Financial market design and
behavior
1. Time Series Analysis
Research Structure
In the near future….
Machine learning is all around us…
• Machine learning is part of our daily live
• Email spam detection
• Photos searching using keywords
• Movies/Songs recommender systems
• Voice recognition
• Video captioning
• Self driving cars
• etc
What is machine learning?
Data Algorithms Insight
Machine Learning 101
• Machine Learning is a process for generalizing
from examples
• examples = example or "training" data
• generalizing = building "statistical models" to capture
correlations
• process= on going process, we keep validating &
refitting models to improve accuracy
• Simple machine learning workflow:
• explore data
• FIT models based on data
• APPLY models in prediction
• Evaluate and validate the models
*all models are incorrect essentially, but some are
useful
3 types of machine learning
• Supervised Learning – generalizing from labeled data
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_2.jpg
3 types of machine learning
• Unsupervised learning – generalizing from unlabeled data
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_3.jpg
3 types of machine learning
• Reinforcement learning – generalizing based on feedbacks in time
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_5.jpg
Common machine learning techniques…
Naive Bayes
Decision Tree
K-Nearest Neighbour
Artificial Neural Network
Support Vector Machine
Ensemble Methods: Random Forest,
Bagging, Adaboost
Logistic Regression
K-means
Which technique to use?
• What is size and dimensionality of my
training set?
• Is my data linearly separable?
• How much do I care about
computational efficiency?
• Model building vs real-time prediction time
• Eager vs lazy learning/ on-line vs batch
learning
• Prediction performance vs speed
• Do I care about interpretability or
should it "just work well?"
What can I do with machine learning?
• Customer Churn Analysis
• Predictive Maintenance
• Customer Segmentation
• Products Recommendation
Business Analytics: Predict Customer Churn
• Problem: Customer churn will lead to income loss and high expenses to
find new customers
• Solution: Build predictive model to forecast possible churn, act pre-
emptively and learn from previous historical dataset
1. Get customer data (set-top boxes, web logs, transaction history)
2. Explore data, and fit predictive models based on past or real-time data
3. Apply and validate models until predictions are accurate
4. Identify customers likely to churn
5. Escalate the incidents to Business Ops. to investigate and act accordingly
Operation Analytics: Predictive Maintenance
• Problem: Network/Service outage will lead to income loss and high
expenses
• Solution: Build predictive model to forecast possible outage, act pre-
emptively and learn from previous historical dataset
1. Get resource usage data (latency, syslog, outage reports)
2. Explore data, and fit predictive models based on past or real-time data
3. Apply and validate models until predictions are accurate
4. Forecast resource saturation, demand and usage
5. Escalate the incidents to IT Ops. to investigate and act accordingly
Summary: The machine learning process
• Problem: Identify problem that may cost time and high expenses
• Solution: Build predictive model to forecast possible incidents, act
pre-emptively and learn
1. Get all relevant data to problem
2. Explore data, and fit predictive models on past/real-time data
3. Apply and validate models until predictions are accurate
4. Forecast KPIs & metrics associated to use case
5. Escalate the incidents to respective units to investigate and act
Machine learning tools
Thanks!
Questions?
@kuanhoong
https://www.linkedin.com/in/kuanhoong
khpoo@mmu.edu.my

Machine learning and big data

  • 1.
    Machine Learning andBig Data By Poo Kuan Hoong (Multimedia University)
  • 2.
    Disclaimer: The viewsand opinions expressed in this slides are those of the author and do not necessarily reflect the official policy or position of Multimedia University. Examples of analysis performed within this slides are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of Multimedia University.
  • 3.
    Data Science Institute •The Data Science Institute is a research center based in the Faculty of Computing & Informatics, Multimedia University. • The members comprise of expertise across faculties such as Faculty of Computing and Informatics, Faculty of Engineering, Faculty of Management & Faculty of Information Science and Technology. • Conduct research in leading data science areas including stream mining, video analytics, machine learning, deep learning, next generation data visualization and advanced data modelling.
  • 4.
    Domain Sub-Domain ResearchAreas Algorithm and Machine Learning High Performance and Parallel Computing 1. HPC for massive heterogeneous data sources 2. Enhanced algorithmic performance using shared and distributed memory parallel processing (GPGPU). Performance Optimization 1. Big Data Stream Mining 2. Data Storage Social Media Analytics Data mining 1. Predictive Analytics Social Media Modelling 1. Sentiment Analysis 2. Topic Modelling Research Structure
  • 5.
    Research Structure Domain Sub-DomainResearch Areas Behavioral Analytics Media Analytics 1. Media Recommender 2. Customer Profiling Smart Cities 1. Sensor networks Transport & mobility management 1. Image and Video Analytics Network Analysis 1. Fault Prediction 2. Intrusion Prediction
  • 6.
    Domain Sub-Domain ResearchAreas Public Health Analytics Public health data 1. Infectious Disease modeling 2. Home Monitoring and Sensing Technologies Multi-domain Electronic Health Records data 1. Knowledge + Data Driven Risk Factor 2. Text mining for clinical notes Financial & Business Analytics Marketing and e-commerce 1. Finance and Banking Financial market design and behavior 1. Time Series Analysis Research Structure
  • 7.
    In the nearfuture….
  • 8.
    Machine learning isall around us… • Machine learning is part of our daily live • Email spam detection • Photos searching using keywords • Movies/Songs recommender systems • Voice recognition • Video captioning • Self driving cars • etc
  • 9.
    What is machinelearning? Data Algorithms Insight
  • 10.
    Machine Learning 101 •Machine Learning is a process for generalizing from examples • examples = example or "training" data • generalizing = building "statistical models" to capture correlations • process= on going process, we keep validating & refitting models to improve accuracy • Simple machine learning workflow: • explore data • FIT models based on data • APPLY models in prediction • Evaluate and validate the models *all models are incorrect essentially, but some are useful
  • 11.
    3 types ofmachine learning • Supervised Learning – generalizing from labeled data http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_2.jpg
  • 12.
    3 types ofmachine learning • Unsupervised learning – generalizing from unlabeled data http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_3.jpg
  • 13.
    3 types ofmachine learning • Reinforcement learning – generalizing based on feedbacks in time http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_5.jpg
  • 15.
    Common machine learningtechniques… Naive Bayes Decision Tree K-Nearest Neighbour Artificial Neural Network Support Vector Machine Ensemble Methods: Random Forest, Bagging, Adaboost Logistic Regression K-means
  • 16.
    Which technique touse? • What is size and dimensionality of my training set? • Is my data linearly separable? • How much do I care about computational efficiency? • Model building vs real-time prediction time • Eager vs lazy learning/ on-line vs batch learning • Prediction performance vs speed • Do I care about interpretability or should it "just work well?"
  • 17.
    What can Ido with machine learning? • Customer Churn Analysis • Predictive Maintenance • Customer Segmentation • Products Recommendation
  • 18.
    Business Analytics: PredictCustomer Churn • Problem: Customer churn will lead to income loss and high expenses to find new customers • Solution: Build predictive model to forecast possible churn, act pre- emptively and learn from previous historical dataset 1. Get customer data (set-top boxes, web logs, transaction history) 2. Explore data, and fit predictive models based on past or real-time data 3. Apply and validate models until predictions are accurate 4. Identify customers likely to churn 5. Escalate the incidents to Business Ops. to investigate and act accordingly
  • 19.
    Operation Analytics: PredictiveMaintenance • Problem: Network/Service outage will lead to income loss and high expenses • Solution: Build predictive model to forecast possible outage, act pre- emptively and learn from previous historical dataset 1. Get resource usage data (latency, syslog, outage reports) 2. Explore data, and fit predictive models based on past or real-time data 3. Apply and validate models until predictions are accurate 4. Forecast resource saturation, demand and usage 5. Escalate the incidents to IT Ops. to investigate and act accordingly
  • 20.
    Summary: The machinelearning process • Problem: Identify problem that may cost time and high expenses • Solution: Build predictive model to forecast possible incidents, act pre-emptively and learn 1. Get all relevant data to problem 2. Explore data, and fit predictive models on past/real-time data 3. Apply and validate models until predictions are accurate 4. Forecast KPIs & metrics associated to use case 5. Escalate the incidents to respective units to investigate and act
  • 21.
  • 22.