Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine learning and big data

494 views

Published on

A brief introduction to Machine Learning and Big Data

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Machine learning and big data

  1. 1. Machine Learning and Big Data By Poo Kuan Hoong (Multimedia University)
  2. 2. Disclaimer: The views and opinions expressed in this slides are those of the author and do not necessarily reflect the official policy or position of Multimedia University. Examples of analysis performed within this slides are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of Multimedia University.
  3. 3. Data Science Institute • The Data Science Institute is a research center based in the Faculty of Computing & Informatics, Multimedia University. • The members comprise of expertise across faculties such as Faculty of Computing and Informatics, Faculty of Engineering, Faculty of Management & Faculty of Information Science and Technology. • Conduct research in leading data science areas including stream mining, video analytics, machine learning, deep learning, next generation data visualization and advanced data modelling.
  4. 4. Domain Sub-Domain Research Areas Algorithm and Machine Learning High Performance and Parallel Computing 1. HPC for massive heterogeneous data sources 2. Enhanced algorithmic performance using shared and distributed memory parallel processing (GPGPU). Performance Optimization 1. Big Data Stream Mining 2. Data Storage Social Media Analytics Data mining 1. Predictive Analytics Social Media Modelling 1. Sentiment Analysis 2. Topic Modelling Research Structure
  5. 5. Research Structure Domain Sub-Domain Research Areas Behavioral Analytics Media Analytics 1. Media Recommender 2. Customer Profiling Smart Cities 1. Sensor networks Transport & mobility management 1. Image and Video Analytics Network Analysis 1. Fault Prediction 2. Intrusion Prediction
  6. 6. Domain Sub-Domain Research Areas Public Health Analytics Public health data 1. Infectious Disease modeling 2. Home Monitoring and Sensing Technologies Multi-domain Electronic Health Records data 1. Knowledge + Data Driven Risk Factor 2. Text mining for clinical notes Financial & Business Analytics Marketing and e-commerce 1. Finance and Banking Financial market design and behavior 1. Time Series Analysis Research Structure
  7. 7. In the near future….
  8. 8. Machine learning is all around us… • Machine learning is part of our daily live • Email spam detection • Photos searching using keywords • Movies/Songs recommender systems • Voice recognition • Video captioning • Self driving cars • etc
  9. 9. What is machine learning? Data Algorithms Insight
  10. 10. Machine Learning 101 • Machine Learning is a process for generalizing from examples • examples = example or "training" data • generalizing = building "statistical models" to capture correlations • process= on going process, we keep validating & refitting models to improve accuracy • Simple machine learning workflow: • explore data • FIT models based on data • APPLY models in prediction • Evaluate and validate the models *all models are incorrect essentially, but some are useful
  11. 11. 3 types of machine learning • Supervised Learning – generalizing from labeled data http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_2.jpg
  12. 12. 3 types of machine learning • Unsupervised learning – generalizing from unlabeled data http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_3.jpg
  13. 13. 3 types of machine learning • Reinforcement learning – generalizing based on feedbacks in time http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_5.jpg
  14. 14. Common machine learning techniques… Naive Bayes Decision Tree K-Nearest Neighbour Artificial Neural Network Support Vector Machine Ensemble Methods: Random Forest, Bagging, Adaboost Logistic Regression K-means
  15. 15. Which technique to use? • What is size and dimensionality of my training set? • Is my data linearly separable? • How much do I care about computational efficiency? • Model building vs real-time prediction time • Eager vs lazy learning/ on-line vs batch learning • Prediction performance vs speed • Do I care about interpretability or should it "just work well?"
  16. 16. What can I do with machine learning? • Customer Churn Analysis • Predictive Maintenance • Customer Segmentation • Products Recommendation
  17. 17. Business Analytics: Predict Customer Churn • Problem: Customer churn will lead to income loss and high expenses to find new customers • Solution: Build predictive model to forecast possible churn, act pre- emptively and learn from previous historical dataset 1. Get customer data (set-top boxes, web logs, transaction history) 2. Explore data, and fit predictive models based on past or real-time data 3. Apply and validate models until predictions are accurate 4. Identify customers likely to churn 5. Escalate the incidents to Business Ops. to investigate and act accordingly
  18. 18. Operation Analytics: Predictive Maintenance • Problem: Network/Service outage will lead to income loss and high expenses • Solution: Build predictive model to forecast possible outage, act pre- emptively and learn from previous historical dataset 1. Get resource usage data (latency, syslog, outage reports) 2. Explore data, and fit predictive models based on past or real-time data 3. Apply and validate models until predictions are accurate 4. Forecast resource saturation, demand and usage 5. Escalate the incidents to IT Ops. to investigate and act accordingly
  19. 19. Summary: The machine learning process • Problem: Identify problem that may cost time and high expenses • Solution: Build predictive model to forecast possible incidents, act pre-emptively and learn 1. Get all relevant data to problem 2. Explore data, and fit predictive models on past/real-time data 3. Apply and validate models until predictions are accurate 4. Forecast KPIs & metrics associated to use case 5. Escalate the incidents to respective units to investigate and act
  20. 20. Machine learning tools
  21. 21. Thanks! Questions? @kuanhoong https://www.linkedin.com/in/kuanhoong khpoo@mmu.edu.my

×