Successfully reported this slideshow.

Big data and machine learning for Businesses



Loading in …3
1 of 31
1 of 31

More Related Content

Big data and machine learning for Businesses

  1. 1. Big Data and Machine Learning for businesses By Abdul Wahid
  2. 2. Who AM I Abdul Wahid CEO at GleeTech ( • Started working as a Software Engineer in 2005. • PhD (Computer Science). • 2016, Victoria University of Wellington • Help people to develop awesome digital products. • Passionate about • Big Data • Machine Learning • Tech Startups/Businesses • • Personal site:
  3. 3. What is Big Data?
  4. 4. Big Data • Additional Dimentions • Complexity • Multiple sources and data streams • Variability • Unpredictable Data flows. • Social media trending.
  5. 5. Data is the new Oil of the Digital Economy Wired 2014
  6. 6. Why Big Data is important • Data contains information. • Information leads to insights. • Insights helps in making better decisions.
  7. 7. Examples - Using data • Weather Data • Utility & Manufacturing companies • Transport & Tourism • News, Accidents, Census • Insurance Companies • Survey, Blogs, Comments, Tweets • Marketing Analysis • Customer Segmentation
  8. 8. Examples - Benefits of data • Sports • Strategies, Data driven analytics • Entertainment • Users opinions, Sentiments • Retail • Consumer behavior • Health Care • Patient/symptom analysis • Financial Institutions • Fraud Detection
  9. 9. How to derive insights from data?
  10. 10. Machine Learning? “Field of study that gives computers the ability to learn without being explicitly programmed.” Arthur Samuel (1959).
  11. 11.
  12. 12. Top Algorithms • Supervised Learning • Decision Trees • Naïve Bayes Classification • Ordinary Least Squares Regression • Logistic Regression • Support Vector Machines • Ensemble Methods • Unsupervised Learning • Clustering Algorithms • Centroid-based algorithms • Connectivity-based algorithms • Density-based algorithms • Probabilistic • Dimensionality Reduction • Neural networks / Deep Learning • Principal Component Analysis • Singular Value Decomposition • Independent Component Analysis
  13. 13. Example • We want some hypothesis h that predicts whether we will be paid back 1. +a, -c, +i, +e, +o, +u: Y 2. -a, +c, -i, +e, -o, -u: N 3. +a, -c, +i, -e, -o, -u: Y 4. -a, -c, +i, +e, -o, -u: Y 5. -a, +c, +i, -e, -o, -u: N 6. -a, -c, +i, -e, -o, +u: Y 7. +a, -c, -i, -e, +o, -u: N 8. +a, +c, +i, -e, +o, -u: N • Lots of possible hypotheses: will be paid back if… • Income is high (wrong on 2 occasions in training data) • Income is high and no Criminal record (always right in training data) • (Address is known AND ((NOT Old) OR Unemployed)) OR ((NOT Address is known) AND (NOT Criminal Record)) (always right in training data) • Which one seems best? Anything better?
  14. 14. Decision trees high Income? yes no NO yes no NO Criminal record? YES
  15. 15. Constructing a decision tree, one step at a time address? yes no +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N -a, +c, -i, +e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N criminal? criminal? -a, +c, -i, +e, -o, -u: N -a, +c, +i, -e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N income? +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N yes no yes no yes no Address was maybe not the best attribute to start with…
  16. 16. Starting with a different attribute • Seems like a much better starting point than address • Each node almost completely uniform • Almost completely predicts whether we will be paid back yes no +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N criminal? -a, +c, -i, +e, -o, -u: N -a, +c, +i, -e, -o, -u: N +a, +c, +i, -e, +o, -u: N +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N
  17. 17. Naïve Bayes Classification
  18. 18. Linear Regression • The task of fitting a straight line through a set of points
  19. 19. Logistic Regression • Measures the relationship between the categorical dependent variable and one or more independent variables • Credit Scoring • Measuring the success rates of marketing campaigns • Predicting the revenues of a certain product
  20. 20. Support Vector Machine • Binary classification algorithm • SVM generates a (N — 1) dimensional hyperlane to separate those points into 2 groups.
  21. 21. Ensemble Methods
  22. 22. Clustering Algorithms
  23. 23. Principle Component Analysis
  24. 24. Singular Value Decomposition • PCA is actually a simple application of SVD
  25. 25. Independent Component Analysis • Revealing hidden factors that underlie sets of random variables. • data variables are assumed to be linear mixtures of some unknown latent variables, and the mixing system is also unknown. • The latent variables are assumed non-gaussian and mutually independent. • Images, Documents
  26. 26.
  27. 27. Conclusions • Data is nothing without insights • Machine learning is the key for deriving insights from data • Big Data and Machine Learning has a huge potential • Mobile First to AI First Approach
  28. 28. Next Steps • Machine Learning Course • • Big Data Course • • Must know Machine Learning Algorithms • engineers.html • Machine Learning Startups • • Recent Cool Startups • learning-startups-to-watch/d/d-id/1326571
  29. 29. References • 31616290?from_action=save • • Top_10_machine_learning_algs1 • Ewjx0_2Q5b7TAhXFPRQKHfiNCV8QFggmMAE& urse%2Fspring16%2Flecture10.pptx&usg=AFQjCNFFg6FbId7hqMj1lwbufcfPvb7Wmw • landscape-21d4d34fbb10 •