Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predictive Analytics - Big Data & Artificial Intelligence


Published on

Quick overview of the latest in big data and artificial intelligence. A lot of buzzwords being thrown around, hopefully this presentation will demystify many of the terms.

Published in: Technology

Predictive Analytics - Big Data & Artificial Intelligence

  1. 1. October 2016 Predictive Analytics Big Data & Artificial Intelligence
  2. 2. Agenda Artificial Intelligence AI Big Data Machine Learning Deep Learning Neural Networks NLPNatural Language Processing Demystify the following buzzwords. Image Recognition 2
  3. 3. Ultimate Goal: Predictive Analytics Predict what users will want to buy. A consumer searches for a TV and based on previous customers data, show a product that has a high probability of being bought as well. 3
  4. 4. Evolution of Data Analytics 1990s 2000s Excel Business Intelligence (BI) Dashboards 2015 and beyond Actionable Insights What Happened? What’s Happening? What Will Happen? 4
  5. 5. The Process Structured and unstructured (ex. video) data Data is stored in databases and servers Data Generated Data Stored Actionable Insights Data Processing Process the data using CPU/GPUs and AI algorithms to detect patterns Predictive signals are generated Central Processing Unit (CPU) / Graphics Processing Unit (GPU) Big Data Artificial Intelligence 5
  6. 6. How Did We Get Here? Databases (the 80s) Data Warehousing (the 90s) • Relational databases • Gigabytes in size • Low latency • Terabytes in size • Custom hardware 6
  7. 7. Today, it’s Big Data 7
  8. 8. Artificial Intelligence (AI) 8
  9. 9. Artificial Intelligence (AI) 9
  10. 10. When To Use Machine Learning A pattern exists1 We cannot pin down the pattern mathematically 2 We have data and hopefully lots of data 10
  11. 11. Types of Machine Learning 11
  12. 12. Supervised Learning X X X X X Price Square Feet We know what we are trying to predict. We use some examples that we and the model know the answers to “train” our model. It can then generate predictions to examples we don’t know the answer to. Example: Predict the price of a house based on the size of the house. X X 12
  13. 13. Unsupervised Learning O O O O O O O OO O X Y OO O O O We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative. Example: Try to identify “clusters” of customers based on the data we have on them. 13
  14. 14. What is Deep Learning? • Deep Learning and Neural Networks are synonymous • It’s a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non- linear transformations What we see What the computer “sees” 14
  15. 15. Tools of The Trade Apache SystemML Google Cloud Machine Learning 15
  16. 16. Questions? version: draft
  17. 17. Appendix 17
  18. 18. AI Researchers Geoffrey Hinton University of Toronto Google Yoshua Bengio University of Montreal Yann LeCun New York University Facebook Andrew Ng Stanford University Baidu 18
  19. 19. CPU vs GPU Performance 19
  20. 20. MapReduce 20
  21. 21. The Name…Hadoop Named after the yellow toy elephant of Doug Cutting’s son. In 2006 while working at Yahoo, Doug came up with the Hadoop framework. In 2008, it was taken over by the open source group Apache, hence the official name is Apache Hadoop. 21
  22. 22. Hadoop to the Rescue “an open source framework written in Java for storing and processing massive amounts of data in a distributed manner” 1 Hadoop Distributed File System (HDFS). Scalable file system that distributes and stores data across many machines in a cluster. MapReduce – framework for distributed processing. 2 Key Components of the Framework: Storage 2 Analysis 22
  23. 23. Hadoop Architecture Hadoop can run on cheap commoditized hardware on premise or in the cloud. Stores files in large blocks (64MB) across multiple machines for fault tolerance. By default, data is stored on 3 separate machines HDFS MapReduce Breaks large data processing problems into multiple steps, namely Mappers (DataNode) and Reducers (TaskTrackers) that can be worked on in parallel on multiple machines 23
  24. 24. MapReduce Store Sales Data (100MB) Mappers Name Node 1 Data Node 1 (64MB) Data Node 2 (36MB) LA NYC LA NYC Reducers Job Tracker Task Tracker 1 LA LA Task Tracker 2 NYC NYC Shuffle and Sort 24
  25. 25. MapReduce Map Shuffle & Sort Reduce Result 25
  26. 26. Hadoop 1.0 vs 2.0 26
  27. 27. The Future… 27