SlideShare a Scribd company logo
1 of 37
Download to read offline
Big Data & Artificial Intelligence 
2014 Technology Review and Primer 
Zavain Dar
High Level 
Data —> Infrastructure —> Enables more Data —> Analytics, 
Applications, & Artificial Intelligence 
If we buy the above, we see ‘AI’, ‘Big Data’, ‘Deep Learning’, etc… 
not as buzz words, but as a logical next step of technological 
progress from the past 20 years 
2
Outline 
• Historical Context: The Web, Big Data, & Distributed Computing 
• Modern Infrastructure 
• Artificial Intelligence 
• Learnings & Thesis Directions 
3
Computing Infrastructure pre Web 
• Storage Paradigm: Relational 
Databases (Oracle, MySQL, etc…) 
• Access Paradigm: Relation Algebra 
(SQL) 
• Each computer owned its data, 
computation was generally done on 
a single computer 
C 
C 
C 
D 
D 
D
1984: 100 Nodes convert to TCP/IP 
• Until 1984, there was no unified 
‘internet’, rather a collection of 
fragmented networks using one-off 
protocols 
• In 1984, the most connected 100 
nodes switched to TCP/IP. Modern 
Internet was born
The Web as a ‘Big Data’-base 
• We can view the Web itself as the 
first big database 
• Storage Paradigm: HTML, DOM, 
Relational Databases (Oracle, 
MySQL) 
• Access Paradigm: HTTP 
C 
C 
C 
C 
D 
D 
D 
D
The Web emerged as the first ‘Big Data’-set 
• Other than HTTP requests, which were slow and clunky - we had no 
way to index, and parse web content 
• A handful of search engines came and went, but all struggled to 
effectively deploy algorithms atop this massive distributed data set 
7
Google in 1998 
• Data uniformly distributed across 
computers 
• Storage Paradigm: GFS (Google 
Filing System) 
• Access Paradigm: ??? 
• Google kept Access Paradigm 
proprietary for years 
D 
C C 
C C
2004: Big Data leaves Google’s confines 
• Jeff Dean and Sanjay Ghemawat 
publish seminal paper outlining 
MapReduce, a distributed data 
access paradigm 
• Storage Paradigm: GFS 
• Access Paradigm: MapReduce
Modern Big Data 
• Apache Hadoop was born as an open source project form Yahoo in 2005. 
Followed Google’s GFS and Google MapReduce implementations 
• Hadoop consisted of HFS (Hadoop Filing System) and Hadoop Map Reduce 
• It took years for the open source framework to become enterprise ready. In 
the interim, Cloudera and HortonWorks began offering enterprise solutions 
based around Hadoop 
• Others wrote completely black box, proprietary versions based on GFS and 
Map Reduce. Examples: Palantir and Discovery Engine 
• Palantir only recently switching over to Hadoop based code. 
10
Emergent Themes 
• Commoditization of Infrastructure 
• Early infrastructure providers have plateaued in value; 
Hortonworks a recent example with a down round IPO 
• DevOps 
• As computing models changed from local and heterogeneous-hardware 
based, new solutions emerge to help pace innovation 
• ‘Appification' and Analytics atop Hadoop 
11
DevOps: Docker 
• Programming on and testing on a 
laptop different than running on Dell 
x86 clusters or mobile+HP server. 
• Docker creates a portable 
container (eg docker) around an 
application, making it easy to port to 
heterogenous environments 
laptop x86 x86 
Application 
x86 x86 
HP iOS 
Application 
Application
DevOps: Mesosphere 
• The old world had Virtual Machines 
which sliced single computers into 
numerous ‘virtual instances’ for 
security, debugging, etc… 
C 
C 
• Now we need the opposite, to view 
entire clusters as a singe computer 
with shared and (hence) optimized 
storage, network, and compute C 
C’
Artificial Intelligence 
Traditional AI broken into 2 categories 
1. Computational Logic (this guy!) & Search+Planning 
2. Machine Learning 
14
Computational Logic + Planning 
• Based on implementing static rules for a computer to follow. The 
end algorithm and rules are independent of the data 
• Old school (Chomskyan) NLP and chess playing followed this 
approach 
• Planning based on route optimization and ‘graph search’ 
• Eg how do you efficiently plan a UPS route, or guide a robotic 
arm around obstacles of a pre known course 
15
Computational Logic + Planning 
• From 1940s through the early 1990s this was the preferred methodology for AI 
• Key assumption: The world is guided by rules, and it’s just going to be a while 
before we can encode the minimal viable set before computers can deduce future 
outcomes and propositions 
• AI slowed in results, and hence funding from the 70s through the 80s.This was 
known as the AI Winter. Largely due to heavy academic emphasis on these 
methods 
• The early 90s showed focus on statistical methods - commonly dubbed the 
Bayesian Revolution 
• This lead to the proliferation and growth of machine learning 
16
Machine Learning 
• Premise for machine learning: 
• Have a dataset 
• Have an algorithm f(D) 
• f(D) applied to a dataset gives a new function (model) m(i) 
• m(i) applied to any input i predicts an output o 
17 
D 
f
Machine Learning (Pictorially) 
D f m(i) o 
1. The machine learning algorithm f is 
applied to the dataset D, giving the model 
m 
2. For any input i, the model m predicts an 
18 
output o
3 Types of Machine Learning 
1) Supervised Learning 
D f m(i) o 
• D consists of pairs of input, output types: <i, o> 
• The larger D the more generalized and accurate the end model m is 
• Learn by example 
19
3 Types of Machine Learning 
2) Unsupervised (Topological) Learning 
D f m(i) o 
• D consists of just inputs: <i> 
• Generally end up with a partitioning of D 
• Good at finding patterns 
20
3 Types of Machine Learning 
3) Reinforcement Learning 
D f m(i) o 
? 
• You add some derivative of the output back to the initial dataset, and reoptimize your 
model 
• Eg Learning to play chess by playing over and over again. Ideally the more you play the 
less you lose 
21
Deep Learning 
• Deep Learning and Neural Nets are synonymous 
• Deep Learning is a subset of machine learning, it is a class of 
functions f from the previous slides 
• Deep learning algorithms take in a data set and spit out another 
function, or model, m 
• Can be deployed in structured, unstructured, and reinforced 
contexts 
22
Deep Learning 
• First theorized and worked on in the 80s 
• However, lacked the infrastructure and data to meaningfully deploy 
• Has seen a massive resurgence 2009 onwards 
• Loosely inspired by (vague) knowledge of brain - layers of abstraction 
23
Deep Learning 
• Useful for noisy, large, human generated data 
• That is data for which, even the correct form of model input i can be tricky to 
characterize 
• When I see a picture of a human face, I immediately recognize eyes, a nose 
and ears … hence a face 
• When a computer receives the same image, it’s a rectangular grid of RGB 
values. How do we map the computer’s input space to our semantic space? 
• Types of data that this makes sense for: Text, Visual (images & video), Audio, 
User behavior (my patterns on Twitter or Facebook), Basketball (player 
millisecond movement), etc… 
24
Good Fine-grained Classification 
Functions Artificial Neural Nets 
Can Learn 
Deep Learning 
LSTM for End to End Translation 
25 
Image Models 
Audio: “sh ang hai res taur aun ts” 
“hibiscus” “dahlia” 
Sensible Errors 
“dog” 
Embeddings are Powerful 
fallen 
draw 
fell 
drawn 
taken 
drew take 
took 
given 
give 
gave 
fall 
sentence rep 
PCA 
linearly separable! 
wrt subject vs object 
Generating Work in progress by Oriol Vinyals Generating Generating Image Captions from Pixels 
Human: A young girl asleep on the sofa cuddling a stuffed bear.! 
Model sample 1: A close up of a child holding a stuffed animal.! 
Model sample 2: A baby is asleep next to a teddy bear. 
Human: Model Model
Current Landscape 
GPUs, FPGAs, ASICs (User wants specialized deployments either for the learning 
function f or the end model m): 
Select examples: Nervana Systems, TerraDeep, Qualcomm Neuromorphic Group 
APIs, SDKs (USer wants to use prewritten algos on their datasets): 
Select examples: Metamind, Skymind.io, Vicarious, Deep Mind 
Vertical (Technology is black-boxed from user): 
Select examples: Clarifai, Butterfly Networks, Binatix, etc… 
26
Artificial Intelligence 
27 
Computational Logic & Planning 
Machine Learning 
Statistical 
Regressions 
Deep 
Learning 
etc… 
Applications 
• NLP 
• Computer Vision 
• Robotics 
• Audio 
• Sports 
• Genetics 
• Finance 
• Anomaly Detection
Learnings 
Static software commoditizes 
• Early big data infrastructure providers stagnating 
• Google’s algorithms are essentially public (PageRank etc..) 
• Deep Learning algos are an arms race & race to bottom 
Defensibility and ability to grow into large 100M+ company is in owning proprietary data from which you can train 
better models and/or have network or scale effect 
Why is now special? We’re sitting at the intersection of: 
1. a matured big data infrastructure driven by well understood distributed storage and data access paradigms 
2. data continues to explode. Not only though web, but also via noisy sensor and human generated data 
3. have AI tools necessary to make sense of unstructured and noisy datasets whose features don’t map well 
to our a priori intuition 
28
‘Virtuous’ Feedback Loops 
Going back to Google: 
29 
D 
C C 
C C 
f m(i) o 
D’
‘Virtuous’ Feedback Loops 
Going back to Google: 
30 
D 
C C 
C C 
f m(i) o 
D’ 
Commoditized 
Commoditized
Feedback Loops 
• Google collects click-data with each user - this enables better search 
for next user: n+1th user has a better experience than nth user 
• Google increases margin from competition the more we use it 
• Leads to a run-away effect 
• Can explain Google’s monopoly in search 
• Same analogy with Facebook/Twitter-adds and other large tech co’s 
• Prediction: Early movers who can bootstrap initial feedback loop will 
be big, potentially winner-take-all, winners 
31
Data —> Infrastructure —> Enables more Data —> Analytics, 
Applications, & Artificial Intelligence 
32
Empirical Timeline 
MapReduce
Empirical Timeline 
Hadoop
Empirical Timeline 
Big Data
Empirical Timeline 
Deep Learning
Fin 
zavain.dar@luxcapital.com 
@zavaindar 
37

More Related Content

What's hot

What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 

What's hot (20)

What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Data science
Data scienceData science
Data science
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
AI and Data Science.pdf
AI and Data Science.pdfAI and Data Science.pdf
AI and Data Science.pdf
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Big data
Big dataBig data
Big data
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big DataImplementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 

Viewers also liked

Nvidia Deep Learning Solutions - Alex Sabatier
Nvidia Deep Learning Solutions - Alex SabatierNvidia Deep Learning Solutions - Alex Sabatier
Nvidia Deep Learning Solutions - Alex SabatierSri Ambati
 
IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...Maged N. Kamel Boulos
 
Big Data to Artificial Intelligence in Healthcare
Big Data to Artificial Intelligence in HealthcareBig Data to Artificial Intelligence in Healthcare
Big Data to Artificial Intelligence in Healthcarejetweedy
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive
 
IBM Watson for Healthcare
IBM Watson for HealthcareIBM Watson for Healthcare
IBM Watson for HealthcareIBM_CH
 
IBM Watson in Healthcare
IBM Watson in HealthcareIBM Watson in Healthcare
IBM Watson in HealthcareAnders Quitzau
 
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!Tony Pearson
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligenceManish Jain
 
Power Point Presentation on Artificial Intelligence
Power Point Presentation on Artificial Intelligence Power Point Presentation on Artificial Intelligence
Power Point Presentation on Artificial Intelligence Anushka Ghosh
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 

Viewers also liked (11)

Nvidia Deep Learning Solutions - Alex Sabatier
Nvidia Deep Learning Solutions - Alex SabatierNvidia Deep Learning Solutions - Alex Sabatier
Nvidia Deep Learning Solutions - Alex Sabatier
 
IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...
 
Big Data to Artificial Intelligence in Healthcare
Big Data to Artificial Intelligence in HealthcareBig Data to Artificial Intelligence in Healthcare
Big Data to Artificial Intelligence in Healthcare
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare
 
IBM Watson for Healthcare
IBM Watson for HealthcareIBM Watson for Healthcare
IBM Watson for Healthcare
 
IBM Watson in Healthcare
IBM Watson in HealthcareIBM Watson in Healthcare
IBM Watson in Healthcare
 
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Power Point Presentation on Artificial Intelligence
Power Point Presentation on Artificial Intelligence Power Point Presentation on Artificial Intelligence
Power Point Presentation on Artificial Intelligence
 
AI and the Future of Growth
AI and the Future of GrowthAI and the Future of Growth
AI and the Future of Growth
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 

Similar to Big Data & Artificial Intelligence

Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTKAshish Jaiman
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligenceYanai Oron
 
Vertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial IntelligenceVertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial IntelligenceVertex Holdings
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator ProgramGoDataDriven
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningAmazon Web Services
 
Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)Julien SIMON
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
Introducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsIntroducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsRokesh Jankie
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 

Similar to Big Data & Artificial Intelligence (20)

Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Machine Learning Overview: How did we get here ?
Machine Learning Overview: How did we get here ?Machine Learning Overview: How did we get here ?
Machine Learning Overview: How did we get here ?
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
Presentation v3
Presentation v3Presentation v3
Presentation v3
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligence
 
Vertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial IntelligenceVertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial Intelligence
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
Introducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsIntroducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applications
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Big Data & Artificial Intelligence

  • 1. Big Data & Artificial Intelligence 2014 Technology Review and Primer Zavain Dar
  • 2. High Level Data —> Infrastructure —> Enables more Data —> Analytics, Applications, & Artificial Intelligence If we buy the above, we see ‘AI’, ‘Big Data’, ‘Deep Learning’, etc… not as buzz words, but as a logical next step of technological progress from the past 20 years 2
  • 3. Outline • Historical Context: The Web, Big Data, & Distributed Computing • Modern Infrastructure • Artificial Intelligence • Learnings & Thesis Directions 3
  • 4. Computing Infrastructure pre Web • Storage Paradigm: Relational Databases (Oracle, MySQL, etc…) • Access Paradigm: Relation Algebra (SQL) • Each computer owned its data, computation was generally done on a single computer C C C D D D
  • 5. 1984: 100 Nodes convert to TCP/IP • Until 1984, there was no unified ‘internet’, rather a collection of fragmented networks using one-off protocols • In 1984, the most connected 100 nodes switched to TCP/IP. Modern Internet was born
  • 6. The Web as a ‘Big Data’-base • We can view the Web itself as the first big database • Storage Paradigm: HTML, DOM, Relational Databases (Oracle, MySQL) • Access Paradigm: HTTP C C C C D D D D
  • 7. The Web emerged as the first ‘Big Data’-set • Other than HTTP requests, which were slow and clunky - we had no way to index, and parse web content • A handful of search engines came and went, but all struggled to effectively deploy algorithms atop this massive distributed data set 7
  • 8. Google in 1998 • Data uniformly distributed across computers • Storage Paradigm: GFS (Google Filing System) • Access Paradigm: ??? • Google kept Access Paradigm proprietary for years D C C C C
  • 9. 2004: Big Data leaves Google’s confines • Jeff Dean and Sanjay Ghemawat publish seminal paper outlining MapReduce, a distributed data access paradigm • Storage Paradigm: GFS • Access Paradigm: MapReduce
  • 10. Modern Big Data • Apache Hadoop was born as an open source project form Yahoo in 2005. Followed Google’s GFS and Google MapReduce implementations • Hadoop consisted of HFS (Hadoop Filing System) and Hadoop Map Reduce • It took years for the open source framework to become enterprise ready. In the interim, Cloudera and HortonWorks began offering enterprise solutions based around Hadoop • Others wrote completely black box, proprietary versions based on GFS and Map Reduce. Examples: Palantir and Discovery Engine • Palantir only recently switching over to Hadoop based code. 10
  • 11. Emergent Themes • Commoditization of Infrastructure • Early infrastructure providers have plateaued in value; Hortonworks a recent example with a down round IPO • DevOps • As computing models changed from local and heterogeneous-hardware based, new solutions emerge to help pace innovation • ‘Appification' and Analytics atop Hadoop 11
  • 12. DevOps: Docker • Programming on and testing on a laptop different than running on Dell x86 clusters or mobile+HP server. • Docker creates a portable container (eg docker) around an application, making it easy to port to heterogenous environments laptop x86 x86 Application x86 x86 HP iOS Application Application
  • 13. DevOps: Mesosphere • The old world had Virtual Machines which sliced single computers into numerous ‘virtual instances’ for security, debugging, etc… C C • Now we need the opposite, to view entire clusters as a singe computer with shared and (hence) optimized storage, network, and compute C C’
  • 14. Artificial Intelligence Traditional AI broken into 2 categories 1. Computational Logic (this guy!) & Search+Planning 2. Machine Learning 14
  • 15. Computational Logic + Planning • Based on implementing static rules for a computer to follow. The end algorithm and rules are independent of the data • Old school (Chomskyan) NLP and chess playing followed this approach • Planning based on route optimization and ‘graph search’ • Eg how do you efficiently plan a UPS route, or guide a robotic arm around obstacles of a pre known course 15
  • 16. Computational Logic + Planning • From 1940s through the early 1990s this was the preferred methodology for AI • Key assumption: The world is guided by rules, and it’s just going to be a while before we can encode the minimal viable set before computers can deduce future outcomes and propositions • AI slowed in results, and hence funding from the 70s through the 80s.This was known as the AI Winter. Largely due to heavy academic emphasis on these methods • The early 90s showed focus on statistical methods - commonly dubbed the Bayesian Revolution • This lead to the proliferation and growth of machine learning 16
  • 17. Machine Learning • Premise for machine learning: • Have a dataset • Have an algorithm f(D) • f(D) applied to a dataset gives a new function (model) m(i) • m(i) applied to any input i predicts an output o 17 D f
  • 18. Machine Learning (Pictorially) D f m(i) o 1. The machine learning algorithm f is applied to the dataset D, giving the model m 2. For any input i, the model m predicts an 18 output o
  • 19. 3 Types of Machine Learning 1) Supervised Learning D f m(i) o • D consists of pairs of input, output types: <i, o> • The larger D the more generalized and accurate the end model m is • Learn by example 19
  • 20. 3 Types of Machine Learning 2) Unsupervised (Topological) Learning D f m(i) o • D consists of just inputs: <i> • Generally end up with a partitioning of D • Good at finding patterns 20
  • 21. 3 Types of Machine Learning 3) Reinforcement Learning D f m(i) o ? • You add some derivative of the output back to the initial dataset, and reoptimize your model • Eg Learning to play chess by playing over and over again. Ideally the more you play the less you lose 21
  • 22. Deep Learning • Deep Learning and Neural Nets are synonymous • Deep Learning is a subset of machine learning, it is a class of functions f from the previous slides • Deep learning algorithms take in a data set and spit out another function, or model, m • Can be deployed in structured, unstructured, and reinforced contexts 22
  • 23. Deep Learning • First theorized and worked on in the 80s • However, lacked the infrastructure and data to meaningfully deploy • Has seen a massive resurgence 2009 onwards • Loosely inspired by (vague) knowledge of brain - layers of abstraction 23
  • 24. Deep Learning • Useful for noisy, large, human generated data • That is data for which, even the correct form of model input i can be tricky to characterize • When I see a picture of a human face, I immediately recognize eyes, a nose and ears … hence a face • When a computer receives the same image, it’s a rectangular grid of RGB values. How do we map the computer’s input space to our semantic space? • Types of data that this makes sense for: Text, Visual (images & video), Audio, User behavior (my patterns on Twitter or Facebook), Basketball (player millisecond movement), etc… 24
  • 25. Good Fine-grained Classification Functions Artificial Neural Nets Can Learn Deep Learning LSTM for End to End Translation 25 Image Models Audio: “sh ang hai res taur aun ts” “hibiscus” “dahlia” Sensible Errors “dog” Embeddings are Powerful fallen draw fell drawn taken drew take took given give gave fall sentence rep PCA linearly separable! wrt subject vs object Generating Work in progress by Oriol Vinyals Generating Generating Image Captions from Pixels Human: A young girl asleep on the sofa cuddling a stuffed bear.! Model sample 1: A close up of a child holding a stuffed animal.! Model sample 2: A baby is asleep next to a teddy bear. Human: Model Model
  • 26. Current Landscape GPUs, FPGAs, ASICs (User wants specialized deployments either for the learning function f or the end model m): Select examples: Nervana Systems, TerraDeep, Qualcomm Neuromorphic Group APIs, SDKs (USer wants to use prewritten algos on their datasets): Select examples: Metamind, Skymind.io, Vicarious, Deep Mind Vertical (Technology is black-boxed from user): Select examples: Clarifai, Butterfly Networks, Binatix, etc… 26
  • 27. Artificial Intelligence 27 Computational Logic & Planning Machine Learning Statistical Regressions Deep Learning etc… Applications • NLP • Computer Vision • Robotics • Audio • Sports • Genetics • Finance • Anomaly Detection
  • 28. Learnings Static software commoditizes • Early big data infrastructure providers stagnating • Google’s algorithms are essentially public (PageRank etc..) • Deep Learning algos are an arms race & race to bottom Defensibility and ability to grow into large 100M+ company is in owning proprietary data from which you can train better models and/or have network or scale effect Why is now special? We’re sitting at the intersection of: 1. a matured big data infrastructure driven by well understood distributed storage and data access paradigms 2. data continues to explode. Not only though web, but also via noisy sensor and human generated data 3. have AI tools necessary to make sense of unstructured and noisy datasets whose features don’t map well to our a priori intuition 28
  • 29. ‘Virtuous’ Feedback Loops Going back to Google: 29 D C C C C f m(i) o D’
  • 30. ‘Virtuous’ Feedback Loops Going back to Google: 30 D C C C C f m(i) o D’ Commoditized Commoditized
  • 31. Feedback Loops • Google collects click-data with each user - this enables better search for next user: n+1th user has a better experience than nth user • Google increases margin from competition the more we use it • Leads to a run-away effect • Can explain Google’s monopoly in search • Same analogy with Facebook/Twitter-adds and other large tech co’s • Prediction: Early movers who can bootstrap initial feedback loop will be big, potentially winner-take-all, winners 31
  • 32. Data —> Infrastructure —> Enables more Data —> Analytics, Applications, & Artificial Intelligence 32