SlideShare a Scribd company logo
How graphs became just another big data primitive 
Ted Willke 
Cloud Platforms Group / Big Data Solutions
Why graphs are cool: DEMO 
2
So, how did graphs become just another useful big data primitive? 
They DIDN’T.
Reduce the tool drag for graph analytics 
-- Vision (early 2012) 
Set off in the right direction 
4
A complete graph analytics solution 
5 
-- July 2013
6 
Wide on Analytics  E2E on Graph  Deep on Graph  Wide on Analytics 
User Interest
Learning #1: Don’t ignore what’s popular! 
7
Popular Big Data (Structure) Primitives 
Which one is best? It depends… and it’s probably not just one. 
Key-Value Document Column Tabular Graph 
8
Which one is best? It depends… and it’s probably not just one. 
Key-Value Document Column Tabular Graph 
Basic dictionary. 
Very fast. 
Very easy. 
No/minimal structure. 
Java, PIQL, Lua, XML, XQuery,… 
Popular Big Data (Structure) Primitives 
9
Which one is best? It depends… and it’s probably not just one. 
Key-Value Document Column Tabular Graph 
Key(s), metadata, hierarchy, document structure 
XML, BSON, JSON… 
Java, C, C++, REST, Clojure, Scala… 
Popular Big Data (Structure) Primitives 
10
Which one is best? It depends… and it’s probably not just one. 
Key-Value Document Column Tabular Graph 
Key:col_val, Key:col_val… 
Great for “do this to everything in this column” 
Not so much for multiple columns, specific keys 
Hadoop, Zookeeper, Java, Python,… 
Popular Big Data (Structure) Primitives 
11
Which one is best? It depends… and it’s probably not just one. 
Key-Value Document Column Tabular Graph 
Old-school RDBMS 
Collection of tables + relations that join them 
*SQL* 
Popular Big Data (Structure) Primitives 
12
Which one is best? It depends… and it’s probably not just one. 
Key-Value Document Column Tabular Graph 
Nodes, edges, properties of nodes and edges 
Java, Clojure, Lisp, Ruby, C, C++, Scala, REST,… 
Popular Big Data (Structure) Primitives 
13
Key-Value Document Graph 
Sync (I/O) Async (Bus) Off-line (Queue) 
API (Remote) LIB (Local) 
Model 
Access 
Implementation 
Column SQL 
14 
How we use the primitives
How are these primitives put to use? 
15
Ingest & Clean 
Engineer Features 
Structure Model 
Train Model 
Query & Analyze 
Learn 
Visualize 
Data workflow example 
16
Data Representation 
Personal Learning Knowledge Graph 
has_associated 
has_result 
contains 
implemented_by 
Task Level evaluated_by 
-name: "10th Grade" 
-value: 10 
Learning Task 
-name: "Matrix Multiplication" 
-task_id: 101 
-description: "Demonstrate how 
to multiply two matrices" 
-type: "homework" 
Subject 
-name: "Linear Algebra" 
-subject_id: 100 Task Outcome 
-score: 0.8 
-num_correct: 8 
-num_attempts: 2 
Learning Plan 
-plan_id: 1 
-num_tasks: 5 
-expected_time: 5h 
Learning Goal 
-goal_id: 9 
-description: "Achieve above 
average proficiency in all Linear 
Algebra course tasks" 
Proficiency 
name: "Above Average" 
summarized_by 
has_associated 
has_prerequisite 
Graph? Columnar? Tabular?? 
17
18 
Run a graph-based classifier (e.g. LBP) 
Build graph w/ features from frame 
Pull results back to frame to get model perf stats 
Engineer features (avg, ratios) 
Input from another model (segment/cluster)
Learning #2: The primitives are not used in isolation. 
19
Ingest & Clean 
Engineer Features 
Structure Model 
Train Model 
Query & Analyze 
Learn 
Visualize 
Pig/MR 
PySpark 
ETL Tools? 
Pig/MR 
PySpark 
Java, Scala 
Giraph 
GraphX 
(Java, Scala…) 
Mahout 
MLlib 
?? 
*SQL* 
BI tools 
PySpark… 
Tooling mash-up! 
20
Tools are not used in isolation either. How can we cope with this? 
21
Direction #1: Unify primitives and processing on a workflow-oriented engine 
22
Unification with Apache Spark 
Image Source: Databricks 
•In-memory structures (RDDs) support both table and graph abstractions 
•Batch processing and Spark streaming 
Spark 
RDDs, Transformations, and Actions 
Spark Streaming real-time 
Spark 
SQL 
MLLib 
machine learning 
DStream’s: Streams of RDD’s 
SchemaRDD’s 
RDD-Based Matrices 
GraphX 
graph processing/ 
machine learning 
RDD-Based Graphs 
23
Image Source: GraphX project 
•Graph processing engine on Spark 
•Supports Pregel-style vertex programming 
•View same data as either graphs or collections 
GraphX API for Spark 
24
Python bindings for Spark (GraphX) 
25 
Client 
Server 
Python 
JVM 
Py4J 
Files 
JVM 
Akka 
Python 
Worker 
Pipes 
Serialized Python Functions 
Results 
“Transformations” 
“Actions” 
“Operations”
Python bindings for Spark GraphX 
26
Python bindings for Spark GraphX 
Coming soon to Apache! 
Vertex 
•Transformations: filter, mapValues, diff 
•Actions: aggregateUsingIndex 
•Join Operations: innerJoin, leftJoin Edge 
•Transformations: filter, mapValues, reverse 
•Join Operations: innerJoin Graph 
•Property Operators: mapVertices, mapEdges, mapTriplets 
•Structural Operators: subgraph, reverse, mask, groupEdges, 
•Join Operations: joinVertices, outerJoinVertices, 
•Neighborhood Aggregation: mapReduceTriplets 
•Analytics: ALS, SVDPlusPlus, TriangleCount, PageRank, ConnectedComponents, ShortestPaths 
27
Direction #1: Spark 
28 
•Feature engineering 
•Model training 
•Limited language binding (Python, R getting better) 
•Lacks transactions and model serving
Lacks transactions and model serving... or does it? 
Image Source: Crankshaw, D., et al., “The Missing Piece in Complex Analytics: Low Latency, 
Scalable Model Management and Serving with Velox,” Cornell University Library Archive, retrieved November 2014 
Extending BDAS with Velox: 
A UC Berkeley AMPlab project (sponsored in part by Intel) 
29
Direction #2: Unify primitives and processing in relational database 
30
Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 
Unification within the In-Memory Database (IMDB) 
•Index data structure for graph traversal 
•Prototyped in SAP HANA distributed columnar IMDB 
•Lays foundation for complex graph query and algorithms 
31
Graph Traversal 
Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 
32
Graph Indexing 
Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 
33
Graph Traversal Results 
Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 
34
•Store graph as a set of nodes and a set of edges 
•Relational algebra captures all basic graph operations 
•Iterative algorithms captured as driver program that calls stored procedures 
Graph Analytics in Relational Databases? 
Source: ISTC for Big Data, Alekh Jindal, “Graph Analytics: The New Use Case for Relational Databases,” blog 
35
Source: ISTC for Big Data, Alekh Jindal, “Graph Analytics: The New Use Case for Relational Databases,” blog 
Graph Analytics in Relational Databases? 
Relational and graphical analysis – better together! 
36
Source: ISTC for Big Data, Alekh Jindal 
Expressing Graph in SQL 
37
Real Time Database 
BQL – BigDAWG Query Language & Compiler 
Analytics Libraries 
Hardware Platforms 
Applications, Visualization, Languages 
“Narrow waist” provides portability 
Historical / Analytics Databases 
Spill 
Stream 
Future Vision – BigDAWG 
38
Future Vision – BigDAWG 
Real Time DBMSs 
BQL – BigDAWG Query Language & Compiler 
Visualization & Presentation, e.g., ScalaR, imMens, TweetMap, Prefetching 
Languages, e.g, Julia, R, MLbase, GraphLab 
SciDB 
Analytics, e.g., ScaLAPACK, ML algos, plsh, other analytics packages 
TupleWare 
Hardware Platforms, e.g., NVM simulator, Xeon Phi, Xeon 
Applications, e.g., medical data, astronomy, Twitter, urban sensing, IoT 
TileDB 
S-Store 
“Narrow waist” provides portability 
MyriaX 
Historical / Analytics DBMSs 
Spill 
Stream 
39
Direction #2: Relational DB 
40 
•Feature engineering 
•Transactions and model serving 
•Performant model training? 
•Just another Spark behind *QL?
Which direction do you favor? 
41 
Will the lines blur?
42 
Takeaway from both: 
Do all of the parallel distributed 
processing in one place and work with it 
through one UI!
43 
FILESYSTEMS AND NOSQL STORAGE 
HW PLATFORM 
APACHE HADOOP 
APACHE SPARK 
DATA WRANGLING 
MACHINE LEARNING AND STATISTICS 
Graphical Algorithms 
Classical Algorithms 
Graph Construction Tools 
Useful String Manipulation 
Useful Math Operators 
“DATA SCIENCE” REST API 
Intel Analytics Toolkit 
Unified UI’s across the workflow 
Easier feature & model creation 
End-to-end graph pipeline 
Fully scalable throughout 
Multiple data primitives 
Optimized for IA 
Python 
Libraries 
3rd Party GUIs/SDKs 
Viz 
Tools 
Future Libraries 
BI Connectors 
Query Interfaces 
... 
Pressing forward with the Intel Analytics Toolkit
Analyzing the Semantic Web 
Reputations 
Neutral 
Good 
Bad 
Suspect 
44
Unified programming environment: DEMO 
45
46 
PROGRESS TOWARD VISION
47 
If we are successful... 
graph will become just another big data primitive!
49 
How graphs became just another big data primitive 
Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.

More Related Content

What's hot

Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
DataWorks Summit/Hadoop Summit
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
Grigory Sapunov
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Hong Ong
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
Alexander Korbonits
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
Sri Ambati
 
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
Sri Ambati
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Sri Ambati
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
Sri Ambati
 
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2ODeep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Sri Ambati
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
Poo Kuan Hoong
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
inside-BigData.com
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
Yalçın Yenigün
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Data Science Milan
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
Anirudh Koul
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
Jeff Henrikson
 

What's hot (20)

Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2ODeep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 

Viewers also liked

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
MLconf
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
MLconf
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
MLconf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
MLconf
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
MLconf
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
Johann Schleier-Smith
 

Viewers also liked (8)

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 

Similar to Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF

Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning Systems
Hironori Washizaki
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
Gezim Sejdiu
 
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1) Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Jean Ihm
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
Khalid Salama
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Optum
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
Joshua Bae
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
Mark Kerzner
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
Moumie Soulemane
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
ZaranTech LLC
 

Similar to Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF (20)

Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning Systems
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1) Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1)
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF

  • 1. How graphs became just another big data primitive Ted Willke Cloud Platforms Group / Big Data Solutions
  • 2. Why graphs are cool: DEMO 2
  • 3. So, how did graphs become just another useful big data primitive? They DIDN’T.
  • 4. Reduce the tool drag for graph analytics -- Vision (early 2012) Set off in the right direction 4
  • 5. A complete graph analytics solution 5 -- July 2013
  • 6. 6 Wide on Analytics  E2E on Graph  Deep on Graph  Wide on Analytics User Interest
  • 7. Learning #1: Don’t ignore what’s popular! 7
  • 8. Popular Big Data (Structure) Primitives Which one is best? It depends… and it’s probably not just one. Key-Value Document Column Tabular Graph 8
  • 9. Which one is best? It depends… and it’s probably not just one. Key-Value Document Column Tabular Graph Basic dictionary. Very fast. Very easy. No/minimal structure. Java, PIQL, Lua, XML, XQuery,… Popular Big Data (Structure) Primitives 9
  • 10. Which one is best? It depends… and it’s probably not just one. Key-Value Document Column Tabular Graph Key(s), metadata, hierarchy, document structure XML, BSON, JSON… Java, C, C++, REST, Clojure, Scala… Popular Big Data (Structure) Primitives 10
  • 11. Which one is best? It depends… and it’s probably not just one. Key-Value Document Column Tabular Graph Key:col_val, Key:col_val… Great for “do this to everything in this column” Not so much for multiple columns, specific keys Hadoop, Zookeeper, Java, Python,… Popular Big Data (Structure) Primitives 11
  • 12. Which one is best? It depends… and it’s probably not just one. Key-Value Document Column Tabular Graph Old-school RDBMS Collection of tables + relations that join them *SQL* Popular Big Data (Structure) Primitives 12
  • 13. Which one is best? It depends… and it’s probably not just one. Key-Value Document Column Tabular Graph Nodes, edges, properties of nodes and edges Java, Clojure, Lisp, Ruby, C, C++, Scala, REST,… Popular Big Data (Structure) Primitives 13
  • 14. Key-Value Document Graph Sync (I/O) Async (Bus) Off-line (Queue) API (Remote) LIB (Local) Model Access Implementation Column SQL 14 How we use the primitives
  • 15. How are these primitives put to use? 15
  • 16. Ingest & Clean Engineer Features Structure Model Train Model Query & Analyze Learn Visualize Data workflow example 16
  • 17. Data Representation Personal Learning Knowledge Graph has_associated has_result contains implemented_by Task Level evaluated_by -name: "10th Grade" -value: 10 Learning Task -name: "Matrix Multiplication" -task_id: 101 -description: "Demonstrate how to multiply two matrices" -type: "homework" Subject -name: "Linear Algebra" -subject_id: 100 Task Outcome -score: 0.8 -num_correct: 8 -num_attempts: 2 Learning Plan -plan_id: 1 -num_tasks: 5 -expected_time: 5h Learning Goal -goal_id: 9 -description: "Achieve above average proficiency in all Linear Algebra course tasks" Proficiency name: "Above Average" summarized_by has_associated has_prerequisite Graph? Columnar? Tabular?? 17
  • 18. 18 Run a graph-based classifier (e.g. LBP) Build graph w/ features from frame Pull results back to frame to get model perf stats Engineer features (avg, ratios) Input from another model (segment/cluster)
  • 19. Learning #2: The primitives are not used in isolation. 19
  • 20. Ingest & Clean Engineer Features Structure Model Train Model Query & Analyze Learn Visualize Pig/MR PySpark ETL Tools? Pig/MR PySpark Java, Scala Giraph GraphX (Java, Scala…) Mahout MLlib ?? *SQL* BI tools PySpark… Tooling mash-up! 20
  • 21. Tools are not used in isolation either. How can we cope with this? 21
  • 22. Direction #1: Unify primitives and processing on a workflow-oriented engine 22
  • 23. Unification with Apache Spark Image Source: Databricks •In-memory structures (RDDs) support both table and graph abstractions •Batch processing and Spark streaming Spark RDDs, Transformations, and Actions Spark Streaming real-time Spark SQL MLLib machine learning DStream’s: Streams of RDD’s SchemaRDD’s RDD-Based Matrices GraphX graph processing/ machine learning RDD-Based Graphs 23
  • 24. Image Source: GraphX project •Graph processing engine on Spark •Supports Pregel-style vertex programming •View same data as either graphs or collections GraphX API for Spark 24
  • 25. Python bindings for Spark (GraphX) 25 Client Server Python JVM Py4J Files JVM Akka Python Worker Pipes Serialized Python Functions Results “Transformations” “Actions” “Operations”
  • 26. Python bindings for Spark GraphX 26
  • 27. Python bindings for Spark GraphX Coming soon to Apache! Vertex •Transformations: filter, mapValues, diff •Actions: aggregateUsingIndex •Join Operations: innerJoin, leftJoin Edge •Transformations: filter, mapValues, reverse •Join Operations: innerJoin Graph •Property Operators: mapVertices, mapEdges, mapTriplets •Structural Operators: subgraph, reverse, mask, groupEdges, •Join Operations: joinVertices, outerJoinVertices, •Neighborhood Aggregation: mapReduceTriplets •Analytics: ALS, SVDPlusPlus, TriangleCount, PageRank, ConnectedComponents, ShortestPaths 27
  • 28. Direction #1: Spark 28 •Feature engineering •Model training •Limited language binding (Python, R getting better) •Lacks transactions and model serving
  • 29. Lacks transactions and model serving... or does it? Image Source: Crankshaw, D., et al., “The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox,” Cornell University Library Archive, retrieved November 2014 Extending BDAS with Velox: A UC Berkeley AMPlab project (sponsored in part by Intel) 29
  • 30. Direction #2: Unify primitives and processing in relational database 30
  • 31. Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) Unification within the In-Memory Database (IMDB) •Index data structure for graph traversal •Prototyped in SAP HANA distributed columnar IMDB •Lays foundation for complex graph query and algorithms 31
  • 32. Graph Traversal Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 32
  • 33. Graph Indexing Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 33
  • 34. Graph Traversal Results Source: Marcus Paradies, GRAph Data-management & ExperienceS Workshop (GRADES 2014) 34
  • 35. •Store graph as a set of nodes and a set of edges •Relational algebra captures all basic graph operations •Iterative algorithms captured as driver program that calls stored procedures Graph Analytics in Relational Databases? Source: ISTC for Big Data, Alekh Jindal, “Graph Analytics: The New Use Case for Relational Databases,” blog 35
  • 36. Source: ISTC for Big Data, Alekh Jindal, “Graph Analytics: The New Use Case for Relational Databases,” blog Graph Analytics in Relational Databases? Relational and graphical analysis – better together! 36
  • 37. Source: ISTC for Big Data, Alekh Jindal Expressing Graph in SQL 37
  • 38. Real Time Database BQL – BigDAWG Query Language & Compiler Analytics Libraries Hardware Platforms Applications, Visualization, Languages “Narrow waist” provides portability Historical / Analytics Databases Spill Stream Future Vision – BigDAWG 38
  • 39. Future Vision – BigDAWG Real Time DBMSs BQL – BigDAWG Query Language & Compiler Visualization & Presentation, e.g., ScalaR, imMens, TweetMap, Prefetching Languages, e.g, Julia, R, MLbase, GraphLab SciDB Analytics, e.g., ScaLAPACK, ML algos, plsh, other analytics packages TupleWare Hardware Platforms, e.g., NVM simulator, Xeon Phi, Xeon Applications, e.g., medical data, astronomy, Twitter, urban sensing, IoT TileDB S-Store “Narrow waist” provides portability MyriaX Historical / Analytics DBMSs Spill Stream 39
  • 40. Direction #2: Relational DB 40 •Feature engineering •Transactions and model serving •Performant model training? •Just another Spark behind *QL?
  • 41. Which direction do you favor? 41 Will the lines blur?
  • 42. 42 Takeaway from both: Do all of the parallel distributed processing in one place and work with it through one UI!
  • 43. 43 FILESYSTEMS AND NOSQL STORAGE HW PLATFORM APACHE HADOOP APACHE SPARK DATA WRANGLING MACHINE LEARNING AND STATISTICS Graphical Algorithms Classical Algorithms Graph Construction Tools Useful String Manipulation Useful Math Operators “DATA SCIENCE” REST API Intel Analytics Toolkit Unified UI’s across the workflow Easier feature & model creation End-to-end graph pipeline Fully scalable throughout Multiple data primitives Optimized for IA Python Libraries 3rd Party GUIs/SDKs Viz Tools Future Libraries BI Connectors Query Interfaces ... Pressing forward with the Intel Analytics Toolkit
  • 44. Analyzing the Semantic Web Reputations Neutral Good Bad Suspect 44
  • 47. 47 If we are successful... graph will become just another big data primitive!
  • 48.
  • 49. 49 How graphs became just another big data primitive Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.