SlideShare a Scribd company logo
1 of 24
N(ot)-o(nly)-(Ha)doop - the DAG showdown
Intel Corporation
Joydeep Ghosh & Seshu Edala
June, 2015
Copyright © 2015, Intel Corporation. All rights reserved.
Legal Message
THE INFORMATION PROVIDED IN THIS PRESENTATION IS INTENDED TO BE GENERAL IN NATURE AND IS NOT
SPECIFIC GUIDANCE. RECOMMENDATIONS (INCLUDING POTENTIAL COST SAVINGS) ARE BASED UPON
INTEL'S EXPERIENCE AND ARE ESTIMATES ONLY. INTEL DOES NOT GUARANTEE OR WARRANT OTHERS
WILL OBTAIN SIMILAR RESULTS
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN
THIS SUMMARY.
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to
vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2015, Intel Corporation. All rights reserved.
2
Copyright © 2015, Intel Corporation. All rights reserved.
Key Messages
3
 Evolution of Hadoop to Big Data
 Introduction to DAG
 DAG runtimes
 Evaluation
 Performance
 Completeness
 Results
nodoop is still not-only-hadoop; time for “no-MR” looks closer
Copyright © 2015, Intel Corporation. All rights reserved.
4
slow
fragmented
skills gap
block-oriented
data mutability
+
Copyright © 2015, Intel Corporation. All rights reserved.
Hadoop to Big Data
5
Processing Model
Analytical Model
Storage Model
Language Model
Complex EventBatch In-Memory
Machine
Learning
Textual SpatialAggregate Temporal Graph
Unstructured Relational Columnar Hierarchic Graph
MR SQL NOSQL JSQL NOSPARQL
retrofitting Hadoop [unstructured batch analytics] to cater to the full big data demand
Copyright © 2015, Intel Corporation. All rights reserved.
Map Reduce (MR) and Directed Acyclic Graph
(DAG)
6
Stage - 1
Stage - 2
Stage - 3
 continuous dataflow
 relational semantics
 in-memory buffering
 sequential dataflow
 MR semantics
 on-disk storage
Copyright © 2015, Intel Corporation. All rights reserved.
MR & DAG Runtimes
7
* Chose only few products for evaluation
DAG*MR
Note: Other names and brands may be claimed as the property of others.
Impala
Hadoop 2.5.0-cdh5.3.0, Hive 0.13.1-cdh5.3.0,presto-server-0.103,
Apache Drill: 0.9.0 ,impalad version 2.1.0-cdh5, Spark 1.3.1, HPCC –
5.0.14.1
Copyright © 2015, Intel Corporation. All rights reserved.
Completeness Criteria
 On Disk failover
 HDFS Compatibility
 Yarn Integration
 File formats
 Expressive language
 Streaming support
8
 Connectivity
 Web UI
 Integrated Monitoring
 Security
 Hybrid Analytics
 Seamless Dataframes
Copyright © 2015, Intel Corporation. All rights reserved.
Completeness Scores
9
Note: Other names and brands may be claimed as the property of others.
Copyright © 2015, Intel Corporation. All rights reserved.
Performance Criteria
10
0
200
400
600
800
1000
1200
1400
1600
1800
FULL TABLE
SCAN
JOIN FACT
DIMENSION
AGGREGATE
FUNCTION
JOIN FACT TO
FACT
TEXT
ANALYTICS
LOG ANALYTICS
PROCESSINGTIMESECONDS
PERFORMANCE COMPARISION
Hive
Impala
Spark
Drill
Presto
HPCC
Copyright © 2015, Intel Corporation. All rights reserved.
11
 All queries completed successfully
 A reliable baseline
670.99
640.37
1705.75
983.73
1298.56
411.88
F ULL T A BLE
S C A N
JOIN F A C T
DIME NSION
A GGRE GA T E
FUNC T ION
JOIN F A C T
T O FAC T
T E X T
ANALY T IC S
LOG
ANALY T IC S
HIVEPROCESSINGTIMESECONDS
Hive
Impala
Spark
Drill
Presto
HPCC
Copyright © 2015, Intel Corporation. All rights reserved.
12
 All queries completed successfully
 Lack of window functions in Spark-
SQL makes moving average analytics
challenging
 Mixed SQL & RDD programming
 Not DAG!
 ~2x to 8x
87.28
192.88
669.09
231.55
132.05
285
F ULL T A BLE
S C A N
JOIN F A C T
DIME NSION
A GGRE GA T E
FUNC T ION
JOIN F A C T
T O FAC T
T E X T
ANALY T IC S
LOG
ANALY T IC S
SPARKPROCESSINGTIMESECONDS
Spark
Impala
Hive
Drill
Presto
HPCC
Copyright © 2015, Intel Corporation. All rights reserved.
13
 In-memory DAG
 Table generating functions and array
functions are not supported; text
analytics example failed
 ~1x to 20x
29.06
72.45
222.98
168.86
0
747.64
F ULL T A BLE
S C A N
JOIN F A C T
DIME NSION
A GGRE GA T E
FUNC T ION
JOIN F A C T
T O FAC T
T E X T
A NA LY T IC S
LOG
ANALY T IC S
IMPALAPROCESSINGTIMESECONDS
Impala
Hive
Spark
Drill
Presto
HPCC
Copyright © 2015, Intel Corporation. All rights reserved.
14
 In-memory DAG – No Resilience
 Table generating functions not
supported; text analytics example failed
 Window functions are still
beta/unsupported; log analytics failed
 ~ 5x to 50x
126.99
83.97
250.19
15.13
0
0
F ULL T A BLE
S C A N
JOIN F A C T
DIME NSION
A GGRE GA T E
FUNC T ION
JOIN F A C T
T O FAC T
T E X T
ANALY T IC S
LOG
ANALY T IC S
DRILLPROCESSINGTIMESECONDS
Drill
Spark
Impala
Hive
Presto
HPCC
Copyright © 2015, Intel Corporation. All rights reserved.
15
 In-memory DAG – No Resilience
 Table generating functions not
supported; text analytics example
failed
 ~ 5x to 60x
4.69
67
491
233.66
0
89.66
F ULL T A BLE
S C A N
JOIN F A C T
DIME NS ION
A GGRE GA T E
F UNC T ION
JOIN F A C T
T O F A C T
T E X T
A NA LY T IC S
LOG
A NA LY T IC S
PRESTOPROCESSINGTIMESECONDS
Presto
Drill
Spark
Impala
Hive
HPCC
Copyright © 2015, Intel Corporation. All rights reserved.
16
 All queries completed successfully
 On-disk DAG runtime; reliable,
complete, performant
 Declarative ECL language; not SQL
 No native support for HDFS
 ~ 2x to 20x
39.43
51.16
305.5
10.43
315.5
206.1
F ULL T A BLE
S C A N
JOIN F A C T
DIME NSION
A GGRE GA T E
FUNC T ION
JOIN F A C T
T O FAC T
T E X T
ANALY T IC S
LOG
ANALY T IC S
HPCCPROCESSINGTIMESECONDS
HPCC
Presto
Drill
Spark
Impala
Hive
Copyright © 2015, Intel Corporation. All rights reserved.
Findings
 Big data use-cases stretch beyond unstructured batch jobs.
 Can DAG meet the demand and performance?
17
Problem Context
 DAG runtimes are still maturing
 Spark comes closest
Copyright © 2015, Intel Corporation. All rights reserved.
NODOOP = Not only Hadoop
18
Copyright © 2015, Intel Corporation. All rights reserved.
19
Questions
Copyright © 2015, Intel Corporation. All rights reserved.
21
Backup
Copyright © 2015, Intel Corporation. All rights reserved.
Benchmark Environment
 Cloudera Enterprise 5.3.2
 4 Node Cluster [1 master + 3 workers]
 Memory 62.9 GiB in each node
 Cores 16
 TPCDS Database with Scale of 250
 Queries used
 Full Table Scan
 Fact and Dimension Join
 Aggregate functions
 Fact to Fact Join
 Text Analytics
 Log Analytics
22
 Hadoop 2.5.0-cdh5.3.0
 Hive 0.13.1-cdh5.3.0
 presto-server-0.103
 Apache Drill: 0.9.0
 impalad version 2.1.0-cdh5
 Spark 1.3.1
 HPCC – 5.0.14.1
 TPCDS Scale of 250 – 19.3 GB
 Store Sales -18.8 GB
 Customer - 300.3 MB
 Text Analytics (twitter) – 436.6 MB
 CIKM twitter dataset
 Log Analytics (weblog) - 5.0 GB
 HPCC ECL WLAM sample
Versions
Data Volume
Copyright © 2015, Intel Corporation. All rights reserved.
Completeness Scores
23
To-disk failover 2 3 0 3 0 3 4
HDFS Compatibility 4 4 4 4 4 4 2
Yarn Integration 4 0 0 3 1 4 0
File formats 4 4 4 3 2 4 1
Expressive language 3 3 3 4 3 3 3
Streaming support 0 0 0 4 0 4 0
Connectivity 4 4 4 4 4 2 3
Web UI 2 3 4 4 4 3 3
Integrated Monitoring 2 3 4 4 4 3 4
Security 3 3 1 1 1 1 1
Hybrid Analytics 3 2 1 4 1 3 4
Seamless Dataframes 1 1 1 4 1 4 2
32 30 26 42 25 38 25
*Score: 0 Min [0] - 4 Max [4]Note: Other names and brands may be claimed as the property of others.
N(ot)-o(nly)-(Ha)doop - the DAG showdown

More Related Content

What's hot

Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel IT Center
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...Intel IT Center
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...Intel IT Center
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...Intel IT Center
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...Intel IT Center
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
IT@Intel: Creating Smart Spaces with All-in-Ones
IT@Intel:  Creating Smart Spaces with All-in-OnesIT@Intel:  Creating Smart Spaces with All-in-Ones
IT@Intel: Creating Smart Spaces with All-in-OnesIT@Intel
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC UpdateIBM Danmark
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...Principled Technologies
 
Passing The Joel Test In The PHP World
Passing The Joel Test In The PHP WorldPassing The Joel Test In The PHP World
Passing The Joel Test In The PHP WorldLorna Mitchell
 
Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentIntel IT Center
 
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...Principled Technologies
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing SlidesRonen Mendezitsky
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
 
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...Principled Technologies
 

What's hot (20)

Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
IT@Intel: Creating Smart Spaces with All-in-Ones
IT@Intel:  Creating Smart Spaces with All-in-OnesIT@Intel:  Creating Smart Spaces with All-in-Ones
IT@Intel: Creating Smart Spaces with All-in-Ones
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC Update
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
 
Passing The Joel Test In The PHP World
Passing The Joel Test In The PHP WorldPassing The Joel Test In The PHP World
Passing The Joel Test In The PHP World
 
Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® Development
 
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
A Dell Latitude 5420 laptop powered by a four-core Intel Core i5-1145G7 vPro ...
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
 

Similar to N(ot)-o(nly)-(Ha)doop - the DAG showdown

Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Alluxio, Inc.
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red_Hat_Storage
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergenceinside-BigData.com
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AILex Yu
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceSergey Arkhipov
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7MarketingArrowECS_CZ
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Armel Nene
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chipinside-BigData.com
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyDataWorks Summit
 
IBM Power for Life Sciences
IBM Power for Life SciencesIBM Power for Life Sciences
IBM Power for Life SciencesDavid Spurway
 

Similar to N(ot)-o(nly)-(Ha)doop - the DAG showdown (20)

Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python Performance
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
 
IBM Power for Life Sciences
IBM Power for Life SciencesIBM Power for Life Sciences
IBM Power for Life Sciences
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

N(ot)-o(nly)-(Ha)doop - the DAG showdown

  • 1. N(ot)-o(nly)-(Ha)doop - the DAG showdown Intel Corporation Joydeep Ghosh & Seshu Edala June, 2015
  • 2. Copyright © 2015, Intel Corporation. All rights reserved. Legal Message THE INFORMATION PROVIDED IN THIS PRESENTATION IS INTENDED TO BE GENERAL IN NATURE AND IS NOT SPECIFIC GUIDANCE. RECOMMENDATIONS (INCLUDING POTENTIAL COST SAVINGS) ARE BASED UPON INTEL'S EXPERIENCE AND ARE ESTIMATES ONLY. INTEL DOES NOT GUARANTEE OR WARRANT OTHERS WILL OBTAIN SIMILAR RESULTS This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. Copyright © 2015, Intel Corporation. All rights reserved. 2
  • 3. Copyright © 2015, Intel Corporation. All rights reserved. Key Messages 3  Evolution of Hadoop to Big Data  Introduction to DAG  DAG runtimes  Evaluation  Performance  Completeness  Results nodoop is still not-only-hadoop; time for “no-MR” looks closer
  • 4. Copyright © 2015, Intel Corporation. All rights reserved. 4 slow fragmented skills gap block-oriented data mutability +
  • 5. Copyright © 2015, Intel Corporation. All rights reserved. Hadoop to Big Data 5 Processing Model Analytical Model Storage Model Language Model Complex EventBatch In-Memory Machine Learning Textual SpatialAggregate Temporal Graph Unstructured Relational Columnar Hierarchic Graph MR SQL NOSQL JSQL NOSPARQL retrofitting Hadoop [unstructured batch analytics] to cater to the full big data demand
  • 6. Copyright © 2015, Intel Corporation. All rights reserved. Map Reduce (MR) and Directed Acyclic Graph (DAG) 6 Stage - 1 Stage - 2 Stage - 3  continuous dataflow  relational semantics  in-memory buffering  sequential dataflow  MR semantics  on-disk storage
  • 7. Copyright © 2015, Intel Corporation. All rights reserved. MR & DAG Runtimes 7 * Chose only few products for evaluation DAG*MR Note: Other names and brands may be claimed as the property of others. Impala Hadoop 2.5.0-cdh5.3.0, Hive 0.13.1-cdh5.3.0,presto-server-0.103, Apache Drill: 0.9.0 ,impalad version 2.1.0-cdh5, Spark 1.3.1, HPCC – 5.0.14.1
  • 8. Copyright © 2015, Intel Corporation. All rights reserved. Completeness Criteria  On Disk failover  HDFS Compatibility  Yarn Integration  File formats  Expressive language  Streaming support 8  Connectivity  Web UI  Integrated Monitoring  Security  Hybrid Analytics  Seamless Dataframes
  • 9. Copyright © 2015, Intel Corporation. All rights reserved. Completeness Scores 9 Note: Other names and brands may be claimed as the property of others.
  • 10. Copyright © 2015, Intel Corporation. All rights reserved. Performance Criteria 10 0 200 400 600 800 1000 1200 1400 1600 1800 FULL TABLE SCAN JOIN FACT DIMENSION AGGREGATE FUNCTION JOIN FACT TO FACT TEXT ANALYTICS LOG ANALYTICS PROCESSINGTIMESECONDS PERFORMANCE COMPARISION Hive Impala Spark Drill Presto HPCC
  • 11. Copyright © 2015, Intel Corporation. All rights reserved. 11  All queries completed successfully  A reliable baseline 670.99 640.37 1705.75 983.73 1298.56 411.88 F ULL T A BLE S C A N JOIN F A C T DIME NSION A GGRE GA T E FUNC T ION JOIN F A C T T O FAC T T E X T ANALY T IC S LOG ANALY T IC S HIVEPROCESSINGTIMESECONDS Hive Impala Spark Drill Presto HPCC
  • 12. Copyright © 2015, Intel Corporation. All rights reserved. 12  All queries completed successfully  Lack of window functions in Spark- SQL makes moving average analytics challenging  Mixed SQL & RDD programming  Not DAG!  ~2x to 8x 87.28 192.88 669.09 231.55 132.05 285 F ULL T A BLE S C A N JOIN F A C T DIME NSION A GGRE GA T E FUNC T ION JOIN F A C T T O FAC T T E X T ANALY T IC S LOG ANALY T IC S SPARKPROCESSINGTIMESECONDS Spark Impala Hive Drill Presto HPCC
  • 13. Copyright © 2015, Intel Corporation. All rights reserved. 13  In-memory DAG  Table generating functions and array functions are not supported; text analytics example failed  ~1x to 20x 29.06 72.45 222.98 168.86 0 747.64 F ULL T A BLE S C A N JOIN F A C T DIME NSION A GGRE GA T E FUNC T ION JOIN F A C T T O FAC T T E X T A NA LY T IC S LOG ANALY T IC S IMPALAPROCESSINGTIMESECONDS Impala Hive Spark Drill Presto HPCC
  • 14. Copyright © 2015, Intel Corporation. All rights reserved. 14  In-memory DAG – No Resilience  Table generating functions not supported; text analytics example failed  Window functions are still beta/unsupported; log analytics failed  ~ 5x to 50x 126.99 83.97 250.19 15.13 0 0 F ULL T A BLE S C A N JOIN F A C T DIME NSION A GGRE GA T E FUNC T ION JOIN F A C T T O FAC T T E X T ANALY T IC S LOG ANALY T IC S DRILLPROCESSINGTIMESECONDS Drill Spark Impala Hive Presto HPCC
  • 15. Copyright © 2015, Intel Corporation. All rights reserved. 15  In-memory DAG – No Resilience  Table generating functions not supported; text analytics example failed  ~ 5x to 60x 4.69 67 491 233.66 0 89.66 F ULL T A BLE S C A N JOIN F A C T DIME NS ION A GGRE GA T E F UNC T ION JOIN F A C T T O F A C T T E X T A NA LY T IC S LOG A NA LY T IC S PRESTOPROCESSINGTIMESECONDS Presto Drill Spark Impala Hive HPCC
  • 16. Copyright © 2015, Intel Corporation. All rights reserved. 16  All queries completed successfully  On-disk DAG runtime; reliable, complete, performant  Declarative ECL language; not SQL  No native support for HDFS  ~ 2x to 20x 39.43 51.16 305.5 10.43 315.5 206.1 F ULL T A BLE S C A N JOIN F A C T DIME NSION A GGRE GA T E FUNC T ION JOIN F A C T T O FAC T T E X T ANALY T IC S LOG ANALY T IC S HPCCPROCESSINGTIMESECONDS HPCC Presto Drill Spark Impala Hive
  • 17. Copyright © 2015, Intel Corporation. All rights reserved. Findings  Big data use-cases stretch beyond unstructured batch jobs.  Can DAG meet the demand and performance? 17 Problem Context  DAG runtimes are still maturing  Spark comes closest
  • 18. Copyright © 2015, Intel Corporation. All rights reserved. NODOOP = Not only Hadoop 18
  • 19. Copyright © 2015, Intel Corporation. All rights reserved. 19 Questions
  • 20.
  • 21. Copyright © 2015, Intel Corporation. All rights reserved. 21 Backup
  • 22. Copyright © 2015, Intel Corporation. All rights reserved. Benchmark Environment  Cloudera Enterprise 5.3.2  4 Node Cluster [1 master + 3 workers]  Memory 62.9 GiB in each node  Cores 16  TPCDS Database with Scale of 250  Queries used  Full Table Scan  Fact and Dimension Join  Aggregate functions  Fact to Fact Join  Text Analytics  Log Analytics 22  Hadoop 2.5.0-cdh5.3.0  Hive 0.13.1-cdh5.3.0  presto-server-0.103  Apache Drill: 0.9.0  impalad version 2.1.0-cdh5  Spark 1.3.1  HPCC – 5.0.14.1  TPCDS Scale of 250 – 19.3 GB  Store Sales -18.8 GB  Customer - 300.3 MB  Text Analytics (twitter) – 436.6 MB  CIKM twitter dataset  Log Analytics (weblog) - 5.0 GB  HPCC ECL WLAM sample Versions Data Volume
  • 23. Copyright © 2015, Intel Corporation. All rights reserved. Completeness Scores 23 To-disk failover 2 3 0 3 0 3 4 HDFS Compatibility 4 4 4 4 4 4 2 Yarn Integration 4 0 0 3 1 4 0 File formats 4 4 4 3 2 4 1 Expressive language 3 3 3 4 3 3 3 Streaming support 0 0 0 4 0 4 0 Connectivity 4 4 4 4 4 2 3 Web UI 2 3 4 4 4 3 3 Integrated Monitoring 2 3 4 4 4 3 4 Security 3 3 1 1 1 1 1 Hybrid Analytics 3 2 1 4 1 3 4 Seamless Dataframes 1 1 1 4 1 4 2 32 30 26 42 25 38 25 *Score: 0 Min [0] - 4 Max [4]Note: Other names and brands may be claimed as the property of others.

Editor's Notes

  1. 2
  2. DAG Definition: directed = the connections between the nodes (edges) have a direction: A -> B is not the same as B -> A acyclic = "non-circular" - moving from node to node by following the edges, you will never encounter the same node for the second time. graph = structure consisting of nodes, that are connected to each other with edges Basically a directed acyclic graph is a tree.
  3. In-memory to on-disk failover [DAG v MR] Storage Compatibility [HDFS v proprietary] Resource Management [Yarn v other] File format compatibility [Parquet/Columnar, Avro/Row, JSON/Hierarchic, Textfile/Linear] Expressive language [Declarative/Functional v Imperative] Streaming + batch support [Temporal/Dimensional Partitioning v Tabular Scans] Connectivity to other systems [ODBC vs WS, Virtual vs Physical] Ease of use – Web UIs [IDE vs Putty, Dashboard vs File-based aggregation] Execution, monitoring, debugging, logging [Centralized v Decentralized; Integrated with CM v Fragmented] Security [Authentication with LDAP/AD, Authorization/ACLs w Sentry/Posix/Kerberos, Encyrption on disk v wire, Key Management central v isolated] Integrated graph, temporal, and statistical analytics [one framework vs Multiple Libraries] Integrated Files, Tables, Datasets, DataFrame “views” – Ability to Share Results
  4. https://cwiki.apache.org/confluence/display/DRILL/Release+Notes Drill now features complete support for UNION ALL and COUNT(DISTINCT). Drill 0.8 also includes new functions such as unix_timestamp and the window functions sum, count and rank. Note that these window functions should be considered beta.