SlideShare a Scribd company logo
Benchmarking Hadoop
with ALOJA
Oct 6, 2015
by Nicolas Poggi @ni_po
sudoers Barcelona:
About Nicolas Poggi @ni_po
Work:
Education:
Community:
Agenda
 Intro on Hadoop
 Current scenario and problematic
 ALOJA project
 Open source tools
 Benchmarking DEMO
 Results
 DEMO results online
 Open questions and comments
Intro: Hadoop design and ecosystem
Hadoop design
 Hadoop designed to solve complex data
 Structured and non structured
 With [close to] linear scalability
 Simplifying the programming model
 From MPI, OpenMP, CUDA, …
 Operates as a blackbox for data analysts
Image source: Hadoop, the definitive guide
Hadoop parameters
 > 100+ tunable parameters
 mapred.map/reduce.tasks.speculative.execution
 obscure and interrelated
 io.sort.mb 100 (300)
 io.sort.record.percent 5% (15%)
 io.sort.spill.percent 80% (95 – 100%)
 Number of Mappers and Reducers
 Rule of thumb 0.5 - 2 per CPU core
Hadoop stack for tuning
Image source: Intel® Distribution for Apache Hadoop
Hadoop highly-scalable but…
 Not a high-performance solution!
 Requires
 Design,
 Clusters, topology clusters
 Setup,
 OS, Hadoop config
 and tuning required
 Iterative approach
 Time consuming
 And extensive benchmarking!
Hadoop ecosystem
 Large and spread
 Dominated by big players
 Custom patches
 Default values not ideal
 Product claims
 Cloud vs. On-premise
 IaaS
 PaaS
 EMR, HDInsight
 Needs standardization
and auditing!
DATA
Product claims
 Needs auditing!
Too many choices?
Remote volumes
-
-
Rotational HDDs
JBODs
Large VMs
Small VMs
GbEthernet
InfiniBand
RAID
Cost
Performance
On-Premise
Cloud
And where is my system
configurationpositionedon
each of these axes?
Highavailability
Replication
+
+
Project ALOJA
 Open initiative to produce mechanisms for an
 automated characterization of cost-effectiveness
 of Big Data deployments
 Results from of a growing need of the community to
understand job execution details and create transparency
 Explore different configuration deployment options and
their tradeoffs
 Both software and hardware
 Cloud services and on-premise
 Seeks to provide knowledge, tools, and an online service
 to with which users make better informed decisions
 reduce the TCO for their Big Data infrastructures
 Guide the future development and deployment of Big Data clusters
and applications
Challenges, options, and implementation
Challenges (circa end 2013)
 Test different clusters architectures
 On-premise
 Commodity, high-end, appliance, low-power
 Cloud IaaS
 32 different VMs in Azure, similar in other
providers
 Cloud PaaS
 HDInsight, EMR, CloudBigData
 Different access level
 Full admin, user-only, request-to-install,
everything ready, queuing systems (SGE)
 Different versions
 Hadoop, JVM, Spark, Hive, etc…
 Dev environments and testing
 Big Data usually requires a cluster to
develop and test
Benchmarking vs. Production envs
 Need to compare different executions
 Not how the systems are doing now
 This is the main diff with prod products
 Dada does not change (non-OLTP)
 Temporary data for benchmarks vs. Important data
 Fast iteration vs. Reliability
 Iterates configurations vs. fixed config
 Many fast, experimental changes
 Security can be relaxed
 Management for Hadoop
 Vendor lock-in
 Lack of systems support(azure, on-prem, low-power)
 Hadoop is our use case, not the only one
 Leave no traces on the benchmarked system
Available options: (circa end 2013)
 Deployment
 jclouds
 foreman
 Puppet
 Ambari
 Config and deploy
 Ambari (hadoop only)
 Use Configuration
Management (CM)
 Puppet, chef, ansible…
 Monitoring
 Ganglia, Zabbix
 Amabari
 Cloudera Manager
 Kibana, GraphD…
 Problems
 All systems thoughfor PROD
 Not for comparison
 No Azure support
 Many different packages
 No one-fits-all solution
 Solution
 Custom implementation
 Based in simple components
 Wrapping commands
ALOJA Platform main components
2 Online Repository
•Explore results
•Execution details
•Cluster details
•Costs
•Data sharing
3 Web Analytics
•Data views and evaluations
•Aggregates
•Abstracted Metrics
•Job characterization
•Machine Learning
•Predictions and clustering
1 Big Data Benchmarking
•Deploy & Provision
•Conf Management
•Parameter selection & Queuing
•Perf counters
•Low-level instrumentation
•App logs
17
NGINX, PHP, MySQL
BASH, Unix tools, CLIs R, SQL, JS
Workflow in ALOJA
Cluster(s)
definition
• VM sizes
• # nodes
• OS, disks
• Capabilities
Execution
plan
• Start cluster
• Exec Benchmarks
• Gather results
• Cleanup
Import
data
• Convert perf metric
• Parse logs
• Import into DB
Evaluate
data
• Data views in Vagrant VM
• Or http://hadoop.bsc.es
PA and KD
•Predictive
Analytics
•Knowledge
Discovery
Historic
Repo
(in progress)
Cluster and node definitions
Clusters (Azure example) Node (Web in Rackspace)
#load AZURE defaults
source "$CONF_DIR/azure_defaults.conf"
clusterName="al-08"
numberOfNodes="8"
vmSize=“Large”
#details
vmCores="4"
vmRAM="7" #in GB
#costs
clusterCostHour="1.584"#0.176 * 9
clusterType="IaaS"
clusterDescription="A3 type VMs"
#load node defaults
source “$CONF_DIR/node_defaults.conf"
defaultProvider="rackspace"
vm_name="aloja-web"
vmSize='io1-30'
attachedVolumes="2"
diskSize="1023"
# Node roles (install functions)
extraLocalCommands="
vm_install_webserver;
vm_install_repo 'provider/rackspace';
install_ganglia_gmond;
config_ganglia_gmond 'aloja-web-rackspace' 'aloja-
web';
install_percona /scratch/attached/2/mysql;"
Commands and providers
Provisioning commands Providers
 Connect
 Node and Cluster
 Uses SSH proxies
automatically
 Deploy
 Start, Stop
 Delete
 Nodes and clusters
 On-premise
 Custom settings for
clusters
 Multiple disk types
 Different architectures
 Cloud IaaS
 Azure, OpenStack,
Rackspace, AWS (testing)
 Cloud PaaS
 HDInsight, CloudBigData,
EMR soon
Code at: https://github.com/Aloja/aloja/tree/master/aloja-deploy
Running benchmarks in ALOJA
 Example of submitting a job to run:
 https://github.com/Aloja/aloja/blob/master/aloja-bench/run_benchs.sh
 To queue jobs and control results:
 https://github.com/Aloja/aloja/blob/master/shell/exeq.sh
Benchmarking results
ALOJA Online Benchmark Repository
 Entry point for explorethe results collected from the executions
 Index of executions
 Quick glance of executions
 Searchable,Sortable
 Execution details
 Performance chartsandhistograms
 Hadoopcounters
 Jobsand taskdetails
 Data management of benchmark executions
 Data importing from different clusters
 Execution validation
 Data management and backup
 Cluster definitions
 Cluster capabilities (resources)
 Cluster costs
 Sharing results
 Download executions
 Add external executions
 Documentation and References
 Papers, links, and feature documentation
Availableat: http://aloja.bsc.es
Impact of SW configurations in Speedup
(4 node clusters)
Number of mappers Compression algorithm
No comp.
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using: http://hadoop.bsc.es/configimprovement
Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
Impact of HW configurationsin Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes
/tmp local
2 Remotes
/tmp local
1 Remotes
/tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using: http://hadoop.bsc.es/configimprovement
Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
Speedup: all disk configurationsSSD vs JBOD
 For DFSIOEread, DFSIOEwrite, and Terasort
URL:
http://hadoop.bsc.es/configimprovement?datefrom=&dateto=&benchs%5B%5D=dfsioe_read&benchs%5B%5D=dfsioe_write&benchs%5B%5D=terasort&id_clusters%5B%5D=21&nets%5B%5D=None&disks%5B%5D=HD2&disks%5B%5D=H
D3&disks%5B%5D=HD4&disks%5B%5D=HD5&disks%5B%5D=HDD&disks%5B%5D=HS5&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RL4&disks%5B%5D=RL5&disks%5B%5D=RL6&disks%5B%5D=RR1&disks%
5B%5D=SS2&disks%5B%5D=SSD&mapss%5B%5D=None&comps%5B%5D=None&replications%5B%5D=None&blk_sizes%5B%5D=None&iosfs%5B%5D=None&iofilebufs%5B%5D=None&datanodess%5B%5D=None&bench_types%5B%5D=H
DI&bench_types%5B%5D=HiBench&vm_sizes%5B%5D=None&vm_coress%5B%5D=None&vm_RAMs%5B%5D=None&hadoop_versions%5B%5D=None&types%5B%5D=None&filters%5B%5D=valid&filters%5B%5D=filters&allunchecked=
2 SSDs
5 SATA
1 SSD /tmp
1 SSD
1 SATA
2 SATA
3 SATA
4 SATA
5 SATA
Higherisbetter
Fastest config
Highcapacity and fast
Highcapacity but slow
Speedup by disk configuration in the Cloud
(higher is better)
URL
http://104.130.159.92/configimprovement?benchs%5B%5D=terasort&disks%5B%5D=HDD&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RR1&disks%5B%5D=RR2&disk
s%5B%5D=RR3&disks%5B%5D=RR4&disks%5B%5D=RR5&disks%5B%5D=RR6&disks%5B%5D=RS1&disks%5B%5D=RS6&disks%5B%5D=SSD&bench_types%5B%5D=HiBench&filters%5B%5D=valid&filt
ers%5B%5D=filters&allunchecked=&selected-groups=disk&datefrom=&dateto=&minexetime=150&maxexetime=1500
1-6 remotes
1 and 6
remotes with
/tmp on SSD
SSD only
Higherisbetter
VM Size comparison(Azure) Lower is better
Preview: Cost/Performance Scalability
 This shows a sample of a new screen (with sample data) to find the most cost-
effective cluster size
 X axis number of datanodes (cluster size
 Left Y Execution time (lower is better)
 Right Y Execution cost
Execution time Execution cost
Recommendedsize
InfiniBand + SDD (LOCAL)
GbE SDD + (LOCAL) CLOUD (local disk/tmpand HDFS)
CLOUD (/tmpinLocal Disk, HDFSin Blob storage 1-3
devices)
CLOUD (/tmpandHDFSin Blob storage
1-3 devices)
InfiniBand + SATA disks (LOCAL)
GbE+ SATA disks (LOCAL)
Price
Performance
Cost-effectiveness On-premise vs. Cloud)
Details at: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
Open questions:
is BASH good enough?
PROs CONs and Alternatives
 Simple and Fast
 Well known
 (basics at least)
 Easy to hack
 Most of the work
requires running sys
commands
 Custom implementation
problems
 Missing some systems
 Too simple, missing:
 objects, inheritance,
types, data structures,
testing
 Python? Perl?
 Puppet? Ansible?
 We’ll stick to bash for
now..
 What’s missing for
incubating in Apache?
More info:
 ALOJA Benchmarking platform and online repository
 http://aloja.bsc.es
 Benchmarking Big Data by Nicolas Poggi
 http://www.slideshare.net/ni_po/benchmarking-hadoop
 Big Data Benchmarking Community (BDBC) mailing list
 (~200 members from ~80organizations)
 http://clds.sdsc.edu/bdbc/community
 Workshop Big Data Benchmarking (WBDB)
 Next: http://clds.sdsc.edu/wbdb2015.ca
 SPEC Research Big Data working group
 http://research.spec.org/working-groups/big-data-working-group.html
 Slides and video:
 Michael Frank on Big Data benchmarking
 http://www.tele-task.de/archive/podcast/20430/
 Tilmann Rabl BigData BenchmarkingTutorial
 http://www.slideshare.net/tilmann_rabl/ieee2014-tutorialbarurabl
@BDOOP_BCN
More info: http://aloja.bsc.es
or join BDOOP group
http://www.meetup.com/Barcelona-BigData-Perfomance-and-
Operations
Oct 06, 2015

More Related Content

What's hot

Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
Venkata Naga Ravi
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
Nicolas Poggi
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
Nicolas Poggi
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Ceph Community
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
DataWorks Summit
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
DataWorks Summit
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
Nicolas Poggi
 
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDKBig Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Principled Technologies
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
Yahoo Developer Network
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
 

What's hot (20)

Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
 
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDKBig Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 

Viewers also liked

EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators
kehali Haileselassie
 
The avanti group sharp turn for electronics company
The avanti group sharp turn for electronics companyThe avanti group sharp turn for electronics company
The avanti group sharp turn for electronics companyApplecherr McDougal
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
Sri Utanti
 
스마트큐 발표자료 김재우
스마트큐 발표자료 김재우스마트큐 발표자료 김재우
스마트큐 발표자료 김재우
JaeWoo Kim
 
Trainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers TestimoniesTrainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers Testimonies
TrainerDavid
 
Texture powerpoint final
Texture powerpoint finalTexture powerpoint final
Texture powerpoint finalkphan22
 
Keeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeKeeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  Probe
Eddyfi
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
Sri Utanti
 
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array ProbeInspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Eddyfi
 
Inspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component ManufacturingInspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component Manufacturing
Eddyfi
 
Factors affecting lls usage
Factors affecting lls usageFactors affecting lls usage
Factors affecting lls usage
Evelyn Estrella
 
Defect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine WheelsDefect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine Wheels
Eddyfi
 
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeState-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
Eddyfi
 
Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9
kehali Haileselassie
 
High-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel TubingHigh-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel Tubing
Eddyfi
 
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsTwisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Eddyfi
 
Assigment 6
Assigment 6Assigment 6
Assigment 6
fuzuli41
 
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeDetecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Eddyfi
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performance
Nicolas Poggi
 
JLL Electronics Treadmills Magzine
JLL Electronics Treadmills MagzineJLL Electronics Treadmills Magzine
JLL Electronics Treadmills Magzine
JLL Fitness
 

Viewers also liked (20)

EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators
 
The avanti group sharp turn for electronics company
The avanti group sharp turn for electronics companyThe avanti group sharp turn for electronics company
The avanti group sharp turn for electronics company
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
 
스마트큐 발표자료 김재우
스마트큐 발표자료 김재우스마트큐 발표자료 김재우
스마트큐 발표자료 김재우
 
Trainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers TestimoniesTrainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers Testimonies
 
Texture powerpoint final
Texture powerpoint finalTexture powerpoint final
Texture powerpoint final
 
Keeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeKeeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  Probe
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
 
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array ProbeInspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
 
Inspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component ManufacturingInspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component Manufacturing
 
Factors affecting lls usage
Factors affecting lls usageFactors affecting lls usage
Factors affecting lls usage
 
Defect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine WheelsDefect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine Wheels
 
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeState-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
 
Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9
 
High-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel TubingHigh-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel Tubing
 
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsTwisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
 
Assigment 6
Assigment 6Assigment 6
Assigment 6
 
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeDetecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performance
 
JLL Electronics Treadmills Magzine
JLL Electronics Treadmills MagzineJLL Electronics Treadmills Magzine
JLL Electronics Treadmills Magzine
 

Similar to sudoers: Benchmarking Hadoop with ALOJA

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
DataWorks Summit/Hadoop Summit
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Cisco DevNet
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
Paula Koziol
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Databricks
 

Similar to sudoers: Benchmarking Hadoop with ALOJA (20)

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 

sudoers: Benchmarking Hadoop with ALOJA

  • 1. Benchmarking Hadoop with ALOJA Oct 6, 2015 by Nicolas Poggi @ni_po sudoers Barcelona:
  • 2. About Nicolas Poggi @ni_po Work: Education: Community:
  • 3. Agenda  Intro on Hadoop  Current scenario and problematic  ALOJA project  Open source tools  Benchmarking DEMO  Results  DEMO results online  Open questions and comments
  • 4. Intro: Hadoop design and ecosystem
  • 5. Hadoop design  Hadoop designed to solve complex data  Structured and non structured  With [close to] linear scalability  Simplifying the programming model  From MPI, OpenMP, CUDA, …  Operates as a blackbox for data analysts Image source: Hadoop, the definitive guide
  • 6. Hadoop parameters  > 100+ tunable parameters  mapred.map/reduce.tasks.speculative.execution  obscure and interrelated  io.sort.mb 100 (300)  io.sort.record.percent 5% (15%)  io.sort.spill.percent 80% (95 – 100%)  Number of Mappers and Reducers  Rule of thumb 0.5 - 2 per CPU core
  • 7. Hadoop stack for tuning Image source: Intel® Distribution for Apache Hadoop
  • 8. Hadoop highly-scalable but…  Not a high-performance solution!  Requires  Design,  Clusters, topology clusters  Setup,  OS, Hadoop config  and tuning required  Iterative approach  Time consuming  And extensive benchmarking!
  • 9. Hadoop ecosystem  Large and spread  Dominated by big players  Custom patches  Default values not ideal  Product claims  Cloud vs. On-premise  IaaS  PaaS  EMR, HDInsight  Needs standardization and auditing! DATA
  • 11. Too many choices? Remote volumes - - Rotational HDDs JBODs Large VMs Small VMs GbEthernet InfiniBand RAID Cost Performance On-Premise Cloud And where is my system configurationpositionedon each of these axes? Highavailability Replication + +
  • 12. Project ALOJA  Open initiative to produce mechanisms for an  automated characterization of cost-effectiveness  of Big Data deployments  Results from of a growing need of the community to understand job execution details and create transparency  Explore different configuration deployment options and their tradeoffs  Both software and hardware  Cloud services and on-premise  Seeks to provide knowledge, tools, and an online service  to with which users make better informed decisions  reduce the TCO for their Big Data infrastructures  Guide the future development and deployment of Big Data clusters and applications
  • 13. Challenges, options, and implementation
  • 14. Challenges (circa end 2013)  Test different clusters architectures  On-premise  Commodity, high-end, appliance, low-power  Cloud IaaS  32 different VMs in Azure, similar in other providers  Cloud PaaS  HDInsight, EMR, CloudBigData  Different access level  Full admin, user-only, request-to-install, everything ready, queuing systems (SGE)  Different versions  Hadoop, JVM, Spark, Hive, etc…  Dev environments and testing  Big Data usually requires a cluster to develop and test
  • 15. Benchmarking vs. Production envs  Need to compare different executions  Not how the systems are doing now  This is the main diff with prod products  Dada does not change (non-OLTP)  Temporary data for benchmarks vs. Important data  Fast iteration vs. Reliability  Iterates configurations vs. fixed config  Many fast, experimental changes  Security can be relaxed  Management for Hadoop  Vendor lock-in  Lack of systems support(azure, on-prem, low-power)  Hadoop is our use case, not the only one  Leave no traces on the benchmarked system
  • 16. Available options: (circa end 2013)  Deployment  jclouds  foreman  Puppet  Ambari  Config and deploy  Ambari (hadoop only)  Use Configuration Management (CM)  Puppet, chef, ansible…  Monitoring  Ganglia, Zabbix  Amabari  Cloudera Manager  Kibana, GraphD…  Problems  All systems thoughfor PROD  Not for comparison  No Azure support  Many different packages  No one-fits-all solution  Solution  Custom implementation  Based in simple components  Wrapping commands
  • 17. ALOJA Platform main components 2 Online Repository •Explore results •Execution details •Cluster details •Costs •Data sharing 3 Web Analytics •Data views and evaluations •Aggregates •Abstracted Metrics •Job characterization •Machine Learning •Predictions and clustering 1 Big Data Benchmarking •Deploy & Provision •Conf Management •Parameter selection & Queuing •Perf counters •Low-level instrumentation •App logs 17 NGINX, PHP, MySQL BASH, Unix tools, CLIs R, SQL, JS
  • 18. Workflow in ALOJA Cluster(s) definition • VM sizes • # nodes • OS, disks • Capabilities Execution plan • Start cluster • Exec Benchmarks • Gather results • Cleanup Import data • Convert perf metric • Parse logs • Import into DB Evaluate data • Data views in Vagrant VM • Or http://hadoop.bsc.es PA and KD •Predictive Analytics •Knowledge Discovery Historic Repo (in progress)
  • 19. Cluster and node definitions Clusters (Azure example) Node (Web in Rackspace) #load AZURE defaults source "$CONF_DIR/azure_defaults.conf" clusterName="al-08" numberOfNodes="8" vmSize=“Large” #details vmCores="4" vmRAM="7" #in GB #costs clusterCostHour="1.584"#0.176 * 9 clusterType="IaaS" clusterDescription="A3 type VMs" #load node defaults source “$CONF_DIR/node_defaults.conf" defaultProvider="rackspace" vm_name="aloja-web" vmSize='io1-30' attachedVolumes="2" diskSize="1023" # Node roles (install functions) extraLocalCommands=" vm_install_webserver; vm_install_repo 'provider/rackspace'; install_ganglia_gmond; config_ganglia_gmond 'aloja-web-rackspace' 'aloja- web'; install_percona /scratch/attached/2/mysql;"
  • 20. Commands and providers Provisioning commands Providers  Connect  Node and Cluster  Uses SSH proxies automatically  Deploy  Start, Stop  Delete  Nodes and clusters  On-premise  Custom settings for clusters  Multiple disk types  Different architectures  Cloud IaaS  Azure, OpenStack, Rackspace, AWS (testing)  Cloud PaaS  HDInsight, CloudBigData, EMR soon Code at: https://github.com/Aloja/aloja/tree/master/aloja-deploy
  • 21. Running benchmarks in ALOJA  Example of submitting a job to run:  https://github.com/Aloja/aloja/blob/master/aloja-bench/run_benchs.sh  To queue jobs and control results:  https://github.com/Aloja/aloja/blob/master/shell/exeq.sh
  • 23. ALOJA Online Benchmark Repository  Entry point for explorethe results collected from the executions  Index of executions  Quick glance of executions  Searchable,Sortable  Execution details  Performance chartsandhistograms  Hadoopcounters  Jobsand taskdetails  Data management of benchmark executions  Data importing from different clusters  Execution validation  Data management and backup  Cluster definitions  Cluster capabilities (resources)  Cluster costs  Sharing results  Download executions  Add external executions  Documentation and References  Papers, links, and feature documentation Availableat: http://aloja.bsc.es
  • 24. Impact of SW configurations in Speedup (4 node clusters) Number of mappers Compression algorithm No comp. ZLIB BZIP2 snappy 4m 6m 8m 10m Speedup (higher is better) Results using: http://hadoop.bsc.es/configimprovement Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
  • 25. Impact of HW configurationsin Speedup Disks and Network Cloud remote volumes Local only 1 Remote 2 Remotes 3 Remotes 3 Remotes /tmp local 2 Remotes /tmp local 1 Remotes /tmp local HDD-ETH HDD-IB SSD-ETH SDD-IB Speedup (higher is better) Results using: http://hadoop.bsc.es/configimprovement Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
  • 26. Speedup: all disk configurationsSSD vs JBOD  For DFSIOEread, DFSIOEwrite, and Terasort URL: http://hadoop.bsc.es/configimprovement?datefrom=&dateto=&benchs%5B%5D=dfsioe_read&benchs%5B%5D=dfsioe_write&benchs%5B%5D=terasort&id_clusters%5B%5D=21&nets%5B%5D=None&disks%5B%5D=HD2&disks%5B%5D=H D3&disks%5B%5D=HD4&disks%5B%5D=HD5&disks%5B%5D=HDD&disks%5B%5D=HS5&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RL4&disks%5B%5D=RL5&disks%5B%5D=RL6&disks%5B%5D=RR1&disks% 5B%5D=SS2&disks%5B%5D=SSD&mapss%5B%5D=None&comps%5B%5D=None&replications%5B%5D=None&blk_sizes%5B%5D=None&iosfs%5B%5D=None&iofilebufs%5B%5D=None&datanodess%5B%5D=None&bench_types%5B%5D=H DI&bench_types%5B%5D=HiBench&vm_sizes%5B%5D=None&vm_coress%5B%5D=None&vm_RAMs%5B%5D=None&hadoop_versions%5B%5D=None&types%5B%5D=None&filters%5B%5D=valid&filters%5B%5D=filters&allunchecked= 2 SSDs 5 SATA 1 SSD /tmp 1 SSD 1 SATA 2 SATA 3 SATA 4 SATA 5 SATA Higherisbetter Fastest config Highcapacity and fast Highcapacity but slow
  • 27. Speedup by disk configuration in the Cloud (higher is better) URL http://104.130.159.92/configimprovement?benchs%5B%5D=terasort&disks%5B%5D=HDD&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RR1&disks%5B%5D=RR2&disk s%5B%5D=RR3&disks%5B%5D=RR4&disks%5B%5D=RR5&disks%5B%5D=RR6&disks%5B%5D=RS1&disks%5B%5D=RS6&disks%5B%5D=SSD&bench_types%5B%5D=HiBench&filters%5B%5D=valid&filt ers%5B%5D=filters&allunchecked=&selected-groups=disk&datefrom=&dateto=&minexetime=150&maxexetime=1500 1-6 remotes 1 and 6 remotes with /tmp on SSD SSD only Higherisbetter
  • 28. VM Size comparison(Azure) Lower is better
  • 29. Preview: Cost/Performance Scalability  This shows a sample of a new screen (with sample data) to find the most cost- effective cluster size  X axis number of datanodes (cluster size  Left Y Execution time (lower is better)  Right Y Execution cost Execution time Execution cost Recommendedsize
  • 30. InfiniBand + SDD (LOCAL) GbE SDD + (LOCAL) CLOUD (local disk/tmpand HDFS) CLOUD (/tmpinLocal Disk, HDFSin Blob storage 1-3 devices) CLOUD (/tmpandHDFSin Blob storage 1-3 devices) InfiniBand + SATA disks (LOCAL) GbE+ SATA disks (LOCAL) Price Performance Cost-effectiveness On-premise vs. Cloud) Details at: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
  • 31. Open questions: is BASH good enough? PROs CONs and Alternatives  Simple and Fast  Well known  (basics at least)  Easy to hack  Most of the work requires running sys commands  Custom implementation problems  Missing some systems  Too simple, missing:  objects, inheritance, types, data structures, testing  Python? Perl?  Puppet? Ansible?  We’ll stick to bash for now..  What’s missing for incubating in Apache?
  • 32. More info:  ALOJA Benchmarking platform and online repository  http://aloja.bsc.es  Benchmarking Big Data by Nicolas Poggi  http://www.slideshare.net/ni_po/benchmarking-hadoop  Big Data Benchmarking Community (BDBC) mailing list  (~200 members from ~80organizations)  http://clds.sdsc.edu/bdbc/community  Workshop Big Data Benchmarking (WBDB)  Next: http://clds.sdsc.edu/wbdb2015.ca  SPEC Research Big Data working group  http://research.spec.org/working-groups/big-data-working-group.html  Slides and video:  Michael Frank on Big Data benchmarking  http://www.tele-task.de/archive/podcast/20430/  Tilmann Rabl BigData BenchmarkingTutorial  http://www.slideshare.net/tilmann_rabl/ieee2014-tutorialbarurabl
  • 33. @BDOOP_BCN More info: http://aloja.bsc.es or join BDOOP group http://www.meetup.com/Barcelona-BigData-Perfomance-and- Operations Oct 06, 2015