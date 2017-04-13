UsingBigBenchtocompareUsingBigBenchtocompare HiveandSparkHiveandSpark NicolasPoggi,AlejandroNicolasPoggi,AlejandroMonteroM...
Outline 1.IntrotoBSCandALOJA 2.BigBench 3.Sequen;altests 1.Datascales 4.Concurrencytests 5.Summary 2
Barcelona Supercompu.ng Center (BSC) •Spanishna*onalsupercompu*ngcenter22yearshistoryin: •ComputerArchitecture,networkinga...
ALOJA: towards cost-eﬀec2ve Big Data •Researchprojectforautoma1ngcharacteriza1onand op1miza1onofBigDataBigDatadeployments ...
BenchmarkingandBigBench
Theneedforanewbenchmarkstandard •Abenchmarkcapturesthesolu3ontoaproblemandguidedecision making •Databaserelatedbenchmarkss...
WhatisBigBench(TPCx-BB1)? •End-to-endapplica/onlevelbenchmark •resultofmanyyearsofcollabora/on •industryandacademia •Cover...
BigBenchusecasesandprocessoverview •3030businessusescasesbusinessusescasescovering: •MerchandisingMerchandising, •PricingO...
BigBenchv1.2–ReferenceImplementa7on HDFS HiveMetastore MapReduceTezSpark Y ar n HiveSparkSQL MahoutMLCustomSparkMLlibMachi...
Thecluster(I)–HDInsightPaaS 10 ModelD4v2 #Headnodes2 #Workingnodes4 #Zookeepernodes3 CPU Intel(R)Xeon(R)CPUE5- 2673v3 8x2,...
Sequen&alruns(power) Queries1-30 Averageofthreeexecu&osof100GBScaleFactor11
BigBenchworkload–powertest 12 LoadtoHive Metastore Data GenerationQuery1HDFS Hive Query2 ….Query30
PureQL 13Averageofthreeexecutionsusing100GBScaleFactor
Query12CPUbehavior 14 Te z Spark 1.6.2 Spark 2.0.2 Averageofthreeexecutionsusing100GBScaleFactor
CustomReducers 15Averageofthreeexecutionsusing100GBScaleFactor
Query2CPUbehavior 16 Te z Spark 1.6.2 Spark 2.0.2 Averageofthreeexecutionsusing100GBScaleFactor
NaturalLanguageProcessing 17Averageofthreeexecutionsusing100GBScaleFactor
Query27CPUbehavior 18 Te z Spark 1.6.2 Spark 2.0.2 Averageofthreeexecutionsusing100GBScaleFactor
MachineLearning 19Averageofthreeexecutionsusing100GBScaleFactor
Query5CPUbehavior 20 Tez+ Mahout Tez+ Spark_MLlib Averageofthreeexecutionsusing100GBScaleFactor
21 Aggregated Results Averageofthreeexecutionsusing100GBScaleFactor
Scaling from 1GB to 1TB Logscales22
Concurrencyruns(throughput) 2,4,8parallelstreams 23
BigBenchworkload–Throughputtest 24 Query15Query21 ….Query16 Query12Query18 ….Query22 Query16Query30 ….Query29 LoadData Dat...
Thecluster(II)–HDInsightPaaS 25 ModelHDInsightD4v3 #Headnodes2 #Workingnodes7 #Zookeepernodes3 CPU Intel(R)Xeon(R)CPUE5- 2...
SparkvsHive+Tezinthroughputtests 26
27
28
