0
BigDataBench: Benchmarking
Big Data Systems

http://prof.ict.ac.cn/jfzhan

INSTITUTE OF COMPUTING TECHNOLOGY

1

Jianfeng ...
Why Big Data Benchmarking?

2

Measuring big data architecture and
systems quantitatively
2/
What is BigDataBench?


An open source project on big data
benchmarking:
•

3/

http://prof.ict.ac.cn/BigDataBench/

•

6...
4/

Comparison of Big Data Benchmarking Efforts

4/
5/

Possible Users
Systems
OS for big data
File systems for big data
…………………………..

Architecture

Data
management

Processo...
Research Publications


Characterizing data analysis workloads in data
centers. Zhen Jia, Lei Wang, Jianfeng Zhan,
Lixing...
Outline

7/

1

2
3

Benchmarking Methodology and Decision

Case Study

3

How to Use

5
4

Future Work
8/

BigDataBench Methodology

4V of Big Data

8/

BigDataBench
Methodology (Cont’)

9/

Represent
ative Data
Sets
Investigate
Typical
Application
Domains

Data Types
Structured
Semi-s...
10/

Methodology (Cont’)
4V of Big Data

System and architecture
characteristics

10/

BigDataBench

Similarity
analysis
Top Sites on the Web

More details in http://www.alexa.com/topsites/global;0

Search Engine, Social Network and Electronic...
12/

12/

and
atte
rep
nti
res
ons
ent
to
ativ
diff
• Inc
e
ere
lud
app
nt
e
lica
app
diff
tio
lica
ere
n
tio
nt
sce
n
dat...
13/

19 Chosen Workloads
Micro Benchmarks
Basic Datastore
Operations
Relational Queries
Application
Scenarios
Search engin...
Data Generation Tools


Data Sources


Text, Graph and Table
• Six real raw data


14/

Synthetics Data


Scale
• From...
15/

Naïve Text generator
machine
evaluate
big
system
data
mining
architecture

select word randomly

CPU

cpu

memory
ben...
Improved Text generator

16/

topic2

topic1

select topic randomly

machine
evaluate
big
CPU
data
mining
architecture

CP...
Outline

17/

1

2
3

Benchmarking Methodology and Decision

Case Study

3
5
4
17/

How to Use

Future Work
BigDataBench Case Study

18/

Performance evaluation and Diagnosis
SJTU, and XJTU

Workload
Characterization

Evaluating B...
19/

19/

Testbed
20/

Workloads Analyzed

http://prof.ict.ac.cn/BigDataBench
Floating point operation intensity
Data Analytics

Services

21

The total number of (floating point or integer) instructi...
Instruction Breakdown
Data Analytics

Services

 Less floating point operations
22/


More Integer operations
23/

Ratio of Integer to Floating Point
Operations
Data Analytics




Services

The average of big data workloads is 100...
Integer operation intensity
Data Analytics

Services

The average integer operation intensity of big data
24/ workloads is...
Cache Behaviors
Data Analytics

Services

Big data workloads have high L1I misses than HPC workloads
 Data analysis workl...
TLB Behaviors
14
data analysis

5
service

ITLB misses of big data workloads are higher than HPC workloads.
 DTLB misses ...
BigDataBench Case Study

27/

Performance evaluation and Diagnosis
SJTU, and XJTU

Big Data workload
Characterization

Eva...
Evaluating Big Data Hardware Systems

28/
Experimental Platforms
Xeon (Common processor)
Atom ( Low power processor)
Tilera (Many

Brief Comparison
Basic Informa...
Experimental Platforms
Hadoop Cluster
Information

Xeon VS Atom

Xeon VS Tilera

[ 1 Xeon master+7
Comprison
[1 Xeon maste...
Benchmark Selection
BigDataBench 1.0
Application

Characteristics

Sort

O(n*log2n)

Integer comparison

WordCount

O(n)
...
Metrics
Performance: Data processed per second
(DPS)
Energy Efficiency: Application Performance
Power Usage Effectivenes...
Xeon VS Atom – DPJ

33/
Xeon VS Tilera – DPJ

34/
Reference
Jing Quan, University of Science and Technology of China, Yingjie
Shi, Chinese Academy of Sciences, Ming Zhao, F...
Outline

36/

1

2
3

Benchmarking Methodology and Decision

Case Study

3

How to Use

5
4

Future Work
BigDataBench Class


For Architecture




For OS



37/

19 among 19
19 among 19

For Runtime environment (Hadoop)

...
BigDataBench Class: data sources


Text related


6 of 19 workloads
•Sort, Grep, WordCount, Index, Collaborative Filteri...
BigDataBench Class: Application Types


Online Services


6 of 19 workloads
• Read, Write, Scan, Nutch server, Olio Serv...
BigDataBench Class: Application Domains


Search engine related:


Basic Operations + Search Engine

7 of 19 workloads
•...
Outline

41/

1

2
3

Benchmarking Methodology and Decision

Case Study

3

How to Use

5
4

Future Work
Near Future Work


Multi-media data



Deep learning workloads

42/




42/

HPC
Refine BigDataBench
Related Resources


BigDataBench project




BPOE workshop


43/

http://prof.ict.ac.cn/BigDataBench





http://pro...
BPOE-4 SC
Christos Kozyrakis, Stanford
 Xiaofang Zhou, University of Queensland
 Dhabaleswar K Panda, Ohio State Univers...
45/

THANKS
Upcoming SlideShare
Loading in...5
×

詹剑锋:Big databench—benchmarking big data systems

168

Published on

BDTC 2013 Beijing China

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
168
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "詹剑锋:Big databench—benchmarking big data systems"

  1. 1. BigDataBench: Benchmarking Big Data Systems http://prof.ict.ac.cn/jfzhan INSTITUTE OF COMPUTING TECHNOLOGY 1 Jianfeng Zhan Computer Systems Research Center, ICT, CAS CCF Big Data Technology Conference 2013-12-06
  2. 2. Why Big Data Benchmarking? 2 Measuring big data architecture and systems quantitatively 2/
  3. 3. What is BigDataBench?  An open source project on big data benchmarking: • 3/ http://prof.ict.ac.cn/BigDataBench/ • 6 real-world data sets and 19 workloads – • 4V characteristics – 3/ Extended in near future Volume, Variety, Velocity, and Veracity
  4. 4. 4/ Comparison of Big Data Benchmarking Efforts 4/
  5. 5. 5/ Possible Users Systems OS for big data File systems for big data ………………………….. Architecture Data management Processor Memory Networks ………….. BigDataBench Performance optimization Co-design 5/ ……..... Distributed systems Scheduling Programming systems
  6. 6. Research Publications  Characterizing data analysis workloads in data centers. Zhen Jia, Lei Wang, Jianfeng Zhan, Lixing Zhang, and Chunjie Luo. IISWC 2013 Best paper award 6/  6/ BigDataBench: a Big Data Benchmark Suite from Internet Services. Lei Wang, Jianfeng Zhan, et al. HPCA 2014, Industry Session.
  7. 7. Outline 7/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 How to Use 5 4 Future Work
  8. 8. 8/ BigDataBench Methodology 4V of Big Data 8/ BigDataBench
  9. 9. Methodology (Cont’) 9/ Represent ative Data Sets Investigate Typical Application Domains Data Types Structured Semi-structured Unstructured Data Sources Text data Graph data Table data Extended … Big Data Sets Preserving 4V data generation tool preserving data characteristics Diverse Worklo ads Application Types Basic & Important Operations and Algorithms Extended… Offline analytics Realtime analytics Online services Represent Software Stack Extended… BigDataBench Big Data Workloads
  10. 10. 10/ Methodology (Cont’) 4V of Big Data System and architecture characteristics 10/ BigDataBench Similarity analysis
  11. 11. Top Sites on the Web More details in http://www.alexa.com/topsites/global;0 Search Engine, Social Network and Electronic Commerce hold 80% page views of all the 11/ Internet service.
  12. 12. 12/ 12/ and atte rep nti res ons ent to ativ diff • Inc e ere lud app nt e lica app diff tio lica ere n tio nt sce n dat nar typ a ios es: • •sou Co Se onl arc ver ine rce ser rep sh En vic •res Te gin e, xt ent e, rea dat Elativ a, tim eco Gr m e ap soft me ana h rce, lyti war dat cs, eSo a, cia off Ta sta l lin ble Ne e cks dat Workloads Chosen tw ana a ork lyti cs
  13. 13. 13/ 19 Chosen Workloads Micro Benchmarks Basic Datastore Operations Relational Queries Application Scenarios Search engines Social networks E-commerce system 13/
  14. 14. Data Generation Tools  Data Sources  Text, Graph and Table • Six real raw data  14/ Synthetics Data  Scale • From GB to PB  Features • Preserve characteristics of real-world data 14/
  15. 15. 15/ Naïve Text generator machine evaluate big system data mining architecture select word randomly CPU cpu memory benchmarking learning words documents following multinomial distribution Only modeling on the word level; 15/
  16. 16. Improved Text generator 16/ topic2 topic1 select topic randomly machine evaluate big CPU data mining architecture CPU select word randomly benchmarking topic3 memory system learning topics following multinomial distribution words following multinomial distribution under topic2 Modeling on the both topic and word level 16/ document
  17. 17. Outline 17/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 5 4 17/ How to Use Future Work
  18. 18. BigDataBench Case Study 18/ Performance evaluation and Diagnosis SJTU, and XJTU Workload Characterization Evaluating Big Data Hardware Systems ICT, CAS SIAT, CAS USTC, and Florida International University BigDataBench Networks for big data OSU Energy Efficiency of Big Data Systems CNCERT http://prof.ict.ac.cn/BigDataBench/#users 18/
  19. 19. 19/ 19/ Testbed
  20. 20. 20/ Workloads Analyzed http://prof.ict.ac.cn/BigDataBench
  21. 21. Floating point operation intensity Data Analytics Services 21 The total number of (floating point or integer) instructions divided by the total number of memory access bytes in a run of workload. Very low floating point operation intensities ( 0.009), two orders of magnitude lower than the theory number of state-of-practice CPU (1.8) 21/
  22. 22. Instruction Breakdown Data Analytics Services  Less floating point operations 22/  More Integer operations
  23. 23. 23/ Ratio of Integer to Floating Point Operations Data Analytics   Services The average of big data workloads is 100 Parsec, HPCC and SPECFP (1.4, 1.0, 0.67)
  24. 24. Integer operation intensity Data Analytics Services The average integer operation intensity of big data 24/ workloads is 0.49  That of PARSEC, HPCC, SPECFP is 1.5, 0.38, 0.23 
  25. 25. Cache Behaviors Data Analytics Services Big data workloads have high L1I misses than HPC workloads  Data analysis workloads have better L2 cache behaviors than service workloads 25/ except BFS   Big data workloads have good L3 behaviors
  26. 26. TLB Behaviors 14 data analysis 5 service ITLB misses of big data workloads are higher than HPC workloads.  DTLB misses of big data workloads are higher than HPC workloads. 26/  26/
  27. 27. BigDataBench Case Study 27/ Performance evaluation and Diagnosis SJTU, and XJTU Big Data workload Characterization Evaluating Big Data Hardware Systems ICT, CAS SIAT, CAS USTC, and Florida International University BigDataBench Networks for big data OSU Energy Efficiency of Big Data Systems CNCERT http://prof.ict.ac.cn/BigDataBench/#users
  28. 28. Evaluating Big Data Hardware Systems 28/
  29. 29. Experimental Platforms Xeon (Common processor) Atom ( Low power processor) Tilera (Many Brief Comparison Basic Information core processor) CPU Type Intel Atom D510 Tilera TilePro36 CPU Core 4 cores @ 1.6GHz 2 cores @ 1.66GHz 36 cores @ 500MHz L1 I/D Cache 32KB 24KB 16KB/8KB L2 Cache 29/ Intel Xeon E5310 4096KB 512KB 64KB
  30. 30. Experimental Platforms Hadoop Cluster Information Xeon VS Atom Xeon VS Tilera [ 1 Xeon master+7 Comprison [1 Xeon master+7 Xeon Xeon slaves ] VS [ 1 (the same logical slaves] VS [ 1 Xeon Atom master +7 Atom core number) master +1 Tilera slave] slaves] Hadoop setting 30/ Following the guidance on Hadoop official website
  31. 31. Benchmark Selection BigDataBench 1.0 Application Characteristics Sort O(n*log2n) Integer comparison WordCount O(n) Integer comparison and calculation Grep O(n) String comparison Naïve Bayes O(m*n) Floating-point computation SVM 31/ Time Complexity O(n3) Floating-point computation
  32. 32. Metrics Performance: Data processed per second (DPS) Energy Efficiency: Application Performance Power Usage Effectiveness(DPJ) 32/
  33. 33. Xeon VS Atom – DPJ 33/
  34. 34. Xeon VS Tilera – DPJ 34/
  35. 35. Reference Jing Quan, University of Science and Technology of China, Yingjie Shi, Chinese Academy of Sciences, Ming Zhao, Florida International University, Wei Yang, University of Science and Technology of China. ”The Implications from Benchmarking Three Different Data Center Platforms” The First Workshop on Big Data Benchmarks, Performance Optimization, and Emerging hardware (BPOE 2013) in conjunction with 2013 IEEE International Conference on Big Data (IEEE Big Data 2013) 35/
  36. 36. Outline 36/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 How to Use 5 4 Future Work
  37. 37. BigDataBench Class  For Architecture   For OS   37/ 19 among 19 19 among 19 For Runtime environment (Hadoop)  9 of 19 workloads •Sort, Grep, WordCount, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.  For Data management  6 of 19 workloads •Read, Write, Scan, Select Query, Aggregate Query, Join Query 37/
  38. 38. BigDataBench Class: data sources  Text related  6 of 19 workloads •Sort, Grep, WordCount, Index, Collaborative Filtering and Naive Bayes  Graph related  •BFS, PageRank, Kmeans, and Connected Components 38/  4 of 19 workloads Table related  9 of 19 workloads •Read, Write, Scan, Select Query, Aggregate Query, Join Query, Nutch Server, Olio Server and Rubis Server
  39. 39. BigDataBench Class: Application Types  Online Services  6 of 19 workloads • Read, Write, Scan, Nutch server, Olio Server and Rubis server  Offline Analytics 39/  10 of 19 workloads • Sort, Grep, WordCount, BFS, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.  Realtime Analytics  3 of 19 workloads • Select Query, Aggregate Query and Join Query
  40. 40. BigDataBench Class: Application Domains  Search engine related:  Basic Operations + Search Engine 7 of 19 workloads •Sort, Grep, WordCount, BFS, PageRank, Index and Nutch Server  Social network related: Basic Cloud OLTP+ Basic Relational Query+ Social Network 40/   9 of 19 workloads •Read, Write, Scan, Select Query, Aggregate Query, Join Query, Olio Server, Kmeans and Connected Components E-commerce related: Basic Cloud OLTP+ Basic Relational Query+ Social Network  9 of 19 workloads • Read, Write, Scan, Select Query, Aggregate Query, Join Query, Rubis server, Collaborative Filtering and Naive Bayes
  41. 41. Outline 41/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 How to Use 5 4 Future Work
  42. 42. Near Future Work  Multi-media data  Deep learning workloads 42/   42/ HPC Refine BigDataBench
  43. 43. Related Resources  BigDataBench project   BPOE workshop  43/ http://prof.ict.ac.cn/BigDataBench   http://prof.ict.ac.cn/bpoe A series of workshops on Big Data Benchmarks, Performance Optimization, and Emerging Hardware BPOE-4: interaction among OS, architecture, and data management • Co-located with ASPLOS 2014
  44. 44. BPOE-4 SC Christos Kozyrakis, Stanford  Xiaofang Zhou, University of Queensland  Dhabaleswar K Panda, Ohio State University  Raghunath Nambiar, Cisco  Lizy K John, University of Texas at Austin  Xiaoyong Du, Renmin University of China 44/  H. Peter Hofstee, IBM Austin Research Laboratory  Ippokratis Pandis, IBM Almaden Research Center  Alexandros Labrinidis, University of Pittsburgh  Bill Jia, Facebook  Jianfeng Zhan, ICT, Chinese Academy of Sciences 
  45. 45. 45/ THANKS
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×