Your SlideShare is downloading. ×
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Topic 1: Big Data and Warehouse-scale Computing

1,632

Published on

Cloud Computing Workshop 2013, ITU

Cloud Computing Workshop 2013, ITU

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,632
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
81
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1: Big Data and Warehouse-scale Computing Zubair Nabi zubair.nabi@itu.edu.pk April 17, 2013Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 1 / 23
  • 2. Outline1 Introduction2 Ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 2 / 23
  • 3. Outline1 Introduction2 Ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 3 / 23
  • 4. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 5. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information We now create the same amount of data every 2 days! Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 6. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information We now create the same amount of data every 2 days! By 2012, we had spawned 2.7ZB of data Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 7. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information We now create the same amount of data every 2 days! By 2012, we had spawned 2.7ZB of data Following the same trend, we will have 8ZB by 2015 Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 8. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 9. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 10. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 11. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Radically different tools needed to record, store, process, and visualize Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 12. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Radically different tools needed to record, store, process, and visualize Moving away from the desktop Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 13. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Radically different tools needed to record, store, process, and visualize Moving away from the desktop Offloaded to the “cloud” Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 14. Example: Facebook’s “Haystack” 65 billion photos Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 15. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 16. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 17. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage 1 billion new photos uploaded each week (increment of 60TB) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 18. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage 1 billion new photos uploaded each week (increment of 60TB) At peak traffic 1 million images served per second Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 19. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage 1 billion new photos uploaded each week (increment of 60TB) At peak traffic 1 million images served per second An image request is like finding a needle in a haystack Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 20. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 21. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 22. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 23. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases The amount of annual traffic flowing over the Internet is around 700EB Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 24. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases The amount of annual traffic flowing over the Internet is around 700EB Walmart handles in excess of 1 million transactions every hour (25PB in total) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 25. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases The amount of annual traffic flowing over the Internet is around 700EB Walmart handles in excess of 1 million transactions every hour (25PB in total) 400 million Tweets everyday Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 26. Outline1 Introduction2 Ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 8 / 23
  • 27. Big data ecosystem Presentation layer Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 28. Big data ecosystem Presentation layer Application layer: frameworks + storage Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 29. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 30. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 31. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Network layer (intra- and inter-data center) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 32. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Network layer (intra- and inter-data center) Physical infrastructure Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 33. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Network layer (intra- and inter-data center) Physical infrastructureCan roughly be called the “cloud” Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 34. Presentation Layer Acts as the user-facing end of the entire ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 35. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 36. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 37. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote For most web 2.0 applications, the presentation layer is a web portal Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 38. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote For most web 2.0 applications, the presentation layer is a web portal For instance, the Google search website is a presentation layer: it takes user queries, forwards them to a scatter-gather application, and presents the results to the user (within a time bound) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 39. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote For most web 2.0 applications, the presentation layer is a web portal For instance, the Google search website is a presentation layer: it takes user queries, forwards them to a scatter-gather application, and presents the results to the user (within a time bound) Made up of many technologies, such as HTTP, HTML, AJAX, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 40. Application Layer Serves as the back-end Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 41. Application Layer Serves as the back-end Either computes a result for the user, or fetches a previously computed result or content from storage Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 42. Application Layer Serves as the back-end Either computes a result for the user, or fetches a previously computed result or content from storage The execution is predominantly distributed Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 43. Application Layer Serves as the back-end Either computes a result for the user, or fetches a previously computed result or content from storage The execution is predominantly distributed The computation itself might entail cross-disciplinary (across sciences) technology Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 44. Computation Can be a custom solution, such as a scatter-gather application Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
  • 45. Computation Can be a custom solution, such as a scatter-gather application Might also be an existing data intensive computation framework, such as MapReduce, Dryad, MPI, etc. or a stream processing system, such as Storm, S4, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
  • 46. Computation Can be a custom solution, such as a scatter-gather application Might also be an existing data intensive computation framework, such as MapReduce, Dryad, MPI, etc. or a stream processing system, such as Storm, S4, etc. Analytics engines: R, Matlab, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
  • 47. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 48. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 49. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 50. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 51. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Key-value stores: Dynamo, Cassandra, Voldemort, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 52. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Key-value stores: Dynamo, Cassandra, Voldemort, etc. Tables: BigTable, HBase, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 53. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Key-value stores: Dynamo, Cassandra, Voldemort, etc. Tables: BigTable, HBase, etc. 3 NewSQL: The best of both worlds: Spanner, VoltDB, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 54. Operating System Layer Consists of the traditional operating system stack with the usual suspects, Windows, variants of *nix, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 14 / 23
  • 55. Operating System Layer Consists of the traditional operating system stack with the usual suspects, Windows, variants of *nix, etc. Alternatives exist though. Specialized for the cloud or multicore systems Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 14 / 23
  • 56. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 57. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 58. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 59. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Easier to dedicate CPU and memory but not the network Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 60. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Easier to dedicate CPU and memory but not the network Allocation either in the form of VMs or containers Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 61. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Easier to dedicate CPU and memory but not the network Allocation either in the form of VMs or containers VMWare, Xen, LXC, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 62. Network Layer Connects the entire ecosystem together Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 63. Network Layer Connects the entire ecosystem together Consists of the entire protocol stack Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 64. Network Layer Connects the entire ecosystem together Consists of the entire protocol stack Tenants assigned to Virtual LANs Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 65. Network Layer Connects the entire ecosystem together Consists of the entire protocol stack Tenants assigned to Virtual LANs Multiple protocols available across the stack Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 66. Physical Infrastructure Layer The physical hardware itself Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 67. Physical Infrastructure Layer The physical hardware itself Servers and network elements Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 68. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 69. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Servers are connected in various topologies using different interconnects Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 70. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Servers are connected in various topologies using different interconnects Dubbed as datacenters Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 71. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Servers are connected in various topologies using different interconnects Dubbed as datacenters “We must treat the datacenter itself as one massive warehouse-scale computer” – Luiz André Barroso and Urs Hölzle Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 72. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 18 / 23
  • 73. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 19 / 23
  • 74. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 20 / 23
  • 75. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 21 / 23
  • 76. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 22 / 23
  • 77. Example: GoogleAll that infrastructure enables Google to: Index 20 billion web pages a day Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 78. Example: GoogleAll that infrastructure enables Google to: Index 20 billion web pages a day Handle in excess of 3 billion search queries daily Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 79. Example: GoogleAll that infrastructure enables Google to: Index 20 billion web pages a day Handle in excess of 3 billion search queries daily Provide email storage to 425 million Gmail users Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 80. Example: GoogleAll that infrastructure enables Google to: Index 20 billion web pages a day Handle in excess of 3 billion search queries daily Provide email storage to 425 million Gmail users Serve 3 billion YouTube videos a day Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 81. 1 Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a needle in Haystack: Facebook’s photo storage. In Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI’10). USENIX Association, Berkeley, CA, USA.2 Urs Hoelzle and Luiz Andre Barroso. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st ed.). Morgan and Claypool Publishers.Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 24 / 23

×