SlideShare a Scribd company logo
1 of 81
Download to read offline
1: Big Data and Warehouse-scale Computing

                                        Zubair Nabi

                              zubair.nabi@itu.edu.pk


                                      April 17, 2013




Zubair Nabi            1: Big Data and Warehouse-scale Computing   April 17, 2013   1 / 23
Outline




1    Introduction




2    Ecosystem




    Zubair Nabi     1: Big Data and Warehouse-scale Computing   April 17, 2013   2 / 23
Outline




1    Introduction




2    Ecosystem




    Zubair Nabi     1: Big Data and Warehouse-scale Computing   April 17, 2013   3 / 23
From the very beginning




      From the dawn civilization to the year 2003, we created 5EB of
      information




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   4 / 23
From the very beginning




      From the dawn civilization to the year 2003, we created 5EB of
      information
      We now create the same amount of data every 2 days!




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   4 / 23
From the very beginning




      From the dawn civilization to the year 2003, we created 5EB of
      information
      We now create the same amount of data every 2 days!
      By 2012, we had spawned 2.7ZB of data




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   4 / 23
From the very beginning




      From the dawn civilization to the year 2003, we created 5EB of
      information
      We now create the same amount of data every 2 days!
      By 2012, we had spawned 2.7ZB of data
      Following the same trend, we will have 8ZB by 2015




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   4 / 23
Big Data



      Large datasets whose processing and storage requirements exceed all
      traditional paradigms and infrastructure




  Zubair Nabi         1: Big Data and Warehouse-scale Computing   April 17, 2013   5 / 23
Big Data



      Large datasets whose processing and storage requirements exceed all
      traditional paradigms and infrastructure
                On the order of exabytes and beyond




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   5 / 23
Big Data



      Large datasets whose processing and storage requirements exceed all
      traditional paradigms and infrastructure
                On the order of exabytes and beyond
      Generated by web 2.0 applications, sensor networks, scientific
      applications, financial applications, etc.




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   5 / 23
Big Data



      Large datasets whose processing and storage requirements exceed all
      traditional paradigms and infrastructure
                On the order of exabytes and beyond
      Generated by web 2.0 applications, sensor networks, scientific
      applications, financial applications, etc.
      Radically different tools needed to record, store, process, and visualize




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   5 / 23
Big Data



      Large datasets whose processing and storage requirements exceed all
      traditional paradigms and infrastructure
                On the order of exabytes and beyond
      Generated by web 2.0 applications, sensor networks, scientific
      applications, financial applications, etc.
      Radically different tools needed to record, store, process, and visualize
      Moving away from the desktop




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   5 / 23
Big Data



      Large datasets whose processing and storage requirements exceed all
      traditional paradigms and infrastructure
                On the order of exabytes and beyond
      Generated by web 2.0 applications, sensor networks, scientific
      applications, financial applications, etc.
      Radically different tools needed to record, store, process, and visualize
      Moving away from the desktop
      Offloaded to the “cloud”




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   5 / 23
Example: Facebook’s “Haystack”




      65 billion photos




  Zubair Nabi             1: Big Data and Warehouse-scale Computing   April 17, 2013   6 / 23
Example: Facebook’s “Haystack”




      65 billion photos
      4 images of different sizes stored for each photo




  Zubair Nabi             1: Big Data and Warehouse-scale Computing   April 17, 2013   6 / 23
Example: Facebook’s “Haystack”




      65 billion photos
      4 images of different sizes stored for each photo
                For a total of 260 billion images and 20PB of storage




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   6 / 23
Example: Facebook’s “Haystack”




      65 billion photos
      4 images of different sizes stored for each photo
                For a total of 260 billion images and 20PB of storage
      1 billion new photos uploaded each week (increment of 60TB)




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   6 / 23
Example: Facebook’s “Haystack”




      65 billion photos
      4 images of different sizes stored for each photo
                For a total of 260 billion images and 20PB of storage
      1 billion new photos uploaded each week (increment of 60TB)
      At peak traffic 1 million images served per second




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   6 / 23
Example: Facebook’s “Haystack”




      65 billion photos
      4 images of different sizes stored for each photo
                For a total of 260 billion images and 20PB of storage
      1 billion new photos uploaded each week (increment of 60TB)
      At peak traffic 1 million images served per second
      An image request is like finding a needle in a haystack




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   6 / 23
More examples


      The LHC at CERN generates 22PB of data annually (after throwing away
      around 99% of readings)




  Zubair Nabi         1: Big Data and Warehouse-scale Computing   April 17, 2013   7 / 23
More examples


      The LHC at CERN generates 22PB of data annually (after throwing away
      around 99% of readings)
      The Square Kilometre Array (under construction) is expected to generate
      hundreds of PB each day




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   7 / 23
More examples


      The LHC at CERN generates 22PB of data annually (after throwing away
      around 99% of readings)
      The Square Kilometre Array (under construction) is expected to generate
      hundreds of PB each day
      Farecast, a part of Bing, searches through 225 billion flight and price
      records to advise customers on their ticket purchases




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   7 / 23
More examples


      The LHC at CERN generates 22PB of data annually (after throwing away
      around 99% of readings)
      The Square Kilometre Array (under construction) is expected to generate
      hundreds of PB each day
      Farecast, a part of Bing, searches through 225 billion flight and price
      records to advise customers on their ticket purchases
      The amount of annual traffic flowing over the Internet is around 700EB




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   7 / 23
More examples


      The LHC at CERN generates 22PB of data annually (after throwing away
      around 99% of readings)
      The Square Kilometre Array (under construction) is expected to generate
      hundreds of PB each day
      Farecast, a part of Bing, searches through 225 billion flight and price
      records to advise customers on their ticket purchases
      The amount of annual traffic flowing over the Internet is around 700EB
      Walmart handles in excess of 1 million transactions every hour (25PB in
      total)




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   7 / 23
More examples


      The LHC at CERN generates 22PB of data annually (after throwing away
      around 99% of readings)
      The Square Kilometre Array (under construction) is expected to generate
      hundreds of PB each day
      Farecast, a part of Bing, searches through 225 billion flight and price
      records to advise customers on their ticket purchases
      The amount of annual traffic flowing over the Internet is around 700EB
      Walmart handles in excess of 1 million transactions every hour (25PB in
      total)
      400 million Tweets everyday




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   7 / 23
Outline




1    Introduction




2    Ecosystem




    Zubair Nabi     1: Big Data and Warehouse-scale Computing   April 17, 2013   8 / 23
Big data ecosystem




      Presentation layer




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Big data ecosystem




      Presentation layer
      Application layer: frameworks + storage




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Big data ecosystem




      Presentation layer
      Application layer: frameworks + storage
      Operating system layer




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Big data ecosystem




      Presentation layer
      Application layer: frameworks + storage
      Operating system layer
      Virtualization layer (optional)




  Zubair Nabi            1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Big data ecosystem




      Presentation layer
      Application layer: frameworks + storage
      Operating system layer
      Virtualization layer (optional)
      Network layer (intra- and inter-data center)




  Zubair Nabi            1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Big data ecosystem




      Presentation layer
      Application layer: frameworks + storage
      Operating system layer
      Virtualization layer (optional)
      Network layer (intra- and inter-data center)
      Physical infrastructure




  Zubair Nabi            1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Big data ecosystem




      Presentation layer
      Application layer: frameworks + storage
      Operating system layer
      Virtualization layer (optional)
      Network layer (intra- and inter-data center)
      Physical infrastructure
Can roughly be called the “cloud”




  Zubair Nabi            1: Big Data and Warehouse-scale Computing   April 17, 2013   9 / 23
Presentation Layer



      Acts as the user-facing end of the entire ecosystem




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   10 / 23
Presentation Layer



      Acts as the user-facing end of the entire ecosystem
      Forwards user queries to the backend (potentially the rest of the stack)




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   10 / 23
Presentation Layer



      Acts as the user-facing end of the entire ecosystem
      Forwards user queries to the backend (potentially the rest of the stack)
      Can be both local and remote




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   10 / 23
Presentation Layer



      Acts as the user-facing end of the entire ecosystem
      Forwards user queries to the backend (potentially the rest of the stack)
      Can be both local and remote
      For most web 2.0 applications, the presentation layer is a web portal




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   10 / 23
Presentation Layer



      Acts as the user-facing end of the entire ecosystem
      Forwards user queries to the backend (potentially the rest of the stack)
      Can be both local and remote
      For most web 2.0 applications, the presentation layer is a web portal
      For instance, the Google search website is a presentation layer: it takes
      user queries, forwards them to a scatter-gather application, and presents
      the results to the user (within a time bound)




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   10 / 23
Presentation Layer



      Acts as the user-facing end of the entire ecosystem
      Forwards user queries to the backend (potentially the rest of the stack)
      Can be both local and remote
      For most web 2.0 applications, the presentation layer is a web portal
      For instance, the Google search website is a presentation layer: it takes
      user queries, forwards them to a scatter-gather application, and presents
      the results to the user (within a time bound)
      Made up of many technologies, such as HTTP, HTML, AJAX, etc.




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   10 / 23
Application Layer




      Serves as the back-end




  Zubair Nabi         1: Big Data and Warehouse-scale Computing   April 17, 2013   11 / 23
Application Layer




      Serves as the back-end
      Either computes a result for the user, or fetches a previously computed
      result or content from storage




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   11 / 23
Application Layer




      Serves as the back-end
      Either computes a result for the user, or fetches a previously computed
      result or content from storage
      The execution is predominantly distributed




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   11 / 23
Application Layer




      Serves as the back-end
      Either computes a result for the user, or fetches a previously computed
      result or content from storage
      The execution is predominantly distributed
      The computation itself might entail cross-disciplinary (across sciences)
      technology




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   11 / 23
Computation




      Can be a custom solution, such as a scatter-gather application




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   12 / 23
Computation




      Can be a custom solution, such as a scatter-gather application
      Might also be an existing data intensive computation framework, such as
      MapReduce, Dryad, MPI, etc. or a stream processing system, such as
      Storm, S4, etc.




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   12 / 23
Computation




      Can be a custom solution, such as a scatter-gather application
      Might also be an existing data intensive computation framework, such as
      MapReduce, Dryad, MPI, etc. or a stream processing system, such as
      Storm, S4, etc.
      Analytics engines: R, Matlab, etc.




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   12 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)




  Zubair Nabi        1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)
 2    NoSQL: Key-value stores, document stores, graphs, tables, etc.
      (semi-structured and unstructured data)




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)
 2    NoSQL: Key-value stores, document stores, graphs, tables, etc.
      (semi-structured and unstructured data)
                Document stores: MongoDB, CouchDB, etc.




  Zubair Nabi               1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)
 2    NoSQL: Key-value stores, document stores, graphs, tables, etc.
      (semi-structured and unstructured data)
                Document stores: MongoDB, CouchDB, etc.
                Graphs: FlockDB, etc.




  Zubair Nabi               1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)
 2    NoSQL: Key-value stores, document stores, graphs, tables, etc.
      (semi-structured and unstructured data)
                Document stores: MongoDB, CouchDB, etc.
                Graphs: FlockDB, etc.
                Key-value stores: Dynamo, Cassandra, Voldemort, etc.




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)
 2    NoSQL: Key-value stores, document stores, graphs, tables, etc.
      (semi-structured and unstructured data)
                Document stores: MongoDB, CouchDB, etc.
                Graphs: FlockDB, etc.
                Key-value stores: Dynamo, Cassandra, Voldemort, etc.
                Tables: BigTable, HBase, etc.




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Storage



 1    Relational database management systems (RDBMS): MySQL, Oracle
      DB, IBM DB2, etc. (structured data)
 2    NoSQL: Key-value stores, document stores, graphs, tables, etc.
      (semi-structured and unstructured data)
                Document stores: MongoDB, CouchDB, etc.
                Graphs: FlockDB, etc.
                Key-value stores: Dynamo, Cassandra, Voldemort, etc.
                Tables: BigTable, HBase, etc.
 3    NewSQL: The best of both worlds: Spanner, VoltDB, etc.




  Zubair Nabi                1: Big Data and Warehouse-scale Computing   April 17, 2013   13 / 23
Operating System Layer




      Consists of the traditional operating system stack with the usual suspects,
      Windows, variants of *nix, etc.




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   14 / 23
Operating System Layer




      Consists of the traditional operating system stack with the usual suspects,
      Windows, variants of *nix, etc.
      Alternatives exist though. Specialized for the cloud or multicore systems




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   14 / 23
Virtualization Layer




      Allows multiple operating systems to run on top of the same physical
      hardware




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   15 / 23
Virtualization Layer




      Allows multiple operating systems to run on top of the same physical
      hardware
      Enables infrastructure sharing, isolation, and optimized utilization




  Zubair Nabi           1: Big Data and Warehouse-scale Computing    April 17, 2013   15 / 23
Virtualization Layer




      Allows multiple operating systems to run on top of the same physical
      hardware
      Enables infrastructure sharing, isolation, and optimized utilization
      Different allocation strategies possible




  Zubair Nabi           1: Big Data and Warehouse-scale Computing    April 17, 2013   15 / 23
Virtualization Layer




      Allows multiple operating systems to run on top of the same physical
      hardware
      Enables infrastructure sharing, isolation, and optimized utilization
      Different allocation strategies possible
      Easier to dedicate CPU and memory but not the network




  Zubair Nabi           1: Big Data and Warehouse-scale Computing    April 17, 2013   15 / 23
Virtualization Layer




      Allows multiple operating systems to run on top of the same physical
      hardware
      Enables infrastructure sharing, isolation, and optimized utilization
      Different allocation strategies possible
      Easier to dedicate CPU and memory but not the network
      Allocation either in the form of VMs or containers




  Zubair Nabi           1: Big Data and Warehouse-scale Computing    April 17, 2013   15 / 23
Virtualization Layer




      Allows multiple operating systems to run on top of the same physical
      hardware
      Enables infrastructure sharing, isolation, and optimized utilization
      Different allocation strategies possible
      Easier to dedicate CPU and memory but not the network
      Allocation either in the form of VMs or containers
      VMWare, Xen, LXC, etc.




  Zubair Nabi           1: Big Data and Warehouse-scale Computing    April 17, 2013   15 / 23
Network Layer




      Connects the entire ecosystem together




  Zubair Nabi         1: Big Data and Warehouse-scale Computing   April 17, 2013   16 / 23
Network Layer




      Connects the entire ecosystem together
      Consists of the entire protocol stack




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   16 / 23
Network Layer




      Connects the entire ecosystem together
      Consists of the entire protocol stack
      Tenants assigned to Virtual LANs




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   16 / 23
Network Layer




      Connects the entire ecosystem together
      Consists of the entire protocol stack
      Tenants assigned to Virtual LANs
      Multiple protocols available across the stack




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   16 / 23
Physical Infrastructure Layer




      The physical hardware itself




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   17 / 23
Physical Infrastructure Layer




      The physical hardware itself
      Servers and network elements




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   17 / 23
Physical Infrastructure Layer




      The physical hardware itself
      Servers and network elements
      Mechanism for power distribution, wiring, and cooling




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   17 / 23
Physical Infrastructure Layer




      The physical hardware itself
      Servers and network elements
      Mechanism for power distribution, wiring, and cooling
      Servers are connected in various topologies using different interconnects




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   17 / 23
Physical Infrastructure Layer




      The physical hardware itself
      Servers and network elements
      Mechanism for power distribution, wiring, and cooling
      Servers are connected in various topologies using different interconnects
      Dubbed as datacenters




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   17 / 23
Physical Infrastructure Layer




      The physical hardware itself
      Servers and network elements
      Mechanism for power distribution, wiring, and cooling
      Servers are connected in various topologies using different interconnects
      Dubbed as datacenters
      “We must treat the datacenter itself as one massive warehouse-scale
      computer” – Luiz André Barroso and Urs Hölzle




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   17 / 23
Zubair Nabi   1: Big Data and Warehouse-scale Computing   April 17, 2013   18 / 23
Zubair Nabi   1: Big Data and Warehouse-scale Computing   April 17, 2013   19 / 23
Zubair Nabi   1: Big Data and Warehouse-scale Computing   April 17, 2013   20 / 23
Zubair Nabi   1: Big Data and Warehouse-scale Computing   April 17, 2013   21 / 23
Zubair Nabi   1: Big Data and Warehouse-scale Computing   April 17, 2013   22 / 23
Example: Google




All that infrastructure enables Google to:
      Index 20 billion web pages a day




  Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   23 / 23
Example: Google




All that infrastructure enables Google to:
      Index 20 billion web pages a day
      Handle in excess of 3 billion search queries daily




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   23 / 23
Example: Google




All that infrastructure enables Google to:
      Index 20 billion web pages a day
      Handle in excess of 3 billion search queries daily
      Provide email storage to 425 million Gmail users




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   23 / 23
Example: Google




All that infrastructure enables Google to:
      Index 20 billion web pages a day
      Handle in excess of 3 billion search queries daily
      Provide email storage to 425 million Gmail users
      Serve 3 billion YouTube videos a day




  Zubair Nabi           1: Big Data and Warehouse-scale Computing   April 17, 2013   23 / 23
1   Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel.
    2010. Finding a needle in Haystack: Facebook’s photo storage. In
    Proceedings of the 9th USENIX conference on Operating systems design
    and implementation (OSDI’10). USENIX Association, Berkeley, CA, USA.
2   Urs Hoelzle and Luiz Andre Barroso. 2009. The Datacenter as a
    Computer: An Introduction to the Design of Warehouse-Scale Machines
    (1st ed.). Morgan and Claypool Publishers.




Zubair Nabi          1: Big Data and Warehouse-scale Computing   April 17, 2013   24 / 23

More Related Content

Similar to Topic 1: Big Data and Warehouse-scale Computing

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData Blueprint
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
Scale as a Competitive Advantage
Scale as a Competitive AdvantageScale as a Competitive Advantage
Scale as a Competitive AdvantageDavid Chou
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingAbzetdin Adamov
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014James Chittenden
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Chris Bizer
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
 

Similar to Topic 1: Big Data and Warehouse-scale Computing (20)

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
Big Data - What the Heck?
Big Data - What the Heck?Big Data - What the Heck?
Big Data - What the Heck?
 
What the Heck is Big Data?
What the Heck is Big Data?What the Heck is Big Data?
What the Heck is Big Data?
 
Scale as a Competitive Advantage
Scale as a Competitive AdvantageScale as a Competitive Advantage
Scale as a Competitive Advantage
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Welcome to big data
Welcome to big dataWelcome to big data
Welcome to big data
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision Making
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: VirtualizationZubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tablesZubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanZubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS HybridsZubair Nabi
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application ScriptingZubair Nabi
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingZubair Nabi
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationZubair Nabi
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud StacksZubair Nabi
 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Topic 1: Big Data and Warehouse-scale Computing

  • 1. 1: Big Data and Warehouse-scale Computing Zubair Nabi zubair.nabi@itu.edu.pk April 17, 2013 Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 1 / 23
  • 2. Outline 1 Introduction 2 Ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 2 / 23
  • 3. Outline 1 Introduction 2 Ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 3 / 23
  • 4. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 5. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information We now create the same amount of data every 2 days! Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 6. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information We now create the same amount of data every 2 days! By 2012, we had spawned 2.7ZB of data Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 7. From the very beginning From the dawn civilization to the year 2003, we created 5EB of information We now create the same amount of data every 2 days! By 2012, we had spawned 2.7ZB of data Following the same trend, we will have 8ZB by 2015 Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
  • 8. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 9. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 10. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 11. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Radically different tools needed to record, store, process, and visualize Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 12. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Radically different tools needed to record, store, process, and visualize Moving away from the desktop Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 13. Big Data Large datasets whose processing and storage requirements exceed all traditional paradigms and infrastructure On the order of exabytes and beyond Generated by web 2.0 applications, sensor networks, scientific applications, financial applications, etc. Radically different tools needed to record, store, process, and visualize Moving away from the desktop Offloaded to the “cloud” Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
  • 14. Example: Facebook’s “Haystack” 65 billion photos Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 15. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 16. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 17. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage 1 billion new photos uploaded each week (increment of 60TB) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 18. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage 1 billion new photos uploaded each week (increment of 60TB) At peak traffic 1 million images served per second Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 19. Example: Facebook’s “Haystack” 65 billion photos 4 images of different sizes stored for each photo For a total of 260 billion images and 20PB of storage 1 billion new photos uploaded each week (increment of 60TB) At peak traffic 1 million images served per second An image request is like finding a needle in a haystack Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
  • 20. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 21. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 22. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 23. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases The amount of annual traffic flowing over the Internet is around 700EB Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 24. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases The amount of annual traffic flowing over the Internet is around 700EB Walmart handles in excess of 1 million transactions every hour (25PB in total) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 25. More examples The LHC at CERN generates 22PB of data annually (after throwing away around 99% of readings) The Square Kilometre Array (under construction) is expected to generate hundreds of PB each day Farecast, a part of Bing, searches through 225 billion flight and price records to advise customers on their ticket purchases The amount of annual traffic flowing over the Internet is around 700EB Walmart handles in excess of 1 million transactions every hour (25PB in total) 400 million Tweets everyday Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
  • 26. Outline 1 Introduction 2 Ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 8 / 23
  • 27. Big data ecosystem Presentation layer Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 28. Big data ecosystem Presentation layer Application layer: frameworks + storage Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 29. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 30. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 31. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Network layer (intra- and inter-data center) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 32. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Network layer (intra- and inter-data center) Physical infrastructure Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 33. Big data ecosystem Presentation layer Application layer: frameworks + storage Operating system layer Virtualization layer (optional) Network layer (intra- and inter-data center) Physical infrastructure Can roughly be called the “cloud” Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
  • 34. Presentation Layer Acts as the user-facing end of the entire ecosystem Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 35. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 36. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 37. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote For most web 2.0 applications, the presentation layer is a web portal Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 38. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote For most web 2.0 applications, the presentation layer is a web portal For instance, the Google search website is a presentation layer: it takes user queries, forwards them to a scatter-gather application, and presents the results to the user (within a time bound) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 39. Presentation Layer Acts as the user-facing end of the entire ecosystem Forwards user queries to the backend (potentially the rest of the stack) Can be both local and remote For most web 2.0 applications, the presentation layer is a web portal For instance, the Google search website is a presentation layer: it takes user queries, forwards them to a scatter-gather application, and presents the results to the user (within a time bound) Made up of many technologies, such as HTTP, HTML, AJAX, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
  • 40. Application Layer Serves as the back-end Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 41. Application Layer Serves as the back-end Either computes a result for the user, or fetches a previously computed result or content from storage Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 42. Application Layer Serves as the back-end Either computes a result for the user, or fetches a previously computed result or content from storage The execution is predominantly distributed Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 43. Application Layer Serves as the back-end Either computes a result for the user, or fetches a previously computed result or content from storage The execution is predominantly distributed The computation itself might entail cross-disciplinary (across sciences) technology Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
  • 44. Computation Can be a custom solution, such as a scatter-gather application Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
  • 45. Computation Can be a custom solution, such as a scatter-gather application Might also be an existing data intensive computation framework, such as MapReduce, Dryad, MPI, etc. or a stream processing system, such as Storm, S4, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
  • 46. Computation Can be a custom solution, such as a scatter-gather application Might also be an existing data intensive computation framework, such as MapReduce, Dryad, MPI, etc. or a stream processing system, such as Storm, S4, etc. Analytics engines: R, Matlab, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
  • 47. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 48. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 49. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 50. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 51. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Key-value stores: Dynamo, Cassandra, Voldemort, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 52. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Key-value stores: Dynamo, Cassandra, Voldemort, etc. Tables: BigTable, HBase, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 53. Storage 1 Relational database management systems (RDBMS): MySQL, Oracle DB, IBM DB2, etc. (structured data) 2 NoSQL: Key-value stores, document stores, graphs, tables, etc. (semi-structured and unstructured data) Document stores: MongoDB, CouchDB, etc. Graphs: FlockDB, etc. Key-value stores: Dynamo, Cassandra, Voldemort, etc. Tables: BigTable, HBase, etc. 3 NewSQL: The best of both worlds: Spanner, VoltDB, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
  • 54. Operating System Layer Consists of the traditional operating system stack with the usual suspects, Windows, variants of *nix, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 14 / 23
  • 55. Operating System Layer Consists of the traditional operating system stack with the usual suspects, Windows, variants of *nix, etc. Alternatives exist though. Specialized for the cloud or multicore systems Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 14 / 23
  • 56. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 57. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 58. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 59. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Easier to dedicate CPU and memory but not the network Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 60. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Easier to dedicate CPU and memory but not the network Allocation either in the form of VMs or containers Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 61. Virtualization Layer Allows multiple operating systems to run on top of the same physical hardware Enables infrastructure sharing, isolation, and optimized utilization Different allocation strategies possible Easier to dedicate CPU and memory but not the network Allocation either in the form of VMs or containers VMWare, Xen, LXC, etc. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
  • 62. Network Layer Connects the entire ecosystem together Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 63. Network Layer Connects the entire ecosystem together Consists of the entire protocol stack Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 64. Network Layer Connects the entire ecosystem together Consists of the entire protocol stack Tenants assigned to Virtual LANs Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 65. Network Layer Connects the entire ecosystem together Consists of the entire protocol stack Tenants assigned to Virtual LANs Multiple protocols available across the stack Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
  • 66. Physical Infrastructure Layer The physical hardware itself Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 67. Physical Infrastructure Layer The physical hardware itself Servers and network elements Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 68. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 69. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Servers are connected in various topologies using different interconnects Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 70. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Servers are connected in various topologies using different interconnects Dubbed as datacenters Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 71. Physical Infrastructure Layer The physical hardware itself Servers and network elements Mechanism for power distribution, wiring, and cooling Servers are connected in various topologies using different interconnects Dubbed as datacenters “We must treat the datacenter itself as one massive warehouse-scale computer” – Luiz André Barroso and Urs Hölzle Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
  • 72. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 18 / 23
  • 73. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 19 / 23
  • 74. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 20 / 23
  • 75. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 21 / 23
  • 76. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 22 / 23
  • 77. Example: Google All that infrastructure enables Google to: Index 20 billion web pages a day Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 78. Example: Google All that infrastructure enables Google to: Index 20 billion web pages a day Handle in excess of 3 billion search queries daily Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 79. Example: Google All that infrastructure enables Google to: Index 20 billion web pages a day Handle in excess of 3 billion search queries daily Provide email storage to 425 million Gmail users Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 80. Example: Google All that infrastructure enables Google to: Index 20 billion web pages a day Handle in excess of 3 billion search queries daily Provide email storage to 425 million Gmail users Serve 3 billion YouTube videos a day Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
  • 81. 1 Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a needle in Haystack: Facebook’s photo storage. In Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI’10). USENIX Association, Berkeley, CA, USA. 2 Urs Hoelzle and Luiz Andre Barroso. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st ed.). Morgan and Claypool Publishers. Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 24 / 23