SlideShare a Scribd company logo
Evolution of Big Data
    Architectures
    Architecture Summit, Aug 2012

           Ashish Thusoo
Outline

Demand for Big Data
Architectural Trade Offs and Evolution
Where next?
The Changing Planet

3 Technology Drivers
   Devices
   Infrastructure
   Applications
Evolution: Devices
Evolution: Devices

Key Capabilities
   Connected
   Location Aware
   Sensory & Powerful
Evolution: Devices
Mobile Subscription Density 2004




Evolution: Connectivity
Mobile Subscription Density 2010




Evolution: Connectivity
Evolution: Bandwidth
Evolution: Applications

Salient Traits
   Cloud based
   Web scale
Explosion in Data

Big Data
   Volume
   Velocity
   Variety
Big Data: Volume

Volume:
   2011: 1.8 zettabytes of digital universe
   2009 - 2020: 35 zettabytes
Big Data: Velocity

Velocity
   340 million tweets per day
   72 hours of video uploaded every minute on YouTube
   2.9 million emails a second
Big Data: Variety

Variety
   Video
   Pictures
   Applications Logs
   etc. etc...
Disruptive Architectures
Disruptions in Data Arch

Change in Focus (1990s -> 2000s)
   Performance -> Scalability & Availability
   Rigid/Structured -> Flexible/Semistructured
Scalability &
 Availability
Towards Scalability

Problem
  10K ops/sec -> 1M ops/sec
  TB of data -> PB of data
Towards Scalability

Solution: SHARDING (Divide and Conquer)

                                       11011000110000011001001011111010


    11011000110000011001001011111010   11011000110000011001001011111010


    11011000110000011001001011111010

    11011000110000011001001011111010   11011000110000011001001011111010

    11011000110000011001001011111010   11011000110000011001001011111010

    11011000110000011001001011111010

    11011000110000011001001011111010   11011000110000011001001011111010

                                       11011000110000011001001011111010
Towards Scalability

How do we quickly route a record to a shard?

                                                  11011000110000011001001011111010

                                                  11011000110000011001001011111010




 fn(                                          )
                                                  11011000110000011001001011111010

           11011000110000011001001011111010       11011000110000011001001011111010




  - Consistent Hashing                            11011000110000011001001011111010

                                                  11011000110000011001001011111010
  - Mapping Table
Towards Scalability
What happens is part of the record is in one shard and part
in another?
                                                                        11011000110000011001001011111010


                                                                        11011000110000011001001011111010




                                                                        11011000110000011001001011111010

  11011000110000011001001011111010   11011000110000011001001011111010   11011000110000011001001011111010




                                                                        11011000110000011001001011111010


                                                                        11011000110000011001001011111010
Towards Scalability
Keep it Simple: Application deals with atomicity &
consistency semantics
                                          11011000110000011001001011111010


                                          11011000110000011001001011111010




       11011000110000011001001011111010   11011000110000011001001011111010

                                          11011000110000011001001011111010
       11011000110000011001001011111010




                                          11011000110000011001001011111010


                                          11011000110000011001001011111010
Towards Availability
What if my shard is down? Where do I put my record?



                                                         X
                                              11011000110000011001001011111010


                                              11011000110000011001001011111010




       11011000110000011001001011111010


                                          ?   11011000110000011001001011111010

                                              11011000110000011001001011111010




                                              11011000110000011001001011111010


                                              11011000110000011001001011111010
Towards Availability
Lets just replicate the shards and pray that one is available :)



                                                                                                                       X
                                     11011000110000011001001011111010   11011000110000011001001011111010    11011000110000011001001011111010


                                     11011000110000011001001011111010   11011000110000011001001011111010    11011000110000011001001011111010




                                     11011000110000011001001011111010   11011000110000011001001011111010    11011000110000011001001011111010
  11011000110000011001001011111010
                                     11011000110000011001001011111010   11011000110000011001001011111010    11011000110000011001001011111010




                                     11011000110000011001001011111010    11011000110000011001001011111010   11011000110000011001001011111010


                                     11011000110000011001001011111010    11011000110000011001001011111010   11011000110000011001001011111010
Towards Availability

Replication strategies
       What should be the number of replicas?
       How to rebuild a replica?
       How to propogate a record to a replica?
1990s vs 2000s
Different Focus: 1990s (Raw Performance)
       Optimal I/O structures
       Cache Sensitive Algorithms
2000s (Scalability, Availability)
       Sharding
       Replication
Flexibility/Semi-
    structure
Towards Flexibility

Problem
  Does structure in a database make it slower to write
  applications (sprint vs waterfall model)?
  My data is not records and tables?
Towards Flexibility
How knowing my record structure help by data system?
   Helps to optimize execution plans
   Helps to optimize my storage layouts
Trade off?
   Application change means database schema change,
   rebuilding indexes etc. etc.
Towards Flexibility

Most of my operations are simple lookups, range lookups
and updates
   Since the execution is simple we don’t need all the
   structure
   Keep enough structure to support fast gets and puts
Towards Flexibility

Solution: Key-Value Stores (NoSQL)

                 KEY                   VALUE
                1101100011   11011000110000011001001011111010

                1101100011   11011000110000011001001011111010

   1101100011   1101100011   11011000110000011001001011111010              11011000110000011001001011111010


                1101100011   11011000110000011001001011111010

                1101100011   11011000110000011001001011111010
                                                                - Sorted HashMaps
                1101100011   11011000110000011001001011111010

                                                                - Sorted Files
Towards Flexibility

Need to update related “values” of a key (Some Atomicity)

            KEY                   VALUE
          11011000110   11011000110000011001001011111010

          11011000110   11011000110000011001001011111010

          11011000110   11011000110000011001001011111010

          11011000110   11011000110000011001001011111010

          11011000110   11011000110000011001001011111010


          11011000110   11011000110000011001001011111010
Towards Flexibility

Need update related “values” of a key (Some Atomicity)

        KEY           TAG                   VALUE
      11011000110   11011000110   11011000110000011001001011111010

      11011000110   11011000110   11011000110000011001001011111010

      11011000110   11011000110   11011000110000011001001011111010

      11011000110   11011000110   11011000110000011001001011111010   TAG = COLUMN FAMILY
      11011000110   11011000110   11011000110000011001001011111010


      11011000110   11011000110   11011000110000011001001011111010
Towards Flexibility

gets and puts are fine for online applications BUT..
What about Analytics?
   Transformations can be really complicated...
Towards Flexibility

Is there a simple construct that can solve a number of
analytics queries
   of course: SORT
   And it can be parallelized too
Towards Flexibility

MAP/REDUCE (Scalable Parallel Pluggable SORT)
                                              Mappers                                  Reducers
                                          11011000110000011001001011111010
                                                                                    11011000110000011001001011111010
                                          11011000110000011001001011111010
  11011000110000011001001011111010
                                                                                    11011000110000011001001011111010
  11011000110000011001001011111010




                                     m{                                      } r{                                      }
                                                                                    11011000110000011001001011111010
  11011000110000011001001011111010        11011000110000011001001011111010

  11011000110000011001001011111010        11011000110000011001001011111010
                                                                                    11011000110000011001001011111010
  11011000110000011001001011111010

                                                                                    11011000110000011001001011111010
  11011000110000011001001011111010
                                          11011000110000011001001011111010
                                                                                    11011000110000011001001011111010

  m: user defined map function             11011000110000011001001011111010


 r: user defined reduce function
Towards Flexibility

MAP/REDUCE and Failures

            Mappers                           Reducers
        11011000110000011001001011111010
                                           11011000110000011001001011111010
        11011000110000011001001011111010
                                           11011000110000011001001011111010




                   X
                                           11011000110000011001001011111010
        11011000110000011001001011111010

        11011000110000011001001011111010
                                           11011000110000011001001011111010

                                           11011000110000011001001011111010

        11011000110000011001001011111010
                                           11011000110000011001001011111010

        11011000110000011001001011111010
1990s vs 2000s
Different Focus: 1990s (Raw Performance)
      Structure important for speed optimizations
      Stream everything through Query plan
2000s (Sprint mode of application development)
      Support dev efficiency and data variety
      Checkpointing for restartability
Where now?
The New Meets The Old

Disruption?
   Well we still need SQL
   We still need to make these work with other components
   Guess what? Efficiency is also important at scale
Where Does New Fail?

Transactions?
   Moving money from one account to another
Graphs?
   Networks everywhere
   How to do second order analysis on graphs
Thank You!
Ashish thusoo   evolution of big data architectures

More Related Content

Viewers also liked

True Life: I work at an advertising agency
True Life: I work at an advertising agencyTrue Life: I work at an advertising agency
True Life: I work at an advertising agency
Kait1788
 
IT Performance – what differentiates the Leaders
IT Performance – what differentiates the LeadersIT Performance – what differentiates the Leaders
IT Performance – what differentiates the Leaders
Capgemini
 
Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...
Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...
Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...
Global Business Events
 
Salesforce dug meetup6_summer14apex
Salesforce dug meetup6_summer14apexSalesforce dug meetup6_summer14apex
Salesforce dug meetup6_summer14apex
Ikou Sanuki
 
RFID
RFIDRFID
Wikipedia presentation 20130514
Wikipedia presentation 20130514Wikipedia presentation 20130514
Wikipedia presentation 20130514
Vassia Atanassova
 
10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...
10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...
10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...
Knobbe Martens - Intellectual Property Law
 

Viewers also liked (7)

True Life: I work at an advertising agency
True Life: I work at an advertising agencyTrue Life: I work at an advertising agency
True Life: I work at an advertising agency
 
IT Performance – what differentiates the Leaders
IT Performance – what differentiates the LeadersIT Performance – what differentiates the Leaders
IT Performance – what differentiates the Leaders
 
Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...
Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...
Marcos Dussoni, Regional CFO at Sodexo - Acquisitions in complex markets: tak...
 
Salesforce dug meetup6_summer14apex
Salesforce dug meetup6_summer14apexSalesforce dug meetup6_summer14apex
Salesforce dug meetup6_summer14apex
 
RFID
RFIDRFID
RFID
 
Wikipedia presentation 20130514
Wikipedia presentation 20130514Wikipedia presentation 20130514
Wikipedia presentation 20130514
 
10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...
10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...
10 Strategies Startup Companies Need to Know to Aggressively Build a Patent P...
 

Similar to Ashish thusoo evolution of big data architectures

Speech Reognition Using FPGA Technology
Speech Reognition Using FPGA TechnologySpeech Reognition Using FPGA Technology
Speech Reognition Using FPGA Technology
Carlos
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
Yasuo Tabei
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability Project
Rebecca Bilbro
 
9. lenguaje binario
9. lenguaje binario9. lenguaje binario
9. lenguaje binario
Oskii27
 
9. lenguaje binario
9. lenguaje binario9. lenguaje binario
9. lenguaje binario
Oskii27
 
Safe Data is Happy Data
Safe Data is Happy DataSafe Data is Happy Data
Safe Data is Happy Data
PostgreSQL Experts, Inc.
 
Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012
Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012
Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012
Elizabeth Neely
 
побудова та організація комп'ютерних мереж
побудова та організація комп'ютерних мережпобудова та організація комп'ютерних мереж
побудова та організація комп'ютерних мереж
Sanya Dzhedzhera
 
Strukt web site
Strukt web siteStrukt web site
Strukt web site
Darina Koroleh
 
Finpro be inspired Ideo
Finpro be inspired IdeoFinpro be inspired Ideo
Finpro be inspired Ideo
Business Finland
 
Big Data Will Change Our World
Big Data Will Change Our WorldBig Data Will Change Our World
Big Data Will Change Our World
Fliptop
 
DX2000 from NEC lets you put big data to work - Infographic
DX2000 from NEC lets you put big data to work - InfographicDX2000 from NEC lets you put big data to work - Infographic
DX2000 from NEC lets you put big data to work - Infographic
Principled Technologies
 
A4 drive dev_ops_agility_and_operational_efficiency
A4 drive dev_ops_agility_and_operational_efficiencyA4 drive dev_ops_agility_and_operational_efficiency
A4 drive dev_ops_agility_and_operational_efficiency
Dr. Wilfred Lin (Ph.D.)
 
[Infographic] Empower Your Business With Digital Business Transformation
[Infographic] Empower Your Business With Digital Business Transformation[Infographic] Empower Your Business With Digital Business Transformation
[Infographic] Empower Your Business With Digital Business Transformation
Citrix
 
Operation Blackjack Decoded By Glp
Operation Blackjack Decoded By GlpOperation Blackjack Decoded By Glp
Operation Blackjack Decoded By Glp
truthseeker
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
Yasuo Tabei
 
Introduction To Uae & Mena Trading Strategies By Peter Barr
Introduction To Uae & Mena Trading Strategies   By Peter BarrIntroduction To Uae & Mena Trading Strategies   By Peter Barr
Introduction To Uae & Mena Trading Strategies By Peter Barr
petebarr
 
Informe simulacion digital yolfred uzcategui - 25.242.800
Informe simulacion digital   yolfred uzcategui - 25.242.800Informe simulacion digital   yolfred uzcategui - 25.242.800
Informe simulacion digital yolfred uzcategui - 25.242.800
Yolfred Uzcategui
 
Data Quality Program Assessment
Data Quality Program AssessmentData Quality Program Assessment
Data Quality Program Assessment
Joaquin Marques
 
Sauron: DIY home security with Ruby!
Sauron: DIY home security with Ruby!Sauron: DIY home security with Ruby!
Sauron: DIY home security with Ruby!
1337807
 

Similar to Ashish thusoo evolution of big data architectures (20)

Speech Reognition Using FPGA Technology
Speech Reognition Using FPGA TechnologySpeech Reognition Using FPGA Technology
Speech Reognition Using FPGA Technology
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability Project
 
9. lenguaje binario
9. lenguaje binario9. lenguaje binario
9. lenguaje binario
 
9. lenguaje binario
9. lenguaje binario9. lenguaje binario
9. lenguaje binario
 
Safe Data is Happy Data
Safe Data is Happy DataSafe Data is Happy Data
Safe Data is Happy Data
 
Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012
Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012
Digitizing Your Publishing Practice for the Museum Publishing Seminar 2012
 
побудова та організація комп'ютерних мереж
побудова та організація комп'ютерних мережпобудова та організація комп'ютерних мереж
побудова та організація комп'ютерних мереж
 
Strukt web site
Strukt web siteStrukt web site
Strukt web site
 
Finpro be inspired Ideo
Finpro be inspired IdeoFinpro be inspired Ideo
Finpro be inspired Ideo
 
Big Data Will Change Our World
Big Data Will Change Our WorldBig Data Will Change Our World
Big Data Will Change Our World
 
DX2000 from NEC lets you put big data to work - Infographic
DX2000 from NEC lets you put big data to work - InfographicDX2000 from NEC lets you put big data to work - Infographic
DX2000 from NEC lets you put big data to work - Infographic
 
A4 drive dev_ops_agility_and_operational_efficiency
A4 drive dev_ops_agility_and_operational_efficiencyA4 drive dev_ops_agility_and_operational_efficiency
A4 drive dev_ops_agility_and_operational_efficiency
 
[Infographic] Empower Your Business With Digital Business Transformation
[Infographic] Empower Your Business With Digital Business Transformation[Infographic] Empower Your Business With Digital Business Transformation
[Infographic] Empower Your Business With Digital Business Transformation
 
Operation Blackjack Decoded By Glp
Operation Blackjack Decoded By GlpOperation Blackjack Decoded By Glp
Operation Blackjack Decoded By Glp
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
 
Introduction To Uae & Mena Trading Strategies By Peter Barr
Introduction To Uae & Mena Trading Strategies   By Peter BarrIntroduction To Uae & Mena Trading Strategies   By Peter Barr
Introduction To Uae & Mena Trading Strategies By Peter Barr
 
Informe simulacion digital yolfred uzcategui - 25.242.800
Informe simulacion digital   yolfred uzcategui - 25.242.800Informe simulacion digital   yolfred uzcategui - 25.242.800
Informe simulacion digital yolfred uzcategui - 25.242.800
 
Data Quality Program Assessment
Data Quality Program AssessmentData Quality Program Assessment
Data Quality Program Assessment
 
Sauron: DIY home security with Ruby!
Sauron: DIY home security with Ruby!Sauron: DIY home security with Ruby!
Sauron: DIY home security with Ruby!
 

More from drewz lin

Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-keary
drewz lin
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013
drewz lin
 
Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13
drewz lin
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrich
drewz lin
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2
drewz lin
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2
drewz lin
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
drewz lin
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equal
drewz lin
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
drewz lin
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansen
drewz lin
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
drewz lin
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_edits
drewz lin
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentation
drewz lin
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
drewz lin
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martin
drewz lin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowasp
drewz lin
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usa
drewz lin
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013
drewz lin
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架drewz lin
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈drewz lin
 

More from drewz lin (20)

Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-keary
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013
 
Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrich
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equal
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansen
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_edits
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentation
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowasp
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usa
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Ashish thusoo evolution of big data architectures

  • 1. Evolution of Big Data Architectures Architecture Summit, Aug 2012 Ashish Thusoo
  • 2. Outline Demand for Big Data Architectural Trade Offs and Evolution Where next?
  • 3. The Changing Planet 3 Technology Drivers Devices Infrastructure Applications
  • 5. Evolution: Devices Key Capabilities Connected Location Aware Sensory & Powerful
  • 7. Mobile Subscription Density 2004 Evolution: Connectivity
  • 8. Mobile Subscription Density 2010 Evolution: Connectivity
  • 10. Evolution: Applications Salient Traits Cloud based Web scale
  • 11. Explosion in Data Big Data Volume Velocity Variety
  • 12. Big Data: Volume Volume: 2011: 1.8 zettabytes of digital universe 2009 - 2020: 35 zettabytes
  • 13. Big Data: Velocity Velocity 340 million tweets per day 72 hours of video uploaded every minute on YouTube 2.9 million emails a second
  • 14. Big Data: Variety Variety Video Pictures Applications Logs etc. etc...
  • 16. Disruptions in Data Arch Change in Focus (1990s -> 2000s) Performance -> Scalability & Availability Rigid/Structured -> Flexible/Semistructured
  • 18. Towards Scalability Problem 10K ops/sec -> 1M ops/sec TB of data -> PB of data
  • 19. Towards Scalability Solution: SHARDING (Divide and Conquer) 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010
  • 20. Towards Scalability How do we quickly route a record to a shard? 11011000110000011001001011111010 11011000110000011001001011111010 fn( ) 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 - Consistent Hashing 11011000110000011001001011111010 11011000110000011001001011111010 - Mapping Table
  • 21. Towards Scalability What happens is part of the record is in one shard and part in another? 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010
  • 22. Towards Scalability Keep it Simple: Application deals with atomicity & consistency semantics 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010
  • 23. Towards Availability What if my shard is down? Where do I put my record? X 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 ? 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010
  • 24. Towards Availability Lets just replicate the shards and pray that one is available :) X 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010
  • 25. Towards Availability Replication strategies What should be the number of replicas? How to rebuild a replica? How to propogate a record to a replica?
  • 26. 1990s vs 2000s Different Focus: 1990s (Raw Performance) Optimal I/O structures Cache Sensitive Algorithms 2000s (Scalability, Availability) Sharding Replication
  • 27. Flexibility/Semi- structure
  • 28. Towards Flexibility Problem Does structure in a database make it slower to write applications (sprint vs waterfall model)? My data is not records and tables?
  • 29. Towards Flexibility How knowing my record structure help by data system? Helps to optimize execution plans Helps to optimize my storage layouts Trade off? Application change means database schema change, rebuilding indexes etc. etc.
  • 30. Towards Flexibility Most of my operations are simple lookups, range lookups and updates Since the execution is simple we don’t need all the structure Keep enough structure to support fast gets and puts
  • 31. Towards Flexibility Solution: Key-Value Stores (NoSQL) KEY VALUE 1101100011 11011000110000011001001011111010 1101100011 11011000110000011001001011111010 1101100011 1101100011 11011000110000011001001011111010 11011000110000011001001011111010 1101100011 11011000110000011001001011111010 1101100011 11011000110000011001001011111010 - Sorted HashMaps 1101100011 11011000110000011001001011111010 - Sorted Files
  • 32. Towards Flexibility Need to update related “values” of a key (Some Atomicity) KEY VALUE 11011000110 11011000110000011001001011111010 11011000110 11011000110000011001001011111010 11011000110 11011000110000011001001011111010 11011000110 11011000110000011001001011111010 11011000110 11011000110000011001001011111010 11011000110 11011000110000011001001011111010
  • 33. Towards Flexibility Need update related “values” of a key (Some Atomicity) KEY TAG VALUE 11011000110 11011000110 11011000110000011001001011111010 11011000110 11011000110 11011000110000011001001011111010 11011000110 11011000110 11011000110000011001001011111010 11011000110 11011000110 11011000110000011001001011111010 TAG = COLUMN FAMILY 11011000110 11011000110 11011000110000011001001011111010 11011000110 11011000110 11011000110000011001001011111010
  • 34. Towards Flexibility gets and puts are fine for online applications BUT.. What about Analytics? Transformations can be really complicated...
  • 35. Towards Flexibility Is there a simple construct that can solve a number of analytics queries of course: SORT And it can be parallelized too
  • 36. Towards Flexibility MAP/REDUCE (Scalable Parallel Pluggable SORT) Mappers Reducers 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 m{ } r{ } 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 m: user defined map function 11011000110000011001001011111010 r: user defined reduce function
  • 37. Towards Flexibility MAP/REDUCE and Failures Mappers Reducers 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 X 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010 11011000110000011001001011111010
  • 38. 1990s vs 2000s Different Focus: 1990s (Raw Performance) Structure important for speed optimizations Stream everything through Query plan 2000s (Sprint mode of application development) Support dev efficiency and data variety Checkpointing for restartability
  • 40. The New Meets The Old Disruption? Well we still need SQL We still need to make these work with other components Guess what? Efficiency is also important at scale
  • 41. Where Does New Fail? Transactions? Moving money from one account to another Graphs? Networks everywhere How to do second order analysis on graphs