SlideShare a Scribd company logo
1 of 127
Download to read offline
®




New Data Stack Workshop: Building a
Scalable Cloud Datacenter

Ping Li, Accel Partners
ping@accel.com

July 14, 2010
Stanford University




                                      1
®




Delivering Cloud Computing

                      “Cloud Frame”                                           Mainframe
                                                                               Monitoring—Security
                                                                                     (RACF)
                                                                             Monitoring—Performance
                          • Elasticity                                              (Mainview)
                                                                            Provisioning & Configuration
                          • Multi-app/user                                          Management

                          • User-provisioned                                       Virtualization
                                                                                       (z/VM)
                          • Portability                                        Resource Scheduler
                                                                                (z/VM & OS 370)

                                                                            Performance Acceleration &
                                                                           dedicated processors (OS 370)

                                                                               Backup and DR Tivoli
                                                                         Storage Manager, Parallel Sysplex
                          Private/Public                                 Clustering, failover, and mirroring
                                                                      (OS 370 & purpose built hw & microcode)

                      •       Cloud data centers will share infrastructure layers common to
                              mainframes but redelivered for cloud capabilities
                      •       “New Data Stack” will form foundation for cloud computing
Accel Partners Confidential                                                                                     2
®



        Data Explosion




                                                             Cloud                           New Data Stack
                                                             Application
                                                             Data

                                             Business
                                             Transaction
                                             Data
                                                                                             Legacy Stack




          • 2,500 exabytes of new information in 2012 with Internet/web as primary driver
          • “Digital universe” grew by 62% last year to 800K petabytes and will grow to
          1.2 zettabytes this year
          Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.
                                                                                                                         .
Accel Partners Confidential                                                                                                  3
®




     “New Data” Trends

                              61% CAGR                              Data is growing faster than 
                                                      Data          processing power – leading to 
                                      42% CAGR
                                                                    coping strategies like throwing 
                                                      Transistors
                                                                    away data or frequent 
                                                                    archiving to tape
                Circa 1975 – Transaction Data          Circa 2010 – Cloud Data

                2,000 users = Huge                     2,000 users = Tiny

                Smaller data sets (bytes)              Extremely large data sets (petabytes)

                Highly structured, relatively small    Unstructured, complex data blobs
                data records                           (images, voice, logs, video) – doesn’t
                                                       fit nicely into rows/columns
                Absolute consistency is the            Application responsiveness/scale
                primary requirement – ACID             trumps immediate consistency
                transactions
                                                                                      Source: Gartner.
                                                                                                     .
Accel Partners Confidential                                                                              4
®




New Data Stack Technologies


                              Legacy                        Cloud


  Centralized/monolithic computing layer    Distributed computing layer (virtual machines,
                                           Distributed computing layer (virtual machines,
 Centralized/monolithic computing layer
                                            Map Reduce, networked commodity servers)
                                           Map Reduce, networked commodity servers)
  Computer networking limited
 Computer networking limited
                                            High speed networking is pervasive
                                           High speed networking is pervasive
  Relational databases
 Relational databases
                                            Non-relational/”no sql” data stores
                                           Non-relational/”no sql” data stores
  FC SAN/NAS
 FC SAN/NAS
                                            Distributed file systems
                                           Distributed file systems
  Disks/Tape (memory scarce/expensive)
 Disks/Tape (memory scarce/expensive)
                                            Flash/SSD (high performance and abundant)
                                           Flash/SSD (high performance and abundant)
  Proprietary/closed vendors
 Proprietary/closed vendors
                                            Open platforms
                                           Open platforms
  Enterprise-scale
 Enterprise-scale
                                            Internet/cloud scale
                                           Internet/cloud scale




Accel Partners Confidential                                                             5
®




             Agenda

             1:15 pm          Northscale
                              Sharon Barr, Vice President Engineering
                              James Phillips, Founder, Chief Product Officer
                              Dustin Sailings, Chief Architect
                              Bob Wiederhold, President, CEO

             2:15 pm          Cloudera
                              Amr Awadallah, CTO/Co-Founder
                              Jeff Hammerbacher, Chief Scientist/Co-Founder

             3:15 pm          Facebook
                              Bobby Johnson, Director, Software Engineering
                              Mark Rabkin, Software Engineer

             4:15 pm          Fusion-io
                              Robert Wipfel, Fellow

             5:30 pm          Cocktails!


Accel Partners Confidential                                                    6
Elastic Data Management Software
for web applications and cloud computing environments
The opportunity.

“ Relational database technology has served us well for 40 years, and will likely continue to
  do so for the foreseeable future to support transactions requiring ACID guarantees. But a
  large, and increasingly dominant, class of software systems and data do not need those
  guarantees. Much of the data manipulated by Web applications have less strict
  transactional requirements but, for lack of a practical alternative, many IT teams continue
  to use relational technology, needlessly tolerating its cost and scalability limitations. For
  these applications and data, distributed key-value cache and database technologies such
  as NorthScale provide a promising alternative. ”
                                                       Carl Olofson
                                                       Research Vice President
                                                       Database Management Software Research
                                                       IDC
Modern interactive software architecture

                             To support more users …




                                … simply add more
                              commodity web servers
                               (or virtual machines)
                             behind a load balancer …



                              … but you must get a
                              bigger, more complex
                                 database server.



                                                        3
Application scales linearly, data hits a wall


                             Application Scales Out
                             Just add more commodity web servers




                             Database Scales Up
                             Get a bigger, more complex server




                                                                   4
What’s driving the curves?

                                                                                                        RDBMS             NorthScale
                                                   RDBMS                  NorthScale                Schema committee
                                                   750 OPS                 750 OPS                    Shard if needed
                                                                                                     Add new table(s)
                                                   $7,500           3x     $2,500                      Re-normalize
                                                                                                       Create indices
    RDBMS              NorthScale                15,000 OPS              15,000 OPS                    Update views
                                                                                                    Tune performance
  750 OPS 15,000 OPS                            $125,000 10x $12,500                                 Insert and select.    Set and get.




           1.                                            2.                                                 3.
 Transaction overhead.                           Expensive hardware.                               Complex administration.
Same hardware, over an order of magnitude   More costly to start with, and the cost differential   RDBMS technology is extremely complex and
   difference in supportable user base.                   widens with growth.                              expensive to administer.




                                                                                                                                           5
Billions in data management savings available




                                                                      Alternative database
                                                                      technology needed


                                                                      Relational database
                                                                      technology ideal




 RDBMS ideal for intended purpose, will continue to be appropriate for
 debit-credit data – costly overkill for most new data
                                Relational database technology was $18.8 billion market in 2007 (IDC)   6
Big leap from relational database to alternatives




  Where do I start? What data should I move first? Which alternative
  database technology will “win”? This looks really complicated.
                                                                       7
NorthScale solution.


“ I can’t tell you how many email requests I’ve received
  from our developers asking for something that is as
  simple and fast as memcached, but that promises data
  durability. Cassandra is just far too complex and
  heavyweight and we won’t be doing any more deployments.
  NorthScale is definitely on to something here. ”

                               Director of Engineering
                               Leading Social Network
Before: Where you are today




  Relational database technology powers 99.999% of web applications.
                                                                       9
Step 1: Cache relational data in memcached




                      NorthScale Memcached Servers




                      Relational Database




  Memcached is simple, fast and infinitely scalable. It is easy to adopt,
  and delivers immediate cost, performance and scalability benefits.
                                                                            10
Step 2: Gradually migrate data to membase




       NorthScale Memcached Servers   NorthScale Membase Servers




       Relational Database




                                                                   11
After: Elastic compute and data layers
Data layer now scales with linear cost and constant performance.


                                        Application Scales Out
                                        Just add more commodity web servers




                                        Database Scales Out
                                        Just add more commodity data servers




         Scaling out flattens the cost and performance curves.                 12
An evolutionary path toward elastic data




                                           13
NorthScale Membase Server
Membase is an elastic key-value database




           Application user




        Web application server




        Membase data servers




     In the data center          On the administrator console



                                                                15
Membase is Simple, Fast, Elastic

                Five minutes or less to a working
                cluster
                • Downloads for Linux and Windows
                • Start with a single node
                • One button press joins nodes to a cluster
                Easy to develop against
                • Just SET and GET – no schema required
                • Drop it in. 10,000+ existing applications
                  already “speak membase” (via memcached)
                • Practically every language and application
                  framework is supported, out of the box
                Easy to manage
                • One-click failover and cluster rebalancing
                • Graphical and programmatic interfaces
                • Configurable alerting

                                                               16
Membase is Simple, Fast, Elastic

                Predictable
                • “Never keep an application waiting”
                • Quasi-deterministic latency and throughput
                Low latency
                • Auto-migration of hot data to lowest latency
                  storage technology (RAM, SSD, Disk)
                • Selectable write behavior – asynchronous,
                  synchronous (on replication, persistence)
                • Back-channel rebalancing [FUTURE]
                High throughput
                •   Multi-threaded
                •   Low lock contention
                •   Asynchronous wherever possible
                •   Automatic write de-duplication
                                                                 17
Membase is Simple, Fast, Elastic

                Scale out
                • Spread I/O and data across commodity
                  servers (or VMs)
                • Consistent performance with linear cost
                • Dynamic rebalancing of a live cluster
                All nodes are created equal
                • No special case nodes
                • Clone to grow
                Extensible
                • Filtered TAP interface provides hook points
                  for external systems (e.g. full-text search,
                  backup, warehouse)
                • Data bucket – engine API for specialized
                  container types
                • Membase NodeCode [FUTURE]

                                                                 18
vBucket mapping

   Key     vBucket               vBucket     Servers
    (hash function)              (table lookup)


 All possible                                                               vBucket‐Server Map ‐ Example
membase keys          vBuckets          Host Server/Replica Servers   vBuckets          Host Server/Replica Servers

    Key1
                                                                      vBucket1          ServerA / ServerB, ServerC
    Key2
                      vBucket1           Server1 / Server2, Server3   vBucket2          ServerA / ServerB, ServerC
    Key3
    Key4                                                              vBucket3          ServerB / ServerA, ServerC
    Key5                                                              vBucket4          ServerB / ServerA, ServerC
    Key6
                      vBucket2           Server1 / Server2, Server3   vBucket5          ServerC / ServerA, ServerB
    Key7
    Key8                                                              vBucket6          ServerC / ServerA, ServerB

    Key9
   Key10
                      vBucket3           Server2 / Server3, Server4




   Keym               vBucketn           Serverp / Serverq, Serverr

                                                                                                                      19
Deployment options

                                      Membase Server                     Membase Server                     Membase Server
OTC Memcached Server                  Embedded proxy                     Standalone proxy              “vBucket-aware” client




                                  cluster operations                 cluster operations                 cluster operations




                                      data operations                    data operations                    data operations
   data operations

                                           proxy       vbucket                proxy       vbucket                proxy       vbucket
                                                        map                                map                                map

            11211             11210            11211             11210            11211             11210            11211




                                                                                          vbucket
                                                                              proxy        map
      OTC            server              OTC            server
    memcached                          memcached                             OTC                                NEW     vbucket
                      list                               list              memcached localhost                memcached
      client                             client                                                                          map
                                                                             client                             client

     application                       application
       logic                             logic                             application                        application
                                                                             logic                              logic



                                  Deployment Option 1               Deployment Option 2                 Deployment Option 3




                                                                                                                                       20
Membase “write” data flow – application view

 User action results in the need
  to change the VALUE of KEY
                                   1


                                                     Application updates key’s VALUE,
                                                  2 performs SET operation

                          4                          Membase (memcached) client hashes
                                                  3 KEY, identifies KEY’s master server
              SET request sent over
             network to master server




                         5
       Membase replicates KEY-VALUE pair,
      caches it in memory and stores it to disk
                                                                                          21
Membase data flow – under the hood

                        SET request arrives at                         SET acknowledgement
                         KEY’s master server
                                               1                  5 returned to application




     Listener‐Sender
                                                  2                                                  2       Listener‐Sender
                                                        Listener-Sender
  RAM*                                                RAM*                                                RAM*




                                                                        membase storage engine
                         membase storage engine




                                                                                                                                membase storage engine
                                                                                                 3

   SSD    SSD    SSD                                                                                       SSD    SSD    SSD
                                                      SSD SSD SSD
                                                      SSD SSD SSD



   Disk   Disk   Disk
                                                                                                 4         Disk   Disk   Disk
                                                      Disk Disk Disk
                                                      Disk Disk Disk


Replica Server 1 for KEY                              Master server for KEY                              Replica Server 2 for KEY
                                                                                                                                                         22
Membase Architecture
      11211            11210
   memcapable 1.0   memcapable 2.0




      moxi




                                             REST management API/Web UI




                                                                                                                                                                                                               vBucket state and replication manager
                                                                                                                                 Global singleton supervisor


                                                                                                                                                                Rebalance orchestrator
                                                                                                         Configuration manager
          memcached




                                                                                                                                                                                         Node health monitor
                                                                                       Process monitor
     protocol listener/sender




                                                                          Heartbeat
     Data Manager                                           Cluster Manager
                          engine interface




     membase storage engine                  http                             on each node                                                                     one per cluster



                                                                                                 Erlang/OTP



                                             HTTP                                     erlang port mapper                                                                  distributed erlang
                                             8080                                                        4369                                                   21100 – 21199
Membase Architecture
      11211            11210
   memcapable 1.0   memcapable 2.0




      moxi




                                                                                                                                                                                                                vBucket state and replication manager
                                              REST management API/Web UI




                                                                                                                                  Global singleton supervisor


                                                                                                                                                                 Rebalance orchestrator
                                                                                                          Configuration manager
          memcached




                                                                                                                                                                                          Node health monitor
                                                                                        Process monitor
     protocol listener/sender




                                                                           Heartbeat
                          engine interface




     membase storage engine                  http                              on each node                                                                     one per cluster



                                                                                                  Erlang/OTP



                                             HTTP                                      erlang port mapper                                                                  distributed erlang
                                             8080                                                         4369                                                   21100 – 21199
Data buckets are secure membase “slices”



                Application user




             Web application server




 Bucket 1
 Bucket 2
     Aggregate Cluster Memory and Disk Capacity




              Membase data servers




        In the data center                        On the administrator console
                                                                                 25
NorthScale in production




   Leading cloud service (PAAS)      Social game leader – FarmVille,
   provider                          Mafia Wars, Café World
   Over 65,000 hosted applications   Over 230 million monthly users
   NorthScale Memcached Server       NorthScale Membase Server
   serving over 1,200 Heroku         is the 500,000 ops-per-second
   customers (as of June 10, 2010)   database behind FarmVille and
                                     Café World
                                                                       26
Wednesday, July 14, 2010
Evolving a New Analytical Platform
         What Works and What’s Missing


         Jeff Hammerbacher
         Chief Scientist, Cloudera
         July 14, 2010



Wednesday, July 14, 2010
My Background
         Thanks for Asking
         ▪   hammer@cloudera.com
         ▪   Studied Mathematics at Harvard
         ▪   Worked as a Quant on Wall Street
         ▪   Conceived, built, and led Data team at Facebook
             ▪   Nearly 30 amazing engineers and data scientists
             ▪   Several open source projects and research papers
         ▪   Founder of Cloudera
             ▪   Chief Scientist
             ▪   Also, check out the book “Beautiful Data”

Wednesday, July 14, 2010
Presentation Outline
         ▪   1. Defining the Platform
             ▪   BI: Science for Profit
             ▪   Need tools for whole research cycle
             ▪   SQL Server 2008 R2: defining the platform
         ▪   2. State of the Platform Ecosystem
         ▪   3. Foundations for a New Implementation
             ▪   Hadoop
             ▪   Boiling the Frog
         ▪   4. Future Developments
         ▪   Questions and Discussion


Wednesday, July 14, 2010
1. Defining the Platform




Wednesday, July 14, 2010
BI is looking more like science (for profit)




Wednesday, July 14, 2010
Jim Gray: Science entering Fourth Paradigm
            “We have to do better at producing tools to
                 support the whole research cycle”




Wednesday, July 14, 2010
RDBMS only a small part of this tool set




Wednesday, July 14, 2010
Example: SQL Server 2008 R2




Wednesday, July 14, 2010
RDBMS: SQL Server




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Wednesday, July 14, 2010
CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Wednesday, July 14, 2010
CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Wednesday, July 14, 2010
MDM: Master Data Services
                                CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Wednesday, July 14, 2010
Collaboration: SharePoint
                            MDM: Master Data Services
                                CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Wednesday, July 14, 2010
What do we call this unified suite?




Wednesday, July 14, 2010
For today: Analytical Data Platform




Wednesday, July 14, 2010
For today: Analytical Data Platform
               LAMP Stack for Analytical Data Management




Wednesday, July 14, 2010
2. The State of the Platform Ecosystem




Wednesday, July 14, 2010
Who makes up the platform ecosystem?




Wednesday, July 14, 2010
Platform Providers




Wednesday, July 14, 2010
Infrastructure Providers
                             Platform Providers




Wednesday, July 14, 2010
Infrastructure Providers
                             Platform Providers
                           Application Developers




Wednesday, July 14, 2010
Content Providers
                           Infrastructure Providers
                             Platform Providers
                           Application Developers




Wednesday, July 14, 2010
Content Providers
                           Infrastructure Providers
                             Platform Providers
                           Application Developers
                                 End Users




Wednesday, July 14, 2010
What is new about the ecosystem today?




Wednesday, July 14, 2010
Content Providers
            1. > 95% of enterprise data is unstructured
                  2. Data volumes growing rapidly




Wednesday, July 14, 2010
Infrastructure Providers
                                     1. Cloud
                           2. Warehouse-Scale Computers




Wednesday, July 14, 2010
Platform Providers
                                  1. Open source
                       2. Driven by consumer web properties




Wednesday, July 14, 2010
Application Developers
                              1. Data Scientists
                           2. Diversity of languages




Wednesday, July 14, 2010
End Users
                                1. Browser is the client
                           2. Tell a story about the business




Wednesday, July 14, 2010
3. Foundations for a New Implementation




Wednesday, July 14, 2010
New foundations: HDFS and MapReduce




Wednesday, July 14, 2010
2005: Doug/Mike start project inside Nutch




Wednesday, July 14, 2010
2006: Doug joins Yahoo!




Wednesday, July 14, 2010
2007: Make Hadoop scale




Wednesday, July 14, 2010
2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Wednesday, July 14, 2010
Jim Gray’s “Fourth Paradigm” lecture
                           2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Wednesday, July 14, 2010
Randy Bryant’s “DISC” lecture
                       Jim Gray’s “Fourth Paradigm” lecture
                           2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Wednesday, July 14, 2010
Randy Bryant’s “DISC” lecture
                       Jim Gray’s “Fourth Paradigm” lecture
                             2007: Make Hadoop scale
                             Yahoo! makes Pig open source
                           Powerset makes HBase open source




Wednesday, July 14, 2010
2008: Make Hadoop fast




Wednesday, July 14, 2010
2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Wednesday, July 14, 2010
First Hadoop Summit
                           2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Wednesday, July 14, 2010
First Hadoop Summit
                           2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Wednesday, July 14, 2010
Facebook makes Hive open source
                                 First Hadoop Summit
                              2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Wednesday, July 14, 2010
“MapReduce: A Major Step Backwards”
                             Facebook makes Hive open source
                                   First Hadoop Summit
                                2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Wednesday, July 14, 2010
2009: Insert Hadoop into the enterprise




Wednesday, July 14, 2010
2009: Insert Hadoop into the enterprise
                           Cloudera releases CDH




Wednesday, July 14, 2010
First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                           Cloudera releases CDH




Wednesday, July 14, 2010
Yahoo! sorts a petabyte with Hadoop
                                 First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                                 Cloudera releases CDH




Wednesday, July 14, 2010
Yahoo! sorts a petabyte with Hadoop
                                 First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                         Cloudera releases CDH
               Cloudera adds training, support, services




Wednesday, July 14, 2010
“The Unreasonable Effectiveness of Data”
                   Yahoo! sorts a petabyte with Hadoop
                          First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                         Cloudera releases CDH
               Cloudera adds training, support, services




Wednesday, July 14, 2010
2010: Integrate Hadoop into the enterprise




Wednesday, July 14, 2010
2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights




Wednesday, July 14, 2010
Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights




Wednesday, July 14, 2010
Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights
                         Datameer and Karmasphere funded




Wednesday, July 14, 2010
Quest, Talend, Netezza, and more integrate
             Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights
                         Datameer and Karmasphere funded




Wednesday, July 14, 2010
Hive adds JDBC and ODBC
             Quest, Talend, Netezza, and more integrate
             Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights
                         Datameer and Karmasphere funded




Wednesday, July 14, 2010
Hadoop will be an Analytical Data Platform




Wednesday, July 14, 2010
4. Future Developments




Wednesday, July 14, 2010
Capture: Log collection and CEP




Wednesday, July 14, 2010
Curate: Workflow and Scheduling




Wednesday, July 14, 2010
Curate: Secondary and Full-Text Indexing




Wednesday, July 14, 2010
Curate: Learn Structure from Data




Wednesday, July 14, 2010
Analyze: Mesos-enabled frameworks




Wednesday, July 14, 2010
Analyze: Link working set and historical data




Wednesday, July 14, 2010
All behind a single user interface




Wednesday, July 14, 2010
HUE
                           Making Many Computers Feel Like One




Wednesday, July 14, 2010
!"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42*
                   2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$-
                    ! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$)
                      "#$%&'()%&($*+&),%"#-"(-)./01,
                                                     ! 63-.*313$()! 7*,2($&')-'"'%$/)
                                                       &$823&$()+,-.,"$"#)9$&/3,"/)
                                                       0)($.$"($"+3$/
                                                     ! :.$")/,2&+$)! ;<<=)>.'+5$)
                                                       *3+$"/$(
                                                     ! ?$*3'@*$)! .'#+5$()43#5)13A$/)
                                                       1&,-)12#2&$)&$*$'/$/)#,)
                                                       3-.&,9$)/#'@3*3#B
                    ! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"()
                      '#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$)
                      +,-.,"$"#/G


Wednesday, July 14, 2010
(c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0




Wednesday, July 14, 2010
ioMemory for Scale-out

             Robert Wipfel, Fellow
             rwipfel@fusionio.com


14th July, 2010, Accel Partners Panel Discussion
Factors impacting Scale-out


                                 Balance
                                 • CPU
                                 • Disk
                                 • Network


            Energy                                Contention
            • Servers                             • Sharing
            • RAM                                 • Locking
            • Disks




                            Management and Monitoring


            Graceful Recovery                     Throughput
            • No SPOFs                            • IOPS
            • Fast Replay                         • Bandwidth



                                 Latency
                                 • Distributed
                                 • Dependencies
What’s *really* Needed…




          DRAM                    Disk               Need




  Want                    Want                Want
  •  Really fast          •  Non-volatile     •  Non-volatile
  Don’t Want              •  Cheap            •  Really fast
  •  Volatile             •  Large capacity   •  Large capacity
  •  Expensive            Don’t Want          •  Reasonable price
  •  Limited capacity     •  Really slow      •  Low energy
Solution: ioMemory




  A disruption called ioMemory
  •  High speed like DRAM
  •  Persistence and capacity of disks



  PCIe based NAND Flash Storage
  •  Very high IOPS
  •  Micro-second latency
  •  Very high data throughput
Why is it called ioMemory?




                                                                                                            SAN, NAS, RAIDed DAS
                                                        ioMemory




                                                                                                     SSDs
                            DRAM




                                     50µs	
  	
                    5	
  orders	
  ooof	
   of	
  
                                                                                 3	
  f	
  
                                                                        6	
  orders	
   rders	
  
                                    (10E-­‐6)	
  	
                magnitude	
  
                                                                        magnitude	
  
                                                                                 magnitude	
  
                      L3
               L2
        L1




       Nanosecond (10E-9)          ACCESS DELAY IN TIME                                             Millisecond (10E-3)
ioMemory Performance



  Raw Storage Performance                            Application Performance

                                H2benchw 3.6:                                 IOMeter Database Benchmark I/O:
                          Interface Bandwidth MB/s                               Average Throughput MB/s

    Fusion-io ioDrive                                   Fusion-io ioDrive
     Maximum Write                                       Maximum Write
  24 GB, Flash, PCIe x4                               24 GB, Flash, PCIe x4

    Fusion-io ioDrive                                   Fusion-io ioDrive
     Improved Write                                      Improved Write
  40 GB, Flash, PCIe x4                               40 GB, Flash, PCIe x4


                  2x Faster
    Fusion-io ioDrive
   Maximum Capacity
  80 GB, Flash, PCIe x4
                                                                 50x Faster
                                                        Fusion-io ioDrive
                                                       Maximum Capacity
                                                      80 GB, Flash, PCIe x4


                 Storage I/O
   SSD SATA Vendor A
   3.0Gbps 2.5 RAID 0
                                                                Application I/O
                                                       SSD SATA Vendor A
                                                       3.0Gbps 2.5 RAID 0
 128 GB, Flash SATA/300                              128 GB, Flash SATA/300

   SSD SATA Vendor B
                                                       SSD SATA Vendor C
   3.0Gbps 2.5 RAID 0
 64 GB, Flash SATA/300                               32 GB, Flash SATA/300


   SSD SATA Vendor C                                   SSD SATA Vendor B
 32 GB, Flash SATA/300                                 3.0Gbps 2.5 RAID 0
                                                     64 GB, Flash SATA/300




7/14/10
ioMemory Reliability



                          Strong ECC
                         Wear leveling
                           Bad block
                          re-mapping
                         Data labeling
        Parity-
      protected
      pipelines                                      Power cut
                                                     protection




                                                       PCI bus
         Flashback                                   protection
       Chip protection                               Checksums
                          MTBF = 2 Million Hours +    Poison bit
ioMemory is not a Solid State Disk



 SSD
   Application    CPU                RAID Controller         SSD
                                                                   3a
                                                                   3


                                                                   4
                                                                   4a


                 51                       2

                  9
                  6                       8
                                          5
                                                                   3b   SSD


                                                                   4b




 ioMemory
   Application    CPU                             ioMemory




                  1

                                              2
ioMemory is Green


                                                      133,493 kWh/yr
       K I L O W A T T S




                                       3,013 kWh/yr




                           97 kWh/yr

                            ioDrive      SSD          15,000 RPM
                           Fusion-io   ZeusIOPS         FC HDD
Case Study




  One of the world’s fastest growing Webmonsters
  •  Over 900% more database queries per second

  •  Dramatically improved server replication for most current data

  •  Over 800% improvement to disaster recovery back-up time

  •  Cut server footprint, power costs, and IT overhead by 75%

  •  Full and immediate ROI on repurposed servers with

  •  Continued ROI on operational cost saving
Case Study
Case Study




  Internet security company that protects over 1 billion inboxes

    •  5x improvement to
       •  Database replication performance
       •  Data intensive query response
       •  Analysis routines

    •  Eliminating 210 failure points from system

    •  Implemented full system redundancy

    •  Dramatically lowered power and cooling expenses
Case Study
Disruption



By deploying ioMemory…   Cloudmark eliminated the need for this…
Other Customer Examples




       Does a 30 to 1 box reduction for their reliable messaging system


       HMO achieves a 200 HDD to 1 ioDrive reduction for their Data Warehouse


       Department of Defense takes NASTRAN from 3-days to 6-hours


       Stock exchange doubles the performance of their trading systems


       Shows a 35x performance increase of unstructured search at OracleWorld


       Demos Dynamics NAV can get a 4x performance improvement
ioMemory Products




     80 GB
     •  119,790 (4k read packet size)
     •  89,549 (75/25 r/w mix 4k packet size)


     160 GB                                     320 GB
     •  116,046 (4k read packet size)           •  185,022 (4k read packet size)
     •  93,199 (75/25 r/w mix 4k packet size)   •  129,699 (75/25 r/w mix 4k packet size)


     320 GB                                     640 GB
     •  71,256 (4k read packet size)            •  122,601 (4k read packet size)
     •  67,659 (75/25 r/w mix 4k packet size)   •  121,008 (75/25 r/w mix 4k packet size)
OEM Partners




Confiden8al	
  Informa8on:	
  Fusion-­‐io	
     19
Questions?




             20
THANK YOU

More Related Content

What's hot

21st Century SOA
21st Century SOA21st Century SOA
21st Century SOABob Rhubart
 
Research on big data
Research on big dataResearch on big data
Research on big dataRoby Chen
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Innovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceInnovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceBob Rhubart
 
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Marc Villemade
 
Riverbed Whitewater Datasheet
Riverbed Whitewater DatasheetRiverbed Whitewater Datasheet
Riverbed Whitewater Datasheetlaurenfortune
 
Disaster Recovery for the Real-Time Data Warehouses
Disaster Recovery for the Real-Time Data WarehousesDisaster Recovery for the Real-Time Data Warehouses
Disaster Recovery for the Real-Time Data Warehousestervela
 
Architecting Cloud Solutions
Architecting Cloud SolutionsArchitecting Cloud Solutions
Architecting Cloud SolutionsAMD
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsJ. David Morris
 
Spring Data for JJUG for Cross Conference Fall
Spring Data for JJUG for Cross Conference Fall Spring Data for JJUG for Cross Conference Fall
Spring Data for JJUG for Cross Conference Fall Toshihiko Ikeda
 
Emulex and IDC Present Why I/O is Strategic for the Cloud
Emulex and IDC Present Why I/O is Strategic for the Cloud Emulex and IDC Present Why I/O is Strategic for the Cloud
Emulex and IDC Present Why I/O is Strategic for the Cloud Emulex Corporation
 
Application Grid: Platform for Virtualization and Consolidation of your Java ...
Application Grid: Platform for Virtualization and Consolidation of your Java ...Application Grid: Platform for Virtualization and Consolidation of your Java ...
Application Grid: Platform for Virtualization and Consolidation of your Java ...Bob Rhubart
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesCloudera, Inc.
 
Leveraging Swift Storage Policies using Scality RING
Leveraging Swift Storage Policies using Scality RINGLeveraging Swift Storage Policies using Scality RING
Leveraging Swift Storage Policies using Scality RINGNicolas Trangez
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012DATAVERSITY
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflowrjmurphyslideshare
 
Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...
Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...
Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...Joao Barreto Fernandes
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]オラクルエンジニア通信
 

What's hot (20)

21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
 
Research on big data
Research on big dataResearch on big data
Research on big data
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Innovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceInnovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle Coherence
 
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
 
Riverbed Whitewater Datasheet
Riverbed Whitewater DatasheetRiverbed Whitewater Datasheet
Riverbed Whitewater Datasheet
 
Disaster Recovery for the Real-Time Data Warehouses
Disaster Recovery for the Real-Time Data WarehousesDisaster Recovery for the Real-Time Data Warehouses
Disaster Recovery for the Real-Time Data Warehouses
 
Architecting Cloud Solutions
Architecting Cloud SolutionsArchitecting Cloud Solutions
Architecting Cloud Solutions
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
 
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics PrezoCetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
 
Spring Data for JJUG for Cross Conference Fall
Spring Data for JJUG for Cross Conference Fall Spring Data for JJUG for Cross Conference Fall
Spring Data for JJUG for Cross Conference Fall
 
Emulex and IDC Present Why I/O is Strategic for the Cloud
Emulex and IDC Present Why I/O is Strategic for the Cloud Emulex and IDC Present Why I/O is Strategic for the Cloud
Emulex and IDC Present Why I/O is Strategic for the Cloud
 
Application Grid: Platform for Virtualization and Consolidation of your Java ...
Application Grid: Platform for Virtualization and Consolidation of your Java ...Application Grid: Platform for Virtualization and Consolidation of your Java ...
Application Grid: Platform for Virtualization and Consolidation of your Java ...
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
 
Leveraging Swift Storage Policies using Scality RING
Leveraging Swift Storage Policies using Scality RINGLeveraging Swift Storage Policies using Scality RING
Leveraging Swift Storage Policies using Scality RING
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
 
Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...
Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...
Executive Breakfast SysValue-NetApp-VMWare - 16 de Março de 2012 - Apresentaç...
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
 

Viewers also liked

Dpi presentation wisc net 2013
Dpi presentation wisc net 2013Dpi presentation wisc net 2013
Dpi presentation wisc net 2013swebb1
 
Aothuat David Copperfield
Aothuat David CopperfieldAothuat David Copperfield
Aothuat David CopperfieldFuong Chic
 
Where Disciplinary Investigations Go Wrong
Where Disciplinary Investigations Go WrongWhere Disciplinary Investigations Go Wrong
Where Disciplinary Investigations Go WrongLeslie Cuthbert
 
The Fitzhugh Legacy - Chapter 2
The Fitzhugh Legacy - Chapter 2The Fitzhugh Legacy - Chapter 2
The Fitzhugh Legacy - Chapter 2fitzhughlegacy
 
The Fitzhugh Legacy - Chapter 1
The Fitzhugh Legacy - Chapter 1The Fitzhugh Legacy - Chapter 1
The Fitzhugh Legacy - Chapter 1fitzhughlegacy
 
Li Nc Bedrijfspresentatie Small
Li Nc Bedrijfspresentatie SmallLi Nc Bedrijfspresentatie Small
Li Nc Bedrijfspresentatie Smalljoostbakkers
 

Viewers also liked (7)

Dpi presentation wisc net 2013
Dpi presentation wisc net 2013Dpi presentation wisc net 2013
Dpi presentation wisc net 2013
 
Aothuat David Copperfield
Aothuat David CopperfieldAothuat David Copperfield
Aothuat David Copperfield
 
Where Disciplinary Investigations Go Wrong
Where Disciplinary Investigations Go WrongWhere Disciplinary Investigations Go Wrong
Where Disciplinary Investigations Go Wrong
 
Kfp Broschuere Gb
Kfp Broschuere GbKfp Broschuere Gb
Kfp Broschuere Gb
 
The Fitzhugh Legacy - Chapter 2
The Fitzhugh Legacy - Chapter 2The Fitzhugh Legacy - Chapter 2
The Fitzhugh Legacy - Chapter 2
 
The Fitzhugh Legacy - Chapter 1
The Fitzhugh Legacy - Chapter 1The Fitzhugh Legacy - Chapter 1
The Fitzhugh Legacy - Chapter 1
 
Li Nc Bedrijfspresentatie Small
Li Nc Bedrijfspresentatie SmallLi Nc Bedrijfspresentatie Small
Li Nc Bedrijfspresentatie Small
 

Similar to New Data Stack Workshop: Building a Scalable Cloud Datacenter

Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012
Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012
Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012CA Nimsoft
 
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data CenterCloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data Centervsarathy
 
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..Odinot Stanislas
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataRichard McDougall
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Datafbeckett1
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...CloudOps Summit
 
Cccc net app_wallacefung
Cccc net app_wallacefungCccc net app_wallacefung
Cccc net app_wallacefungCloud Congress
 
CCCC NetApp Wallace Fung
CCCC NetApp Wallace FungCCCC NetApp Wallace Fung
CCCC NetApp Wallace FungCloud Congress
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
g Eclipse @ Eclipse Summit Europe 2008
g Eclipse @ Eclipse Summit Europe 2008g Eclipse @ Eclipse Summit Europe 2008
g Eclipse @ Eclipse Summit Europe 2008guest462d7
 
Dc architecture for_cloud
Dc architecture for_cloudDc architecture for_cloud
Dc architecture for_cloudAlain Geenrits
 
Cisco open network environment
Cisco open network environmentCisco open network environment
Cisco open network environmentdeepers
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616Bruno Banha
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireCarter Shanklin
 

Similar to New Data Stack Workshop: Building a Scalable Cloud Datacenter (20)

Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012
Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012
Managing Your Cloud with Confidence - Mark Rivington, n•fluence 2012
 
Poster for ISGC
Poster for ISGCPoster for ISGC
Poster for ISGC
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data CenterCloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Data
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
 
Cccc net app_wallacefung
Cccc net app_wallacefungCccc net app_wallacefung
Cccc net app_wallacefung
 
CCCC NetApp Wallace Fung
CCCC NetApp Wallace FungCCCC NetApp Wallace Fung
CCCC NetApp Wallace Fung
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
g Eclipse @ Eclipse Summit Europe 2008
g Eclipse @ Eclipse Summit Europe 2008g Eclipse @ Eclipse Summit Europe 2008
g Eclipse @ Eclipse Summit Europe 2008
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Dc architecture for_cloud
Dc architecture for_cloudDc architecture for_cloud
Dc architecture for_cloud
 
Cisco open network environment
Cisco open network environmentCisco open network environment
Cisco open network environment
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616
 
The SDN Opportunity
The SDN OpportunityThe SDN Opportunity
The SDN Opportunity
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
SAP on AWS
SAP on AWSSAP on AWS
SAP on AWS
 

Recently uploaded

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 

Recently uploaded (20)

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 

New Data Stack Workshop: Building a Scalable Cloud Datacenter

  • 1. ® New Data Stack Workshop: Building a Scalable Cloud Datacenter Ping Li, Accel Partners ping@accel.com July 14, 2010 Stanford University 1
  • 2. ® Delivering Cloud Computing “Cloud Frame” Mainframe Monitoring—Security (RACF) Monitoring—Performance • Elasticity (Mainview) Provisioning & Configuration • Multi-app/user Management • User-provisioned Virtualization (z/VM) • Portability Resource Scheduler (z/VM & OS 370) Performance Acceleration & dedicated processors (OS 370) Backup and DR Tivoli Storage Manager, Parallel Sysplex Private/Public Clustering, failover, and mirroring (OS 370 & purpose built hw & microcode) • Cloud data centers will share infrastructure layers common to mainframes but redelivered for cloud capabilities • “New Data Stack” will form foundation for cloud computing Accel Partners Confidential 2
  • 3. ® Data Explosion Cloud New Data Stack Application Data Business Transaction Data Legacy Stack • 2,500 exabytes of new information in 2012 with Internet/web as primary driver • “Digital universe” grew by 62% last year to 800K petabytes and will grow to 1.2 zettabytes this year Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. . Accel Partners Confidential 3
  • 4. ® “New Data” Trends 61% CAGR Data is growing faster than  Data processing power – leading to  42% CAGR coping strategies like throwing  Transistors away data or frequent  archiving to tape Circa 1975 – Transaction Data Circa 2010 – Cloud Data 2,000 users = Huge 2,000 users = Tiny Smaller data sets (bytes) Extremely large data sets (petabytes) Highly structured, relatively small Unstructured, complex data blobs data records (images, voice, logs, video) – doesn’t fit nicely into rows/columns Absolute consistency is the Application responsiveness/scale primary requirement – ACID trumps immediate consistency transactions Source: Gartner. . Accel Partners Confidential 4
  • 5. ® New Data Stack Technologies Legacy Cloud Centralized/monolithic computing layer Distributed computing layer (virtual machines, Distributed computing layer (virtual machines, Centralized/monolithic computing layer Map Reduce, networked commodity servers) Map Reduce, networked commodity servers) Computer networking limited Computer networking limited High speed networking is pervasive High speed networking is pervasive Relational databases Relational databases Non-relational/”no sql” data stores Non-relational/”no sql” data stores FC SAN/NAS FC SAN/NAS Distributed file systems Distributed file systems Disks/Tape (memory scarce/expensive) Disks/Tape (memory scarce/expensive) Flash/SSD (high performance and abundant) Flash/SSD (high performance and abundant) Proprietary/closed vendors Proprietary/closed vendors Open platforms Open platforms Enterprise-scale Enterprise-scale Internet/cloud scale Internet/cloud scale Accel Partners Confidential 5
  • 6. ® Agenda 1:15 pm Northscale Sharon Barr, Vice President Engineering James Phillips, Founder, Chief Product Officer Dustin Sailings, Chief Architect Bob Wiederhold, President, CEO 2:15 pm Cloudera Amr Awadallah, CTO/Co-Founder Jeff Hammerbacher, Chief Scientist/Co-Founder 3:15 pm Facebook Bobby Johnson, Director, Software Engineering Mark Rabkin, Software Engineer 4:15 pm Fusion-io Robert Wipfel, Fellow 5:30 pm Cocktails! Accel Partners Confidential 6
  • 7. Elastic Data Management Software for web applications and cloud computing environments
  • 8. The opportunity. “ Relational database technology has served us well for 40 years, and will likely continue to do so for the foreseeable future to support transactions requiring ACID guarantees. But a large, and increasingly dominant, class of software systems and data do not need those guarantees. Much of the data manipulated by Web applications have less strict transactional requirements but, for lack of a practical alternative, many IT teams continue to use relational technology, needlessly tolerating its cost and scalability limitations. For these applications and data, distributed key-value cache and database technologies such as NorthScale provide a promising alternative. ” Carl Olofson Research Vice President Database Management Software Research IDC
  • 9. Modern interactive software architecture To support more users … … simply add more commodity web servers (or virtual machines) behind a load balancer … … but you must get a bigger, more complex database server. 3
  • 10. Application scales linearly, data hits a wall Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server 4
  • 11. What’s driving the curves? RDBMS NorthScale RDBMS NorthScale Schema committee 750 OPS 750 OPS Shard if needed Add new table(s) $7,500 3x $2,500 Re-normalize Create indices RDBMS NorthScale 15,000 OPS 15,000 OPS Update views Tune performance 750 OPS 15,000 OPS $125,000 10x $12,500 Insert and select. Set and get. 1. 2. 3. Transaction overhead. Expensive hardware. Complex administration. Same hardware, over an order of magnitude More costly to start with, and the cost differential RDBMS technology is extremely complex and difference in supportable user base. widens with growth. expensive to administer. 5
  • 12. Billions in data management savings available Alternative database technology needed Relational database technology ideal RDBMS ideal for intended purpose, will continue to be appropriate for debit-credit data – costly overkill for most new data Relational database technology was $18.8 billion market in 2007 (IDC) 6
  • 13. Big leap from relational database to alternatives Where do I start? What data should I move first? Which alternative database technology will “win”? This looks really complicated. 7
  • 14. NorthScale solution. “ I can’t tell you how many email requests I’ve received from our developers asking for something that is as simple and fast as memcached, but that promises data durability. Cassandra is just far too complex and heavyweight and we won’t be doing any more deployments. NorthScale is definitely on to something here. ” Director of Engineering Leading Social Network
  • 15. Before: Where you are today Relational database technology powers 99.999% of web applications. 9
  • 16. Step 1: Cache relational data in memcached NorthScale Memcached Servers Relational Database Memcached is simple, fast and infinitely scalable. It is easy to adopt, and delivers immediate cost, performance and scalability benefits. 10
  • 17. Step 2: Gradually migrate data to membase NorthScale Memcached Servers NorthScale Membase Servers Relational Database 11
  • 18. After: Elastic compute and data layers Data layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers Database Scales Out Just add more commodity data servers Scaling out flattens the cost and performance curves. 12
  • 19. An evolutionary path toward elastic data 13
  • 21. Membase is an elastic key-value database Application user Web application server Membase data servers In the data center On the administrator console 15
  • 22. Membase is Simple, Fast, Elastic Five minutes or less to a working cluster • Downloads for Linux and Windows • Start with a single node • One button press joins nodes to a cluster Easy to develop against • Just SET and GET – no schema required • Drop it in. 10,000+ existing applications already “speak membase” (via memcached) • Practically every language and application framework is supported, out of the box Easy to manage • One-click failover and cluster rebalancing • Graphical and programmatic interfaces • Configurable alerting 16
  • 23. Membase is Simple, Fast, Elastic Predictable • “Never keep an application waiting” • Quasi-deterministic latency and throughput Low latency • Auto-migration of hot data to lowest latency storage technology (RAM, SSD, Disk) • Selectable write behavior – asynchronous, synchronous (on replication, persistence) • Back-channel rebalancing [FUTURE] High throughput • Multi-threaded • Low lock contention • Asynchronous wherever possible • Automatic write de-duplication 17
  • 24. Membase is Simple, Fast, Elastic Scale out • Spread I/O and data across commodity servers (or VMs) • Consistent performance with linear cost • Dynamic rebalancing of a live cluster All nodes are created equal • No special case nodes • Clone to grow Extensible • Filtered TAP interface provides hook points for external systems (e.g. full-text search, backup, warehouse) • Data bucket – engine API for specialized container types • Membase NodeCode [FUTURE] 18
  • 25. vBucket mapping Key  vBucket vBucket  Servers (hash function) (table lookup) All possible vBucket‐Server Map ‐ Example membase keys vBuckets Host Server/Replica Servers vBuckets Host Server/Replica Servers Key1 vBucket1 ServerA / ServerB, ServerC Key2 vBucket1 Server1 / Server2, Server3 vBucket2 ServerA / ServerB, ServerC Key3 Key4 vBucket3 ServerB / ServerA, ServerC Key5 vBucket4 ServerB / ServerA, ServerC Key6 vBucket2 Server1 / Server2, Server3 vBucket5 ServerC / ServerA, ServerB Key7 Key8 vBucket6 ServerC / ServerA, ServerB Key9 Key10 vBucket3 Server2 / Server3, Server4 Keym vBucketn Serverp / Serverq, Serverr 19
  • 26. Deployment options Membase Server Membase Server Membase Server OTC Memcached Server Embedded proxy Standalone proxy “vBucket-aware” client cluster operations cluster operations cluster operations data operations data operations data operations data operations proxy vbucket proxy vbucket proxy vbucket map map map 11211 11210 11211 11210 11211 11210 11211 vbucket proxy map OTC server OTC server memcached memcached OTC NEW vbucket list list memcached localhost memcached client client map client client application application logic logic application application logic logic Deployment Option 1 Deployment Option 2 Deployment Option 3 20
  • 27. Membase “write” data flow – application view User action results in the need to change the VALUE of KEY 1 Application updates key’s VALUE, 2 performs SET operation 4 Membase (memcached) client hashes 3 KEY, identifies KEY’s master server SET request sent over network to master server 5 Membase replicates KEY-VALUE pair, caches it in memory and stores it to disk 21
  • 28. Membase data flow – under the hood SET request arrives at SET acknowledgement KEY’s master server 1 5 returned to application Listener‐Sender 2 2 Listener‐Sender Listener-Sender RAM* RAM* RAM* membase storage engine membase storage engine membase storage engine 3 SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD Disk Disk Disk 4 Disk Disk Disk Disk Disk Disk Disk Disk Disk Replica Server 1 for KEY Master server for KEY Replica Server 2 for KEY 22
  • 29. Membase Architecture 11211 11210 memcapable 1.0 memcapable 2.0 moxi REST management API/Web UI vBucket state and replication manager Global singleton supervisor Rebalance orchestrator Configuration manager memcached Node health monitor Process monitor protocol listener/sender Heartbeat Data Manager Cluster Manager engine interface membase storage engine http on each node one per cluster Erlang/OTP HTTP erlang port mapper distributed erlang 8080 4369 21100 – 21199
  • 30. Membase Architecture 11211 11210 memcapable 1.0 memcapable 2.0 moxi vBucket state and replication manager REST management API/Web UI Global singleton supervisor Rebalance orchestrator Configuration manager memcached Node health monitor Process monitor protocol listener/sender Heartbeat engine interface membase storage engine http on each node one per cluster Erlang/OTP HTTP erlang port mapper distributed erlang 8080 4369 21100 – 21199
  • 31. Data buckets are secure membase “slices” Application user Web application server Bucket 1 Bucket 2 Aggregate Cluster Memory and Disk Capacity Membase data servers In the data center On the administrator console 25
  • 32. NorthScale in production Leading cloud service (PAAS) Social game leader – FarmVille, provider Mafia Wars, Café World Over 65,000 hosted applications Over 230 million monthly users NorthScale Memcached Server NorthScale Membase Server serving over 1,200 Heroku is the 500,000 ops-per-second customers (as of June 10, 2010) database behind FarmVille and Café World 26
  • 33.
  • 35. Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist, Cloudera July 14, 2010 Wednesday, July 14, 2010
  • 36. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Wednesday, July 14, 2010
  • 37. Presentation Outline ▪ 1. Defining the Platform ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ 2. State of the Platform Ecosystem ▪ 3. Foundations for a New Implementation ▪ Hadoop ▪ Boiling the Frog ▪ 4. Future Developments ▪ Questions and Discussion Wednesday, July 14, 2010
  • 38. 1. Defining the Platform Wednesday, July 14, 2010
  • 39. BI is looking more like science (for profit) Wednesday, July 14, 2010
  • 40. Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Wednesday, July 14, 2010
  • 41. RDBMS only a small part of this tool set Wednesday, July 14, 2010
  • 42. Example: SQL Server 2008 R2 Wednesday, July 14, 2010
  • 44. ETL: SQL Server Integration Services RDBMS: SQL Server Wednesday, July 14, 2010
  • 45. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Wednesday, July 14, 2010
  • 46. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Wednesday, July 14, 2010
  • 47. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Wednesday, July 14, 2010
  • 48. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Wednesday, July 14, 2010
  • 49. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Wednesday, July 14, 2010
  • 50. MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Wednesday, July 14, 2010
  • 51. Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Wednesday, July 14, 2010
  • 52. What do we call this unified suite? Wednesday, July 14, 2010
  • 53. For today: Analytical Data Platform Wednesday, July 14, 2010
  • 54. For today: Analytical Data Platform LAMP Stack for Analytical Data Management Wednesday, July 14, 2010
  • 55. 2. The State of the Platform Ecosystem Wednesday, July 14, 2010
  • 56. Who makes up the platform ecosystem? Wednesday, July 14, 2010
  • 58. Infrastructure Providers Platform Providers Wednesday, July 14, 2010
  • 59. Infrastructure Providers Platform Providers Application Developers Wednesday, July 14, 2010
  • 60. Content Providers Infrastructure Providers Platform Providers Application Developers Wednesday, July 14, 2010
  • 61. Content Providers Infrastructure Providers Platform Providers Application Developers End Users Wednesday, July 14, 2010
  • 62. What is new about the ecosystem today? Wednesday, July 14, 2010
  • 63. Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Wednesday, July 14, 2010
  • 64. Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Wednesday, July 14, 2010
  • 65. Platform Providers 1. Open source 2. Driven by consumer web properties Wednesday, July 14, 2010
  • 66. Application Developers 1. Data Scientists 2. Diversity of languages Wednesday, July 14, 2010
  • 67. End Users 1. Browser is the client 2. Tell a story about the business Wednesday, July 14, 2010
  • 68. 3. Foundations for a New Implementation Wednesday, July 14, 2010
  • 69. New foundations: HDFS and MapReduce Wednesday, July 14, 2010
  • 70. 2005: Doug/Mike start project inside Nutch Wednesday, July 14, 2010
  • 71. 2006: Doug joins Yahoo! Wednesday, July 14, 2010
  • 72. 2007: Make Hadoop scale Wednesday, July 14, 2010
  • 73. 2007: Make Hadoop scale Yahoo! makes Pig open source Wednesday, July 14, 2010
  • 74. Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Wednesday, July 14, 2010
  • 75. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Wednesday, July 14, 2010
  • 76. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Wednesday, July 14, 2010
  • 77. 2008: Make Hadoop fast Wednesday, July 14, 2010
  • 78. 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Wednesday, July 14, 2010
  • 79. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Wednesday, July 14, 2010
  • 80. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Wednesday, July 14, 2010
  • 81. Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Wednesday, July 14, 2010
  • 82. “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Wednesday, July 14, 2010
  • 83. 2009: Insert Hadoop into the enterprise Wednesday, July 14, 2010
  • 84. 2009: Insert Hadoop into the enterprise Cloudera releases CDH Wednesday, July 14, 2010
  • 85. First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Wednesday, July 14, 2010
  • 86. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Wednesday, July 14, 2010
  • 87. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Wednesday, July 14, 2010
  • 88. “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Wednesday, July 14, 2010
  • 89. 2010: Integrate Hadoop into the enterprise Wednesday, July 14, 2010
  • 90. 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Wednesday, July 14, 2010
  • 91. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Wednesday, July 14, 2010
  • 92. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Wednesday, July 14, 2010
  • 93. Quest, Talend, Netezza, and more integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Wednesday, July 14, 2010
  • 94. Hive adds JDBC and ODBC Quest, Talend, Netezza, and more integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Wednesday, July 14, 2010
  • 95. Hadoop will be an Analytical Data Platform Wednesday, July 14, 2010
  • 97. Capture: Log collection and CEP Wednesday, July 14, 2010
  • 98. Curate: Workflow and Scheduling Wednesday, July 14, 2010
  • 99. Curate: Secondary and Full-Text Indexing Wednesday, July 14, 2010
  • 100. Curate: Learn Structure from Data Wednesday, July 14, 2010
  • 102. Analyze: Link working set and historical data Wednesday, July 14, 2010
  • 103. All behind a single user interface Wednesday, July 14, 2010
  • 104. HUE Making Many Computers Feel Like One Wednesday, July 14, 2010
  • 105. !"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42* 2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$- ! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$) "#$%&'()%&($*+&),%"#-"(-)./01, ! 63-.*313$()! 7*,2($&')-'"'%$/) &$823&$()+,-.,"$"#)9$&/3,"/) 0)($.$"($"+3$/ ! :.$")/,2&+$)! ;<<=)>.'+5$) *3+$"/$( ! ?$*3'@*$)! .'#+5$()43#5)13A$/) 1&,-)12#2&$)&$*$'/$/)#,) 3-.&,9$)/#'@3*3#B ! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"() '#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$) +,-.,"$"#/G Wednesday, July 14, 2010
  • 106. (c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Wednesday, July 14, 2010
  • 107. ioMemory for Scale-out Robert Wipfel, Fellow rwipfel@fusionio.com 14th July, 2010, Accel Partners Panel Discussion
  • 108. Factors impacting Scale-out Balance • CPU • Disk • Network Energy Contention • Servers • Sharing • RAM • Locking • Disks Management and Monitoring Graceful Recovery Throughput • No SPOFs • IOPS • Fast Replay • Bandwidth Latency • Distributed • Dependencies
  • 109. What’s *really* Needed… DRAM Disk Need Want Want Want •  Really fast •  Non-volatile •  Non-volatile Don’t Want •  Cheap •  Really fast •  Volatile •  Large capacity •  Large capacity •  Expensive Don’t Want •  Reasonable price •  Limited capacity •  Really slow •  Low energy
  • 110. Solution: ioMemory A disruption called ioMemory •  High speed like DRAM •  Persistence and capacity of disks PCIe based NAND Flash Storage •  Very high IOPS •  Micro-second latency •  Very high data throughput
  • 111. Why is it called ioMemory? SAN, NAS, RAIDed DAS ioMemory SSDs DRAM 50µs     5  orders  ooof   of   3  f   6  orders   rders   (10E-­‐6)     magnitude   magnitude   magnitude   L3 L2 L1 Nanosecond (10E-9) ACCESS DELAY IN TIME Millisecond (10E-3)
  • 112. ioMemory Performance Raw Storage Performance Application Performance H2benchw 3.6: IOMeter Database Benchmark I/O: Interface Bandwidth MB/s Average Throughput MB/s Fusion-io ioDrive Fusion-io ioDrive Maximum Write Maximum Write 24 GB, Flash, PCIe x4 24 GB, Flash, PCIe x4 Fusion-io ioDrive Fusion-io ioDrive Improved Write Improved Write 40 GB, Flash, PCIe x4 40 GB, Flash, PCIe x4 2x Faster Fusion-io ioDrive Maximum Capacity 80 GB, Flash, PCIe x4 50x Faster Fusion-io ioDrive Maximum Capacity 80 GB, Flash, PCIe x4 Storage I/O SSD SATA Vendor A 3.0Gbps 2.5 RAID 0 Application I/O SSD SATA Vendor A 3.0Gbps 2.5 RAID 0 128 GB, Flash SATA/300 128 GB, Flash SATA/300 SSD SATA Vendor B SSD SATA Vendor C 3.0Gbps 2.5 RAID 0 64 GB, Flash SATA/300 32 GB, Flash SATA/300 SSD SATA Vendor C SSD SATA Vendor B 32 GB, Flash SATA/300 3.0Gbps 2.5 RAID 0 64 GB, Flash SATA/300 7/14/10
  • 113. ioMemory Reliability Strong ECC Wear leveling Bad block re-mapping Data labeling Parity- protected pipelines Power cut protection PCI bus Flashback protection Chip protection Checksums MTBF = 2 Million Hours + Poison bit
  • 114. ioMemory is not a Solid State Disk SSD Application CPU RAID Controller SSD 3a 3 4 4a 51 2 9 6 8 5 3b SSD 4b ioMemory Application CPU ioMemory 1 2
  • 115. ioMemory is Green 133,493 kWh/yr K I L O W A T T S 3,013 kWh/yr 97 kWh/yr ioDrive SSD 15,000 RPM Fusion-io ZeusIOPS FC HDD
  • 116. Case Study One of the world’s fastest growing Webmonsters •  Over 900% more database queries per second •  Dramatically improved server replication for most current data •  Over 800% improvement to disaster recovery back-up time •  Cut server footprint, power costs, and IT overhead by 75% •  Full and immediate ROI on repurposed servers with •  Continued ROI on operational cost saving
  • 118. Case Study Internet security company that protects over 1 billion inboxes •  5x improvement to •  Database replication performance •  Data intensive query response •  Analysis routines •  Eliminating 210 failure points from system •  Implemented full system redundancy •  Dramatically lowered power and cooling expenses
  • 120. Disruption By deploying ioMemory… Cloudmark eliminated the need for this…
  • 121. Other Customer Examples Does a 30 to 1 box reduction for their reliable messaging system HMO achieves a 200 HDD to 1 ioDrive reduction for their Data Warehouse Department of Defense takes NASTRAN from 3-days to 6-hours Stock exchange doubles the performance of their trading systems Shows a 35x performance increase of unstructured search at OracleWorld Demos Dynamics NAV can get a 4x performance improvement
  • 122.
  • 123.
  • 124. ioMemory Products 80 GB •  119,790 (4k read packet size) •  89,549 (75/25 r/w mix 4k packet size) 160 GB 320 GB •  116,046 (4k read packet size) •  185,022 (4k read packet size) •  93,199 (75/25 r/w mix 4k packet size) •  129,699 (75/25 r/w mix 4k packet size) 320 GB 640 GB •  71,256 (4k read packet size) •  122,601 (4k read packet size) •  67,659 (75/25 r/w mix 4k packet size) •  121,008 (75/25 r/w mix 4k packet size)
  • 125. OEM Partners Confiden8al  Informa8on:  Fusion-­‐io   19
  • 126. Questions? 20