Mike Miller, CoFounder, Chief Scientist




                                          @mlmilleratmit
                                          mike@cloudant.com   1
My Background

Cloudant CoFounder, Chief Scientist


Assistant Professor, Particle Physics
(U. Washington, Affiliate)

Background: machine learning,
analysis, big data, globally distributed
systems




                             Cloudant, 9-26-2012   2
The face of big data




   http://abstract.cs.washington.edu/~shwetak/
                   Cloudant, 9-26-2012           3
The face of big data




          Cloudant, 9-26-2012   4
The face of big data




 “The future is stranger and sooner than you think”
              Reid Hoffman, LinkedIn/Greylock
                      Cloudant, 9-26-2012             5
Perfect Storm



                      Parallel
                     Processing

 Big Data
                                                        HTML5/JS




            Mobile                         9M Trained
                                           Developers
                     Cloudant, 9-26-2012                           6
Focus on your Application
   not data operations




        Cloudant, 9-26-2012   7
If your data is stuck in the warehouse...
            ... you’re losing




                Cloudant, 9-26-2012         8
Data Layer for the Web
Founded (2009) by leading MIT data
scientists

Funded by Y Combinator & Avalon

Global network of 20+ data centers
-- Application Data Network (ADN)

Built on leading NoSQL standard:
most durable data store on planet

10,000 users and growing.

         Cloudant: Akamai of dynamic content
                            Cloudant, 9-26-2012   9
Cloudant Product Line
•   Application State
    Hyper-Scalable Document Store (JSON+HTTP)
    MVCC
    Secondary indexes for flexible query

•   Application Data Security
    Accounts/API keys, data sharing, permission roles

•   Application Analytics
    Fully Integrated (Incremental) MapReduce engine

•   Application Search
    Fully Integrated (Incremental) Lucene + Geospatial
                                                          API Compatible
•   Application Object Storage
    images, audio, video...

•   Application State Distribution
    cloud <==> tablet <==> PC <==> mobile
                                    Cloudant, 9-26-2012                    10
Cloudant Install

  You do this:




  We give you:


                 That’s It

                 Cloudant, 9-26-2012   11
API Examples




     Write a doc...from the browser
      No client install necessary



               Cloudant, 9-26-2012    12
API Examples
                                      Create Secondary Indexes




Query Those indexes




                      Cloudant, 9-26-2012                        13
http://examples.cloudant.com/lobby-search/_design/lookup/index.html
                          Cloudant, 9-26-2012                         14
Global Data Network




 Cloudant scales within & between data centers
 Availability, low-latency


                             Cloudant, 9-26-2012   15
Anatomy of the Data Layer

                    PUT {document}                                          Secondary Data Centers
                                                                            (for DR & distributed access)

   US-EAST                                “Node”
                                                                            AP-JP
                                                   Filtered Replication &
                                                            Sync              EU-NL

    Single-tenant
       cluster
                         Multi-tenant
                           cluster
                                                                                   Disconnected
        Horizontally Scalable DB
                                                                                   Devices
        •   Fault tolerant
        •   Always consistent                                                     Edge Database
        •   Schemaless (NoSQL)                                                    Cluster
        •   Automatic sharding
        •   Distributed, parallel analytics
        •   Incremental, chainable
            MapReduce
        •   Full-text search                                         Single-Tenant or Multi-Tenant



                                                                                                            16
https://cloudant.com/blog/cloudant-labs-on-google-spanner/

                        Cloudant, 9-26-2012                  17
Why It Matters



     Cloudant, 9-26-2012   18
>1. Visualization Wins




   http://sosolimited.com/blog/2012/07/from-tweets-to-lightshow/
                          Cloudant, 9-26-2012                      19
>2. Prepare For Success




 Three #1 apps, from 6 to 90 servers in weeks
                   Cloudant, 9-26-2012          20
>3. Scale Invariance




           Cloudant, 9-26-2012   21
>3. Scale Invariance


  mobile/tablet




                  desktop




 Goal: Megabytes to Petabytes           Cloud
                  Cloudant, 9-26-2012           22
>3. Scale Invariance




             ‘Carry Small, Live Large’
single user experience at vastly different scales

                    Cloudant, 9-26-2012            23
>4. No Preferred Frame




 So why do you have a global ‘write master’?
                  Cloudant, 9-26-2012          24
>4. No Preferred Frame
This simple document...




...establishes Continuous Pipe from Europe to US


                    Cloudant, 9-26-2012            25
>4. No Preferred Frame

And you can do the reverse...




                                        ...at the same time


                  Cloudant, 9-26-2012                         26
>4. No Preferred Frame




         Write local, live global
What could you do with relaxed constraints?
                  Cloudant, 9-26-2012         27
>4. No Preferred Frame
                                         Data Import
               18                                                                                           18
   Size [GB]




                                                                                      Doc Count [Million]
                                                           Actual Customer Data
                                                           France to Amsterdam
               16       Data Size [GB]                                                                      16

                        Disk Size [GB]
               14                                                                                           14
                        Documents [M]
               12                                                                                           12


               10                                                                                           10


                8                                                                                            8


                6                                                                                            6


                4                                                                                            4


                2                                                                                            2


                0                                                                                            0
                 0     2000     4000      6000      8000       10000   12000      14000
                                                                           Time [sec]



                     One click (continuous) Import
                                         Cloudant, 9-26-2012                                                     28
Big and Getting Bigger




          Cloudant, 9-26-2012   29
Big and Getting Bigger
• And of course, we are hiring
 Languages
 erlang, scala, c, javascript, python, clojure, html5, iOS, Android, ruby/chef

 Sample problems in the Seattle office

 Create file format optimized for (huge) structured time-series data
 Integrate Cubism into two-tier application stack
 Profile creation of 100M databases (real customer)
 PIG / HIVE integration
 Prototype read-in-place Hadoop connector



                                 Cloudant, 9-26-2012                             30

Scalability 09262012

  • 1.
    Mike Miller, CoFounder,Chief Scientist @mlmilleratmit mike@cloudant.com 1
  • 2.
    My Background Cloudant CoFounder,Chief Scientist Assistant Professor, Particle Physics (U. Washington, Affiliate) Background: machine learning, analysis, big data, globally distributed systems Cloudant, 9-26-2012 2
  • 3.
    The face ofbig data http://abstract.cs.washington.edu/~shwetak/ Cloudant, 9-26-2012 3
  • 4.
    The face ofbig data Cloudant, 9-26-2012 4
  • 5.
    The face ofbig data “The future is stranger and sooner than you think” Reid Hoffman, LinkedIn/Greylock Cloudant, 9-26-2012 5
  • 6.
    Perfect Storm Parallel Processing Big Data HTML5/JS Mobile 9M Trained Developers Cloudant, 9-26-2012 6
  • 7.
    Focus on yourApplication not data operations Cloudant, 9-26-2012 7
  • 8.
    If your datais stuck in the warehouse... ... you’re losing Cloudant, 9-26-2012 8
  • 9.
    Data Layer forthe Web Founded (2009) by leading MIT data scientists Funded by Y Combinator & Avalon Global network of 20+ data centers -- Application Data Network (ADN) Built on leading NoSQL standard: most durable data store on planet 10,000 users and growing. Cloudant: Akamai of dynamic content Cloudant, 9-26-2012 9
  • 10.
    Cloudant Product Line • Application State Hyper-Scalable Document Store (JSON+HTTP) MVCC Secondary indexes for flexible query • Application Data Security Accounts/API keys, data sharing, permission roles • Application Analytics Fully Integrated (Incremental) MapReduce engine • Application Search Fully Integrated (Incremental) Lucene + Geospatial API Compatible • Application Object Storage images, audio, video... • Application State Distribution cloud <==> tablet <==> PC <==> mobile Cloudant, 9-26-2012 10
  • 11.
    Cloudant Install You do this: We give you: That’s It Cloudant, 9-26-2012 11
  • 12.
    API Examples Write a doc...from the browser No client install necessary Cloudant, 9-26-2012 12
  • 13.
    API Examples Create Secondary Indexes Query Those indexes Cloudant, 9-26-2012 13
  • 14.
  • 15.
    Global Data Network Cloudant scales within & between data centers Availability, low-latency Cloudant, 9-26-2012 15
  • 16.
    Anatomy of theData Layer PUT {document} Secondary Data Centers (for DR & distributed access) US-EAST “Node” AP-JP Filtered Replication & Sync EU-NL Single-tenant cluster Multi-tenant cluster Disconnected Horizontally Scalable DB Devices • Fault tolerant • Always consistent Edge Database • Schemaless (NoSQL) Cluster • Automatic sharding • Distributed, parallel analytics • Incremental, chainable MapReduce • Full-text search Single-Tenant or Multi-Tenant 16
  • 17.
  • 18.
    Why It Matters Cloudant, 9-26-2012 18
  • 19.
    >1. Visualization Wins http://sosolimited.com/blog/2012/07/from-tweets-to-lightshow/ Cloudant, 9-26-2012 19
  • 20.
    >2. Prepare ForSuccess Three #1 apps, from 6 to 90 servers in weeks Cloudant, 9-26-2012 20
  • 21.
    >3. Scale Invariance Cloudant, 9-26-2012 21
  • 22.
    >3. Scale Invariance mobile/tablet desktop Goal: Megabytes to Petabytes Cloud Cloudant, 9-26-2012 22
  • 23.
    >3. Scale Invariance ‘Carry Small, Live Large’ single user experience at vastly different scales Cloudant, 9-26-2012 23
  • 24.
    >4. No PreferredFrame So why do you have a global ‘write master’? Cloudant, 9-26-2012 24
  • 25.
    >4. No PreferredFrame This simple document... ...establishes Continuous Pipe from Europe to US Cloudant, 9-26-2012 25
  • 26.
    >4. No PreferredFrame And you can do the reverse... ...at the same time Cloudant, 9-26-2012 26
  • 27.
    >4. No PreferredFrame Write local, live global What could you do with relaxed constraints? Cloudant, 9-26-2012 27
  • 28.
    >4. No PreferredFrame Data Import 18 18 Size [GB] Doc Count [Million] Actual Customer Data France to Amsterdam 16 Data Size [GB] 16 Disk Size [GB] 14 14 Documents [M] 12 12 10 10 8 8 6 6 4 4 2 2 0 0 0 2000 4000 6000 8000 10000 12000 14000 Time [sec] One click (continuous) Import Cloudant, 9-26-2012 28
  • 29.
    Big and GettingBigger Cloudant, 9-26-2012 29
  • 30.
    Big and GettingBigger • And of course, we are hiring Languages erlang, scala, c, javascript, python, clojure, html5, iOS, Android, ruby/chef Sample problems in the Seattle office Create file format optimized for (huge) structured time-series data Integrate Cubism into two-tier application stack Profile creation of 100M databases (real customer) PIG / HIVE integration Prototype read-in-place Hadoop connector Cloudant, 9-26-2012 30