Big Data on Open Cloud
Analytical Compute Grid (ACG)
Elastic “Big Data” Infrastructure
                                    by Natasha Gajic

March 1, 2013
Rackspace’s EBI Environment

Current Environment        “Big Data” Problem
  Windows    and Linux       Cost  of purchasing
   operating systems           additional licenses
  Oracle and Microsoft       Time required to set up
   databases solutions         new hardware
  Microsoft and Oracle       Increased demand for DBA
   replication technology      resources
  SSIS                       System performance

  Informatica                System scalability

  Dedicated servers          Capacity

  Rapid data set growth


                                        RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                     2
Analytical Compute Grid (ACG) Features

• Host ever growing set of data
• Quick data collection and retrieval
• Rapid scalability
• Ease of maintenance
• Provide standard data access API




                                        RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                     3
Analytical Compute Grid (ACG) Features
• Ability to provide variety of storage types:
 • Columnar
 • Relational
 • HDFS
• Enable users to select optimal storage
  type for information collected
• Leverage Rackspace® Private Cloud
  powered by OpenStack® and open
  source technology

                                          RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                       4
Analytical Compute Grid (ACG) Quality Attributes




                                    RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                 5
ACG on Rackspace® Private
Cloud powered by OpenStack®
     High Level Architecture




                          RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                       6
ACG on Rackspace® Private Cloud powered by OpenStack®




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         7
ACG on Rackspace® Private Cloud powered by OpenStack®
Image




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         8
ACG on Rackspace® Private Cloud powered by OpenStack®
Database Engine Selection




  Columnar                   Cassandra
  Relational                 PostgreSQL
  HDFS                       Hadoop



                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         9
ACG on Rackspace® Private Cloud powered by OpenStack®
Node




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         10
ACG on Rackspace® Private Cloud powered by OpenStack®
Node




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         11
ACG on Rackspace® Private Cloud powered by OpenStack®
Node




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         12
ACG on Rackspace® Private Cloud powered by OpenStack®
Node




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         13
ACG on Rackspace® Private Cloud powered by OpenStack®
Controller




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         14
ACG on Rackspace® Private Cloud powered by OpenStack®
Controller




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         15
ACG on Rackspace® Private Cloud powered by OpenStack®
Controller




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         16
ACG on Rackspace® Private Cloud powered by OpenStack®
API




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         17
ACG on Rackspace® Private
Cloud powered by OpenStack®
      Indexing Structure




                           RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                        18
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         19
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
                                     What is ACG Indexing
                                     Structure?
                                     • System entry point

                                     • Set of pointers ultimately
                                     addressing database
                                     entities




                                             RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                          20
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
                                     What is ACG Indexing
                                     Structure?

                                     • System entry point
                                     • Set of pointers ultimately
                                     addressing database
                                     entities


                                     Where is Indexing Structure
                                     Located?
                                     • It is a part of ACG so it
                                     resides on Open Cloud
                                     • ACG Controller manages
                                     Indexing Structure


                                              RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                           21
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
                                     What ACG Indexing Structure
                                     Enables?
                                     • Splitting of large data sets
                                     across many instances
                                     • Query parallelization
                                     • Controlled data store size
                                     • Optimal data store
                                     configuration
                                     • Uniform access to data
                                     residing in various storage
                                     types
                                     • System scalability as it
                                     expands horizontally and
                                     vertically to address ever
                                     growing data set


                                              RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                           22
ACG on Rackspace® Private
Cloud powered by OpenStack®
       Quality Attributes




                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                         23
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance

Rackspace® Private Cloud
powered by OpenStack®
Creates ACG node in 30 seconds
Creates ACG nodes concurrently
Re-size ACG nodes adding CPUs




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         24
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance

Rackspace® Private Cloud
powered by OpenStack®
Creates ACG node in 30 seconds
Creates ACG nodes concurrently
Re-size ACG nodes adding CPUs



              ACG
Indexing structure and controlled
data set size allow for:
 Quick data distribution
 Query parallelization




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         25
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability

Rackspace® Private Cloud
powered by OpenStack®
Rapidly replace failed ACG nodes




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         26
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability

Rackspace® Private Cloud
powered by OpenStack®
Rapidly replace failed ACG nodes



               ACG
Deploys data store native
availability mechanisms
(replication, data distribution…)




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         27
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability

Rackspace® Private Cloud
powered by OpenStack®
Adding ACG nodes expands:
  Storage capacity
  CPU power
  Memory
No DBA or system administrators
activity required




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         28
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability

Rackspace® Private Cloud
powered by OpenStack®
Adding ACG nodes expands:
  Storage capacity
  CPU power
  RAM
No DBA or system administrators
activity required
               ACG
Controlled data set size enables:
 Optimal and stable data store
configuration
  Reducing demand for managing
data store objects
  Stable query execution plans


                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         29
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility

                 ACG
Variety of storage types:
Columnar – Cassandra : time series
data
Relational – PostgreSQL : relational data
HDFS – Hadoop : un-structured data

Ability to select optimal storage type
for individual use case




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         30
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability

                ACG
Standard interfaces:
   SQL language
   JDBC API
   ODBC

ACG Management Console

ACG Monitoring Console

Loader utility implementing:
   Bulk Loader
   Insert Loader




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         31
ACG on Rackspace® Private
Cloud powered by OpenStack®
        Current State




                        RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                     32
ACG on Rackspace® Private Cloud powered by OpenStack®
Current State

                    Columnar         Relational                 HDFS
 ACG Controller
                  Implementation   Implementation           Implementation
• ACG Manager     • Data Store     • Data Store            • Will start soon
• Rule Engine       Controller       Controller
• Node            • JDBC           • JDBC driver
  Manager           extended to      extended with
• ACG               work with        distributed
  Management        supercolumn      query rewrite
  Console         • Loader         • Loader
• ACG               integrated       integrated
  Monitoring        with             with
                    Informatica      Informatica
                                   • ODBC (In
                                     Progress)



                                                     RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                                  33
ACG on Rackspace® Private
Cloud powered by OpenStack®
     Rackspace Use Case




                          RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                       34
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Subject:
 • Complex availability calculation sourcing 3
   months of monitoring data and creating 1 billion
   records in initial calculation




                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         35
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Environment 1
 • Data Warehouse Microsoft SQL server database
 • SSIS data loading
 • SQL server with 24 CPUs and 250GB RAM was
   dedicated to the initial calculation
 • SQL server stored procedure performed the
   calculation
 • Source and result are stored in traditional data
   warehouse structure

                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         36
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Environment 2
 • ACG running two Cassandra clusters 4 nodes
   each
 • Informatica with Cassandra bulk loader
 • Each ACG node has 2CPUs and 8GB RAM
 • Java program running on instance with 4CPUs
   and 8GB RAM
 • Source and result are stored in columnar
   structure suitable for time series data

                                            RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                         37
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Result
• Calculation Duration
  •Microsoft SQL Server lasted 5 days
  •ACG calculation completed in 3.5 hours
• Storage Size
   • Microsoft SQL server 500GB
   •ACG 20 GB
• Complexity of the calculation
   •Columnar data store is optimal for time series data.
    Sourcing from columnar data store resulted in relatively
    simple Java calculation process comparing to SQL
    server stored procedure
                                              RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                           38
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
 • Selecting optimal data store for use case resulted in:
  • Substantial performance improvement
  • Reduced storage demand
  •Simplified processes
  •Ability to process terabytes of data per day close to
   real-time and on-demand
  •Improved trending and reporting:
   • enhances support capabilities
   • improved Rackspace customer experience
  • Significant cost reduction
                                              RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                                                           39
RACKSPACE® HOSTING                 |    5000 WALZEM ROAD              |    SAN ANTONIO, TX 78218
                                                US SALES: 1-800-961-2888              |    US SUPPORT: 1-800-961-4454               |    WWW.RACKSPACE.COM


RACKSPACE® HOSTING   |   © RACKSPACE US, INC.   |   RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES.   |   WWW.RACKSPACE.COM
                                                                                                                                                                                                              40
ACG UI




         RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                      41
ACG UI




         RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                      42
ACG UI




         RACKSPACE® HOSTING   |   WWW.RACKSPACE.COM
                                                      43

Rackspace Analytical Compute Grid (ACG)

  • 1.
    Big Data onOpen Cloud Analytical Compute Grid (ACG) Elastic “Big Data” Infrastructure by Natasha Gajic March 1, 2013
  • 2.
    Rackspace’s EBI Environment CurrentEnvironment “Big Data” Problem  Windows and Linux  Cost of purchasing operating systems additional licenses  Oracle and Microsoft  Time required to set up databases solutions new hardware  Microsoft and Oracle  Increased demand for DBA replication technology resources  SSIS  System performance  Informatica  System scalability  Dedicated servers  Capacity  Rapid data set growth RACKSPACE® HOSTING | WWW.RACKSPACE.COM 2
  • 3.
    Analytical Compute Grid(ACG) Features • Host ever growing set of data • Quick data collection and retrieval • Rapid scalability • Ease of maintenance • Provide standard data access API RACKSPACE® HOSTING | WWW.RACKSPACE.COM 3
  • 4.
    Analytical Compute Grid(ACG) Features • Ability to provide variety of storage types: • Columnar • Relational • HDFS • Enable users to select optimal storage type for information collected • Leverage Rackspace® Private Cloud powered by OpenStack® and open source technology RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
  • 5.
    Analytical Compute Grid(ACG) Quality Attributes RACKSPACE® HOSTING | WWW.RACKSPACE.COM 5
  • 6.
    ACG on Rackspace®Private Cloud powered by OpenStack® High Level Architecture RACKSPACE® HOSTING | WWW.RACKSPACE.COM 6
  • 7.
    ACG on Rackspace®Private Cloud powered by OpenStack® RACKSPACE® HOSTING | WWW.RACKSPACE.COM 7
  • 8.
    ACG on Rackspace®Private Cloud powered by OpenStack® Image RACKSPACE® HOSTING | WWW.RACKSPACE.COM 8
  • 9.
    ACG on Rackspace®Private Cloud powered by OpenStack® Database Engine Selection Columnar Cassandra Relational PostgreSQL HDFS Hadoop RACKSPACE® HOSTING | WWW.RACKSPACE.COM 9
  • 10.
    ACG on Rackspace®Private Cloud powered by OpenStack® Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 10
  • 11.
    ACG on Rackspace®Private Cloud powered by OpenStack® Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 11
  • 12.
    ACG on Rackspace®Private Cloud powered by OpenStack® Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 12
  • 13.
    ACG on Rackspace®Private Cloud powered by OpenStack® Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 13
  • 14.
    ACG on Rackspace®Private Cloud powered by OpenStack® Controller RACKSPACE® HOSTING | WWW.RACKSPACE.COM 14
  • 15.
    ACG on Rackspace®Private Cloud powered by OpenStack® Controller RACKSPACE® HOSTING | WWW.RACKSPACE.COM 15
  • 16.
    ACG on Rackspace®Private Cloud powered by OpenStack® Controller RACKSPACE® HOSTING | WWW.RACKSPACE.COM 16
  • 17.
    ACG on Rackspace®Private Cloud powered by OpenStack® API RACKSPACE® HOSTING | WWW.RACKSPACE.COM 17
  • 18.
    ACG on Rackspace®Private Cloud powered by OpenStack® Indexing Structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 18
  • 19.
    ACG on Rackspace®Private Cloud powered by OpenStack® Indexing Structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 19
  • 20.
    ACG on Rackspace®Private Cloud powered by OpenStack® Indexing Structure What is ACG Indexing Structure? • System entry point • Set of pointers ultimately addressing database entities RACKSPACE® HOSTING | WWW.RACKSPACE.COM 20
  • 21.
    ACG on Rackspace®Private Cloud powered by OpenStack® Indexing Structure What is ACG Indexing Structure? • System entry point • Set of pointers ultimately addressing database entities Where is Indexing Structure Located? • It is a part of ACG so it resides on Open Cloud • ACG Controller manages Indexing Structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 21
  • 22.
    ACG on Rackspace®Private Cloud powered by OpenStack® Indexing Structure What ACG Indexing Structure Enables? • Splitting of large data sets across many instances • Query parallelization • Controlled data store size • Optimal data store configuration • Uniform access to data residing in various storage types • System scalability as it expands horizontally and vertically to address ever growing data set RACKSPACE® HOSTING | WWW.RACKSPACE.COM 22
  • 23.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes RACKSPACE® HOSTING | WWW.RACKSPACE.COM 23
  • 24.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes - Performance Rackspace® Private Cloud powered by OpenStack® Creates ACG node in 30 seconds Creates ACG nodes concurrently Re-size ACG nodes adding CPUs RACKSPACE® HOSTING | WWW.RACKSPACE.COM 24
  • 25.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes - Performance Rackspace® Private Cloud powered by OpenStack® Creates ACG node in 30 seconds Creates ACG nodes concurrently Re-size ACG nodes adding CPUs ACG Indexing structure and controlled data set size allow for: Quick data distribution Query parallelization RACKSPACE® HOSTING | WWW.RACKSPACE.COM 25
  • 26.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes – Availability Rackspace® Private Cloud powered by OpenStack® Rapidly replace failed ACG nodes RACKSPACE® HOSTING | WWW.RACKSPACE.COM 26
  • 27.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes – Availability Rackspace® Private Cloud powered by OpenStack® Rapidly replace failed ACG nodes ACG Deploys data store native availability mechanisms (replication, data distribution…) RACKSPACE® HOSTING | WWW.RACKSPACE.COM 27
  • 28.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes – Maintainability Rackspace® Private Cloud powered by OpenStack® Adding ACG nodes expands: Storage capacity CPU power Memory No DBA or system administrators activity required RACKSPACE® HOSTING | WWW.RACKSPACE.COM 28
  • 29.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes – Maintainability Rackspace® Private Cloud powered by OpenStack® Adding ACG nodes expands: Storage capacity CPU power RAM No DBA or system administrators activity required ACG Controlled data set size enables: Optimal and stable data store configuration Reducing demand for managing data store objects Stable query execution plans RACKSPACE® HOSTING | WWW.RACKSPACE.COM 29
  • 30.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes – Flexibility ACG Variety of storage types: Columnar – Cassandra : time series data Relational – PostgreSQL : relational data HDFS – Hadoop : un-structured data Ability to select optimal storage type for individual use case RACKSPACE® HOSTING | WWW.RACKSPACE.COM 30
  • 31.
    ACG on Rackspace®Private Cloud powered by OpenStack® Quality Attributes – Usability ACG Standard interfaces: SQL language JDBC API ODBC ACG Management Console ACG Monitoring Console Loader utility implementing: Bulk Loader Insert Loader RACKSPACE® HOSTING | WWW.RACKSPACE.COM 31
  • 32.
    ACG on Rackspace®Private Cloud powered by OpenStack® Current State RACKSPACE® HOSTING | WWW.RACKSPACE.COM 32
  • 33.
    ACG on Rackspace®Private Cloud powered by OpenStack® Current State Columnar Relational HDFS ACG Controller Implementation Implementation Implementation • ACG Manager • Data Store • Data Store • Will start soon • Rule Engine Controller Controller • Node • JDBC • JDBC driver Manager extended to extended with • ACG work with distributed Management supercolumn query rewrite Console • Loader • Loader • ACG integrated integrated Monitoring with with Informatica Informatica • ODBC (In Progress) RACKSPACE® HOSTING | WWW.RACKSPACE.COM 33
  • 34.
    ACG on Rackspace®Private Cloud powered by OpenStack® Rackspace Use Case RACKSPACE® HOSTING | WWW.RACKSPACE.COM 34
  • 35.
    ACG on Rackspace®Private Cloud powered by OpenStack® Rackspace Use Case • Subject: • Complex availability calculation sourcing 3 months of monitoring data and creating 1 billion records in initial calculation RACKSPACE® HOSTING | WWW.RACKSPACE.COM 35
  • 36.
    ACG on Rackspace®Private Cloud powered by OpenStack® Rackspace Use Case • Environment 1 • Data Warehouse Microsoft SQL server database • SSIS data loading • SQL server with 24 CPUs and 250GB RAM was dedicated to the initial calculation • SQL server stored procedure performed the calculation • Source and result are stored in traditional data warehouse structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 36
  • 37.
    ACG on Rackspace®Private Cloud powered by OpenStack® Rackspace Use Case • Environment 2 • ACG running two Cassandra clusters 4 nodes each • Informatica with Cassandra bulk loader • Each ACG node has 2CPUs and 8GB RAM • Java program running on instance with 4CPUs and 8GB RAM • Source and result are stored in columnar structure suitable for time series data RACKSPACE® HOSTING | WWW.RACKSPACE.COM 37
  • 38.
    ACG on Rackspace®Private Cloud powered by OpenStack® Rackspace Use Case - Result • Calculation Duration •Microsoft SQL Server lasted 5 days •ACG calculation completed in 3.5 hours • Storage Size • Microsoft SQL server 500GB •ACG 20 GB • Complexity of the calculation •Columnar data store is optimal for time series data. Sourcing from columnar data store resulted in relatively simple Java calculation process comparing to SQL server stored procedure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 38
  • 39.
    ACG on Rackspace®Private Cloud powered by OpenStack® Rackspace Use Case - Conclusion • Selecting optimal data store for use case resulted in: • Substantial performance improvement • Reduced storage demand •Simplified processes •Ability to process terabytes of data per day close to real-time and on-demand •Improved trending and reporting: • enhances support capabilities • improved Rackspace customer experience • Significant cost reduction RACKSPACE® HOSTING | WWW.RACKSPACE.COM 39
  • 40.
    RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM 40
  • 41.
    ACG UI RACKSPACE® HOSTING | WWW.RACKSPACE.COM 41
  • 42.
    ACG UI RACKSPACE® HOSTING | WWW.RACKSPACE.COM 42
  • 43.
    ACG UI RACKSPACE® HOSTING | WWW.RACKSPACE.COM 43