Disaster Recovery
For the Real-Time Data Warehouse:
Replicating and Parallelizing Big Data
What you will learn: 4 strategies

1.   Separate operational warehouses from reporting systems
2.   Use changed data capture and Big Data replication
3.   Implement parallel, active-active data warehouses
4.   Maintain a “golden event” warehouse in Hadoop




                         Confidential & Proprietary           2
Analytics Have a Measurable Effect

•   For the median Fortune 1000 Company, a
    10% increase in data usability corresponds to
    $2.01B in annual revenue gains
                                      Big Data, Big Opportunity – University of Texas at Austin, Sept 2011


•   A “real-time infrastructure” ranks
    #3 on the CIO’s list of strategies
                                                                       A “real-time infrastructure” – Gartner


•   Organizations adept at analytics see
    1.6x the revenue growth
    2.0x the profit growth, and
    2.5x the stock price appreciation
    of their peers          – “Outperforming in a Data-Rich and Hyper-Connected World.”
                                                  IBM Center for Applied Insights and Economic Intelligence
                                 Confidential & Proprietary                                                3
Data Warehousing: Now Part of Operations




                                       real-time pricing
                                 real-time marketing
                                        fraud detection
                           inventory management
                                     customer service


                Confidential & Proprietary                 4
Analytics in Business Operations:
Constant, Up-to-Minute Access to Big Data
ADVERTISING                           CAPITAL MARKETS




Click-stream       Mobile ads         Market Data           Securities Trading

UTILITIES                             TRANSPORTATION




Energy usage       Power production   Traffic & Logistics   Fleet Deployment

INFORMATION TECHNOLOGY                TELECOMMUNICATIONS




Network Activity   IT Root-Cause      Call Activity         Capacity Allocation

                                                                                  5
Expectations have changed




               Confidential & Proprietary
                                            6
What we need…vs. what we have


                          Need                                    Have
                SLAs: 99.999%                          Backup and recovery can
  Up-Time                                              take days in the event of an
                                                       outage or system failure
                Access to information as it            ETL processes can take
  Real-time     happens                                hours before information is
                                                       available
                Add new applications as                Access to warehouse is
                the business demands                   tightly controlled;
 Distribution                                          performance bottlenecks of a
                                                       single database can impact
                                                       mission-critical systems



                          Confidential & Proprietary                                  7
4 disaster recovery strategies for big data

1.   Separate operational warehouses from reporting systems
2.   Use changed data capture and Big Data replication
3.   Implement parallel, active-active data warehousing
4.   Maintain a “golden event” warehouse in Hadoop




                       Confidential & Proprietary       8
1. Separate operations from reporting

                Operations    Primary
  application                Warehouse
    DB2
                                         Run day-to-day
                                         applications in one
                                         place. Ad-hoc
                                         reporting happens in a
                                         separate warehouse.

                WAN                      BENEFIT
                                         Better control over
                                         performance

                                         CHALLENGE
                                         Keeping changes in
                             Secondary
                                         sync
                Reporting
                             Warehouse

                                                                  9
2. Changed data capture
                                     Primary Cluster


                                                         Determine what has
application                                              changed, then
                                                         replicate it to achieve
                                                         parity between
                                                         environments
                                       1 GB/s
              Data Fabric                                BENEFIT
              250 MB/s per box
              Load-balanced
                                                         Quickly propagate
              Linearly scalable                          changes to remote
              Built-in persistence
                                                         sites
                                 WAN
                                                         CHALLENGE
                                                         Identifying changes is
                                                         difficult. The volume of
                                                         data represents a stop-
                                                         gap as it continues to
                                     Reporting Cluster   grow.
                                                                               10
3. Parallel, active-active data warehousing


                                Primary Cluster

                                                                  Capture application
                                                                  data streams and load
                                                                  to parallel data
                                                                  warehouses over the
                                                                  WAN
1 GB/s
                                                                  BENEFIT
         Data Fabric                                              Multiple warehouses
         250 MB/s per box                                         are kept up to date
         Load-balanced          WAN
         Linearly scalable
         Built-in persistence                                     CHALLENGE
                                                                  Synchronization of
                                                                  many data streams
                                 Reporting Cluster


                                     Confidential & Proprietary                         11
4. “Golden Event” store

                          Data Fabric
                                                                  Primary Data Warehouse
                          250 MB/s per box
       application        Load-balanced
                          Linearly scalable
                          Built-in persistence




Capture raw data and
store it in Hadoop

BENEFIT
New analytics are
                                                                 Reporting Data Warehouse
always possible
                                                                                 (Optional)
CHALLENGE
Best practices are only                                                New Apps &
just being developed                                                   Analytics
                                Golden Event Store
                                    Confidential & Proprietary                          12
About Tervela Turbo

•   New release!
•   Capture, share, and distribute data
•   Accelerate any of the use cases we discussed today




                        Confidential & Proprietary       13
Big Data Requires Big Data Movement

As companies
implement more big
data solutions, the
need to use high-
performance message
delivery with those
systems will grow.



Gartner: Hype Cycle for Big Data, 2012

                                         Confidential & Proprietary   14
Key Features and Benefits of Tervela Turbo

Key Features                         Key Benefits

Data Capture
• Adapters for top data stores       Real-Time
• Flexible multi-language API        Regardless of data volume or
• Real-time acquisition              number of sources

Data Availability                    Reliable
• Parallel loading
• Large-volume buffering             For mission-critical operations
• Automatic retry                    that can’t go down
• Data replay


Data Distribution                    Multi-Platform
• Continuous loading
• No disruption with bad consumers
                                     Feeds explosion of analytic
• Warehouses, DBs, Hadoop, etc       apps on any platform without
• Web, mobile, custom apps           disrupting other consumers

                                                                       15
Learn More About Big Data Movement



    Capture, Share, and Distribute
Big Data For Mission-Critical Analytics




   Access videos, how-to
      guides, and other
  educational materials at:               www.terverla.com
   tervela.com/datafabric                     @tervela
                                          info@tervela.com


                                                             16

Disaster Recovery for the Real-Time Data Warehouses

  • 1.
    Disaster Recovery For theReal-Time Data Warehouse: Replicating and Parallelizing Big Data
  • 2.
    What you willlearn: 4 strategies 1. Separate operational warehouses from reporting systems 2. Use changed data capture and Big Data replication 3. Implement parallel, active-active data warehouses 4. Maintain a “golden event” warehouse in Hadoop Confidential & Proprietary 2
  • 3.
    Analytics Have aMeasurable Effect • For the median Fortune 1000 Company, a 10% increase in data usability corresponds to $2.01B in annual revenue gains Big Data, Big Opportunity – University of Texas at Austin, Sept 2011 • A “real-time infrastructure” ranks #3 on the CIO’s list of strategies A “real-time infrastructure” – Gartner • Organizations adept at analytics see 1.6x the revenue growth 2.0x the profit growth, and 2.5x the stock price appreciation of their peers – “Outperforming in a Data-Rich and Hyper-Connected World.” IBM Center for Applied Insights and Economic Intelligence Confidential & Proprietary 3
  • 4.
    Data Warehousing: NowPart of Operations real-time pricing real-time marketing fraud detection inventory management customer service Confidential & Proprietary 4
  • 5.
    Analytics in BusinessOperations: Constant, Up-to-Minute Access to Big Data ADVERTISING CAPITAL MARKETS Click-stream Mobile ads Market Data Securities Trading UTILITIES TRANSPORTATION Energy usage Power production Traffic & Logistics Fleet Deployment INFORMATION TECHNOLOGY TELECOMMUNICATIONS Network Activity IT Root-Cause Call Activity Capacity Allocation 5
  • 6.
    Expectations have changed Confidential & Proprietary 6
  • 7.
    What we need…vs.what we have Need Have SLAs: 99.999% Backup and recovery can Up-Time take days in the event of an outage or system failure Access to information as it ETL processes can take Real-time happens hours before information is available Add new applications as Access to warehouse is the business demands tightly controlled; Distribution performance bottlenecks of a single database can impact mission-critical systems Confidential & Proprietary 7
  • 8.
    4 disaster recoverystrategies for big data 1. Separate operational warehouses from reporting systems 2. Use changed data capture and Big Data replication 3. Implement parallel, active-active data warehousing 4. Maintain a “golden event” warehouse in Hadoop Confidential & Proprietary 8
  • 9.
    1. Separate operationsfrom reporting Operations Primary application Warehouse DB2 Run day-to-day applications in one place. Ad-hoc reporting happens in a separate warehouse. WAN BENEFIT Better control over performance CHALLENGE Keeping changes in Secondary sync Reporting Warehouse 9
  • 10.
    2. Changed datacapture Primary Cluster Determine what has application changed, then replicate it to achieve parity between environments 1 GB/s Data Fabric BENEFIT 250 MB/s per box Load-balanced Quickly propagate Linearly scalable changes to remote Built-in persistence sites WAN CHALLENGE Identifying changes is difficult. The volume of data represents a stop- gap as it continues to Reporting Cluster grow. 10
  • 11.
    3. Parallel, active-activedata warehousing Primary Cluster Capture application data streams and load to parallel data warehouses over the WAN 1 GB/s BENEFIT Data Fabric Multiple warehouses 250 MB/s per box are kept up to date Load-balanced WAN Linearly scalable Built-in persistence CHALLENGE Synchronization of many data streams Reporting Cluster Confidential & Proprietary 11
  • 12.
    4. “Golden Event”store Data Fabric Primary Data Warehouse 250 MB/s per box application Load-balanced Linearly scalable Built-in persistence Capture raw data and store it in Hadoop BENEFIT New analytics are Reporting Data Warehouse always possible (Optional) CHALLENGE Best practices are only New Apps & just being developed Analytics Golden Event Store Confidential & Proprietary 12
  • 13.
    About Tervela Turbo • New release! • Capture, share, and distribute data • Accelerate any of the use cases we discussed today Confidential & Proprietary 13
  • 14.
    Big Data RequiresBig Data Movement As companies implement more big data solutions, the need to use high- performance message delivery with those systems will grow. Gartner: Hype Cycle for Big Data, 2012 Confidential & Proprietary 14
  • 15.
    Key Features andBenefits of Tervela Turbo Key Features Key Benefits Data Capture • Adapters for top data stores Real-Time • Flexible multi-language API Regardless of data volume or • Real-time acquisition number of sources Data Availability Reliable • Parallel loading • Large-volume buffering For mission-critical operations • Automatic retry that can’t go down • Data replay Data Distribution Multi-Platform • Continuous loading • No disruption with bad consumers Feeds explosion of analytic • Warehouses, DBs, Hadoop, etc apps on any platform without • Web, mobile, custom apps disrupting other consumers 15
  • 16.
    Learn More AboutBig Data Movement Capture, Share, and Distribute Big Data For Mission-Critical Analytics Access videos, how-to guides, and other educational materials at: www.terverla.com tervela.com/datafabric @tervela info@tervela.com 16

Editor's Notes

  • #4 Big Data, Big Opportunity – University of Texas at Austin, Sept 2011A “real-time infrastructure” – Gartner – (ranks 3rd after “developing business solutions” and “reducing the cost of IT”)Organizations using analytics for competitive advantage – “Outperforming in a Data-Rich and Hyper-Connected World.” IBM Center for Applied Insights and Economic Intelligence
  • #6 Use this, instead, for “new role of the data warehouse” slide??
  • #10 Benefit: better manage performanceChallenge: Keep reporting systems up to date with changes
  • #11 Benefit: get changes out to remote sites faster
  • #16 Second “about Tervela Turbo” slide??