Top 5 Considerations for a Big Data Solution

DataStax
DataStaxDataStax
Top 5 Factors to Consider When
Choosing a Big Data Solution
 Robin Schumacher, VP Products


©2012 DataStax                   1
•  VP Products, DataStax
    •  Director of Product Management MySQL, then
       EnterpriseDB
    •  VP Product Management at Embarcadero
       Technologies
    •  DBA with Oracle, Teradata, SQL Server, DB2,
       others…
    •  Database software reviewer for various
       magazines
    •  Author of 3 database books

©2012 DataStax                                       2
•  Define big data
       •  Identify “must have’s” of a big data solution
       •  Discuss difficulty in getting all of them from a
          business and technical perspective
       •  Brief tour of NoSQL, Cassandra and DataStax
          Enterprise




©2012 DataStax                                             3
What big data is and the
                 domains of data that need to
                 be considered.




©2012 DataStax                                  4
©2012 DataStax   5
“Big data technologies describe a new generation of technologies and
    architectures, designed to economically extract value from very large volumes of a
    wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”



     "Big data is data that exceeds the processing capacity of conventional database
     systems. The data is too big, moves too fast, or doesn't fit the strictures of your
     database architectures. To gain value from this data, you must choose an
     alternative way to process it."



    ”Datasets whose size is beyond the ability of typical database software tools to
    capture, store, manage, and analyze "



      * All definitions have one thing in common: new technology is needed for big data…

©2012 DataStax                                                                              6
1.  Real-time – transactional, online, streaming, low latency
        data
    2.  Analytic – aggregated data from real-time feeds or other
        sources; many times batch in nature
    3.  Search – supporting data, both external and internal, used
        for locating desired information and/or objects (e.g.
        products, documents, etc.)




©2012 DataStax                                                       7
Research done by McKinsey & Company shows the eye-opening, 10-year
          category growth rate differences between businesses that smartly use
          their big data and those that do not.


©2012 DataStax                                                                  8
What are the top five things to
                 consider in a big data
                 solution?




©2012 DataStax                                     9
©2012 DataStax   10
The characteristics that define big data are:

    1.  Velocity – includes the speed at which data comes in, and
        the number of events/elements being stored
    2.  Variety – involves structured, semi-structured, unstructured
        data
    3.  Volume – can equate to TB-PB’s of data
    4.  Complexity – typically entails the difficulty distributing the
        data (e.g. multi-data centers, cloud, etc.) and managing the
        data traffic/movement (e.g. ETL, migrations, etc.)




©2012 DataStax                                                         11
•  Data has high rate of input
         •  Data has large quantity of elements/events



                 • Sensor data
                 • Media streaming
                 • Mobile devices
                 • Financial streams
                 • Web clickstream
                 • Traffic monitoring
                 • Patient care




©2012 DataStax                                           12
•  Includes structured, semi, and unstructured
         •  Necessitates new data model and file formats
         •  Involves, real-time, analytic, and search data




©2012 DataStax                                               13
•  TB’s to PB’s
         •  Also involves data maintenance functions (e.g.
            purging, etc.)




©2012 DataStax                                               14
The McKinsey report found that the average investment firm with fewer than 1,000 employees has
      3.8 petabytes of data stored, experiences a data growth rate of 40 percent per year, and stores
      structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17
      industry sectors in the United States have more data stored per company than the U.S. Library of
      Congress (which had 235 terabytes of information at the time of McKinsey’s study)

©2012 DataStax                                                                                           15
•  Typically involves data distribution, movement,
            etc., across multiple data centers and
            geographies
         •  Can be on-premise, cloud, or hybrid




©2012 DataStax                                                16
Getting a big data technology that provides two out of three can be
       challenging; finding one that supplies all three can be very hard.

©2012 DataStax                                                               17
NoSQL, Cassandra, and
                 DataStax Enterprise for big
                 data.




©2012 DataStax                                 18
NoSQL is a broad class of next-generation database management
        systems that differ from the classic model of the relational database
        management system (RDBMS) in some significant ways, most important
        being they:

         •       Sport a less-rigid, more dynamic data model
         •       Look to provide user controlled trade-off’s to the CAP theorem
         •       Do not support ANSI SQL or operations such as joins
         •       Attempt to solve some or all of the challenges of big data




©2012 DataStax                                                                   19
A NoSQL solution like Apache Cassandra:
         •  Handles high velocity data with ease
         •  Uses schema that support broad varieties of data
         •  Scales from GB’s to PB’s with linear performance capabilities
         •  Is built to handle multi-location/data center use cases
         •  Is designed for continuous availability
         •  Offers quick installation and configuration for multi-node
            clusters
         •  Is open source and/or cost 80-90% less than RDBMS’s




©2012 DataStax                                                              20
Overview of DataStax
        •  Founded in April 2010
        •  Commercial leader in Apache Cassandra™, the
           popular open-source “big data” database
        •  140+ customers
        •  40+ employees
        •  Home to Apache Cassandra Chair & most
           committers
        •  Headquartered in San Francisco Bay area
        •  Funded by prominent venture firms




©2012 DataStax                                           21
* Uses Cassandra and Hadoop for data management
©2012 DataStax                                          22
Cassandra is:
    Nearly 4x better in writes
    Nearly 2x better in reads
    Over 12x better in reads/updates




    YCSB Benchmark
    Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2-
    NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email

©2012 DataStax                                                                                                              23
Stores financial options tick data into very fluid data model for storage and
                 analysis into Cassandra.



©2012 DataStax                                                                                   24
“The hundreds of millions of web pages that contain this information are
                 stored in a multi-terabyte cache that grows continually as we crawl the
                 web, analyzing new pages and finding new versions of existing pages.” –
                 Zoominfo Architect on using Cassandra

©2012 DataStax                                                                              25
“I can create a Cassandra cluster in any region of the world in 10
                 minutes. When marketing guys decide we want to move into a
                 certain part of the world, we’re ready.” - Netflix architect

©2012 DataStax                                                                        26
•       Fully integrated smart big data platform
         •       Production certified Cassandra
         •       Continuously available analytics with Hadoop
         •       Scalable enterprise search with Solr
         •       Built in workload isolation
         •       No costly and error-prone ETL operations
         •       Easy migration of RDBMS and log data
         •       Simple to install and grow
         •       OpsCenter management solution
         •       80-90% less cost than RDBMS vendors




©2012 DataStax                                                  27
•  DataStax OpsCenter is a visual management and
           monitoring solution for DataStax Enterprise
        •  Manage and monitor all Cassandra and Hadoop and Solr
           operations
        •  Visual alerts and notifications




©2012 DataStax                                                    28
1.  Does it handle high data velocity?
        2.  Can it tackle all types of data?
        3.  How well does it perform with large data volumes?
        4.  Can it handle complex distribution and implementation
            use cases (e.g. on-premise/cloud, multi-geo)?
        5.  How does it stack up in hitting the big data “bulls
            eye?” (i.e. cost, saleable performance, and operational
            ease are concerned)?




©2012 DataStax                                                        29
DataStax Enterprise is tailor made for high-velocity, multi-variety, large
       volume, and complex deployment use cases that involve big data.




©2012 DataStax                                                                      30
Recommended Reading




                 http://www.datastax.com/resources/whitepapers

©2012 DataStax                                                   31
Next Steps
         Download DataStax Enterprise and try it in your
         own environment.

          ›  Go to www.datastax.com/
              software
          ›  Download a copy of DataStax
              Enterprise
          ›  Installs and configures in
              minutes
          ›  Completely free for
              development use




©2012 DataStax                                             32
For More Information




©2012 DataStax                  33
Move Faster.




©2012 DataStax                  34
1 of 34

Recommended

Teradata Architecture by
Teradata Architecture Teradata Architecture
Teradata Architecture BigClasses Com
3.3K views8 slides
Teradata by
TeradataTeradata
TeradataTeja Bheemanapally
7.1K views17 slides
Graph databases by
Graph databasesGraph databases
Graph databasesVinoth Kannan
2.8K views26 slides
Data warehousing by
Data warehousingData warehousing
Data warehousingJuhi Mahajan
4.5K views43 slides
Netezza vs Teradata vs Exadata by
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
34.9K views10 slides
OLTP+OLAP=HTAP by
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAPEDB
409 views39 slides

More Related Content

What's hot

Tera data by
Tera dataTera data
Tera dataNaga Dinesh
2.1K views17 slides
Data integration by
Data integrationData integration
Data integrationUmar Alharaky
10.9K views25 slides
Big data-analytics-cpe8035 by
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
1.5K views173 slides
Presentation About Big Data (DBMS) by
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
1.4K views10 slides
Data storage and indexing by
Data storage and indexingData storage and indexing
Data storage and indexingpradeepa velmurugan
1.2K views53 slides
Big Data by
Big DataBig Data
Big DataSeminar Links
8.7K views16 slides

What's hot(20)

Big data-analytics-cpe8035 by Neelam Rawat
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
Neelam Rawat1.5K views
Presentation About Big Data (DBMS) by SiamAhmed16
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
SiamAhmed161.4K views
Data Warehousing Datamining Concepts by raulmisir
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
raulmisir5.6K views
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo... by Simplilearn
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn992 views
Data Mining by ksanthosh
Data MiningData Mining
Data Mining
ksanthosh4.7K views
Common MongoDB Use Cases by MongoDB
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
MongoDB18.5K views
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho... by Edureka!
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Edureka!5.1K views
Data Federation with Apache Spark by DataWorks Summit
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache Spark
DataWorks Summit1.8K views
A Seminar on NoSQL Databases. by Navdeep Charan
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan1.2K views
Big Data, Business Intelligence and Data Analytics by Systems Limited
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data Analytics
Systems Limited3.7K views
Chapter 2 database environment by >. <
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment
>. <10.7K views
Teradata introduction - A basic introduction for Taradate system Architecture by Mohammad Tahoon
Teradata introduction - A basic introduction for Taradate system ArchitectureTeradata introduction - A basic introduction for Taradate system Architecture
Teradata introduction - A basic introduction for Taradate system Architecture
Mohammad Tahoon7.9K views

Similar to Top 5 Considerations for a Big Data Solution

The Top 5 Factors to Consider When Choosing a Big Data Solution by
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionDATAVERSITY
7.1K views36 slides
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC) by
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
103 views33 slides
Big data - Cassandra by
Big data - CassandraBig data - Cassandra
Big data - CassandraJen Wei Lee
485 views10 slides
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization by
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
140 views38 slides
Getting Big Value from Big Data by
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big DataDataStax
4.8K views33 slides
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC) by
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
97 views25 slides

Similar to Top 5 Considerations for a Big Data Solution(20)

The Top 5 Factors to Consider When Choosing a Big Data Solution by DATAVERSITY
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
DATAVERSITY7.1K views
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC) by Denodo
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo 103 views
Big data - Cassandra by Jen Wei Lee
Big data - CassandraBig data - Cassandra
Big data - Cassandra
Jen Wei Lee485 views
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization by Denodo
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo 140 views
Getting Big Value from Big Data by DataStax
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big Data
DataStax4.8K views
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC) by Denodo
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Denodo 97 views
Unlock Your Data for ML & AI using Data Virtualization by Denodo
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo 915 views
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H... by DataStax
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax5.4K views
Data Lake Acceleration vs. Data Virtualization - What’s the difference? by Denodo
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo 181 views
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC) by Denodo
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Denodo 98 views
Bridging the Last Mile: Getting Data to the People Who Need It by Denodo
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo 55 views
LinkedInSaxoBankDataWorkbench by Sheetal Pratik
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik1.2K views
Data Virtualization: An Essential Component of a Cloud Data Lake by Denodo
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo 195 views
Introduction to Bigdata and NoSQL by Tushar Shende
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
Tushar Shende1.1K views
Building a Logical Data Fabric using Data Virtualization (ASEAN) by Denodo
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo 273 views
Best Practices in the Cloud for Data Management (US) by Denodo
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
Denodo 157 views
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio... by Denodo
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo 158 views
Big Data Practice_Planning_steps_RK by Rajesh Jayarman
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman137 views
Data Ninja Webinar Series: Realizing the Promise of Data Lakes by Denodo
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo 247 views

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season? by
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
1.9K views34 slides
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas... by
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
679 views44 slides
Running DataStax Enterprise in VMware Cloud and Hybrid Environments by
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
853 views48 slides
Best Practices for Getting to Production with DataStax Enterprise Graph by
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
554 views48 slides
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey by
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
547 views38 slides
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ... by
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
1.3K views59 slides

More from DataStax(20)

Is Your Enterprise Ready to Shine This Holiday Season? by DataStax
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax1.9K views
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas... by DataStax
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax679 views
Running DataStax Enterprise in VMware Cloud and Hybrid Environments by DataStax
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax853 views
Best Practices for Getting to Production with DataStax Enterprise Graph by DataStax
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax554 views
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey by DataStax
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax547 views
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ... by DataStax
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax1.3K views
Webinar | Better Together: Apache Cassandra and Apache Kafka by DataStax
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax881 views
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise by DataStax
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax4.2K views
Introduction to Apache Cassandra™ + What’s New in 4.0 by DataStax
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax1.8K views
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud... by DataStax
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax748 views
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities by DataStax
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax568 views
Designing a Distributed Cloud Database for Dummies by DataStax
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
DataStax751 views
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud by DataStax
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax565 views
How to Evaluate Cloud Databases for eCommerce by DataStax
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
DataStax667 views
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa... by DataStax
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax950 views
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi... by DataStax
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax560 views
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin... by DataStax
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax678 views
Datastax - The Architect's guide to customer experience (CX) by DataStax
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
DataStax892 views
An Operational Data Layer is Critical for Transformative Banking Applications by DataStax
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax1.3K views
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking by DataStax
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax391 views

Recently uploaded

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
160 views32 slides
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueShapeBlue
218 views20 slides
Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
123 views13 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
56 views29 slides
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
166 views28 slides
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
119 views17 slides

Recently uploaded(20)

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson160 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue218 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue166 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue119 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue206 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue180 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash158 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue130 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty64 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue106 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue139 views
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue252 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue173 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 views

Top 5 Considerations for a Big Data Solution

  • 1. Top 5 Factors to Consider When Choosing a Big Data Solution Robin Schumacher, VP Products ©2012 DataStax 1
  • 2. •  VP Products, DataStax •  Director of Product Management MySQL, then EnterpriseDB •  VP Product Management at Embarcadero Technologies •  DBA with Oracle, Teradata, SQL Server, DB2, others… •  Database software reviewer for various magazines •  Author of 3 database books ©2012 DataStax 2
  • 3. •  Define big data •  Identify “must have’s” of a big data solution •  Discuss difficulty in getting all of them from a business and technical perspective •  Brief tour of NoSQL, Cassandra and DataStax Enterprise ©2012 DataStax 3
  • 4. What big data is and the domains of data that need to be considered. ©2012 DataStax 4
  • 6. “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” "Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it." ”Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze " * All definitions have one thing in common: new technology is needed for big data… ©2012 DataStax 6
  • 7. 1.  Real-time – transactional, online, streaming, low latency data 2.  Analytic – aggregated data from real-time feeds or other sources; many times batch in nature 3.  Search – supporting data, both external and internal, used for locating desired information and/or objects (e.g. products, documents, etc.) ©2012 DataStax 7
  • 8. Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not. ©2012 DataStax 8
  • 9. What are the top five things to consider in a big data solution? ©2012 DataStax 9
  • 11. The characteristics that define big data are: 1.  Velocity – includes the speed at which data comes in, and the number of events/elements being stored 2.  Variety – involves structured, semi-structured, unstructured data 3.  Volume – can equate to TB-PB’s of data 4.  Complexity – typically entails the difficulty distributing the data (e.g. multi-data centers, cloud, etc.) and managing the data traffic/movement (e.g. ETL, migrations, etc.) ©2012 DataStax 11
  • 12. •  Data has high rate of input •  Data has large quantity of elements/events • Sensor data • Media streaming • Mobile devices • Financial streams • Web clickstream • Traffic monitoring • Patient care ©2012 DataStax 12
  • 13. •  Includes structured, semi, and unstructured •  Necessitates new data model and file formats •  Involves, real-time, analytic, and search data ©2012 DataStax 13
  • 14. •  TB’s to PB’s •  Also involves data maintenance functions (e.g. purging, etc.) ©2012 DataStax 14
  • 15. The McKinsey report found that the average investment firm with fewer than 1,000 employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent per year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17 industry sectors in the United States have more data stored per company than the U.S. Library of Congress (which had 235 terabytes of information at the time of McKinsey’s study) ©2012 DataStax 15
  • 16. •  Typically involves data distribution, movement, etc., across multiple data centers and geographies •  Can be on-premise, cloud, or hybrid ©2012 DataStax 16
  • 17. Getting a big data technology that provides two out of three can be challenging; finding one that supplies all three can be very hard. ©2012 DataStax 17
  • 18. NoSQL, Cassandra, and DataStax Enterprise for big data. ©2012 DataStax 18
  • 19. NoSQL is a broad class of next-generation database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways, most important being they: •  Sport a less-rigid, more dynamic data model •  Look to provide user controlled trade-off’s to the CAP theorem •  Do not support ANSI SQL or operations such as joins •  Attempt to solve some or all of the challenges of big data ©2012 DataStax 19
  • 20. A NoSQL solution like Apache Cassandra: •  Handles high velocity data with ease •  Uses schema that support broad varieties of data •  Scales from GB’s to PB’s with linear performance capabilities •  Is built to handle multi-location/data center use cases •  Is designed for continuous availability •  Offers quick installation and configuration for multi-node clusters •  Is open source and/or cost 80-90% less than RDBMS’s ©2012 DataStax 20
  • 21. Overview of DataStax •  Founded in April 2010 •  Commercial leader in Apache Cassandra™, the popular open-source “big data” database •  140+ customers •  40+ employees •  Home to Apache Cassandra Chair & most committers •  Headquartered in San Francisco Bay area •  Funded by prominent venture firms ©2012 DataStax 21
  • 22. * Uses Cassandra and Hadoop for data management ©2012 DataStax 22
  • 23. Cassandra is: Nearly 4x better in writes Nearly 2x better in reads Over 12x better in reads/updates YCSB Benchmark Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2- NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email ©2012 DataStax 23
  • 24. Stores financial options tick data into very fluid data model for storage and analysis into Cassandra. ©2012 DataStax 24
  • 25. “The hundreds of millions of web pages that contain this information are stored in a multi-terabyte cache that grows continually as we crawl the web, analyzing new pages and finding new versions of existing pages.” – Zoominfo Architect on using Cassandra ©2012 DataStax 25
  • 26. “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.” - Netflix architect ©2012 DataStax 26
  • 27. •  Fully integrated smart big data platform •  Production certified Cassandra •  Continuously available analytics with Hadoop •  Scalable enterprise search with Solr •  Built in workload isolation •  No costly and error-prone ETL operations •  Easy migration of RDBMS and log data •  Simple to install and grow •  OpsCenter management solution •  80-90% less cost than RDBMS vendors ©2012 DataStax 27
  • 28. •  DataStax OpsCenter is a visual management and monitoring solution for DataStax Enterprise •  Manage and monitor all Cassandra and Hadoop and Solr operations •  Visual alerts and notifications ©2012 DataStax 28
  • 29. 1.  Does it handle high data velocity? 2.  Can it tackle all types of data? 3.  How well does it perform with large data volumes? 4.  Can it handle complex distribution and implementation use cases (e.g. on-premise/cloud, multi-geo)? 5.  How does it stack up in hitting the big data “bulls eye?” (i.e. cost, saleable performance, and operational ease are concerned)? ©2012 DataStax 29
  • 30. DataStax Enterprise is tailor made for high-velocity, multi-variety, large volume, and complex deployment use cases that involve big data. ©2012 DataStax 30
  • 31. Recommended Reading http://www.datastax.com/resources/whitepapers ©2012 DataStax 31
  • 32. Next Steps Download DataStax Enterprise and try it in your own environment. ›  Go to www.datastax.com/ software ›  Download a copy of DataStax Enterprise ›  Installs and configures in minutes ›  Completely free for development use ©2012 DataStax 32