Your SlideShare is downloading. ×
0
Top 5 Factors to Consider WhenChoosing a Big Data Solution Robin Schumacher, VP Products©2012 DataStax                   1
• VP Products, DataStax    • Director of Product Management MySQL, then      EnterpriseDB    • VP Product Management at Em...
Overview of DataStax        • Founded in April 2010        • Commercial leader in Apache Cassandra™, the          popular ...
• Define big data        • Identify “must have’s” of a big data          solution        • Discuss difficulty in getting a...
What big data is and the                 domains of data that need to be                 considered.©2012 DataStax        ...
©2012 DataStax   6
“Big data technologies describe a new generation of technologies and     architectures, designed to economically extract v...
1. Real-time – transactional, online, streaming, low        latency data     2. Analytic – aggregated data from real-time ...
Research done by McKinsey & Company shows the eye-opening,          10-year category growth rate differences between busin...
What are the top five things to                 consider in a big data solution?©2012 DataStax                            ...
©2012 DataStax   11
The characteristics that define big data are:     1. Velocity – includes the speed at which data comes in,        and the ...
• Data has high rate of input          • Data has large quantity of elements/events                 •Sensor data          ...
• Includes structured, semi, and unstructured          • Necessitates new data model and file formats          • Involves,...
• TB’s to PB’s          • Also involves data maintenance functions            (e.g. purging, etc.)©2012 DataStax          ...
The McKinsey report found that the average investment firm with fewer than 1,000       employees has 3.8 petabytes of data...
• Typically involves data distribution,            movement, etc., across multiple data centers            and geographies...
Getting a big data technology that provides two out of three can be       challenging; finding one that supplies all three...
NoSQL, Cassandra, and                 DataStax Enterprise for big data.©2012 DataStax                                   19
NoSQL is a broad class of next-generation database management        systems that differ from the classic model of the rel...
A NoSQL solution like Apache Cassandra:          • Handles high velocity data with ease          • Uses schema that suppor...
* Uses Cassandra and Hadoop for data management©2012 DataStax                                           22
Cassandra is:    Nearly 4x better in writes    Nearly 2x better in reads    Over 12x better in reads/updates    YCSB Bench...
“Cassandra was just a better design all around – more truly horizontally scalable            and with less management over...
“The hundreds of millions of web pages that contain this information                 are stored in a multi-terabyte cache ...
“I can create a Cassandra cluster in any region of the world in 10                 minutes. When marketing guys decide we ...
•      Fully integrated smart big data platform          •      Production certified Cassandra          •      Continuousl...
DataStax Enterprise Server          No ETL and Built-in Workload Isolation          •      Data written to any node is aut...
DataStax Enterprise Server          Multi-Data Center and Cloud Capable          •      Built-in capabilities to maintain ...
• DataStax OpsCenter is a visual management and monitoring           solution for DataStax Enterprise         • Manage and...
1. Does it handle high data velocity?         2. Can it tackle all types of data?         3. How well does it perform with...
DataStax Enterprise is tailor made for high-velocity, multi-variety,        large volume, and complex deployment use cases...
Recommended Reading                 http://www.datastax.com/resources/whitepapers©2012 DataStax                           ...
Next Steps          Download DataStax Enterprise and try it in your own          environment.           • Go to           ...
For More Information©2012 DataStax                   35
Move Faster.©2012 DataStax                  36
Upcoming SlideShare
Loading in...5
×

The Top 5 Factors to Consider When Choosing a Big Data Solution

4,750

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,750
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
69
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Machine generated data
  • s
  • Transcript of "The Top 5 Factors to Consider When Choosing a Big Data Solution"

    1. 1. Top 5 Factors to Consider WhenChoosing a Big Data Solution Robin Schumacher, VP Products©2012 DataStax 1
    2. 2. • VP Products, DataStax • Director of Product Management MySQL, then EnterpriseDB • VP Product Management at Embarcadero Technologies • DBA with Oracle, Teradata, SQL Server, DB2, others… • Database software reviewer for various magazines • Author of 3 database books©2012 DataStax 2
    3. 3. Overview of DataStax • Founded in April 2010 • Commercial leader in Apache Cassandra™, the popular open-source “big data” database • 140+ customers • 40+ employees • Home to Apache Cassandra Chair & most committers • Headquartered in San Francisco Bay area • Funded by prominent venture firms©2012 DataStax 3
    4. 4. • Define big data • Identify “must have’s” of a big data solution • Discuss difficulty in getting all of them from a business and technical perspective • Brief tour of NoSQL, Cassandra and DataStax Enterprise©2012 DataStax 4
    5. 5. What big data is and the domains of data that need to be considered.©2012 DataStax 5
    6. 6. ©2012 DataStax 6
    7. 7. “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” "Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesnt fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it." ”Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze " * All definitions have one thing in common: new technology is needed for big data…©2012 DataStax 7
    8. 8. 1. Real-time – transactional, online, streaming, low latency data 2. Analytic – aggregated data from real-time feeds or other sources; many times batch in nature 3. Search – supporting data, both external and internal, used for locating desired information and/or objects (e.g. products, documents, etc.)©2012 DataStax 8
    9. 9. Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.©2012 DataStax 9
    10. 10. What are the top five things to consider in a big data solution?©2012 DataStax 10
    11. 11. ©2012 DataStax 11
    12. 12. The characteristics that define big data are: 1. Velocity – includes the speed at which data comes in, and the number of events/elements being stored 2. Variety – involves structured, semi-structured, unstructured data 3. Volume – can equate to TB-PB’s of data 4. Complexity – typically entails the difficulty distributing the data (e.g. multi-data centers, cloud, etc.) and managing the data traffic/movement (e.g. ETL, migrations, etc.)©2012 DataStax 12
    13. 13. • Data has high rate of input • Data has large quantity of elements/events •Sensor data •Media streaming •Mobile devices •Financial streams •Web clickstream •Traffic monitoring •Patient care©2012 DataStax 13
    14. 14. • Includes structured, semi, and unstructured • Necessitates new data model and file formats • Involves, real-time, analytic, and search data©2012 DataStax 14
    15. 15. • TB’s to PB’s • Also involves data maintenance functions (e.g. purging, etc.)©2012 DataStax 15
    16. 16. The McKinsey report found that the average investment firm with fewer than 1,000 employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent per year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17 industry sectors in the United States have more data stored per company than the U.S. Library of Congress (which had 235 terabytes of information at the time of McKinsey’s study)©2012 DataStax 16
    17. 17. • Typically involves data distribution, movement, etc., across multiple data centers and geographies • Can be on-premise, cloud, or hybrid©2012 DataStax 17
    18. 18. Getting a big data technology that provides two out of three can be challenging; finding one that supplies all three can be very hard.©2012 DataStax 18
    19. 19. NoSQL, Cassandra, and DataStax Enterprise for big data.©2012 DataStax 19
    20. 20. NoSQL is a broad class of next-generation database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways, most important being they: • Sport a less-rigid, more dynamic data model • Look to provide user controlled trade-off’s to the CAP theorem • Do not support ANSI SQL or operations such as joins • Attempt to solve some or all of the challenges of big data©2012 DataStax 20
    21. 21. A NoSQL solution like Apache Cassandra: • Handles high velocity data with ease • Uses schema that support broad varieties of data • Scales from GB’s to PB’s with linear performance capabilities • Is built to handle multi-location/data center use cases • Is designed for continuous availability • Offers quick installation and configuration for multi-node clusters • Is open source and/or cost 80-90% less than RDBMS’s©2012 DataStax 21
    22. 22. * Uses Cassandra and Hadoop for data management©2012 DataStax 22
    23. 23. Cassandra is: Nearly 4x better in writes Nearly 2x better in reads Over 12x better in reads/updates YCSB Benchmark Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2- NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email©2012 DataStax 23
    24. 24. “Cassandra was just a better design all around – more truly horizontally scalable and with less management overhead – and there’s no single point of failure. I looked at Cassandra’s architecture and thought, ‘Yeah, that’s how you do it.’” - Matt Conway, VP of Engineering©2012 DataStax 24
    25. 25. “The hundreds of millions of web pages that contain this information are stored in a multi-terabyte cache that grows continually as we crawl the web, analyzing new pages and finding new versions of existing pages.” – Zoominfo Architect on using Cassandra©2012 DataStax 25
    26. 26. “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.” - Netflix architect©2012 DataStax 26
    27. 27. • Fully integrated smart big data platform • Production certified Cassandra • Continuously available analytics with Hadoop • Scalable enterprise search with Solr • Built in workload isolation • No costly and error-prone ETL operations • Easy migration of RDBMS and log data • Simple to install and grow • OpsCenter management solution • 80-90% less cost than RDBMS vendors©2012 DataStax 27
    28. 28. DataStax Enterprise Server No ETL and Built-in Workload Isolation • Data written to any node is automatically and transparently written to all other nodes. • Mixed workload management is automatic; real-time, analytic, and search workloads/nodes do not compete for compute or data resources with other nodes. ETL Staff / Processes©2012 DataStax 28
    29. 29. DataStax Enterprise Server Multi-Data Center and Cloud Capable • Built-in capabilities to maintain the same database cluster between many different data centers • Able to easily do on-premise data centers and cloud use case models Data Center 1 Data Center 2©2012 DataStax 29
    30. 30. • DataStax OpsCenter is a visual management and monitoring solution for DataStax Enterprise • Manage and monitor all Cassandra and Hadoop and Solr operations • Visual alerts and notifications©2012 DataStax 30
    31. 31. 1. Does it handle high data velocity? 2. Can it tackle all types of data? 3. How well does it perform with large data volumes? 4. Can it handle complex distribution and implementation use cases (e.g. on-premise/cloud, multi-geo)? 5. How does it stack up in hitting the big data “bulls eye?” (i.e. cost, saleable performance, and operational ease are concerned)?©2012 DataStax 31
    32. 32. DataStax Enterprise is tailor made for high-velocity, multi-variety, large volume, and complex deployment use cases that involve big data.©2012 DataStax 32
    33. 33. Recommended Reading http://www.datastax.com/resources/whitepapers©2012 DataStax 33
    34. 34. Next Steps Download DataStax Enterprise and try it in your own environment. • Go to www.datastax.com/download • Download a copy of DataStax Enterprise • Installs and configures in minutes • Completely free for development use©2012 DataStax 34
    35. 35. For More Information©2012 DataStax 35
    36. 36. Move Faster.©2012 DataStax 36
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×