Business value from     Big Data
Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
What’s Big Data?• Volume• Velocity• Variety• Value
Large datasets
Challenge    40%                   5%                       Growth of IT                     spending per year  Growth of ...
Maybe Big Data is...• When any of volume, velocity, variety, value  (cost?) becomes a problem• When new use cases emerge, ...
For example US cell     Items shared    Smart meter updates     Social media   readings 2015600B/day        4B/day        ...
Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
Why now?
Cost per gigabyte                      1000 $569                          100               $ per GB                      ...
Guess what?
Disruptive innovation
Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
Why care?“Companies that can harness big data will     trample data incompetents”           The Economist, May 26th 2011
Why care - take 2• The competition will do it (and you’ll get  fired)• Competitive advantage to be gained by  doing it well...
What are we looking         for?• Data / Information• Insights• Actionable intelligence
Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
DatabasesA Relational Model of Data for Large Shared Data Banks                                     Tedd Codd, CACM, June ...
Big = Slow?                                                                                    Throughput           Throug...
Scale-out versus Scale-up21
Hadoop• Great for unstructured data or arbitrary  queries• MapReduce framework for distributed  compute• Tools now making ...
What about real-time?
Use cases• Tracking trending topics on social media• Network and infrastructure monitoring• Web and ad analytics dashboard...
NOSQL        Voldemort
No “one size fits all”• Column DBs and Key-Value stores   P• Document databases• Graph databases                           ...
Questions to ask• Who uses it?• Who can support it? Where are they?• How does it scale? Perform?• Maturity, both DB and to...
Changing economics     XDR                         XDR    metadata                    metadata      Oracle     NetApp     ...
Agenda• What’s Big Data?• Why now?• Why care?• What’s the new technology good for?• How can I get started?
Start small• Identify data sources• Look at capabilities• Run experiments, PoCs
Data sources     Web, SCM, Retail   Location Services    Infra Monitoring     Smart Metering     Oil/Gas Sensors      Ad M...
Capabilities     •   Open source, supported, or “packaged”         solution?     •   How do “commodity” servers fit your   ...
Acunu         Discover the Potential of Real Time Big Data with Acunu Activate                                            ...
www.acunu.com @acunuApache, Apache Cassandra, Cassandra, Hadoop, and the eye andelephant logos are trademarks of the Apach...
Upcoming SlideShare
Loading in...5
×

Exploring Big Data value for your business

1,404

Published on

Slides from our webinar on "Exploring Big Data value for your business."

Delivered by Andy Ormsby. V

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,404
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
108
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • Analyst firms appear to be fighting with each other to come up with the most Vs to define Big Data. \n“Big” seems to imply size of the dataset but there’s more to it than this.\n
  • There’s nothing new about huge datasets: plenty of people playing with big data sets for years:\nSeismic survey datasets in the PB range, v. high end supercomputing hardware\nWeather: ECMWF has supercomputers with > PB disk storage\nHEP: 15PB/year for CERN HB project\nThe rest of us can’t afford supercomputers\n
  • But we all have a challenge with datasets that grow faster than IT budgets (these numbers from McKinsey and are probably optimistic w.r.t. IT budgets)\n
  • So maybe Big Data is really when we have one of these two things...and we’ve not already solved the problem. Perhaps there’s a silent [New] in “Big Data” :)\n
  • Here are some new datasets that typify both the challange and the opportunity of big data. \n\nSo we’ve explored what Big Data might be. Let’s move on to look at why the Big Data hype is happening now\n
  • \n
  • Disks got cheaper!\n\nhttp://everyjoe.com/technology/hard-drive-cost-per-gigabyte-from-1980-to-2009/\n
  • Exponential drop in price. (1GB cost around $200K in 1980). Today, I can buy SATA disks at 4p/GB,\n
  • Basic economics. Reduce the price and the demand goes up.\n
  • With huge reductions in cost and waves of commoditisation, the scene is set for repeated disruptive innovation, not just in storage technology itself but in the products and services that rely on it.\n
  • \n
  • What’s big data about? We’re looking to get insight from data. Data trumps intuition and commonsense every time (funny anecdotal examples). More data means better decisions, based on fact not folklore.\n
  • \n
  • Odd coloured cars more reliable.\nVegetarians less likely to miss flights.\nComputing hardware doesn’t fail at high temperatures as thought - but changing temperatures kill it.\nA person who’s just viewed a particular web page is more likely to buy product X\n
  • \n
  • RDBMS. Ted Codd, 1970, IBM. System/R, DB2, Oracle...\nBy late 1980s, it was the standard. Usurpers (e.g. Object-oriented Databases) failed to gain significant market share; hierarchical (IMS - developed for Apollo) and network databases (CODASYL) pretty much disappeared.\nHowever, while RDBMS have become the default choice, they aren’t necessarily the best for some Big Data UCs. Some problems: \n
  • Problem 1: Performance. Dealing with time-series data is a common BD use case. We’re not looking to do complex transactions but we need to store the data so we can access it for analytics &c. RDBs do not handle this well. \n\nI’ve been investigating the performance of our “big vendor” RDBMS to hold months of sensor data. So far, the results are not really encouraging. I observe a exponential performance drop on the single-index (PK) table holding the data as more records are added. Here’s a plot of the performance of 5K records as records are continuously added to the table. Record addition is done with 5 parallel client threads, each inserting 1K records in batch mode. The client is an optimized Java app, using raw JDBC for the batch inserts. I haven’t found a faster way than that to add records to the relational DB.\n
  • Problem 2: Increasingly we’d like to scale out rather than scale up. Why? (i) incremental capacity (and cost) management; (ii) availability; (iii) distribution; (iv) potential cloud deployment (onto relatively small machines)\n\nRelational DBs tend to push towards a single big machine, or a tightly coupled cluster with expensive h/w like Infiniband or SAN storage.\n
  • From Google via Yahoo. Not really a database, but provides a distributed filesystem intended to store large files - with no schema.\nNot trivial to set up, but tools getting better - no longer need to write Java code to do queries. See Hive - HQL - for SQL-like access.\nBut it’s still batch, and plenty of time, you want real-time.\n
  • We’re looking to act on the insights that the data bring. If we don’t act, we’re just observing. But action is often time critical; the world is changing. (e.g. we’re monitoring an oil well, trading financial instruments, trying to understand the behaviour of lots of people) and insights from yesterday’s data are historical documents: interesting, perhaps, but not great as a guide to action.\n
  • Some concrete examples of things that people we’re working with are interested in capturing. \n
  • Lots of databases.\n
  • Lots of different kinds of databases with different goals in mind. \nOne way to view them is to see what sit in CAP terms (Brewer’s CAP theorem).\nMany different data models.\n
  • \n
  • Picking the right solution (or combination) can deliver significant cost savings, increased capability and allow granular growth over time.\n
  • \n
  • Paradoxically for “big data” consider starting with something small\nLet's look at the other items...\n
  • \n
  • \n
  • How can we help?\nAcunu Activate: A focused package of work to help you discover big data opportunities and understand how to exploit them;\nAcunu Reflex: A fully supported distributed database to support real-time big data use cases\nAcunu Analytics: Currently in preview, launching soon, provides real-time results for queries that would normally be costly to compute\n
  • \n
  • Exploring Big Data value for your business

    1. 1. Business value from Big Data
    2. 2. Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
    3. 3. Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
    4. 4. What’s Big Data?• Volume• Velocity• Variety• Value
    5. 5. Large datasets
    6. 6. Challenge 40% 5% Growth of IT spending per year Growth of datagenerated per year Source: McKinsey
    7. 7. Maybe Big Data is...• When any of volume, velocity, variety, value (cost?) becomes a problem• When new use cases emerge, new things become possible, because of new data sources
    8. 8. For example US cell Items shared Smart meter updates Social media readings 2015600B/day 4B/day 29B/day
    9. 9. Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
    10. 10. Why now?
    11. 11. Cost per gigabyte 1000 $569 100 $ per GB 10 1 $0.13 1992 1994 1996 1998 2000 2002 2004 2006 2008Source: Deloitte
    12. 12. Guess what?
    13. 13. Disruptive innovation
    14. 14. Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
    15. 15. Why care?“Companies that can harness big data will trample data incompetents” The Economist, May 26th 2011
    16. 16. Why care - take 2• The competition will do it (and you’ll get fired)• Competitive advantage to be gained by doing it well (you get promoted)• It’s not hard to get started (no need for huge investment)
    17. 17. What are we looking for?• Data / Information• Insights• Actionable intelligence
    18. 18. Agenda• What’s Big Data?• Why now?• Why care?• Key technologies• How can I get started?
    19. 19. DatabasesA Relational Model of Data for Large Shared Data Banks Tedd Codd, CACM, June 1970 Image: IBM
    20. 20. Big = Slow? Throughput Throughput: records/ms falls as datasets get larger 0 25 50 75 100 Records (in millions)Source: Gerard Maas, http://www.gerardmaas.net/2011/06/bigdata-on-rdbms
    21. 21. Scale-out versus Scale-up21
    22. 22. Hadoop• Great for unstructured data or arbitrary queries• MapReduce framework for distributed compute• Tools now making it accessible• Still essentially a batch processing system
    23. 23. What about real-time?
    24. 24. Use cases• Tracking trending topics on social media• Network and infrastructure monitoring• Web and ad analytics dashboard and platforms• Real-time A-B testing• User profiling
    25. 25. NOSQL Voldemort
    26. 26. No “one size fits all”• Column DBs and Key-Value stores P• Document databases• Graph databases C A
    27. 27. Questions to ask• Who uses it?• Who can support it? Where are they?• How does it scale? Perform?• Maturity, both DB and tool ecosystem
    28. 28. Changing economics XDR XDR metadata metadata Oracle NetApp 30 x $3k Dell servers 30 days of SMS 1/5th TCO of alternativesAt capacity ceiling Cost grows predictably
    29. 29. Agenda• What’s Big Data?• Why now?• Why care?• What’s the new technology good for?• How can I get started?
    30. 30. Start small• Identify data sources• Look at capabilities• Run experiments, PoCs
    31. 31. Data sources Web, SCM, Retail Location Services Infra Monitoring Smart Metering Oil/Gas Sensors Ad Marketplaces Fraud Detection Social Media31
    32. 32. Capabilities • Open source, supported, or “packaged” solution? • How do “commodity” servers fit your infrastructure? • Don’t rule out Cloud deployments to get quick answers32
    33. 33. Acunu Discover the Potential of Real Time Big Data with Acunu Activate Acunu Reflex Makes Big Data results easy, economic and fast Every CIO, Architect and Analyst knows of existing data with huge untapped potential within their organisation. Zero to Big Data Hero Evolving Big Data technologies provide new paths to revenue with both customers and prospects. Build a Big Data database cluster on commodity hardware in hours, not days. $ Acunu partners with you to deliver competitive advantage by capturing data and exploring its benefits. You’ll validate the value of Big Data by building real applications and dashboards to drive new value for your business. “ Key business andmanagement and processing landscape. traditional data technology trends are disrupting the Save Money versus Open Source Alternatives At the outset, we work with you to identify and develop use Data analysis is increasingly being viewed as a Save up to 60% on hardware and operation costs. competitive advantage. An increasingly sensor-enabled cases and areas where Big Data tools could be utilised to add and instrumented business environment is generating huge significant business value. We work with you to recommend z z z Database lag getting you down? solutions architectures for your specific use cases. volumes of data… Traditional IT infrastructure is simply ” zz not able to meet the demands of this new situation. We then deploy Acunu Reflex in your own data center or in the cloud and can include Apache Hadoop for investigative work and -Gartner Milliseconds turning into minutes? Acunu Analytics for real-time decision support. Once the software is installed, we work with you to integrate, capture and store sources of data from inside your organisation. We provide hands-on assistance to help you showcase the business value of your data through live proof-of-concept applications. You’ll get results quickly, with successive iterations delivering What is Acunu Reflex? ongoing value. As a result, you gain an understanding of Big Data’s transformative capability through working Easy Acunu provides an integrated suite of technologies to support rapid development and deployment of demonstrations and have a clear route to deliver that competitive advantage to your business. your Big Data applications. Getting started is easy with a single, fast installation, handling all the details usually associated with OS tuning, storage optimization, database integration and management. This alleviates the complexity of NoSQL development, deployment and support. The platform is flexible and scalable, providing simple, one click deployment. Scale linearly with ease and deploy across numerous machines within a data center or across a globally distributed public or private cloud. Workshops Structure & Planning Ecosystem of Expertise Acunu Specialist delivers A dedicated Project Lead will Acunu’s Big Data expertise is workshops and provides keep the project on track complemented through our on-demand consulting to enable your development team through kickoff, reviews and regular calls. Progressively, partners. Together we will build your own Big Data ecosystem. Economic to build Big Data applications. we’ll help you plan next steps. Acunu’s subscription base pricing model insures continuous value, skipping charges for non-production deployment, so you can defer technology expenses until your application goes into production. Acunu provides the NoSQL domain expertise you need, reducing your technology deployment costs without compromising your data security. The platform is architected to store significantly more data per node A Comprehensive Big Data Discovery Package than competing technologies with a focus on reducing both your initial hardware and operational costs over time. Acunu’s support for commodity hardware and large capacity disks further reduces your costs. Deployment Data Source Integration Support & Training Fast We deploy Acunu’s database We work with you to integrate We deliver hands on training and storage software, complete sources of log, clickstream, on the Acunu Reflex Acunu provides a suite of products focused on bringing you the performance your Big Data applications with management tools, to sensor, monitoring or similar infrastructure to your demand. Whether it’s a globally distributed database, millions to billions of records, tremendous amounts your own hardware or to data into Acunu Reflex. operations staff, and provide of machine generated data or managing millions of active users, Acunu provides you with real time Amazon’s public cloud. support throughout the project. results. Acunu has the professional services and support to get your applications up and running in the shortest possible time. Acunu leverages best in class open source solutions, adding additional management and performance technology to accelerate your Big Data results.33
    34. 34. www.acunu.com @acunuApache, Apache Cassandra, Cassandra, Hadoop, and the eye andelephant logos are trademarks of the Apache Software Foundation.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×