Business value from
     Big Data
Agenda

• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
Agenda

• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
What’s Big Data?

• Volume
• Velocity
• Variety
• Value
Large datasets
Challenge


    40%                   5%
                       Growth of IT
                     spending per year
  Growth of data
generated per year
                           Source: McKinsey
Maybe Big Data is...

• When any of volume, velocity, variety, value
  (cost?) becomes a problem
• When new use cases emerge, new things
  become possible, because of new data
  sources
For example



 US cell     Items shared    Smart meter
 updates     Social media   readings 2015
600B/day        4B/day         29B/day
Agenda

• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
Why now?
Cost per gigabyte
                      1000 $569



                          100
               $ per GB




                          10



                           1
                                                                                        $0.13


                                1992   1994   1996   1998   2000   2002   2004   2006    2008

Source: Deloitte
Guess what?
Disruptive innovation
Agenda

• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
Why care?

“Companies that can harness big data will
     trample data incompetents”
           The Economist, May 26th 2011
Why care - take 2

• The competition will do it (and you’ll get
  fired)
• Competitive advantage to be gained by
  doing it well (you get promoted)
• It’s not hard to get started (no need for huge
  investment)
What are we looking
         for?

• Data / Information
• Insights
• Actionable intelligence
Agenda

• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
Databases




A Relational Model of Data for Large Shared Data Banks
                                     Tedd Codd, CACM, June 1970

 Image: IBM
Big = Slow?

                                                                                    Throughput
           Throughput: records/ms




                                                                                  falls as datasets
                                                                                      get larger




                                    0     25                    50               75       100
                                                               Records (in millions)



Source: Gerard Maas, http://www.gerardmaas.net/2011/06/bigdata-on-rdbms
Scale-out versus Scale-up




21
Hadoop
• Great for unstructured data or arbitrary
  queries
• MapReduce framework for distributed
  compute
• Tools now making it accessible
• Still essentially a batch processing system
What about real-time?
Use cases
• Tracking trending topics on social media
• Network and infrastructure monitoring
• Web and ad analytics dashboard and
  platforms
• Real-time A-B testing
• User profiling
NOSQL

        Voldemort
No “one size fits all”

• Column DBs and Key-Value stores   P
• Document databases
• Graph databases
                               C        A
Questions to ask

• Who uses it?
• Who can support it? Where are they?
• How does it scale? Perform?
• Maturity, both DB and tool ecosystem
Changing economics
     XDR                         XDR
    metadata                    metadata



      Oracle

     NetApp              30 x $3k Dell servers



 30 days of SMS       1/5th TCO of alternatives
At capacity ceiling    Cost grows predictably
Agenda

• What’s Big Data?
• Why now?
• Why care?
• What’s the new technology good for?
• How can I get started?
Start small

• Identify data sources
• Look at capabilities
• Run experiments, PoCs
Data sources
     Web, SCM, Retail   Location Services    Infra Monitoring




     Smart Metering     Oil/Gas Sensors      Ad Marketplaces



            Fraud Detection        Social Media


31
Capabilities

     •   Open source, supported, or “packaged”
         solution?
     •   How do “commodity” servers fit your
         infrastructure?
     •   Don’t rule out Cloud deployments to get
         quick answers


32
Acunu
         Discover the Potential of Real Time Big Data with Acunu Activate                                                                                                                                                                        Acunu Reflex
                                                                                                                                                                                                                   Makes Big Data results easy, economic and fast


         Every CIO, Architect and Analyst knows of existing data with huge untapped potential within their organisation.
                                                                                                                                                                       Zero to Big Data Hero
         Evolving Big Data technologies provide new paths to revenue with both customers and prospects.                                                                    Build a Big Data database cluster on commodity hardware in hours, not days.



                                                                                                                                                             $
         Acunu partners with you to deliver competitive advantage by
         capturing data and exploring its benefits. You’ll validate the
         value of Big Data by building real applications and dashboards
         to drive new value for your business.
                                                                                  “ Key business andmanagement and processing landscape.
                                                                                    traditional data
                                                                                                     technology trends are disrupting the
                                                                                                                                                                       Save Money versus Open Source Alternatives
         At the outset, we work with you to identify and develop use
                                                                                     Data analysis is increasingly being viewed as a                                       Save up to 60% on hardware and operation costs.
                                                                                     competitive advantage. An increasingly sensor-enabled
         cases and areas where Big Data tools could be utilised to add
                                                                                     and instrumented business environment is generating huge
         significant business value. We work with you to recommend

                                                                                                                                                            z z z Database lag getting you down?
         solutions architectures for your specific use cases.                        volumes of data… Traditional IT infrastructure is simply

                                                                                                                                           ”
                                                                                                                                                             zz
                                                                                     not able to meet the demands of this new situation.
         We then deploy Acunu Reflex in your own data center or in the
         cloud and can include Apache Hadoop for investigative work and                                                       -Gartner                                     Milliseconds turning into minutes?
         Acunu Analytics for real-time decision support.

         Once the software is installed, we work with you to integrate, capture and store sources of data from inside
         your organisation. We provide hands-on assistance to help you showcase the business value of your data
         through live proof-of-concept applications. You’ll get results quickly, with successive iterations delivering                                  What is Acunu Reflex?
         ongoing value.



     As a result, you gain an understanding of Big Data’s transformative capability through working                                                         Easy
                                                                                                                                                            Acunu provides an integrated suite of technologies to support rapid development and deployment of
     demonstrations and have a clear route to deliver that competitive advantage to your business.                                                          your Big Data applications. Getting started is easy with a single, fast installation, handling all the details
                                                                                                                                                            usually associated with OS tuning, storage optimization, database integration and management.
                                                                                                                                                            This alleviates the complexity of NoSQL development, deployment and support. The platform is flexible
                                                                                                                                                            and scalable, providing simple, one click deployment. Scale linearly with ease and deploy across
                                                                                                                                                            numerous machines within a data center or across a globally distributed public or private cloud.
         Workshops                             Structure & Planning                    Ecosystem of Expertise
         Acunu Specialist delivers             A dedicated Project Lead will           Acunu’s Big Data expertise is
         workshops and provides                keep the project on track               complemented through our
         on-demand consulting to
         enable your development team
                                               through kickoff, reviews and
                                               regular calls. Progressively,
                                                                                       partners. Together we will build
                                                                                       your own Big Data ecosystem.                                         Economic
         to build Big Data applications.       we’ll help you plan next steps.
                                                                                                                                                            Acunu’s subscription base pricing model insures continuous value, skipping charges for non-production
                                                                                                                                                            deployment, so you can defer technology expenses until your application goes into production. Acunu
                                                                                                                                                            provides the NoSQL domain expertise you need, reducing your technology deployment costs without
                                                                                                                                                            compromising your data security. The platform is architected to store significantly more data per node

                        A Comprehensive Big Data Discovery Package                                                                                          than competing technologies with a focus on reducing both your initial hardware and operational costs
                                                                                                                                                            over time. Acunu’s support for commodity hardware and large capacity disks further reduces your costs.



         Deployment                            Data Source Integration                 Support & Training                                                   Fast
         We deploy Acunu’s database            We work with you to integrate           We deliver hands on training
         and storage software, complete        sources of log, clickstream,            on the Acunu Reflex                                                  Acunu provides a suite of products focused on bringing you the performance your Big Data applications
         with management tools, to             sensor, monitoring or similar           infrastructure to your                                               demand. Whether it’s a globally distributed database, millions to billions of records, tremendous amounts
         your own hardware or to               data into Acunu Reflex.                 operations staff, and provide                                        of machine generated data or managing millions of active users, Acunu provides you with real time
         Amazon’s public cloud.                                                        support throughout the project.
                                                                                                                                                            results. Acunu has the professional services and support to get your applications up and running in the
                                                                                                                                                            shortest possible time. Acunu leverages best in class open source solutions, adding additional
                                                                                                                                                            management and performance technology to accelerate your Big Data results.




33
www.acunu.com @acunu




Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and
elephant logos are trademarks of the Apache Software Foundation.

Exploring Big Data value for your business

  • 1.
  • 2.
    Agenda • What’s BigData? • Why now? • Why care? • Key technologies • How can I get started?
  • 3.
    Agenda • What’s BigData? • Why now? • Why care? • Key technologies • How can I get started?
  • 4.
    What’s Big Data? •Volume • Velocity • Variety • Value
  • 5.
  • 6.
    Challenge 40% 5% Growth of IT spending per year Growth of data generated per year Source: McKinsey
  • 7.
    Maybe Big Datais... • When any of volume, velocity, variety, value (cost?) becomes a problem • When new use cases emerge, new things become possible, because of new data sources
  • 8.
    For example UScell Items shared Smart meter updates Social media readings 2015 600B/day 4B/day 29B/day
  • 9.
    Agenda • What’s BigData? • Why now? • Why care? • Key technologies • How can I get started?
  • 10.
  • 11.
    Cost per gigabyte 1000 $569 100 $ per GB 10 1 $0.13 1992 1994 1996 1998 2000 2002 2004 2006 2008 Source: Deloitte
  • 12.
  • 13.
  • 14.
    Agenda • What’s BigData? • Why now? • Why care? • Key technologies • How can I get started?
  • 15.
    Why care? “Companies thatcan harness big data will trample data incompetents” The Economist, May 26th 2011
  • 16.
    Why care -take 2 • The competition will do it (and you’ll get fired) • Competitive advantage to be gained by doing it well (you get promoted) • It’s not hard to get started (no need for huge investment)
  • 17.
    What are welooking for? • Data / Information • Insights • Actionable intelligence
  • 18.
    Agenda • What’s BigData? • Why now? • Why care? • Key technologies • How can I get started?
  • 19.
    Databases A Relational Modelof Data for Large Shared Data Banks Tedd Codd, CACM, June 1970 Image: IBM
  • 20.
    Big = Slow? Throughput Throughput: records/ms falls as datasets get larger 0 25 50 75 100 Records (in millions) Source: Gerard Maas, http://www.gerardmaas.net/2011/06/bigdata-on-rdbms
  • 21.
  • 22.
    Hadoop • Great forunstructured data or arbitrary queries • MapReduce framework for distributed compute • Tools now making it accessible • Still essentially a batch processing system
  • 23.
  • 24.
    Use cases • Trackingtrending topics on social media • Network and infrastructure monitoring • Web and ad analytics dashboard and platforms • Real-time A-B testing • User profiling
  • 25.
    NOSQL Voldemort
  • 26.
    No “one sizefits all” • Column DBs and Key-Value stores P • Document databases • Graph databases C A
  • 27.
    Questions to ask •Who uses it? • Who can support it? Where are they? • How does it scale? Perform? • Maturity, both DB and tool ecosystem
  • 28.
    Changing economics XDR XDR metadata metadata Oracle NetApp 30 x $3k Dell servers 30 days of SMS 1/5th TCO of alternatives At capacity ceiling Cost grows predictably
  • 29.
    Agenda • What’s BigData? • Why now? • Why care? • What’s the new technology good for? • How can I get started?
  • 30.
    Start small • Identifydata sources • Look at capabilities • Run experiments, PoCs
  • 31.
    Data sources Web, SCM, Retail Location Services Infra Monitoring Smart Metering Oil/Gas Sensors Ad Marketplaces Fraud Detection Social Media 31
  • 32.
    Capabilities • Open source, supported, or “packaged” solution? • How do “commodity” servers fit your infrastructure? • Don’t rule out Cloud deployments to get quick answers 32
  • 33.
    Acunu Discover the Potential of Real Time Big Data with Acunu Activate Acunu Reflex Makes Big Data results easy, economic and fast Every CIO, Architect and Analyst knows of existing data with huge untapped potential within their organisation. Zero to Big Data Hero Evolving Big Data technologies provide new paths to revenue with both customers and prospects. Build a Big Data database cluster on commodity hardware in hours, not days. $ Acunu partners with you to deliver competitive advantage by capturing data and exploring its benefits. You’ll validate the value of Big Data by building real applications and dashboards to drive new value for your business. “ Key business andmanagement and processing landscape. traditional data technology trends are disrupting the Save Money versus Open Source Alternatives At the outset, we work with you to identify and develop use Data analysis is increasingly being viewed as a Save up to 60% on hardware and operation costs. competitive advantage. An increasingly sensor-enabled cases and areas where Big Data tools could be utilised to add and instrumented business environment is generating huge significant business value. We work with you to recommend z z z Database lag getting you down? solutions architectures for your specific use cases. volumes of data… Traditional IT infrastructure is simply ” zz not able to meet the demands of this new situation. We then deploy Acunu Reflex in your own data center or in the cloud and can include Apache Hadoop for investigative work and -Gartner Milliseconds turning into minutes? Acunu Analytics for real-time decision support. Once the software is installed, we work with you to integrate, capture and store sources of data from inside your organisation. We provide hands-on assistance to help you showcase the business value of your data through live proof-of-concept applications. You’ll get results quickly, with successive iterations delivering What is Acunu Reflex? ongoing value. As a result, you gain an understanding of Big Data’s transformative capability through working Easy Acunu provides an integrated suite of technologies to support rapid development and deployment of demonstrations and have a clear route to deliver that competitive advantage to your business. your Big Data applications. Getting started is easy with a single, fast installation, handling all the details usually associated with OS tuning, storage optimization, database integration and management. This alleviates the complexity of NoSQL development, deployment and support. The platform is flexible and scalable, providing simple, one click deployment. Scale linearly with ease and deploy across numerous machines within a data center or across a globally distributed public or private cloud. Workshops Structure & Planning Ecosystem of Expertise Acunu Specialist delivers A dedicated Project Lead will Acunu’s Big Data expertise is workshops and provides keep the project on track complemented through our on-demand consulting to enable your development team through kickoff, reviews and regular calls. Progressively, partners. Together we will build your own Big Data ecosystem. Economic to build Big Data applications. we’ll help you plan next steps. Acunu’s subscription base pricing model insures continuous value, skipping charges for non-production deployment, so you can defer technology expenses until your application goes into production. Acunu provides the NoSQL domain expertise you need, reducing your technology deployment costs without compromising your data security. The platform is architected to store significantly more data per node A Comprehensive Big Data Discovery Package than competing technologies with a focus on reducing both your initial hardware and operational costs over time. Acunu’s support for commodity hardware and large capacity disks further reduces your costs. Deployment Data Source Integration Support & Training Fast We deploy Acunu’s database We work with you to integrate We deliver hands on training and storage software, complete sources of log, clickstream, on the Acunu Reflex Acunu provides a suite of products focused on bringing you the performance your Big Data applications with management tools, to sensor, monitoring or similar infrastructure to your demand. Whether it’s a globally distributed database, millions to billions of records, tremendous amounts your own hardware or to data into Acunu Reflex. operations staff, and provide of machine generated data or managing millions of active users, Acunu provides you with real time Amazon’s public cloud. support throughout the project. results. Acunu has the professional services and support to get your applications up and running in the shortest possible time. Acunu leverages best in class open source solutions, adding additional management and performance technology to accelerate your Big Data results. 33
  • 34.
    www.acunu.com @acunu Apache, ApacheCassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation.

Editor's Notes

  • #2 \n
  • #3 \n
  • #4 \n
  • #5 Analyst firms appear to be fighting with each other to come up with the most Vs to define Big Data. \n“Big” seems to imply size of the dataset but there’s more to it than this.\n
  • #6 There’s nothing new about huge datasets: plenty of people playing with big data sets for years:\nSeismic survey datasets in the PB range, v. high end supercomputing hardware\nWeather: ECMWF has supercomputers with > PB disk storage\nHEP: 15PB/year for CERN HB project\nThe rest of us can’t afford supercomputers\n
  • #7 But we all have a challenge with datasets that grow faster than IT budgets (these numbers from McKinsey and are probably optimistic w.r.t. IT budgets)\n
  • #8 So maybe Big Data is really when we have one of these two things...and we’ve not already solved the problem. Perhaps there’s a silent [New] in “Big Data” :)\n
  • #9 Here are some new datasets that typify both the challange and the opportunity of big data. \n\nSo we’ve explored what Big Data might be. Let’s move on to look at why the Big Data hype is happening now\n
  • #10 \n
  • #11 Disks got cheaper!\n\nhttp://everyjoe.com/technology/hard-drive-cost-per-gigabyte-from-1980-to-2009/\n
  • #12 Exponential drop in price. (1GB cost around $200K in 1980). Today, I can buy SATA disks at 4p/GB,\n
  • #13 Basic economics. Reduce the price and the demand goes up.\n
  • #14 With huge reductions in cost and waves of commoditisation, the scene is set for repeated disruptive innovation, not just in storage technology itself but in the products and services that rely on it.\n
  • #15 \n
  • #16 What’s big data about? We’re looking to get insight from data. Data trumps intuition and commonsense every time (funny anecdotal examples). More data means better decisions, based on fact not folklore.\n
  • #17 \n
  • #18 Odd coloured cars more reliable.\nVegetarians less likely to miss flights.\nComputing hardware doesn’t fail at high temperatures as thought - but changing temperatures kill it.\nA person who’s just viewed a particular web page is more likely to buy product X\n
  • #19 \n
  • #20 RDBMS. Ted Codd, 1970, IBM. System/R, DB2, Oracle...\nBy late 1980s, it was the standard. Usurpers (e.g. Object-oriented Databases) failed to gain significant market share; hierarchical (IMS - developed for Apollo) and network databases (CODASYL) pretty much disappeared.\nHowever, while RDBMS have become the default choice, they aren’t necessarily the best for some Big Data UCs. Some problems: \n
  • #21 Problem 1: Performance. Dealing with time-series data is a common BD use case. We’re not looking to do complex transactions but we need to store the data so we can access it for analytics &c. RDBs do not handle this well. \n\nI’ve been investigating the performance of our “big vendor” RDBMS to hold months of sensor data. So far, the results are not really encouraging. I observe a exponential performance drop on the single-index (PK) table holding the data as more records are added. Here’s a plot of the performance of 5K records as records are continuously added to the table. Record addition is done with 5 parallel client threads, each inserting 1K records in batch mode. The client is an optimized Java app, using raw JDBC for the batch inserts. I haven’t found a faster way than that to add records to the relational DB.\n
  • #22 Problem 2: Increasingly we’d like to scale out rather than scale up. Why? (i) incremental capacity (and cost) management; (ii) availability; (iii) distribution; (iv) potential cloud deployment (onto relatively small machines)\n\nRelational DBs tend to push towards a single big machine, or a tightly coupled cluster with expensive h/w like Infiniband or SAN storage.\n
  • #23 From Google via Yahoo. Not really a database, but provides a distributed filesystem intended to store large files - with no schema.\nNot trivial to set up, but tools getting better - no longer need to write Java code to do queries. See Hive - HQL - for SQL-like access.\nBut it’s still batch, and plenty of time, you want real-time.\n
  • #24 We’re looking to act on the insights that the data bring. If we don’t act, we’re just observing. But action is often time critical; the world is changing. (e.g. we’re monitoring an oil well, trading financial instruments, trying to understand the behaviour of lots of people) and insights from yesterday’s data are historical documents: interesting, perhaps, but not great as a guide to action.\n
  • #25 Some concrete examples of things that people we’re working with are interested in capturing. \n
  • #26 Lots of databases.\n
  • #27 Lots of different kinds of databases with different goals in mind. \nOne way to view them is to see what sit in CAP terms (Brewer’s CAP theorem).\nMany different data models.\n
  • #28 \n
  • #29 Picking the right solution (or combination) can deliver significant cost savings, increased capability and allow granular growth over time.\n
  • #30 \n
  • #31 Paradoxically for “big data” consider starting with something small\nLet's look at the other items...\n
  • #32 \n
  • #33 \n
  • #34 How can we help?\nAcunu Activate: A focused package of work to help you discover big data opportunities and understand how to exploit them;\nAcunu Reflex: A fully supported distributed database to support real-time big data use cases\nAcunu Analytics: Currently in preview, launching soon, provides real-time results for queries that would normally be costly to compute\n
  • #35 \n