Making Sense of   Big Data  October11, 2012     #MakeSenseBD
#MakeSenseBD   “”    Information is powerful.    But it is how we use is it that will define us.   10/15/2012          Inf...
#MakeSenseBD   First    What is Big Data?   “data sets so large and complex that it becomes   difficult to process using o...
#MakeSenseBD                                #Volume                                #Velocity                              ...
#MakeSenseBD                  It’s All About The Data                         DIGITAL CONTENT                             ...
#MakeSenseBD   Problem                “Little Data For Business Users“   10/15/2012              Infochimps Confidential   6
#MakeSenseBD
#MakeSenseBD                                                         Problem                                             O...
#MakeSenseBD                                Problem                Complexity of A New Data Architecture                  ...
#MakeSenseBD
#MakeSenseBD                “Big Data For Business Users“   10/15/2012              Infochimps Confidential   11
#MakeSenseBD                                           $ $                                            $ $                 ...
#MakeSenseBD                #thisisreallygood   10/15/2012       Infochimps Confidential   13
#MakeSenseBD                #timeforaPOLL   10/15/2012      Infochimps Confidential   14
#MakeSenseBD   Next    Hadoop + NoSQL technologies =    the ability to process large and complex    data sets without the ...
#MakeSenseBD           Enterprise Data Warehouse                      Request                            Answer           ...
#MakeSenseBD                                  Big Data Warehouse    Search      Recommend             Rank                ...
#MakeSenseBD                Real                Time                                       Traditional Operational        ...
#MakeSenseBD                #lotsofdata        + #simpleanalytics   10/15/2012                 Infochimps Confidential   19
#MakeSenseBD    Images      Web, Mobile, CRM,                ERP, SCM…                                                 Bus...
#MakeSenseBD   Use Case    Hedge Fund    How do I predict whether companies will    make their quarterly earnings forecast...
#MakeSenseBD                Walmart   10/15/2012        Infochimps Confidential   22
#MakeSenseBD                Target   10/15/2012            Infochimps Confidential   23
#MakeSenseBD     Cars    In Lot    News    Text     Web    Pricing                                Quarterly               ...
#MakeSenseBD   Use Case    Media Company    How do I merge my traditional media    sources with new media sources to    pr...
#MakeSenseBD  New Media                          Data Scientist                        App Developer     Gnip   Powertrack...
#MakeSenseBD   Use Case    Retail Company    How do I increase online revenue?   10/15/2012        Infochimps Confidential...
#MakeSenseBD                                                   Family 60% + 10%                                           ...
#MakeSenseBD       Current      Campaign        Offers        Online      Click Data       Online                        T...
#MakeSenseBD                #85%AccurateFirstTime   10/15/2012         Infochimps Confidential   30
#MakeSenseBD                #timeforaPOLL   10/15/2012      Infochimps Confidential   31
#MakeSenseBD   I’m Ready    So How Do I Start?    …without spending a *$#&-load of    money before proving ROI?   10/15/20...
#MakeSenseBD                     Deployment Options                                 On-Premise                Public Cloud...
#MakeSenseBD                             You Manage                                      Someone Else Manages             ...
#MakeSenseBD   Who?                #InfochimpsOfCourse   10/15/2012         Infochimps Confidential   35
#MakeSenseBD                     Infochimps                                                        Enterprise Customers   ...
#MakeSenseBD                                                                 Data                                         ...
#MakeSenseBD                           Elastic Big Data PaaS   Deployment From Laptop to Cloud (Public & Private)         ...
#MakeSenseBD       Big Data Managed Service Offerings   Community                Public                  Virtual Private  ...
#MakeSenseBD                #LastPOLL   10/15/2012    Infochimps Confidential   40
#MakeSenseBD        #1 Big Data Platform For The Cloud                        #MakeSenseBD                  www.infochimps...
Upcoming SlideShare
Loading in …5
×

Making Sense of Big Data

2,593 views

Published on

Read this presentation deck by big data expert and Infochimps CEO, Jim Kaskade. During the presentation, he explains where to start and how to effectively manage, protect and leverage the growing amounts of data in your enterprise. See the video presentation here: http://bigdata.infochimps.com/making-sense-of-big-data

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,593
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide
  • Title slide: "Making Sense of Big Data" (I like the Elephant on the motorcycle as the background image here along with the descriptor, "We provide a suite of big data services in the cloud, used by enterprise customers who want to quickly unlock the value of their data"Slide 1:  "Information is Powerful, but it's how we use it..." Set the stage, we are here today to learn how to leverage Big Data to derive value and achieve insights. Slide 2: "What is Big Data?" The message here is to start at the beginning and define it for those in the audience who might be unclear (we know there are many people who are). Use the first slide from your CloudCon deck here.Slide 3: State of the world - data is increasing exponentially and it's only going to continue and therefore require infrastructure and management in order to provide useful insights. Use your slide 2 from your Cloudcon deck - it has a nice image of volume (which is one of the tenants of big data)Slide 4: Why is this occurring? Here the message is new types of data, batch vs. real-time -- everyone is "listening" now and measuring more activity, actions, conversations than every before. Use the CloudCon slide that builds vertically from batch to real time and horizontally from large enterprise to small enterprise.Slide 5: Problem: "Little Data for Business Users" slide from CloudCon. The message here is that due to the influx and types of data, etc. the actual users are too far removed from it and therefore blind to how to instill insights from it. Walk through the build as it explains really well how data moves throughout an organization and where the roadblocks are for getting insight to execs to act upon. Slide 6: Use the #thisreallysucks slide here to drive home the current state of being.Slide 7: "Big Data for Business Users" slide. This is the end state of being for executives looking to use data to improve operational efficiency and competitive advantage.Slide 8: Use the build slide here to show how we bring the data to the app developer and therefore reduce the friction for executives.Slide 9: Use the #thisisreallygood slide here to enhance the point that this is the way data and info should flowSlide 10: How do you achieve this state?Slide 11: Introducing Infochimps. Use the "Good to Great" (#2) slide from your 451 deck to give a brief overview of who we are. We cut our teeth on big data having built the largest data marketplace, where we leveraged the latest technology (Hadoop, etc) to manage big data. We realized that others must be realizing the same issues as IC and decided to externalize our platform to help companies implement their own Big Data infrastructure.Slide 12: Big Data Cloud Platform (the solution to the Big Data problem). Use slide #7 from 451 deck. Walk through the platform and the components, allowing attendees to see that we offer an end-to-end, cloud-based solution. Call out the value of our 4 pillars here - Fast, Simple, Flexible, Enterprise-ready.Slide 13: Deployment options slide. This is where we talk about IC being offered as a managed service and the value it affords. NOTE: Not sure if we want to communicate the Data Intelligence Network since we have not publicly or formally announced it. Slide 14: IC in action: Infomart use case (challenge, IC solution, result)Slide 15: one more use case if time (Koupon)?Slide 16: Close: Infochimps the #1 Big Data Platform for the Cloud. Include sales contact number at bottom of the slide along with web address.
  • AvinashKaushik gave a talk at Strata 2012 in Santa Clara in March….and quoted an Kenyan Farmer.If you listen to all the hype of Big Data, it solves for the first problem.If you listen to all the vendors, there is a lot of emphasis on the first part (perhaps Infochimps included), and very little on the second.I think that’s because we don’t exactly know how to truly empower the organization to interact directly with any/all data available.It’s too expensive, risky, complex.
  • AvinashKaushik gave a talk at Strata 2012 in Santa Clara in March.If you listen to all the hype of Big Data, it solves for the first problem.If you listen to all the vendors, there is a lot of emphasis on the first part (perhaps Infochimps included), and very little on the second.I think that’s because we don’t exactly know how to truly empower the organization to interact directly with any/all data available.It’s too expensive, risky, complex.
  • 40%+ YoY growth with 2012 generating 2.4Zettabytes alone.http://jameskaskade.com/?p=2040http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
  • Discussions with O’Reilly Media, Teradata, Aster Data, Yahoo!, eBay, and Facebook.The issue is not just the fact that unstructured data is exploding, but the number of sources and types of data as well…all fed from the explosion of devices used by people to interact with each other, products, and services.
  • We have a problem today WITH our data infrastructure….our ability to gleam insights.I think all of you know what I’m referring to…..It’s the fact that we’re operating on less than 15% of the corporate data available to us…..even with the ENTERPRISE DATA WAREHOUSE, the EDW which is supposedly storing a COMPLETE, SINGLE VIEW OF THE TRUTH….We’re still giving our business users…..a tiny bit…a little bit of data.
  • http://blogs.the451group.com/opensource/2011/04/15/nosql-newsql-and-beyond-the-answer-to-sprained-relational-databases/NoSQL databases designed to meet scalability requirements of distributed architectures and/or schema-less data management requirements, including big tables, key value stores, document database and graph databasesNewSQL databases designed to meet scalability requirements of distributed architectures or to improve performance such that horizontal scalability is no longer a necessity, including new MySQL storage engines, transparent sharding technologies, software and hardware appliances, and completely new databasesData grid/cache products designed to store data in memory to increase application and database performance, covering a spectrum of data management capabilities from non-persistent data caching to persistent caching, replication, and distributed data and compute grid functionalityhttp://en.wikipedia.org/wiki/DatabaseThe first generation of database systems were navigational,[2] applications typically accessed data by following pointers from one record to another. The two main data models at this time were the hierarchical model, epitomized by IBM's IMS system, and the Codasyl model (Network model), implemented in a number of products such as IDMS.http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redisNew SQL: The “new relational databases” that retain SQL & ACID compliance*Scalable with distributed architectures, or*Performance improved such that horizontal scalability no loner necessitySchoonerSQL: http://www.schoonerinfotech.com/Tokutek: http://www.tokutek.com/Continuent: http://www.continuent.com/Translattice: http://www.translattice.com/ScaleBase: http://www.scalebase.com/CodeFutures: http://www.codefutures.com/database-products/VoltDB: http://voltdb.com/HandlerSocket: https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQLAkiban: http://www.akiban.com/MySQL Cluster: http://www.mysql.com/products/cluster/Clustrix: http://www.clustrix.com/Drizzle: http://www.drizzle.org/GenieDB: http://www.geniedb.com/ScalArc: http://scalarc.com/NimbusDB: http://nimbusdb.com/NimbusDB/NimbusDb.html
  • http://www.nrf-arts.org/content/unifiedposStep 1:Integrate into CRM (email)Step 2:Integrate into WebStep 3:Integrate into POS (UnifiedPOS)
  • The Business User
  • The Business User
  • The Business User
  • Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  • AMP:access module processorsPE: Parsing EngineBYNET: Banyan Cross-bar Switch YNET (Y Network)Store:The Parsing Engine dispatches a request to retrieve one or more rows.The BYNET ensures that appropriate AMP(s) are activated.The Parsing Engine dispatches a request to insert a row.The BYNET ensures that the row gets to the appropriate AMP (Access Module Processor) via the hashing algorithm.The AMP stores the row on its associated disk.Each AMP can have multiple physical disks associated with it.Retrieve:The AMPs (access module processors) locate and retrieve desired rows in parallel access and will sort, aggregate or format if needed.The BYNET returns retrieved rows to Parsing Engine.The Parsing Engine returns row(s) to requesting client application.Teradata’s shared-nothing architecture allows for highly scalable data volumes.
  • 3 node Hadoop system:$8K/node$10K switch$4K/node HadoopDistro$24K + $10K x 25%x3 maintenance = $43K$4K x 3 x 3 = $36KTotal = There are three essential elements of an analytic platform: Strong support for analytic database query. A variety of query styles — at a minimum, SQL, MDX or graph.Strong support for analytic processes other than queries. Typically these would be in the areas of mathematics (statistics, predictive analytics, data mining, linear algebra, optimization, graph theory, etc.) and/or data transformation (e.g. sessionization, entity extraction).Strong integration between the first two.The point is — an analytic platform is something on which you can build a range of powerful analytic applications. Some specifics of what to look for in analytic platform may be found in the link above.http://www.dbms2.com/2011/02/24/analytic-platforms/http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/Enterprise data warehouse (Full or partial)Kinds of data likely to be included: All, but especially operationalLikely use styles: AllCanonical example: Central EDW for a big enterpriseStresses: Concurrency, reliability, workload managementClassical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL ServerTraditional data martKinds of data likely to be included: AllLikely use styles: Business intelligence, budgeting/consolidation, investigativeExamples: Reporting servers, planning/consolidation servers, anything MOLAP, etc.Stresses: Performance, concurrency, TCOColumnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them — e.g. Sybase IQ and Vertica — have excellent track records in concurrent usage as well.Investigative data mart — agileKinds of data likely to be included: All, especially customer-centricLikely use styles: InvestigativeCanonical example: A few analysts getting a few TB to examineStresses: Ease of setup/load, ease of admin, price/performanceInfobright is often cost-effective among columnar analytic DBMS. Investigative data mart — bigKinds of data likely to be included: All, especially customer-centric, logs, financial trade, scientificLikely use styles: InvestigativeCanonical example: Single-subject 20 TB – 20 PB relational databaseStresses: Performance, scale-out, analytic functionalityPerformance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum.Bit bucket - HadoopKinds of data likely to be included: Logs, other technical/externalLikely use styles: Staging/ETL, investigativeCanonical example: Log files in a Hadoop clusterStresses: TCO, scale-out, transform/big-query performance, ETL functionalityArchival data storeKinds of data likely to be included: Operational, CDR (call detail record), security logLikely use styles: Archival, reporting (for compliance), possibly also investigativeExamples: Any long-term detailed historical storeStresses: TCO, compression, scale-out, performance (if multi-use)Perhaps only Rainstor truly embraces the archival positioningOutsourced data martKinds of data likely to be included: AllLikely use styles: Traditional BI, investigative analytics, staging/ETLExamples: Advertising tracking, SaaS CRMStresses: Performance, TCO, reliability, concurrencyOracle shops = Vertica gets the nod in a number of these casesOperational analytic(s) serverKinds of data likely to be included: Customer-centric, log, financial tradeLikely use styles: Advanced operational analyticsExamples:Lower latency: Web or call-center personalization, anti-fraudHigher latency: Customer profiling, Basel 3 risk analysisStresses: Performance, reliability, analytic functionality, perhaps concurrencyhttp://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/
  • The Business User
  • The way this is performed is by taking data sources like images and storing them into Hadoop. Then using Big Data tools like MapReduce to perform sophisticated analysis on those aggregated data sets.Why is this concept so disruptive?Things like a fraction of the price….no structured data model – aka no star schema…yet the ability to run sophisticated queries and algorithms against all your detailed data.
  • Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  • The current image shows a Walmart in Wichita, Kansas.Analysts count cars in Wal-Mart parking lots to measure overall customer traffic to understand growth versus its competition.For example, Wal-Mart's growthwas determined to come mostly from areas of high unemployment.This type of analysis is being performed in Amazon”s EC2…
  • The current image shows the a Target in the Moraine Point Plaza located in Gardiner, NorthAnalysts comparing satellite parking lot data with regional unemployment trends found Target's growth tended to come in areas of lower-than-average unemployment.

Again, these processes are being performed in Amazon EC2.…this is interesting….but how do we process the data further to help derive more relevant insights?http://www.cnbc.com/id/38738810/Spying_For_Profits_The_Satellite_Image_Indicator
  • The previous examples of Walmart and Target involved using a regression algorithm which was executed against the satellite data + other data to produce a quarterly revenue prediction which BEAT all previous models.
  • Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  • Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  • The Business User
  • Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  • Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  • Slide 1: Company Overview.The best way to give an overview of your company is to state concisely your core value proposition: What unique benefit will you provide to what set of customers to address what particular need? Then you can add three or four additional dot points to clarify your target markets, your unique technology/solution, and your status (launch date, current customers, revenue rate, pipeline, funding needed). Key objective: Flesh out the foundation you established at the beginning. At this point, no one should have any question about what it is that your company does, or plans to do. The only questions that should remain are the details of how you are going to do it. Another key objective you should have achieved by this point in your presentation is to make sure that if there are some compelling brand names associated with your company (customers, partners, investors, advisors), your audience knows about them. Feel free to drop names early and often—starting with your first email introduction to the investor. Brand name relationships build your credibility, but do not overstate them if they are tenuous.Use-cases:RunaAutomated real-time online offers - monitors and analyzes shopper behavior on web, and then makes each shopper a personalized offerInfochimps helps Runa configure and manage their entire production system, including Hadoop, HBase, messaging, monitoring, and more. (using Ironfan – Robert Berger)SpringSenseintelligence enterprise document searchSpringSense uses Infochimps to scale its award-winning technology to process the full Wikipedia corpus - over 4 million articles - for rapid meaning-based search. (using Ironfan)Black LocusCompetitive pricing analytics platform for enterprisesIngesting millions of product pricing data points from the web, analyzing historical and current data, presenting analytic results in real-time.Koupon MediaMobile coupon platformFor every user who enters into the mobile coupon system, more demographic information is needed to help target the right coupon to the right customer and in real-time.BlueCavaBehavioral target marketing platform - joins customers across any/all devices & augments w/ demograph / behavioral for targeted advertisingFor every user who enters into the mobile coupon system, more demographic information is needed to help target the right coupon to the right customer and in real-time.A new Attribution data product (using Hadoop) which determines correlations between customer purchases / conversions to advertising impressions and website behavior.InfoMartLargest media company in Canada transforming business from print to digital – focus is on engaging and better understanding their audiencesSocial media listening platform which consists of both real-time social feed search / analytics / reporting for InfoMart and their customers + historic analysis / trending research.
  • Slide 3: Solution.What specifically are you offering to whom? Software, hardware, services, a combination? Use common terms to state concretely what you have, or what you do, that solves the problem you’ve identified. Avoid acronyms and don’t try to use these precious few words to create and trademark a bunch of terms that won’t mean anything to most people, and don’t use this as an opportunity to showcase your insider status and facility with the idiomatic lingo of the industry. If you can demonstrate your solution (briefly) in a meeting, this is the place to do it.Slide 3.1: Delivering the Solution.You might need an extra slide to show how your solution fits in the value chain or ecosystem of your target market. Do you complement commonly used technologies, or do you displace them? Do you change the way certain business processes get executed, or do you just do them the same way, but faster, better and cheaper? Do you disrupt the current value chain, or do you fit into established channels? Who exactly is the buyer, and is that person different than the user?
  • Slide 7: Go to Market Strategy.The single most compelling slide in any pitch is a pipeline of customers and strategic partners that have already expressed some interest in your solution—if they haven’t already joined your beta program. Too often this slide is, instead, a bland laundry list of standard sales and marketing tactics. You should focus on articulating the non-obvious, potentially disruptive elements of your strategy. Even better, frame your comments in terms of the critical hurdles you need to get over, and how you are going to jump them. If you don’t have a pipeline, and there is nothing unique or innovative about your strategy, then drop this slide and make the elements of your sales model clear in the discussion of your business model (next slide).
  • 1. Which best describes your position in your organization? a. Executive (VP, SVP, C-level)b. Business User (Marketing, Product, etc.)c. Analytics Team (Data Scientist, Analyst, etc.)d. IT User (App Dev, DevOps, Project Manager, etc)e. Other2. Do you have a current or upcoming Big Data project? a. Yesb. Noc. Not Sure3. Which deployment option do you prefer? a. Public Cloudb. Private Cloudc. Virtual Private Cloudd. No Cloude. Not Sure
  • Making Sense of Big Data

    1. 1. Making Sense of Big Data October11, 2012 #MakeSenseBD
    2. 2. #MakeSenseBD “” Information is powerful. But it is how we use is it that will define us. 10/15/2012 Infochimps Confidential 2
    3. 3. #MakeSenseBD First What is Big Data? “data sets so large and complex that it becomes difficult to process using on-hand database management tools.” 10/15/2012 Infochimps Confidential 3
    4. 4. #MakeSenseBD #Volume #Velocity #Variety 2010 = 1.2 2020 = 35.2 Zettabytes/yr Zettabytes/yr Source: 2011 IDC Digital Universe Study 10/15/2012 Infochimps Confidential 4
    5. 5. #MakeSenseBD It’s All About The Data DIGITAL CONTENT OPERATIONAL DATA WEB LOGS SOCIAL MEDIA FILES SMART GRIDS TRANSACTIONAL DATA AD IMPRESSIONS R&D DATA 5
    6. 6. #MakeSenseBD Problem “Little Data For Business Users“ 10/15/2012 Infochimps Confidential 6
    7. 7. #MakeSenseBD
    8. 8. #MakeSenseBD Problem One Size Does Not Fit All Non-Relational Relational Analytic Teradata IBM InfoSphere Aster Netezza HP Vertica Infobright Hadoop Hadapt ParAccel Horton Calpont EMC SAP Hana Oracle Cloudera VectorWise Greenplum SAP Sybase IQ Times-Ten MapR Zettaset Operational Spark Oracle IBM DB2 SQLSrvr JustOneDB InterSystems Progress Document MarkLogic MySQL Ingress PostgreSQL Objectivity McObject Lotus Notes Sybase ASE EnterpriseDB Versant NoSQL CouchDB NewSQL MongoDB ‘Data as a Service’ HandlerSocket Key Amazon RDS Couchbase RavenDB Akiban Cloudant SQL Azure Value App Engine MySQL Cluster Database.com SimpleDB Clustrix Xeround FathomDB Drizzle Riak Big Tables GenieDB Redis Cassandra Graph SchoonerSQL ScaleBase ScalArc Membrain Tokutek NimbusDB FlockDB CodeFutures Voldemort HyperTable InfiniteGraph Continuent VoltDB BerkeleyDB HBase Neo4j Translattice AllegroGraph 10/15/2012 Infochimps Confidential 8
    9. 9. #MakeSenseBD Problem Complexity of A New Data Architecture Structured BI User Departmental Reports (reports) Online Teradata Click Data Data Warehouse SQL BI Data Mart Server Virt Virt Virt Online DM DM DM BI Data CRM Real-Time Data App Data Streaming Server Operational Customer Application POS Data BI Hadoop Server Cust Srvc Analytics User Data Call Logs NoSQL Warehouse Platform In-Memory Social Sandbox Sandbox Sandbox Sandbox Analytics Bus User Semi-structured IT (ETL) (Reports)
    10. 10. #MakeSenseBD
    11. 11. #MakeSenseBD “Big Data For Business Users“ 10/15/2012 Infochimps Confidential 11
    12. 12. #MakeSenseBD $ $ $ $ ? Executive Data 10/15/2012 Infochimps Confidential 12
    13. 13. #MakeSenseBD #thisisreallygood 10/15/2012 Infochimps Confidential 13
    14. 14. #MakeSenseBD #timeforaPOLL 10/15/2012 Infochimps Confidential 14
    15. 15. #MakeSenseBD Next Hadoop + NoSQL technologies = the ability to process large and complex data sets without the challenges associated with legacy, and at a fraction of the price. 10/15/2012 Infochimps Confidential 15
    16. 16. #MakeSenseBD Enterprise Data Warehouse Request Answer Parsing ? Engines BYNET Interconnect Amp Amp Amp Node Node Node .... PARC | 16
    17. 17. #MakeSenseBD Big Data Warehouse Search Recommend Rank Analytic Request Master: Answer Score Next-Best-Action Name Node Job Tracker Ethernet Interconnect Slave: Slave: Slave: Task Trckr Task Trckr Task Trckr Data Node Data Node Data Node Semi- .... Structured Data PARC | 17
    18. 18. #MakeSenseBD Real Time Traditional Operational Application Ecosystem Deployment in Analytic Public/Private Cloud Appliances Toolset Integration Traditional Decision Support Hardened Batch Large Small Enterprise Enterprise 10/15/2012 Infochimps Confidential 18
    19. 19. #MakeSenseBD #lotsofdata + #simpleanalytics 10/15/2012 Infochimps Confidential 19
    20. 20. #MakeSenseBD Images Web, Mobile, CRM, ERP, SCM… Business Docs, Transactions & Text Interactions Web Logs SQL NoSQL NewSQL Social EDW MPP NewSQL Sensors Business Intelligence & Analytics Dashboards, Reports GPS Visualization… 10/15/2012 Infochimps Confidential 20
    21. 21. #MakeSenseBD Use Case Hedge Fund How do I predict whether companies will make their quarterly earnings forecast? 10/15/2012 Infochimps Confidential 21
    22. 22. #MakeSenseBD Walmart 10/15/2012 Infochimps Confidential 22
    23. 23. #MakeSenseBD Target 10/15/2012 Infochimps Confidential 23
    24. 24. #MakeSenseBD Cars In Lot News Text Web Pricing Quarterly Revenue Prediction Social Sentiment Weather Sensors Local Employment 10/15/2012 Infochimps Confidential 24
    25. 25. #MakeSenseBD Use Case Media Company How do I merge my traditional media sources with new media sources to provide improved and instant insights to my customers? 10/15/2012 Infochimps Confidential 25
    26. 26. #MakeSenseBD New Media Data Scientist App Developer Gnip Powertrack Business Users Gnip EDC Sources Sentiment Moreover Metabase In-Motion Data Delivery APIs Listening Service Application TV Transcription NoSQL Radio Transcription Print Transcription IT StaffTraditional Media 10/15/2012 Infochimps Confidential 26
    27. 27. #MakeSenseBD Use Case Retail Company How do I increase online revenue? 10/15/2012 Infochimps Confidential 27
    28. 28. #MakeSenseBD Family 60% + 10% Million $ Q 40% Color 30% Welcome 15% Kids Exclusive Current Baby 60% Approved Hue Denim Weekend 15% Threadless Offers Sunday 25% Denim Million $ Q Spring 25% Khakis Color 30% Color 30% Million $ Q Color Denim 30% Khakis Hoodies 10% Dynamically Populated Personalized Email Known & Unknown Existing Customers & Approved Online/Offline Behavior Product Content
    29. 29. #MakeSenseBD Current Campaign Offers Online Click Data Online Traditional BI Data Analytics Targeted Offers Personalized Data & Products Email Campaign Past CRM Data Model Hadoop Graph POS Cluster Analytics Data Data Model Cust Srvc Measure Call Logs Performance Social Product Content
    30. 30. #MakeSenseBD #85%AccurateFirstTime 10/15/2012 Infochimps Confidential 30
    31. 31. #MakeSenseBD #timeforaPOLL 10/15/2012 Infochimps Confidential 31
    32. 32. #MakeSenseBD I’m Ready So How Do I Start? …without spending a *$#&-load of money before proving ROI? 10/15/2012 Infochimps Confidential 32
    33. 33. #MakeSenseBD Deployment Options On-Premise Public Cloud Provider Trusted Data Center Provider 10/15/2012 Infochimps Confidential 33
    34. 34. #MakeSenseBD You Manage Someone Else Manages $ $ $ $ $ Private Big Data Virtual Private Big Public Big Data Virtual Private Big Public Big Data Cloud (You Data Cloud (You Cloud (You Data Cloud Cloud (Managed Manage) Manage) Manage) (Managed Service) Service) $Cost Security Risk Time To Market 10/15/2012 Infochimps Confidential 34
    35. 35. #MakeSenseBD Who? #InfochimpsOfCourse 10/15/2012 Infochimps Confidential 35
    36. 36. #MakeSenseBD Infochimps Enterprise Customers • Managed Big Data Services • Elastic & Secure Private & Public Clouds • Across a Global Network of App BI Analytics Sys BI Trusted Data Center Data Lang Data Intelligence Data Delivery Delivery Network Providers Hadoop NoSQL Infra • With Batch & Real-time Delivery Analytic Framework Global Network Of • Supporting Structured & Data Center Infrastructure Providers Unstructured Data 10/15/2012 Infochimps Confidential 36
    37. 37. #MakeSenseBD Data Intelligence Network Cloud-based Data PaaS Virtual Private & Public Cloud Data Tier4 Lights Out Data Centers Marketplace OpenStack & VSphere Managed Services Big Data PaaS Public Cloud 15,000 Data Sets Amazon & Rackspace Managed Services 10/15/2012 Infochimps Confidential 37
    38. 38. #MakeSenseBD Elastic Big Data PaaS Deployment From Laptop to Cloud (Public & Private) Amazon, Rackspace, OpenStack & VSphere Ironfan 10/15/2012 Infochimps Confidential 38
    39. 39. #MakeSenseBD Big Data Managed Service Offerings Community Public Virtual Private Private Cloud Cloud Cloud Access to Pre-integrated, pre- Pre-integrated, pre- Pre-integrated, pre- Infochimps Big tested Big Data tested Big Data tested Big Data Data Platform via stack stack stack open source Quickly deploy in Deployed in a Deployed in your Deploy Anywhere Amazon trusted lights-out Data Center - Cloud, Rackspace data center Open Stack or Cloud network Vsphere Try It Under Your High SLA Control Fully Managed Managed Service Customized Service Managed Service 10/15/2012 Infochimps Confidential 39
    40. 40. #MakeSenseBD #LastPOLL 10/15/2012 Infochimps Confidential 40
    41. 41. #MakeSenseBD #1 Big Data Platform For The Cloud #MakeSenseBD www.infochimps.com/demo 1-855-DATA-FUN (1-855-328-2386) 10/15/2012 Infochimps Confidential 41

    ×