Successfully reported this slideshow.
Your SlideShare is downloading. ×

Telco Big Data Workshop Sample

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Big Data Overview
Big Data Overview
Loading in …3
×

Check these out next

1 of 167 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Telco Big Data Workshop Sample (20)

More from Alan Quayle (20)

Advertisement

Recently uploaded (20)

Telco Big Data Workshop Sample

  1. 1. Introduction to Big Data and Real Time Analytics Workshop Telco Big Data & Real Time Analytics Summit 2012 3-5 December 2012, London www.alanquayle.com/blog © 2012 Alan Quayle Business and Service Development 1
  2. 2. "There are three kinds of lies: lies, damned lies, and statistics." British Prime Minister Benjamin Disraeli (1804–1881), or perhaps Samuel Langhorne Clemens (1835 – 1910) better known as Mark Twain © 2012 Alan Quayle Business and Service Development 2
  3. 3. Never Forget This! People Most projects fail here Process Technology © 2012 Alan Quayle Business and Service Development 3
  4. 4. The Data Tsunami! © 2012 Alan Quayle Business and Service Development 4
  5. 5. Why are we measuring so many things? • Atoms vibrate at about 10^13 Hz, assuming we only measure the atom and not the subatomic constituents to the resolution of only 1 byte, that’s 10TB per second • Now there are rough 7*10^27 atoms in the human body • So just monitoring one human body’s atoms will generate 7*10^40 bytes per second. • That’s 2*10^48 bytes in a year, that’s 2 yotta yotta bytes • By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes, that’s only 35*10^21 • Its easy (fun) to play with numbers! Lies, damned lies and statistics! • We do not need to measure each revolution of an airplane’s turbine, only when an event (out of tolerance) occurs does it matter. o Events and collecting what matters, NOT collecting everything all the time! o How do we know what matters? Common sense, knowing your business and experimentation! © 2012 Alan Quayle Business and Service Development 5
  6. 6. Beware the “Bait and Switch” © 2012 Alan Quayle Business and Service Development 6
  7. 7. Data You Need Lots of It!! © 2012 Alan Quayle Business and Service Development 7
  8. 8. But There’s a Shortage of Data Scientists to Do Anything With It © 2012 Alan Quayle Business and Service Development 8
  9. 9. So Give Me All Your Money © 2012 Alan Quayle Business and Service Development 9
  10. 10. Introduction • The purpose of this one day workshop is to provide both an introduction and pragmatic insight into Big Data, Data Science and Real-Time Analytics. • This course will provide a frank and objective review of the state of the art and the market. Examining what is working in practice and what is not through an extensive series of case studies. • Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data. o Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes. o A new platform of "big data" tools has arisen to handle sense-making over large quantities of data, for example the Apache Hadoop Big Data Platform. • Analyzing large data sets in near real-time is not new, business intelligence is as old as business itself (that is as old as human society). o IT automated it, and enabled an organization to own it rather than in the wet-ware of a few human brains (generally the owners of a business.) o Some real-time analysis results in automated triggers, so called machine learning, most analysis still requires human interpretation which is not straight forward. o Analysis of such large and mixed data sources has its own problems, as we’ll discuss in the course. o Privacy and regulation cannot be ignored, for some industries this will limit the application of Big Data. © 2012 Alan Quayle Business and Service Development 10
  11. 11. Structure Part 1 of 5 • 09:00 Registration • 09:30 History and Overview: Understanding Big Data and Real-Time Analytics in Context • What do we mean by Big Data? • History of Big Data • Why does Big Data matter? • Taxonomy of Big Data Companies • Big Data Maturity • Big Data Landscape • The 3Vs” Volume, Variety and Velocity • List of Companies in Big Data (and their Big • What are the Domains of Big Data? Data revenues) • Big Data Technologies • Big Data Market Sizing • What Enterprises Think of Big Data • Telecoms and Real-Time • How Enterprise Verticals are Impacted by Big Data • O2 More: Proof we can do it! • Why Now? • Key Trends driving towards Big Data • 10:45 Coffee Break © 2012 Alan Quayle Business and Service Development 11
  12. 12. Structure Part 2 of 5 • 11:00 Quick Technology Review: Diving into a little detail on a few of the key technologies (only as deep as the architecture) to understand their history and capabilities / limitations • Hadoop o What is Hadoop? o Ecosystem o History o Design Axioms o Hadoop Distributed File System o MapReduce: Distributed Processing o Architecture o Data Schemas o Query Language Flexibility o Economics o Case Studies • Hadoop and Hbase in the Cloud (Amazon) • NoSQL and Cassandra + some use cases • Hbase versus Cassandra • Graph Database introduction © 2012 Alan Quayle Business and Service Development 12
  13. 13. Structure Part 3 of 5 • The Social Enterprise o Business Benefits ALU example • 12:00 & 14:00 Application of Big Data o o Drivers • Hardware and Software Trends o Social + Data Analysis = Business o Execution and Results Characteristics intelligence o Framework: Ecosystem, Application Services, Data o AT&T Case Study Management o Lessons Learned • Real-Time Analytics • Telcos and Big Data o Use Cases o TMF Survey o Extended RDMS versus MapReduce / Hadoop o Big Data Framework o Requirements, Trends, People and Organization o Predictive / Adaptive Analytics Issues, Outlook o Decision Engineering • Big Data and the Cloud o The Problem with Telecom Why the Cloud and Big Data? o • Telco Analytics o Cloud benefits o Customer Profiling o Use Cases: Bankinter, Etsy, Razorfish o Next Product Tools o Marketing Mix Modeling o Cost of Acquisition Tools o Case Study • 13:00ish Lunch © 2012 Alan Quayle Business and Service Development 13
  14. 14. Structure Part 4 of 5 • 15:00 Ecosystem, Taxonomies and • Case Studies Suppliers: Understanding the many • Real Time Analytics for Big Data Lessons from suppliers, technology camps, and Facebook o Quick technology review approaches o Facebook Real-time Analytics System • Taxonomy of Big Data Companies o Goal • Big Data Landscape o Actual Analytics • Cloudera o Solution • Autonomy o Memory, Collocate, Economics • Vertica • Real Time Analytics for Big Data Lessons from • InfoChimps Twitter • Guavas o Requirements Actual Analytics • Matrix o o Challenges o Performance o One data any API o Solution o Memory, Collocate, Economics • Other Case Studies • Orbitz, Hertz, Yelp © 2012 Alan Quayle Business and Service Development 14
  15. 15. Structure Part 5 of 5 • 16:00 Global Enterprise and Telecom Survey on Big Data and Real-Time Analytics • Background • The Questions • The Importance of Analytics • Impact of Big Data on Analytics • Size of Data Sets, Number of Data Sources • Update Frequency • Integration of Data Sources • Data Set Responsibility • Types of Data, Types of Processing and Analytics • Challenges • Big Data Analytics Platforms • Benefits and Plans • Data Analytics Storage and IT Infrastructure Requirements • Increasing Interest in Hadoop MapReduce Framework Technology • Conclusions • Recommendations and Wrap Up © 2012 Alan Quayle Business and Service Development 15
  16. 16. Alan Quayle • 22 years of experience in the telecommunication industry, focused on developing profitable new businesses in service providers, suppliers and start-ups. • Customers include o Operators such as AT&T, BT, Charter, Etisalat, M1, O2, Rogers, Swisscom, T-Mobile, Telstra, Time Warner Cable, Verizon and Vodafone; o Suppliers such as Adobe, Alcatel-Lucent, Ericsson, Huawei, Nokia Siemens Networks, and Oracle; and o Innovative start-ups such as Apigee, AppTrigger (sold to Metaswitch), Camiant (sold to Tekelec), OpenCloud, and Voxeo. • Work with the developer community and on the board of developers such as GotoCamera, hSenid Mobile, as well as suppliers such as Sigma Systems. • Weblog www.alanquayle.com/blog • Linkedin http://www.linkedin.com/in/alanquayle © 2012 Alan Quayle Business and Service Development 16
  17. 17. A Thank You to Those helping me Put this Course Together • In putting this workshop together I’d like to thank the following suppliers for their time, openness, willingness to review, and provide material to ensure this workshop is up-to-the-minute. o And especially for not requiring any editorial control over the content or my views expressed in this material (in reverse alphabetically order). • Guavas • HP (don’t mention the Autonomy deal) • Versant, NoSQL database vendor • Ty Wang, social media entrepreneur using FB Social Graph • Lorien Pratt, Data / Decision Scientist with Telco focus • Amazon Web Services • Matrixx © 2012 Alan Quayle Business and Service Development 17
  18. 18. Introductions • Spend 2 minutes to introduce yourself o Name, current employer and job o Let us know your favorite hobby • For me its hiking with my family o What you want to get out of this course • What topics are most important to you? 18 (c) 2012 Alan Quayle Business and Service Development
  19. 19. History and Overview Understanding Big Data and Real-Time Analytics in Context © 2012 Alan Quayle Business and Service Development 19
  20. 20. Structure • What do we mean by Big Data? • History of Big Data • Why does Big Data matter? • Taxonomy of Big Data Companies • Big Data Maturity • Big Data Landscape • The 3Vs” Volume, Variety and Velocity • List of Companies in Big Data (and • What are the Domains of Big Data? their Big Data revenues) • Big Data Technologies • Big Data Market Sizing • What Enterprises Think of Big Data • Telecoms and Real-Time • How Enterprise Verticals are Impacted • O2 More: Proof we can do it! by Big Data • Why Now? • Key Trends driving towards Big Data © 2012 Alan Quayle Business and Service Development 20
  21. 21. What Do We Mean by Big Data? © 2012 Alan Quayle Business and Service Development 21
  22. 22. IDC’s Definition of Big Data © 2012 Alan Quayle Business and Service Development 22
  23. 23. What is Big Data © 2012 Alan Quayle Business and Service Development 23
  24. 24. Why does Big Data Matter? © 2012 Alan Quayle Business and Service Development 24
  25. 25. © 2012 Alan Quayle Business and Service Development 25
  26. 26. © 2012 Alan Quayle Business and Service Development 26
  27. 27. © 2012 Alan Quayle Business and Service Development 27
  28. 28. Another Version of the 3 Vs • Volume: Data sets are expanding constantly. A strategic approach to big data takes into account ways to store and manage the huge volumes of data that are being generated. • Variety: Big data comes in many forms. Analyzing multi-structured data can yield important insights that can help direct a business strategy. • Velocity: The speed at which data is analyzed is everything, especially when working in a time sensitive business environment. © 2012 Alan Quayle Business and Service Development 28
  29. 29. © 2012 Alan Quayle Business and Service Development 29
  30. 30. © 2012 Alan Quayle Business and Service Development 30
  31. 31. © 2012 Alan Quayle Business and Service Development 31
  32. 32. © 2012 Alan Quayle Business and Service Development 32
  33. 33. © 2012 Alan Quayle Business and Service Development 33
  34. 34. What are the Domains of Big Data? © 2012 Alan Quayle Business and Service Development 34
  35. 35. Big Data Technology Stack © 2012 Alan Quayle Business and Service Development 35
  36. 36. Big Data Technologies © 2012 Alan Quayle Business and Service Development 36
  37. 37. The Technology has Become Quite Fashionable © 2012 Alan Quayle Business and Service Development 37
  38. 38. © 2012 Alan Quayle Business and Service Development 38
  39. 39. © 2012 Alan Quayle Business and Service Development 39
  40. 40. © 2012 Alan Quayle Business and Service Development 40
  41. 41. © 2012 Alan Quayle Business and Service Development 41
  42. 42. © 2012 Alan Quayle Business and Service Development 42
  43. 43. Big Data Use Cases © 2012 Alan Quayle Business and Service Development 43
  44. 44. © 2012 Alan Quayle Business and Service Development 44
  45. 45. Companies in Big Data • Storage: HP, EMC, IBM, Dell, NetApp, Hitachi Ltd., Fujitsu, Oracle, NEC • Servers: IBM, HP, Dell, Oracle, Fujitsu, Acer, Cray, Groupe Bull, Hitachi, NEC, SGI, Stratus Technologies, Unisys, Cisco, Lenovo • Networking: Cisco, Brocade, HP, Dell, IBM, Alcatel-Lucent, F5 Networks, Citrix • Relational database software: Oracle Exadata, IBM Netezza, IBM Smart Analytics System, Teradata, HP Vertica and Autonomy, SAP Sybase IQ, EMC Greenplum DB and HD, Microsoft SQL Server Parallel Edition, IBM Netezza High Capacity Appliance, Teradata Extreme Performance Appliance, SAP-Sybase IQ • Hadoop-based data management and analysis software: Cloudera, MapR, EMC Greenplum HD, Oracle Big Data Appliance, IBM BigInsights, Hstreaming, Platfora, Zettaset, DataStax, Karmashere, Datameer, Hadapt, and so forth • XML databases: MarkLogic, Oracle XML DB, IBM pureXML, Software AG webMethods, Tamino XML Server, TigerLogic, Xyleme, and so forth © 2012 Alan Quayle Business and Service Development 45
  46. 46. Companies in Big Data • Object-oriented databases: Jade Software, Objectivity, Progress Software, Versant • Graph databases: Neo Technology, Objectivity, Franz Inc., Sones, Ravel • Ultra-high-speed streaming data technologies: IBM InfoSphere Streams, Informatica Ultra Messaging Streaming Edition, TIBCO FTL and BusinessEvents, Progress Software Apama CEP • Analytics and discovery software: SAS, IBM, Attivio, HP Autonomy, Skytree, Attivio, Oracle Advanced Analytics, IBM SPSS, Microsoft, Vivisimo, ZyLAB, Sinequa, Revolution Analytics, KXEN, BA Insight, Palantir, Perfect Search, Wolfram Alpha • Decision support and automation software including applications: Webtrends, Adobe- Omniture, IBM Coremetrics, FICO • Services: Accenture, Deloitte, TCS, HP, Teradata, Mu Sigma, Think Big Analytics, • Hortonworks, Hashrocket, KloudData, Trendwise Analytics © 2012 Alan Quayle Business and Service Development 46
  47. 47. Big Data Is a Big Market & Big Business - $50 Billion Market by 2017 (according to Wikibon) • Open source analyst firm Wikibon pegs the current Big Data market at just over $5 billion (IDC and others agree with) • Wikibon forecast the Big Data market will grow at a CAGR of 58% between now and 2017, hitting the $50 billion within five years. • Vendors from whales like IBM and HP to pure-plays like Vertica and Cloudera are bringing in significant revenue today helping enterprises, governments and healthcare organizations process and make sense of the torrents of unstructured data flowing from mobile devices, sensors, social media and other sources. • Today Big Data technologies like Hadoop are mostly in production at Web and online gaming companies, large financial services firms and banks, and online retailers. © 2012 Alan Quayle Business and Service Development 47
  48. 48. Big Data Is Big Market & Big Business - $50 Billion Market by 2017 • Another important point is that, while Hadoop may be the poster child of Big Data, there are other important technologies at play. o Hadoop: open source framework for distributing data processing across multiple nodes, these include massively parallel data warehouses “that deliver fast data loading and real-time analytic capabilities,” o Analytic platforms and applications that allow Data Scientists and Business Analysts to manipulate Big Data; and o Data Visualization tools that bring insights from Big Data analysis alive for end users. • Of the current market, Big Data pure-play vendors account for $300 million in Big Data-related revenue. o Despite their relatively small percentage of current overall revenue (approximately 5%), Big Data pure-play vendors – such as Vertica, Splunk and Cloudera — are responsible for the vast majority of new innovations and modern approaches to data management and analytics that have emerged over the last several years and made Big Data the hottest sector in IT. © 2012 Alan Quayle Business and Service Development 48
  49. 49. Wikibon Forecast © 2012 Alan Quayle Business and Service Development 49
  50. 50. IDC’s Forecast © 2012 Alan Quayle Business and Service Development 50
  51. 51. © 2012 Alan Quayle Business and Service Development 51
  52. 52. © 2012 Alan Quayle Business and Service Development 52
  53. 53. © 2012 Alan Quayle Business and Service Development 53
  54. 54. © 2012 Alan Quayle Business and Service Development 54
  55. 55. 55
  56. 56. 56
  57. 57. 57
  58. 58. 58
  59. 59. 59
  60. 60. 60
  61. 61. 61
  62. 62. 62
  63. 63. 63
  64. 64. 64
  65. 65. 65
  66. 66. 66
  67. 67. 67
  68. 68. 68
  69. 69. 69
  70. 70. 70
  71. 71. 71
  72. 72. 72
  73. 73. Technology Review Diving into a little detail on a few of the key technologies (only as deep as the architecture) to understand their history and capabilities / limitations © 2012 Alan Quayle Business and Service Development 73
  74. 74. Structure Part 2 of 4 • Hadoop o What is Hadoop? o Ecosystem o History o Design Axioms o Hadoop Distributed File System o MapReduce: Distributed Processing o Architecture o Data Schemas o Query Language Flexibility o Economics o Case Studies • Hadoop and Hbase in the Cloud (Amazon) • NoSQL and Cassandra + some use cases • Hbase versus Cassandra • Graph Database introduction © 2012 Alan Quayle Business and Service Development 74
  75. 75. Hbase Versus Cassandra: History • HBase and its required supporting systems are derived from what is known of the original Google BigTable and Google File System designs (as known from the Google File System paper Google published in 2003, and the BigTable paper published in 2006). • Cassandra on the other hand is a recent open source fork of a standalone database system initially coded by Facebook, which while implementing the BigTable data model, uses a system inspired by Amazon’s Dynamo for storing data (in fact much of the initial development work on Cassandra was performed by two Dynamo engineers recruited to Facebook from Amazon). © 2012 Alan Quayle Business and Service Development 75
  76. 76. Hbase Versus Cassandra: • These differing histories have resulted in HBase being more suitable for data warehousing, and large scale data processing and analysis (for example, such as that involved when indexing the Web) • Cassandra being more suitable for real time transaction processing and the serving of interactive data. • For lightweight validation you’ll find the current makeup of the key committers interesting: o the primary committers to HBase work for Bing (M$ bought their search company last year, and gave them permission to continue submitting open source code after a couple of months). o By contrast the primary committers on Cassandra work for Rackspace, which supports the idea of an advanced general purpose NOSQL solution being freely available to counter the threat of companies becoming locked in to the proprietary NOSQL solutions offered by the likes of Google, Yahoo and Amazon EC2. © 2012 Alan Quayle Business and Service Development 76
  77. 77. • The CAP Theorem, and was developed by Professor Eric Brewer, Co-founder and Chief Scientist of Inktomi. • The theorem states, that a distributed (or “shared data”) system design, can offer at most two out of three desirable properties – Consistency, Availability and tolerance to network Partitions. Consistency means that if someone writes a value to a database, thereafter other users will immediately be able to read the same value back. Availability means that if some number of nodes fail in your cluster the distributed system can remain operational, and Tolerance to Partitions means that if the nodes in your cluster are divided into two groups that can no longer communicate by a network failure, again the system remains operational • If you search online posts related to HBase and Cassandra comparisons, you will regularly find the HBase community explaining that they have chosen CP, while Cassandra has chosen AP • BUT the CAP theorem only applies to a single distributed algorithm. But there is no reason why you cannot design a single system where for any given operation, the underlying algorithm and thus the trade- off achieved is selectable. • Thus while it is true that a system may only offer two of these properties per operation, what has been widely missed is that a system can be designed that allows a caller to choose which properties they want when any given operation is performed. • Not only that, reality is not nearly so black and white, and it is possible to offer differing degrees of balance between consistency, availability and tolerance to partition. This is Cassandra. © 2012 Alan Quayle Business and Service Development 77
  78. 78. Application of Big Data © 2012 Alan Quayle Business and Service Development 78
  79. 79. Structure • The Social Enterprise o Business Benefits • Hardware and Software Trends o ALU example o Drivers o Execution and Results Characteristics o Social + Data Analysis = Business o Framework: Ecosystem, Application intelligence Services, Data Management o AT&T Case Study • Real-Time Analytics o Lessons Learned o Use Cases • Telcos and Big Data o Extended RDMS versus MapReduce / o TMF Survey o Big Data Framework Hadoop o Predictive / Adaptive Analytics o Requirements, Trends, People and o Decision Engineering Organization Issues, Outlook o The Problem with Telecom • Big Data and the Cloud • Telco Analytics o Why the Cloud and Big Data? o Customer Profiling o Cloud benefits o Next Product Tools o Marketing Mix Modeling o Use Cases: Bankinter, Etsy, Razorfish o Cost of Acquisition Tools o Case Study © 2012 Alan Quayle Business and Service Development 79
  80. 80. Use Cases for Big Data Analytics • Search ranking. o All search engines attempt to rank the relevance of a webpage to a search request against all other possible webpages o Google’s page rank algorithm is, of course, the poster child for this use case • Ad tracking. o E-commerce sites typically record an enormous river of data including every page event in every user session o This allows for very short turnaround of experiments in ad placement, color, size, wording, and other features o When an experiment shows that such a feature change in an ad results in improved click through behavior, the change can be implemented virtually in real time • Location and proximity tracking. o Many use cases add precise GPS location tracking, together with frequent updates, in operational applications, security analysis, navigation, and social media o Precise location tracking opens the door for an enormous ocean of data about other locations nearby the GPS measurement © 2012 Alan Quayle Business and Service Development 80
  81. 81. Use Cases for Big Data Analytics • Causal factor discovery. o Point-of-sale data has long been able to show us when the sales of a product goes sharply up or down. But searching for the causal factors that explain these deviations has been, at best, a guessing game or an art form. o The answers may be found in competitive pricing data, competitive promotional data including print and television media, weather, holidays, national events including disasters, and virally spread opinions found in social media. • Social CRM. o This use case is one of the hottest new areas for marketing analysis. The Altimeter Group has described a very useful set of key performance indicators for social CRM that include share of voice, audience engagement, conversation reach, active advocates, advocate influence, advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment ratio, and idea impact. o The calculation of these KPIs involves in-depth trolling of a huge array of data sources, especially unstructured social media. © 2012 Alan Quayle Business and Service Development 81
  82. 82. Use Cases for Big Data Analytics • Document similarity testing. o Two documents can be compared to derive a metric of similarity. There is a large body of academic research and tested algorithms, for example latent semantic analysis, that is just now finding its way to driving monetized insights of interest to big data practitioners. o For example, a single source document can be used as a kind of multifaceted template to compare against a large set of target documents. This could be used for threat discovery, sentiment analysis, and opinion polls. For example: "find all the documents that agree with my source document on global warming.“ • Genomics analysis: e.g., commercial seed gene sequencing. o A few months ago the cotton research community was thrilled by a genome sequencing announcement that stated in part "The sequence will serve a critical role as the reference for future assembly of the larger cotton crop genome. o Cotton is the most important fiber crop worldwide and this sequence information will open the way for more rapid breeding for higher yield, better fiber quality and adaptation to environmental stresses and for insect and disease resistance.” Scientist Ryan Rapp stressed the importance of involving the cotton research community in analyzing the sequence, identifying genes and gene families and determining the future directions of research. o This use case is just one example of a whole industry that is being formed to address genomics analysis broadly, beyond this example of seed gene sequencing. © 2012 Alan Quayle Business and Service Development 82
  83. 83. Use Cases for Big Data Analytics • Discovery of customer cohort groups. o Customer cohort groups are used by many enterprises to identify common demographic trends and behavior histories. We are all familiar with Amazon's cohort groups when they say other customers who bought the same book as you have also bought the following books. Of course, if you can sell your product or service to one member of a cohort group, then all the rest may be reasonable prospects. Cohort groups are represented logically and graphically as links, and much of the analysis of cohort groups involves specialized link analysis algorithms. • In-flight aircraft status. o This use case as well as the following two use cases are made possible by the introduction of sensor technology everywhere. In the case of aircraft systems, in-flight status of hundreds of variables on engines, fuel systems, hydraulics, and electrical systems are measured and transmitted every few milliseconds. The value of this use case is not just the engineering telemetry data that could be analyzed at some future point in time, but drives real-time adaptive control, fuel usage, part failure prediction, and pilot notification. • Smart utility meters. o It didn't take long for utility companies to figure out that a smart meter can be used for more than just the monthly readout that produces the customer’s utility bill. By drastically cranking up the frequency of the readouts to as much as one readout per second per meter across the entire customer landscape, many useful analyses can be performed including dynamic load-balancing, failure response, adaptive pricing, and longer-term strategies for incenting customers to utilize the utility more effectively (either from the customers’ point of view or the utility's point of view!) © 2012 Alan Quayle Business and Service Development 83
  84. 84. Use Cases for Big Data Analytics • Building sensors. o Modern industrial buildings and high-rises are being fitted with thousands of small sensors to detect temperature, humidity, vibration, and noise. o Like the smart utility meters, collecting this data every few seconds 24 hours per day allows many forms of analysis including energy usage, unusual problems including security violations, component failure in air-conditioning and heating systems and plumbing systems, and the development of construction practices and pricing strategies. • Satellite image comparison. o Images of the regions of the earth from satellites are captured by every pass of certain satellites on intervals typically separated by a small number of days. o Overlaying these images and computing the differences allows the creation of hot spot maps showing what has changed. This analysis can identify construction, destruction, changes due to disasters like hurricanes and earthquakes and fires, and the spread of human encroachment. © 2012 Alan Quayle Business and Service Development 84
  85. 85. Use Cases for Big Data Analytics • CAT scan comparisons. o CAT scans are stacks of images taken as "slices" of the human body. Large libraries of CAT scans can be analyzed to facilitate the automatic diagnosis of medical issues and their prevalence. • Financial account fraud detection and intervention. o Account fraud, of course, has immediate and obvious financial impact. In many cases fraud can be detected by patterns of account behavior, in some cases crossing multiple financial systems. For example, "check kiting" requires the rapid transfer of money back and forth between two separate accounts. o Certain forms of broker fraud involve two conspiring brokers selling a security back-and-forth at ever increasing prices, until an unsuspecting third party enters the action by buying the security, allowing the fraudulent brokers to quickly exit. Again, this behavior may take place across two separate exchanges in a short period of time. © 2012 Alan Quayle Business and Service Development 85
  86. 86. Use cases for big data analytics • Computer system hacking detection and intervention. o System hacking in many cases involves an unusual entry mode or some other kind of behavior that in retrospect is a smoking gun but may be hard to detect in real-time. • Online game gesture tracking. o Online game companies typically record every click and maneuver by every player at the most fine grained level. This avalanche of "telemetry data" allows fraud detection, intervention for a player who is getting consistently defeated (and therefore discouraged), offers of additional features or game goals for players who are about to finish a game and depart, ideas for new game features, and experiments for new features in the games. o This can be generalized to television viewing. Your DVR box can capture remote control keystrokes, recording events, playback events, picture-in-picture viewing, and the context of the guide. All of this can be sent back to your provider. • Big science including atom smashers, weather analysis, space probe telemetry feeds. o Major scientific projects have always collected a lot of data, but now the techniques of big data analytics are allowing broader access and much more timely access to the data. Big science data, of course, is a mixture of all forms of data, scalar, vector, complex structures, analog wave forms, and images. © 2012 Alan Quayle Business and Service Development 86
  87. 87. Use Cases for Big Data Analytics • "Data bag" exploration. o There are many situations in commercial environments and in the research communities where large volumes of raw data are collected. One example might be data collected about structure fires. Beyond the predictable dimensions of time, place, primary cause of fire, and responding firefighters, there may be a wealth of unpredictable anecdotal data that at best can be modeled as a disorderly collection of name value pairs, such as "contributing weather= lightning.” Another example would be the listing of all relevant financial assets for a defendant in a lawsuit. o Again such a list is likely to be a disorderly collection of name value pairs, such as "shared real estate ownership =condominium.” The list of examples like this is endless. What they have in common is the need to encapsulate the disorderly collection of name value pairs which is generally known as a "data bag.” Complex data bags may contain both name value pairs as well as embedded sub data bags. The challenge in this use case is to find a common way to approach the analysis of data bags when the content of the data may need to be discovered after the data is loaded. © 2012 Alan Quayle Business and Service Development 87
  88. 88. Use Cases for Big Data Analytics • The final two use cases are old and even predate data warehousing itself. But new life has been breathed into these use cases because of the exciting potential of ultra-atomic customer behavior data. o Loan risk analysis and insurance policy underwriting. In order to evaluate the risk of a prospective loan or a prospective insurance policy, many data sources can be brought into play ranging from payment histories, detailed credit behavior, employment data, and financial asset disclosures. In some cases the collateral for a loan or the insured item may be accompanied by image data. o Customer churn analysis. Enterprises concerned with churn want to understand the predictive factors leading up to the loss of a customer, including that customer’s detailed behavior as well as many external factors including the economy, life stage and other demographics of the customer, and finally real time competitive issues. © 2012 Alan Quayle Business and Service Development 88
  89. 89. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  90. 90. Characteristics of Big Data
  91. 91. Features driven by MapReduce
  92. 92. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  93. 93. Why is Big Data Hard (and Getting Harder)? Changing Data Requirements Faster response time of fresher data Sampling is not good enough & history is important Increasing complexity of analytics Users demand inexpensive experimentation
  94. 94. Where is it Coming From? Computer Human Generated Generated • Application server • Twitter “Fire Hose” logs (web sites, 50m tweets/day games) 1,400% growth per • Sensor data (weather, year water, smart grids) • Blogs/Reviews/Emails • Images/videos /Pictures (traffic, security • Social Graphs: cameras) Facebook, Linked-in, Contacts
  95. 95. Big Data Verticals Social Media/Ad Life Financial Oil & Gas Retail Security Network/ vertising Sciences Services Gaming User Anti-virus Demographi Targeted Recommen Monte Carlo cs Advertising d Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video s Analysis Analysis Image In-game Processing Recognition metrics
  96. 96. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to our risk-simulation process 23 Hours requirements. to With AWS, we now have the power to decide how fast we want to obtain simulation results, and, more importantly, 20 Minutes we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  97. 97. Recommendations The Taste Test http://www.etsy.com/tastetest
  98. 98. Recommendations Gift Ideas for Facebook Friends etsy.com/gifts
  99. 99. Recommendations
  100. 100. Click Stream Analysis User recently purchased a sports movie and Targeted Ad is searching for video games (1.7 Million per day)
  101. 101. The Social Enterprise • Implementations are getting bigger and growing faster than ever • Virtually all data continue to show sustained real-world benefits (McKinsey, IBM, Frost and Sullivan, AIIM) • Everything is becoming social: Social features are appearing in virtually all types of applications • There continues to be considerable confusion about who “owns” social in the organization • The predicted social data explosion: It happened • Mining insight from social data has now become a major industry (#bigdata, #analytics) • The blur between internal and external social business has not progressed as far as many thought • The first serious talk about open social business standards has begun © 2012 Alan Quayle Business and Service Development 101
  102. 102. © 2012 Alan Quayle Business and Service Development 102
  103. 103. Decision Engineering Adaptive Analytics Predictive Analytics Reporting Data Management (including data migration, data quality, data modeling)
  104. 104. Decision Engineering Adaptive Analytics Predictive Analytics Reporting Data Management (including data migration, data quality, data modeling)
  105. 105. Predictive/Adaptive Analytics on 1 slide Will this customer churn? Yes/No data: If customer has an open trouble ticket: Yes, otherwise: No Real-Valued: If customer age < 30: Yes, otherwise: No Pattern Combination: If customer age <30 AND has an open trouble ticket: Yes, otherwise: No Linear Combination: If 2.3 x Age + 4.4 x Income > 40: Yes, otherwise: No Predictive Analytics: Obtain these numbers by analyzing historical data Adaptive Analytics: Update your historical data, and re-derive the numbers periodically to take changing situations into account. Nonlinear Analytics: Income vs. Income age age
  106. 106. Decision Engineering Adaptive Analytics Predictive Analytics Reporting Data Management (including data migration, data quality, data modeling)
  107. 107. Decision Model (part of Decision Engineering) From: Agile Decision Making: Improving business results with analytics TM Forum Quick Insight report, 2011. Source: Lorien Pratt …Decision engineering places analytics in the larger business context. Each “f” here is an analytic, or based on human expertise
  108. 108. 1 Data used to construct the analytic 3 2 5 Sally Sally is likely Operational If 2.3 x Age + 4.4 x data Income > 40: Yes, enough to otherwise: No churn that we should 4 call her
  109. 109. Key Distinctions • Automated versus human-in-the-loop while building analytics • Automated versus human-in-the-loop while using analytics • Strategic versus tactical goals • One-size fits all versus demographic versus personalized • Within-silo versus between-silo • Cleansing for operational versus analytic purposes
  110. 110. Moving Analytics to the Center: Retailers face new competition that is driving an advanced view of customers and interactions to the center of the business. How to dynamically Multi-Channel Operations How do I leverage and manage margin and operationalize customer brand perception with insights and experience the right mix of regular, data to drive personal, promotional and timely, and relevant Merchandising Marketing & Sales markdown products interactions across all across categories, Advanced channels? channels, and formats? Customer Intelligence How do I create a Are inventory and responsive analytics demand data leveraged capability, and Supply Chain Operations to optimize the customer governance relative to experience and the right-time effectively respond to application of analytic changing marketing decision making? Supplier/Partner Collaboration conditions?
  111. 111. Semantic Framework: Applied Customer Analytics Capability
  112. 112. The New Analytical Competency Focus of Efforts in the Past New Competency Requirements Large-scale Integration of All Data Connected Information & Analytics Sources Governance for the Enterprise Central Control of Meta Data and Provisioning Information & Insights to Information Usage Point of Leverage Developing the Most Technically Agile Analytical Modeling Processes & Correct Analytical Point Solution Rapid Evaluation of Business Lift Possible Example- FROM: How can we use all possible customer dimensions to predict customer churn? TO: What is the optimum behavior modeling framework to rapidly build and deploy models applicable to multiple business objectives that change over time?
  113. 113. Predictive Analytics Historical Future Needs Approaches Rely on Require a More Static Data Dynamic Approach • Propensity to Churn • Ability to intervene • Propensity to Buy in customer • Propensity to Pay interactions to create desired • Customer Lifetime outcomes Value
  114. 114. Problem Statements Telcos are not traditionally nimble Telcos look at customers in groups, not individually. Telcos have very little idea what drives customer behavior Telcos have no idea how to influence customer behavior Even if they knew how to influence customer behavior, Telcos do not have the nimble decisioning tools required to impact customer behavior in real time.
  115. 115. Ecosystem, Taxonomies and Supplier Review Understanding the many suppliers, technology camps, and approaches © 2012 Alan Quayle Business and Service Development 115
  116. 116. Structure Part 4 of 5 • 15:00 Ecosystem, Taxonomies and • Case Studies Suppliers: Understanding the many • Real Time Analytics for Big Data Lessons from suppliers, technology camps, and Facebook o Quick technology review approaches o Facebook Real-time Analytics System • Taxonomy of Big Data Companies o Goal • Big Data Landscape o Actual Analytics • Cloudera o Solution • Autonomy o Memory, Collocate, Economics • Vertica • Real Time Analytics for Big Data Lessons from • InfoChimps Twitter • Guavas o Requirements Actual Analytics • Matrix o o Challenges o Performance o One data any API o Solution o Memory, Collocate, Economics • Other Case Studies • Orbitz, Hertz, Yelp © 2012 Alan Quayle Business and Service Development 116
  117. 117. provides integrated solutions to enable rapid decisions on big data for CSPs Guavus delivers Unique ability to Patent pending Current big data rapidly fuse huge streaming customers include solutions, quantities of analytics leading wireless, not just data from technology IP, and video technology diverse sources proven over service components 10+ years providers
  118. 118. Guavus at a Glance Silicon Valley Venture Backed • US HQ in San Mateo, CA, R&D Offices in India Company • Raised $48 Million, 350 employees worldwide • 3 of the top 5 NA mobile operators, 3 of the top Tier-1 CSP 5 IP / MPLS backbone carriers, & CDN Networks Customers & • 4 of the top 6 largest global communications Partnerships infrastructure equipment vendors • Mature (10+ years) patent-pending technology Industry Proven & Recognized
  119. 119. Guavus Empowers LOB to Make Decisions Information Systems Devices & Networks Enterprise Apps Networks Databases Data at Rest Data in Motion Views Flows Data Warehouses Finance & Network & Customer Marketing Executives Regulatory Operations Care & Sales • Profitability • Traffic • Customer • Continuous • Churn Prediction Analysis Engineering Segmentation Business • Focused • Tiered Pricing • Capacity • Campaign Optimization Prospecting Optimization Planning management • Predictive • Targeted Up-Sell • Contract/SLA • Peering Planning & Cross-Sell Enforcement Optimization Data Collection, Fusion and Mining Across Disparate Data Sources
  120. 120. Operator Challenges in a Big Data World EXPONENTIAL DATA TIMELY DISTRIBUTED [ STREAMING ] SITTING INSIGHTS NETWORK DATA GROWTH IN SILOS GENERATION
  121. 121. Key Data Sources & Insights CONTENT INTERNET CDN Streaming PROVIDERS Analytics Insights Content trending & consumption Fused network events Subscriber EDGE ACCESS CPE OR dynamic usage NETWORK NETWORK END DEVICE profiles Network usage patterns Policy control functions
  122. 122. Transforming the Big Data Analytics Economic Model Traditional Streaming Centric Centralized, Store-First Distributed, Compute-First Architecture Architecture TRANSPORT STORAGE TRANSORT STORAGE COMPUTE COMPUTE [ Insights ] [ Insights ] RESOURCES & TIME RESOURCES & TIME • Move processing to data edge • Consolidate data in a repository • Focus spend on analytics first • Transport and store data- Transport • Continuous processing yields timely and and storage costs alone may put it actionable insights over budget • Reduce overall spend per new analytics • Project may not even get started questions • Leverage off the shelf low cost processing and storage
  123. 123. Big Data Streaming Analytics Architecture Analytics Applications Examples 3rd Party Feeds & Customer Tools Data Market Capacity Broadban Ad Mobility Digital Warehouse Research Planning d Targeting Media s Centralized Clustering Master Master Business Machine Compute & & Fusion Aggregation Logic Learning Classifying Analyze Distributed Site 1 Distributed Site 2 Distributed Site 3 Aggregation Aggregation Aggregation Data Fusion Data Fusion Data Fusion Streaming / Batch Local Streaming / Batch Local Streaming / Batch Local Ingest Ingest Ingest Data Store Data Store Data Store Media Service DPI PDN Flow & AAA Web Web Advertisin Type Meta Consumption Data Flows Routing Data Activity Taxonomy g Traffic Data Traffic Data Sources
  124. 124. Guavus Analytics Platform Details Guavus Applications Customer UI Portals Insight Discovery 3rd Party System Support Mobility Reflex Consumer Guavus External API Network Management, Reporting & POC Sandbox Field Inventory, etc. IP Reflex Enterprise CDN Reflex Reporting Ad Reflex API Data Stores (IT, DWH, Cloud) Cube API HBASE API SQL SQL/Hive Ingest Export Processing Pipeline Caching Compute Nodes … XDR Guavus Stream ( Bus Cubes, Machine Learning Caching ) Analysis Store Traditional ETL Layers Central Compute ( Fusion, Aggregation & Compute ) Data Store Distributed Data Distributed Data Distributed Data Collectors Collectors Collectors Inventor DPI PCMD IPDR NetFlow RADIUS DNS … PM / FM CRM y Streaming Data Feeds
  125. 125. Matrixx. Parallel-MATRIXX™ • Parallel-MATRIXX™ technology has completely re-invented transactional real-time and eliminated limitations with contemporary technologies described earlier. • The Next slide identifies the Parallel-MATRIXX™ functional architecture based on multiple patented technologies, and offering a performance improvement of at least two orders of magnitude relative to legacy approaches. © 2012 Alan Quayle Business and Service Development 125
  126. 126. © 2012 Alan Quayle Business and Service Development 126
  127. 127. Matrixx. Algebraic-Decision Engine • OCS raters can be broadly classified as rule- or data driven. o The former offer great flexibility to configure rating scenarios of arbitrary sophistication but which can become challenging to maintain beyond a certain complexity. o Data driven systems typically offer a rich catalog of off-the-shelf templates that are easily configured to create real offers. • These templates are “baked” into code so performance can be highly optimized. The challenge with this approach arises when no suitable template is available, often requiring complex and costly customization. • With respect to real-time performance, both approaches share a common weakness. Every transaction results in execution of conditional logic reflecting the rating discriminators (if weekend, and if URL is On-net, and if…). • As rating, or indeed policy, rules become more sophisticated, execution code paths extend and performance degrades – often unpredictably. © 2012 Alan Quayle Business and Service Development 127
  128. 128. Matrixx. Algebraic-Decision Engine • The Parallel-MATRIXX™ Algebraic-Decision engine eliminates this degradation by building on the simple principle that any pricing concept can be represented as a set of mathematical equations. • Modern CPUs capable of 200 million multiplications per second are exceptionally efficient at solving such equations. • Pricing plans, offers, and policies are configured via a GUI and transparently compiled into an n-dimensional matrix where each dimension corresponds to a rating normalizer (such as time, location, service, etc.). • Stored at each matrix “intersection” is a linear equation representing the rating formula to be applied. As each transaction is mapped to the relevant intersection, solution of the associated linear equation is extremely fast. • As offers are extended with additional normalizers (for example, adding a device dependency to offer lower rates for a promoted device), the matrix dimensionality is extended accordingly. This simply results in a few additional CPU cycles to solve the rate equation with no significant impact on latency. © 2012 Alan Quayle Business and Service Development 128
  129. 129. Contention-Free In-Memory Database and Parallel- MATRIXX™ Processing • Maintaining data and transaction integrity is a mission-critical requirement for any database containing CSP customer or financial data. For example, an attempt to transfer funds between two customers must complete successfully or be cleanly aborted. • A situation where the donor’s account is debited but some technical failure results in the recipient not receiving the funds would leave the database in an invalid state. • As described earlier, current real-time systems rely heavily on OLTP and locking techniques to assure data integrity but which can lead to rapidly degrading and unpredictable performance. • Parallel-MATRIXX™ technology is based on an in-memory database that does not utilize locking while still supporting full ACID-compliant transactions. • No transaction is ever blocked from accessing or updating data while newly developed algorithms detect and resolve transaction conflicts. © 2012 Alan Quayle Business and Service Development 129
  130. 130. © 2012 Alan Quayle Business and Service Development 130
  131. 131. Case Studies Understanding where big data is used in practice © 2012 Alan Quayle Business and Service Development 131
  132. 132. Structure Part 4 of 5 • 15:00 Ecosystem, Taxonomies and • Case Studies Suppliers: Understanding the many • Real Time Analytics for Big Data Lessons from suppliers, technology camps, and Facebook o Quick technology review approaches o Facebook Real-time Analytics System • Taxonomy of Big Data Companies o Goal • Big Data Landscape o Actual Analytics • Cloudera o Solution • Autonomy o Memory, Collocate, Economics • Vertica • Real Time Analytics for Big Data Lessons from • InfoChimps Twitter • Guavas o Requirements Actual Analytics • Matrix o o Challenges o Performance o One data any API o Solution o Memory, Collocate, Economics • Other Case Studies • Orbitz, Hertz, Yelp © 2012 Alan Quayle Business and Service Development 132
  133. 133. © 2012 Alan Quayle Business and Service Development 133
  134. 134. © 2012 Alan Quayle Business and Service Development 134
  135. 135. © 2012 Alan Quayle Business and Service Development 135
  136. 136. © 2012 Alan Quayle Business and Service Development 136
  137. 137. © 2012 Alan Quayle Business and Service Development 137
  138. 138. Global Enterprise and Telecom Survey on Big Data and Real-Time Analytics © 2012 Alan Quayle Business and Service Development 138
  139. 139. Structure • Background • The Questions • The Importance of Analytics • Impact of Big Data on Analytics • Size of Data Sets, Number of Data Sources • Update Frequency • Integration of Data Sources • Data Set Responsibility • Types of Data, Types of Processing and Analytics • Challenges • Big Data Analytics Platforms • Benefits and Plans • Data Analytics Storage and IT Infrastructure Requirements • Increasing Interest in Hadoop MapReduce Framework Technology • Conclusions © 2012 Alan Quayle Business and Service Development 139
  140. 140. Background • Global Survey • Across 200 business and IT executives, questioned in August and September 2012 • 105 enterprise (non Telco), 55 Telco – all large enterprises (no mid-market analysis) • Non-Telco included web service providers, financial services, healthcare, manufacturing, retail, education, government, military, entertainment verticals • Generally VP level with a few CxO level, all decision makers with budget responsibilities • Generally known to me, or through my contacts as I was trying to gather frank reviews • Surprisingly similar across Telco and non-Telco © 2012 Alan Quayle Business and Service Development 140
  141. 141. Importance of Enhancing Data Processing and Analytics versus all Business Priorities 39% 31% 20% 9% 1% Most Top 5 Top 10 Top 20 Not Important Important © 2012 Alan Quayle Business and Service Development 141
  142. 142. Impact of Big Data on Analytics • There is much market hype surrounding the term big data. When asked what the term means to them, a majority of respondents indicated that it simply refers to very large data sets, see next slide. • The big data movement born from the Hadoop open source initiative has not reached most IT departments or even analytics professionals, as evidenced by the fact that only 11% of survey respondents associate Hadoop MapReduce with the concept of big data. • Most organizations’ analytics efforts to date have dealt with structured data, sourced through relational databases and data warehouses, and for the vast majority of analytical undertakings this makes sense. • But even organizations that have not been captured by the Hadoop movement are still increasingly under the gun to deal with larger data volumes, and the incursion of unstructured data. This, plus the many public examples of big data that have caught the imagination of business executives, have reinvigorated interest in data analytics. © 2012 Alan Quayle Business and Service Development 142
  143. 143. What does the term Big Data mean to you? Hadoop / MapReduce Web and search engine data Problems in storing / processing data Data Analytics Dat Warehouses Very large databases Very large data sets 0% 10% 20% 30% 40% 50% 60% 70% 80% © 2012 Alan Quayle Business and Service Development 143
  144. 144. Size of Data Sets • The majority (66%) of respondents revealed that the size of the largest data set on which their organization conducts analytics is no more than 5 terabytes (TB). • Overall, the largest data analytics set is approximately 10 TB. • While these numbers might not reflect the expectations that often accompany the concept of big data, the reality is that processing even gigabytes of data at a time during traditional analytics exercises is significant. © 2012 Alan Quayle Business and Service Development 144
  145. 145. What is the Largest Data Set? 32% 20% 19% 11% 9% 5% 3% 1% <250GB <500GB <1TB <5TB <10TB <25TB <50TB >50TB © 2012 Alan Quayle Business and Service Development 145
  146. 146. Number of Data Sources • A significant part of data analytics exercises is the amalgamation of data from multiple disparate sources. • The next slide show 57% of these organizations are pulling from at least three unique data sources, and one-quarter (25%) are integrating data from five or more sources. © 2012 Alan Quayle Business and Service Development 146
  147. 147. Number of Data Sources 25% 21% 17% 16% 12% 9% Single Source 2 3 4 5 >5 © 2012 Alan Quayle Business and Service Development 147
  148. 148. Update Frequency • Many organizations identified improving business intelligence and/or delivery of real-time business information as a key business initiative that will have an impact on IT spending decisions. • Considering the volumes of data organizations intend to analyze in shorter timeframes, organizations will need to evaluate whether their current approaches are adaptable to these demanding and constantly changing requirements. As part of the same spending survey, organizations also identified major application deployments or upgrades as a top IT priority, which is significant since every newly deployed or upgraded application will have a corresponding impact on existing data integration processes. • When asked about the rate with which their largest data set data is updated, nearly two thirds (65%) of organizations revealed that the changes take place at an either real-time or near real-time pace. © 2012 Alan Quayle Business and Service Development 148
  149. 149. Frequency of Update 37% 35% 28% Realtime (streams) Near realtime Batch © 2012 Alan Quayle Business and Service Development 149
  150. 150. Integration of Data Sources • When asked about the primary method to integrate data sources comprising their organization’s largest data sets, nearly four fifths of respondents identified purpose-built applications such as Informatica, Oracle, and Teradata. • An additional 30% use custom extract, transform, load (ETL) scripts or custom extract, load, transform (ELT) scripts for data source integration purposes. © 2012 Alan Quayle Business and Service Development 150
  151. 151. Main Method of Integrating Data Sources 39% 30% 12% 10% 9% Purpose built Custom ETL EAI Open Source Other © 2012 Alan Quayle Business and Service Development 151
  152. 152. Data Set Responsibility • In terms of the sources responsible for populating organizations’ largest data sets, nearly half (51%) of respondents identified back office applications, such as resource planning, human capital management, and accounting systems. o For example, many years of order or payment information can yield useful insight into customer patterns. • Another common source involves the information gleaned from corporate data centers and computer networks in the form of network traffic and system log files. This information is important to not only those organizations looking to maximize network and system performance and utilization metrics, but also to those that rely on security analytics to help shape information privacy and information protection strategies. • Enterprise organizations were significantly more likely to identify internal back and front office applications, internal data center or computer networks, e- commerce applications (i.e., point-of-sale, supply chain, etc.), and scientific research as data sources that comprise their largest data sets. © 2012 Alan Quayle Business and Service Development 152
  153. 153. Responsible for Populating Data Set Scientific research 7% Third Party 10% External public data 11% Telemetry 10% Social media 12% Web Applications 34% Front office 35% Internal data center 45% Internal back-office 51% © 2012 Alan Quayle Business and Service Development 153
  154. 154. Types of Data • What data types end up in organizations’ largest data sets from the aforementioned sources? More than half (52%) of respondents indicated that their largest data set is comprised of database data. • Nearly half (48%) of organizations have some measure of transactional data— such as point-of-sale (POS) or inventory—residing in their largest data set. • What is interesting is the number of organizations that report that unstructured data—especially machine-generated content such as log files and sensor data— populates their largest data sets. These data types precipitated the concept of big data and there are emerging signs that these will consume a vast amount of bandwidth, compute, and storage resources. Probably the most significant takeaway is that big data becomes really big when an organization starts to see unstructured / machine-generated data grow to the size of—or even surpass— relational information, which will serve to further exacerbate the integration challenges mentioned above. © 2012 Alan Quayle Business and Service Development 154
  155. 155. Source of Data Sensor data 9% Audio / video 11% Web log files 16% Location data 18% Text / messages 19% Log files 22% Office documents 30% Transaction database 48% Relational database 52% © 2012 Alan Quayle Business and Service Development 155
  156. 156. Challenges • When asked to identify the data processing and/or analytics challenges associated with their organization’s largest data set, nearly half cited security / regulation / compliance. • Personally identifiable information (PII) and other sensitive information is what drives this. • About one third of respondents identified data quality (35%) and data cleansing tasks (33%) since data cleansing and preparation was categorized as the most time-consuming data processing and analytics activity. • While lack of skills is a middle of the pack challenge according to respondents. • Clearly, responses involving process-related considerations (i.e., data security, integration, cleansing, etc.) gravitated to the top of the challenges list © 2012 Alan Quayle Business and Service Development 156
  157. 157. Data Processing Challenges Lack of Skills 17% Costs 18% Data Synchronization 19% Business expectations 25% Data integration 29% Cleansing 32% Data quality 35% Security / Regulation / Compliance 48% © 2012 Alan Quayle Business and Service Development 157
  158. 158. Benefits • Cost containment is still an important business initiative to many organizations, especially when it comes to IT investments. • More than half (55%) of respondents identified reduced costs as a key benefit associated with their data analytics platform. • Other top benefits centered on simplicity and efficiency, including easier management and process improvements, as well as improved business agility, which is particularly significant since business requirements are constantly changing when it comes to data analytics. © 2012 Alan Quayle Business and Service Development 158
  159. 159. Benefits from Data Analytics Platform Fraud detection 21% Event monitoring 25% Better accuracy 32% Business agility 33% Process improvement 37% Cost reduction 55% © 2012 Alan Quayle Business and Service Development 159
  160. 160. Conclusions and Recommendations © 2012 Alan Quayle Business and Service Development 160
  161. 161. Recommendations to the Big Data Buyer • Recognize the value of unified information access and analysis in supporting fact-based decisions by individuals, groups, and systems. • Recognize the shortcomings of operating without having the right information at the right time. Use this awareness to help build the business case for addressing those shortcomings – fine an anchor tenant for the project. NO ENTERPRISE WIDE PLATFORM PROJECTS YET, LOOK TO THE CLOUD. • Formulate a Big Data strategy that includes evaluation of decision makers‘ requirements, decision processes, existing and new technology, and availability and quality of data. NOT TECHNOLOGY LED. • The application of Big Data technology will fall into two primary categories: o doing more efficiently (including at lower costs) tasks that have been done for years and doing completely new things that were never before possible, o Driving up long-term strategic organizational value. o Identify opportunities to apply Big Data to both. © 2012 Alan Quayle Business and Service Development 161
  162. 162. Recommendations to the Big Data Buyer • Beware of the confusion and hyperbolic marketing in the Big Data market today. WE ARE AT PEAK BS. • IT organizations will need to consider a coordinated approach to planning implementations - when more than one project exists. • It is important to develop an IT infrastructure strategy that optimizes the server, storage, and network resources. Well-developed plans for networking support of Big Data projects should address optimizing the network both within a Big Data domain and in the connection to traditional enterprise infrastructure. LEGACY MATTERS. • Consider the breadth of Big Data technologies and the functionality each technology brings to the overall portfolio of tools for collecting, accessing, analyzing, monitoring, and managing data. © 2012 Alan Quayle Business and Service Development 162
  163. 163. Recommendations to the Big Data Vendor • Revenue opportunities exist at all levels of the Big Data technology stack as well as in services. Service is where the bulk of the growth exists. • Articulate your value proposition by connecting technology capabilities to business problems or opportunities. NOT TECHNOLOGY LED. • Big Data technology is not an end in itself. NOT TECHNOLOGY LED. • Recognize the value of Big Data to drive employee and customer decisions and actions. • Decide if you want to be a niche player or enter the mainstream. o If the former, then build a network of consultants and partners to support your technology. o If the latter, then build a business case that assumes eventual acquisition. • The growth in appliances, cloud, and outsourcing deals for Big Data technology will likely mean that end users will choose new applications and services, based less on the technology itself and more on the business value they deliver. © 2012 Alan Quayle Business and Service Development 163
  164. 164. Recommendations to the Big Data Vendor • Whether the application is based on a database or is search based, and whether the database is row based or column based, is in-memory or disk based, or uses SQL or NoSQL technologies will become less relevant over time. Thus technology will provide only a short-lived competitive advantage to any vendor. • System performance, availability, security, and manageability will all matter greatly. However, how they are achieved will be less of a point for differentiation. • HPC vendors have an edge in Big Data because leading-edge data-intensive computing has been an integral part of HPC for decades. • Most HPC Big Data work involves established methods of analyzing increasingly large data volume related to numerical modeling and simulation. © 2012 Alan Quayle Business and Service Development 164
  165. 165. Recommendations to the Big Data Vendor • Vendors should tout, not hide, their HPC histories. A number of vendors with HPC origins and strong HPC reputations have not capitalized on these assets when attempting to address Big Data markets outside of HPC. • It is better to position your high-end HPC experience as a strength for meeting the presumably less-difficult, data-intensive challenges in the mainstream market. • Useful tools are largely lacking for very large data sets. Tools such as Hadoop and MapReduce can effectively expedite searches through the large, irregular data sets that characterize some of the newer Big Data problems. • These tools can be great for retrieving and moving through complex data, but they do not allow researchers to take the next step and pose intelligent questions. In addition, the going gets tough when data sets cross the 100TB threshold. © 2012 Alan Quayle Business and Service Development 165
  166. 166. Recommendations to the Big Data Vendor • Sophisticated tools for data integration and analysis on this scale are largely lacking today. There are opportunities to create tools and applications for Big Data. Vendors that create tools and applications for use at this scale can use them as a lever to seize market leadership positions in the Big Data market. • Not all Big Data use cases involve analytics. Analytics may be at the heart of most Big Data opportunities in the enterprise market, but there are also opportunities to support operational workloads and information access applications. • Some of the emerging technologies and the vendors behind them will likely end up as components or features of broader information management, access, and analysis platforms of larger vendors. Specialized application and service providers with localized and industry expertise will be critical to expanding the market. © 2012 Alan Quayle Business and Service Development 166
  167. 167. Walk or Run to Big Data? It depends on your situation. For most telcos the move to Big Data will be incremental and complementary to existing platforms and investments. Focus on the solution: the application of the Analytics to the Business – people and process not technology. © 2012 Alan Quayle Business and Service Development 167

×