• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment


    • SAP Solutions for AnalyticsBig Data Analytics GuideBetter technology, more insightfor the next generation of business applications
    • Big Data Analytics Guide 2012 Big Data Analytics Guide 2012Big Data Analytics Guide 1
    • Big Data Analytics Guide: 2012 Published by SAP © 2012 SAP AG. All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase Inc. Sybase is an SAP company. Crossgate, m@gic EDDY, B2B 360°, and B2B 360° Services are registered trademarks of Crossgate AG in Germany and other countries. Crossgate is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. Library of Congress Cataloging-in-Publication Data SAP Big Data Analytics Guide 2012: How to prosper amid big data, market volatility and changing regulations / Edited by Don Marzetta. p. cm. ISBN 978-0-9851539-6-0 1. Big data. 2. Analytics. 3. Databases2 Big Data Analytics Guide
    • Welcome to Big Data Analytics Guide 2012 Big Data Means Big Business By Steve Lucas, Executive Vice President and General Manager, Database and Technology, SAP With more and more people spending much of their existence in the digital world—whether it’s for work, play, learning, or to socialize—the amount of data being generated is truly astounding. Just think about the number of SMS messages and emails sent, phone calls placed, and Facebook updates made every minute, and it boggles the mind how much data is traversing networks around the world. At SAP we believe this confluence of events is a golden opportunity for enterprises , to rethink how they do business, and our goal is to help them do their jobs better than ever. As co-chair of the Big Data commission sponsored by TechAmerica, I’m hearing firsthand how government and private enterprise is evaluating how to embrace a Big Data world. But this requires organizations to take a new approach to data. Innovations in in-memory computing are turning the whole idea of data management on its head by allowing enterprises to get rid of the complexity that’s been encroaching on their systems. That’s why we’ve built the real-time data platform with SAP HANA at its core, giving enterprises the foundation they need to embrace Big Data. But we recognize a platform is not enough. Together, we need to dream new ways to do business by leveraging Big Data insights. And at SAP we’re serious about making this a reality by investing in new businesses through the $155 million venture start-up fund dubbed the ‘SAP HANA Real-Time Fund,’ through the HANA Start-up program, through innovation labs around the world, and most importantly by co-innovating with our customers. To further our path toward better, smarter ways of working with data, we’ve put together a series of articles in the Big Data Analytics Guide. Within these pages you’ll find real solutions, real ways of operating, real results and perhaps most importantly real technology that can be used in your business today. The guide outlines the opportunity and business case for Big Data in the first chapter, and subsequent chapters look at SAP technology innovations, real-world examples, and insights from analytic leaders who are on the forefront of the Big Data market. In the last chapter, you’ll find a set of interesting market research statistics that highlight how C-level executives are using Big Data now and their plans for using it to their advantage in the future. So join in the conversation and co-innovate with us to re-invent business. nBig Data Analytics Guide 3
    • Table of Contents 03 Welcome to Big Data Analytics Guide 2012 30 Analytics Advantage Big Data Means Big Business 30: Data Variety Is the Spice of Analytics By Steve Lucas, Executive Vice President and By Amr Awadallah, CTO, Cloudera General Manager, Database and Technology, SAP 33: Text Analytics for Speed Reading—Do You Mean 06 Big Data Opportunity What You Say? 06: Measuring the Value and Potential Yield By Seth Grimes, Strategy Consultant and Industry of Big Data Projects Analyst, Alta Plana By Dan Lahl, Director of Analytics, SAP 36: Image Recognition, Pattern Identification, and 08: The Numbers are In: Early Stage ROI the New Memory Game and Proof of Concept By Joydeep Das, Director, Data Warehousing and By David Jonker, Product Marketing Director, SAP Analytics Product Management, SAP 10: Analytics in the Cloud: Traversing a 38: Technology Alone is Not the Answer Legal Minefield By Byron Banks, Vice President of Business Analytics By Dr. Brian Bandey, Principal, Patronus Marketing, SAP 13: Big Data Analytics Earns High Scores in the Field 40 Analytics Innovations 40: What’s All the Hadoop-la About? By Wayne Eckerson, Principal, BI Leader Consulting 016: Big Data Is Only a Small Part of the Opportunity By Mike Upchurch, Chief Operating Officer, Fuzzy Logix 43: Fast Flowing Decisions Through Streams of Data By Irfan Khan, Senior Vice President and 18 Business Analytics Roadmap Chief Technology Officer, SAP Database and Technology 18: Business Value through Operational BI By Claudia Imhoff, President of Intelligent Solutions, Inc. and Founder of the Boulder BI Brain Trust 45: Age of Influence: Making the Most of Social Networks By Bruno Delahaye, Senior Vice President Worldwide 21: Real-time Data Platform for a Real-time World Business Development, KXEN By Amit Sinha, Head, Database and Technology Innovation, SAP 48: Embracing a Standard for Predictive Analytics By Michael Zeller, Ph.D., CEO, Zementis 23: How HANA Changes the Database Market By Ken Tsai, Vice President HANA Solution Marketing, SAP 51: How Modern Analytics “R” Done By Jeff Erhardt, Chief Operations Officer, Revolution Analytics 25: DBTA: Data Marts Can’t Dance to Data’s New Groove By John Schitka, Senior Product Marketing Manager, 54: Navigating a 4G World Sybase IQ By Greg Dunn, Vice President, Sybase 365 28: In-Database Analytics: Reducing Travel Time 57: Increasing the IQ of Everyone with Analytics By Courtney Claussen, Sybase IQ Product By Jürgen Hirsch, CEO, Qyte GmbH Manager, SAP 60 Market Data 60: The Big Deal with Big Data By IDC 68 Company Index Big Data Analytics Guide 5
    • Big Data OpportunityMeasuring the Value and Potential Yieldof Big Data ProjectsWhere should companies look for the return on investmentthat determines whether, or how much, Big Data projectspay off?By Dan Lahl, Director of Analytics, SAP intelligence tools, and training for the people who will use it to make decisions. Where to Look for ROIData analytics has traditionally been Companies can look for ROI in these three ways: do whatexpensive and inefficient, but new analytical they’re already doing better, do more of what they’re already doing, and do things they’ve never thought about before.platforms optimized for Big Data are Big Data provides new solutions for current problems,heralding a brave new world. Hadoop, an converts incomprehensible data to actionable businessopen-source Apache product, and Not Only recommendations, and makes previously impossibleSQL (NoSQL) databases don’t require the business models possible.significant upfront license costs of traditional The first, and often most compelling, benefits of upgradingsystems, and that’s making setting up an to a Big Data platform are in speed and cost savings gains.analytics platform—and seeing a return on The new systems allow organizations to do what they arethe investment (ROI)—more accessible than already doing faster, better, and cheaper. What used to take hours or days suddenly only takes minutes. As data volume,ever before. velocity, and variety have grown, legacy data warehouse systems have been bogging down—unable to handle biggerCosts are coming down, but this is no free lunch. Crunching data, more users, and increasingly complex queries. ResultsBig Data analytics still requires hardware, database take longer. Users get frustrated and stop using the dataadministrators, developers to build the models, business because it takes too long. Many data warehouses are in this predicament today, having hit a performance and scalability6 Big Data Analytics Guide
    • wall. Big Data technologies break through these roadblocks, opportunities to collect data and put it to work. Used thisdelivering faster performance and higher availability. way, analytics often deliver a huge ROI, helping a company identify problems early on (or before they happen) andThe second benefit is the ability to do more. Not only will take steps to fix (or prevent) them. Predictive analyticsa Big Data system not bog down at current usage rates,but it can also handle more users, more data, and morecomplex queries. For example, instead of storing a year’sworth of loan default records, a lender can store 30 years’ Wherever data becomes part of aworth and perform more detailed analyses, with more business process, such as in customeraccurate results. support or sales, companies canAdditionally, a Big Data system can handle a mixed workload. measure customer satisfaction, salesInstead of processing a few queries from a handful of powerusers, it can handle a bunch of short technical queries by figures, and other established metrics.a large front-line service team, such as a customerlookup that identifies cross-sell and up-sell opportunitiesin real time. can recognize customer care or sales opportunities at the right moment, offering discounts or related productsThe third piece of Big Data ROI is that it opens up new that increase satisfaction and boost sales.opportunities. Once the data has been liberated andemployees can get at it, they find all kinds of new ways Success for the Long Termto use Big Data. For example, if a utility company had Once an enterprise decides that its analytics project haspreviously outsourced the analytics from its self-service legs, there are some well-established tactics to help assureWeb pages and analyzed that data separately from its its long-term success. Foremost is communication. Keepother customer service channels, Big Data technology everyone in the loop from strategy to deployment—andcan bring all the analysis together, in-house. That opens beyond. Key individuals should never be blindsided bythe possibility to track customer behavior in multiple hiccups or new developments in the process.channels at the same time. This kind of 360-degreevisibility provides a more accurate picture of the customer New technology built to handle today’s veritable delugeexperience and customer satisfaction, providing new and of data is bringing down the cost of analytics, deliveringdeeper insight into any business. better performance, and helping companies put data to work in new ways. Measuring the ROI of a Big Data projectShow Me The Metrics can be accomplished by establishing company metricsWhen establishing metrics for any new analytics project, that highlight analytics usage, encouraging data teamsfocus in two areas: employee usage of the resulting and employees to define actionable reports, andintelligence and key performance indicators for the incorporating analytics into more decision makingprocesses where analytics will be used. throughout the enterprise. nFirst, create metrics and track data for the project itself.Think of employees as the data team’s customers, and Dan Lahl has been in high tech forestablish measures that show the total amount of almost 30 years. In addition to bringinginformation consumed in the organization: who’s using to market SAP Sybase SQL Server,the data, how much are they using it, and what are they SAP Sybase ASE, and SAP Sybase IQ,using it for? The higher the usage rates, the better. Lahl has evaluated multiple emerging technology areas leading to EII,Second, wherever data becomes part of a business ETL, and GRID technologyprocess, such as in customer support or sales, companies purchases for the company.can measure customer satisfaction, sales figures, and otherestablished metrics. Comparing numbers before and afterthe Big Data implementation should clearly show whatthe organization has gained as a result—as well as otherBig Data Analytics Guide 7
    • ENTERPRISES, BUSINESSES, AND BUSINESSES AND GOVERNMENTS ARE SEEING SIZABLE RETURNS ON THEIRINVESTMENT IN BIG DATA ANALYTICS PROJECTS.The Numbers are In:Early Stage ROI and Proof of ConceptBy David Jonker, Product Marketing Director, SAP AOK Hessen, a public health insurance organization in Germany, uses pattern analysis to detect fraudulent invoicing, and it has reclaimed US $3.2 million of unjustified charges.As new technology helps organizations put Lawson HMV Entertainment, a leading retailer in Japan oftoday’s massive data sets to use, real-world DVDs, Blu-ray discs, books, and games, integrates in-store and Website data into one massive marketing database thatexamples and research are proving that the company uses to fuel its targeted email campaigns toanalytics helps cut costs, increase revenue, customers. The result has been 3 to 15 times higher purchaseeliminate waste, and otherwise boost the rates and double digit revenue growth.bottom line.Real Returns Table 1. Big Data Analytics ROIRecent reports confirm that organizations around the worldare finding an edge by incorporating advanced analytics into Who Whatbusiness processes, leading to informed decisions around American Airlines $1 million annual fraudcustomer satisfaction, new service success rates, and detectioncompetitive analysis (see Table 1.) American Airlines has State of Sao Paulo, Brazil $100 million untaxed earningssaved roughly $1 million annually in fraud detection, helpingthe company identify forms of fraud it never knew existed and Cell© $20 million saved on one projecteliminating the loopholes that criminals were exploiting. AOK Hessen $3.2 million fraud detection HMV Japan 3 to 15 times higherThe Sao Paulo State Treasury Department has so far identi- purchase ratesfied $100 million in untaxed earnings, which allowed taxinspectors to adopt a proactive approach to tax evaders,investigating telltale behavior patterns in its data and taking Research Backs Up the Numberscorrective measures early on, rather than punitive ones after After studying 179 large public companies that use whatthe fact. they call “data-driven decision making,” authors from MIT and the Wharton School concluded that organizations using analytics in their decision-making processes derived 5 to 6% better “output and productivity” than if they had not used analytics.Organizations around the world arefinding an edge by incorporating Researchers at the University of Texas studied how analytics affected the finances, customer activities, and operations ofadvanced analytics into 150 Fortune 1000 firms. According to the research, thebusiness processes. product development area alone justifies deploying analytics8 Big Data Analytics Guide
    • According to a University of Texas study, product development alone justifies deploying analytics for a typical Fortune 1000 enterprise.Table 2. Legacy vs. Next Generation Real-Time Analytics in Telecommunications Legacy Analytics Infrastructure Next-Generation Real-Time Analytics Infrastructure Storage cost High Low Analytics Offline Real time Data loading speed Low High Data loading time Long Average 50% faster Administration time Long Average 60% faster Complex query response time Hours/days Minutes Data compression technique Not mature Average 40 to 50% more data compression Support cost High Lowfor a typical Fortune 1000 enterprise. The study states, The sources of these significant ROI metrics vary by company“Revenue due to a company’s ability to innovate new products and industry. To examine an example in depth, consider howand services increases with data accessibility and specialty using analytics for real-time applications has impacted theproducts and services, which, in turn, is positively affected telecommunications market (see Table 2). You can see how aby data intelligence.” modern analytics environment leaves legacy decision-support systems in the proverbial dust. nHow positive is that impact? According to the analysis, a$16.8 billion company might see an extra $64 million top-linedollars over five years if analytics are put into the hands of David Jonker is focused on market strategy“more authorized employees” who “can make use of informa- for the SAP Data Management andtion…to better spot trends, demand patterns, improve recom- Analytics product lines, includingmendations for decision making and profile match.” All of those SAP Sybase IQ, ASE, Replication Server,benefits could contribute to new-product revenue. Such firms and SQL Anywhere. Jonker’s careercould also add $14 million in new customer sales annually. includes more than 10 years in software engineering and product managementThe University of Texas report also discovered that the roles before leading the SAP product marketing teamscomprehensive use of analytics inside a company improved for data management and analytics.results in the operational areas of asset utilization, forecastingand planning, and on-time delivery of products or services. Forexample, wider use of analytics can lead to an 18.5% improve-ment in planning and forecasting for a typical firm studied.Big Data Analytics Guide 9
    • TO AVOID LEGAL LIABILITY, ORGANIZATIONS THAT WANT TO REAP THE BENEFITS OF CLOUD-BASED BIG DATAANALYTICS MUST CAREFULLY VET PARTNER TECHNOLOGY.Analytics in the Cloud:Traversing a Legal MinefieldBy Dr. Brian Bandey, Doctor of Law obligations that go to nondisclosure; but may also restrict the uses to which the data can be put and define what level of security is to be employed.When a corporation mines the Big Data within Other data might be owned by the corporation, but identifiesits IT infrastructure a number of laws will living individuals (whether directly or indirectly). Data Protec- tion Law (as it’s generally known) is concerned with theautomatically be in play. However, if that access, use, movement, and the technological safeguards tocorporation wants to analyze the same Big prevent disclosure of Personal Identifying Information (PII).Data in the cloud—a new tier of legalobligations and restrictions arise. Some of A corporation will also own secrets about itself which, if disclosed, might cause irreparable damage. Officers owethem quite foreign to a management stakeholders a legally binding ‘duty of care’ to take all reason-previously accustomed to dealing with its own able precautions to ensure the security of such information.data within its own infrastructure. Due to restrictions on processing, both from Data ProtectionA corporation holding Big Data will possess different types of and Confidentiality Laws, care will need to be taken whendata which the Law will automatically classify and attach law- building the data warehouse to be analyzed. Certain classes ofbased obligations. data may need to be excluded.Some of that data may not be owned by the corporation. It All of these different types of law intersect over the area ofmay be a third party’s data which it holds pursuant to a Confi- Big Data storage, security, and processing. They produce adentiality Agreement. Such agreements may not only produce matrix of law-based obligations which, in many areas, cannot be delegated or avoided—only met. SecurityOften law-based security obligations But what happens when we translate that matrix into the cloud?cannot be delegated to the cloudservices provider. Legal responsibility The first matter is that of security. Breaches occasioning the loss of data can cause an abundance of law-based difficulties:may remain with the data controller. from breach of contract, fines under Data Protection Law, uncapped damages due to the release of third-party secrets and so on. But why is this the “first matter”? The corporation cedes actual security to its cloud services provider. Instead of the corporation implementing its own10 Big Data Analytics Guide
    • The cloud computing architecture must be able to identify what data is in which jurisdiction and, if necessary, keep it there.security directly; that role is handed to the cloud services These are not matters of academic technical interest–but goprovider. A great deal is said about service level agreements to the ability of the corporation to discharge what are often,on this subject—their utility and importance. Frankly, I don’t non-delegable, unavoidable legal duties.see it that way. What remedies are available to the corporationunder a SLA other than contractual remedies? Usually none! Personal Identifying Information Moving on from security; there is the matter that is generallyIn my opinion, it is highly likely that money damages will not referred to as the trans-border movement of PII. Many coun-put the corporation back in the position it would have been— tries either restrict or prohibit the exporting of PII. To do sobut for the security/contractual breach. can even be a corporate crime—certainly exposing the wrongful exporter to the likelihood of a hefty fine, adverseNo. What is needed is the choice of a correct cloud security publicity, and reputational loss.architecture of sufficient robustness. One may ask why is thata legal topic? Surely it is strictly an IT matter? I take the view Thus the problem for our conceptual corporation is thethat one must look to the propensity of the cloud technology nature of cloud computing itself. By that I mean that theitself to cause the corporation legal exposure. advantages of scalability, flexibility, and economies of scale that are accessed through the technological advantage ofThe Duty of Care owed by officers to their stakeholders, the distributing data across a number of servers which may notcorporation’s duty to those persons whose PII it holds, and all be in the same country. Thus PII may be automaticallythe contractual obligations it owes with respect to third-party exported illegally.confidential information—all compel the corporation to exer-cise expertise, care, and prudence in the selection of a techno- There are two avenues open to the corporation to obviate thislogically secure cloud computing environment. This means ‘unlawfulness.’ The first is to choose a Big Data warehousingthat they must look beyond the cloud services provider per se, and analytics architecture which, with certainty, can confineand discharge the Duty of Care through due diligence on the data storage and processing to servers residing in nominatedtechnology underpinning it. How capable is the architecture of legal jurisdictions. The cloud computing architecture must besecuring the data? Is the architecture built to be secure and able to identify what data is in which jurisdiction and, if neces-resistant to the correct range of security threats? How robust sary, keep it there. The second is to transform the PII so that itand secure is it against measurable benchmarks? no longer constitutes, in law, PII. Data which is not PII cannot be subject to data protection law.Secondly, there are significant technological differencesbetween a cloud computing environment and a corporation’s For some time now, medical researchers have shared patient‘owned’ infrastructure. I am referring especially to the integrity information internationally through a process of eitherof multi-tenancy architectures. A leaky multi-tenancy system anonymize or pseudonymization. Anonymization is a processmust cause a significant probability that the corporation will whereby the identifier sub-data is removed, prior to export,be in breach of its obligations to many prospective litigants. thus enabling any type of processing, anywhere. The dataThus real attention will need to be given to the architecturethat isolates one ‘data set’ from another and keeps it isolated.Big Data Analytics Guide 11
    • needs to be of a configuration that can still be effectively damages in these scenarios will never, in my opinion, be suffi-processed in the absence of identifier sub-data. Where the cient compensation for the owners of Big Data.presence of a form of identifier sub-data is required forprocessing (or analysis) pseudonymization is used. Rather, the requirements of the law need to be soundly and accurately matched and, indeed, mapped onto the cloudThe aim of these two forms of de-identification is to obscure computing technology at hand. Only then can the minefield ofthe identifier sub-data items within the patient records suffi- Big Data analytics in the cloud be successfully traversed—ciently that the risk of potential identification of the subject of without an explosion. na patient record is minimized to acceptable and permissiblelegal levels. Although the risk of identification cannot be fullyremoved, it can often be minimized so as to fall below the Dr. Brian Bandey is acknowledged as one ofdefining threshold. the leading experts on Computer Law and the international application of IntellectualAnalytics Property Law to Computer and InternetThere is no reason, in law, why Big Data analytics cannot be Programming Technologies. His experienceperformed lawfully in the cloud. However, in order to do so, in the global computer law environmentsignificant attention needs to be directed to the actual soft- spans more than three decades. He is theware and hardware programming architectures to be author of a definitive legal practitioners textbook and hisemployed—and match those to the matrix of laws which commentaries on contemporary IT legal issues areoperate over the storage, use, processing, and movement of regularly published throughout the world. Dr. Bandey isdata. It may seem strange that I am advocating an almost now well-advanced upon the unique route of studyingtechnology-centric solution to what is clearly (and perhaps for a Second Doctorate of Law advancing the currentsolely) a law-based problem. But as I said before—money state of the art in the Intellectual Property in Internet and Cloud Technologies with St. Peter’s College at the University of Oxford in England. In order for Big Data analytics in the cloud to be lawful, the requirements of the law need to be accurately mapped onto the cloud computing technology at hand.12 Big Data Analytics Guide
    • INTERNET METRICS, TELECOMMUNICATIONS, AND FINANCIAL SERVICES PROVIDERS ARE USING BIG DATAANALYTICS TO BOOST PROFITS AND ADD CUSTOMERS.Big Data AnalyticsEarns High Scores in the FieldWhile industries vary greatly in what they need ready for analysis, modern Big Data analytic softwarefrom their data, and even companies within performs faster and readily scales to handle as many users and as much data as needed.the same industry are unalike, virtually everyorganization in every market has these two By seeking out Big Data inside and outside your organizationrelated problems: what to do with all the and using it to push intelligence deep into the enterprise,information pouring into data centers every organizations can be more responsive, more competitive, and more profitable. At the heart of these endeavors are columnar-second, and how to serve the growing number based databases and in-database analytics.of users who want to analyze that data. Companies from many industries are already taking advan-Massive data sets, even those including variable data types tage of technological advances in storing and analyzinglike unstructured data, can be analyzed. Not only are they Big Data to gain business insight and provide better service to customers. These real-world examples prove the value of Big Data analytics.In healthcare, the move to electronic comScore Stops Counting Visitors andmedical records and the data analysis of Starts Counting Profitspatient information are being spurred by comScore, a cloud-based provider of analytics services and solutions for the eCommerce marketplace, realized when itestimated annual savings to providers in began operations that the focus of Internet marketing wasthe tens of billions of dollars. shifting from visitor counts to profitability. comScore’s Customer Knowledge Platform provides a 360-degree view ofBig Data Analytics Guide 13
    • customer behavior and preferences as they visit sites Suntel’s traditional relational database was unacceptablythroughout the Internet. The service monitors surfing and sluggish. “We reached a point,” explains Tariq Marikar, directorbuying behavior at every site visited by consumers who have of information technology and solutions delivery, “at which weopted in to having their Internet behavior analyzed. were seeing a 20% overload on our production database, which was unacceptable to us.”With millions of Web users signing up to be monitored, thedata collected was enormous. comScore applies its analytics “Additionally, we wanted to be able to run reports and queriesto more than 40 terabytes of compressed data, while adding against years of historical data rather than just a few months.close to 150 gigabytes every week. We knew we needed to create a separate repository—a data warehouse—specifically designated and designed forDespite this volume of data, query-response time is excep- reporting and analytics in order to solve this problem.”tional. “We are able to mine the data and produce results forour customers much more quickly. That helps them market Suntel achieved its goal by adopting a column-store datamore effectively and generate more business,” says Ric Elert, warehouse designed for advanced analytics. “It’s very impor-vice president of engineering at comScore. tant to our business to be able to view large volumes of histor- ical data,” says Marikar. As with comScore, the “compressionThe company achieves 40% compression ratios using capability has meant that the data residing on our productioncolumn-store technology. Had they used a traditional database only requires about one-third the space.”approach, comScore says its storage costs would have beenmuch higher. The new platform can scale as Suntel increases the numbers of users and explores other strategies for tapping into this“The compression is extremely important to us because we valuable data. “We’re exploring ways to exploit this data trovehave fire hoses of data,” says Scott Smith, vice president data to develop ways to customize the customer experience acrosswarehousing. “We have a tremendous amount of data. More different sized customers and to implement programs todata than most people will ever see.” cross-sell and upsell our services,” says Markar.Suntel Introduces Airtel VodafoneCustomized Service Offerings to Sri Lanka Makes Better Decisions with Business IntelligenceAs Sri Lanka’s fastest growing telecommunications company, In Spain, Airtel Vodafone created a data warehouse to help itSuntel has 500,000 customers. The company puts the latest accurately analyze and predict customer activity. Thetechnology, innovative thinking, and an unprecedented service company developed a data warehouse with informationcommitment into customizing telecommunications solutions generated from multiple departments and organized the datafor its subscribers. according to the company’s business map. The data ware- house allows Airtel Vodafone to convert data into valuable business intelligence. Query demands on the company’s data warehouse are“We’re exploring ways to exploit this data intense. More than 1,000 employees use the data warehouse trove to develop ways to customize the for multidimensional analysis. The information has specifically designed structures for the data concerning customers, customer experience across different infrastructures, and company processes. This structure sized customers and to implement allows users to extract the data to create modeling and simulation processes. programs to cross-sell and upsell our services.” Data-mining techniques extract information on customer behavior patterns. Airtel Vodafone’s customer-facingTarik Markar personnel are able to input the information they collect on aDirector of Information Technology and Solutions Delivery daily basis, so that it is integrated with data already stored inSuntel14 Big Data Analytics Guide
    • the warehouse. This data is subsequently combined and technologies necessary to achieve these feats make itconverted into information structures for inquiries. possible for only the largest players in the industry to continu- ously throw faster hardware and network gear at the problem.The data warehouse environment comprises marketing data- For everyone else, says Larry Tabb, CEO of the Tabb Group,bases, call systems, customer service, GSM network statis- a financial services technology consultant, you need to betical data, invoicing systems, collections and retrievals, and all significantly smarter. To compete, Tabb says, “you need tologistics information. The marketing team uses the same raise the analytical barriers.”information as those in finance, although they look at it fromdifferent angles and use it for different analyses. In the face of these Big Data challenges, some in the analytics industry caution that companies might be smarter to take aHaving this structured data allows Airtel Vodafone to provide “more is less” approach to analyzing data sets. Sometimesboth detailed and summarized information of the company’sactivities directly from the data warehouse. These advantagesare helping Airtel Vodafone make informed business decisionsbased on customer activity. To speed decision making, firms are applying analytics to business processesVertical Industries Reap the Rewards of Data for financial transactions executed byOther industries are waking up and taking advantage of computers. Humans once were solelyBig Data analytics. In healthcare, the move to electronicmedical records and the data analysis of patient information responsible for the decisions, but noware being spurred by estimated annual savings to providers in only computers can work as fast as thethe tens of billions of dollars. In the manufacturing sector,outsourcing a supply chain may save money, according to data is moving.McKinsey & Company, but it has made it even more criticalfor executives, business analysts, and forecasters to acquire,access, retain, and analyze as much information as possible you might hear arguments that applying analytics to smallerabout everything from availability of raw materials to a data sets is, in effect, “good enough.” More often than not, thispartner’s inventory levels. argument is made by those that can’t analyze large data sets.In these and other industries, users are lining up outside the As Google Chief Economist Hal Varian observes, analyzing aCIO’s door, asking to analyze the incoming flood of data. And small, random slice of data can indeed yield valid results. Butwhen they get access, users want query-response times to get a truly random data set, that sliver of information needscomparable to what they experience using search engines to come from a massive amount of information. Without asuch as Google and Bing. large enough pool of data to draw from, the validity of your analytics processes can be called into question. In otherIn some markets, response times that satisfy humans aren’t words, Big Data generates the best valid data. nfast enough. These enterprises demand machine-to-machinespeeds for analytics.According to the publication Wall Street & Technology,financial services companies are under increasing pressureto accelerate decision making from “microseconds tomilliseconds to nanoseconds.” To speed decision making,firms are applying analytics to business processes forfinancial transactions executed by computers. Humansonce were solely responsible for the decisions, but now onlycomputers can work as fast as the data is moving. The priceyBig Data Analytics Guide 15
    • CONSIDERING BIG DATA ALONE IS INSUFFICIENT; ANALYTICS MUST ALSO BECOME PERVASIVE ACROSS THEENTERPRISE IN ORDER TO TRULY LEVERAGE THE OPPORTUNITY.Big Data Is Only a Small Part of the OpportunityBy Mike Upchurch, Chief Operating Officer, Fuzzy Logix their time was high. Analysis also required moving data from a database to an analytics server, processing it and pushing it, back. Just moving the data was 80% of the work—akin to trucking our pile of sand 10 miles to sift it.The opportunity Big Data represents goesbeyond the data and the related new Today, new, powerful data warehouse systems using in-database analytics can quickly ingest and process Big Datatechnologies that capture and store it. The real wherever it resides. What’s more, business users can nowbenefit is that organizations can derive better sift through data using familiar reporting tools, gaining easybusiness intelligence from far more sources access to powerful on-demand analytics and allowing datathan ever before, and make it available to scientists to focus on building models instead of running reports. Best of all, these new solutions generally cost arounddecision makers at every level. The keys to 20% less to build than traditional platforms and performsuccess are designing your Big Data analytics more than ten times faster.to support business goals and enablingdecision makers to take action. Start With “Why” Data analysis is more accessible than ever, and it can solve many problems—but not all of them. The key to identifyingIt’s easy to collect large amounts of data. Knowing what to do which problems to tackle is to start with “why.” Why are wewith it all—and making changes based on what you learn—is analyzing Big Data? First, assess your strategic goals. Thesethe challenge. We can liken the task to searching for diamonds could be growing market share, controlling cost and risk, orin a giant pile of sand. Storing the sand is easy, but sifting understanding customer behavior. Then, determine if usingthrough it requires a special set of tools, as well as a sufficient analytics will deliver value.understanding of what you’re looking for, why, and what you’regoing to do when you find it. There are two important questions to answer: Can the company use data models to derive insight, and can it actHistorically, data analysis has been a story of complexity, on the results? Working through this process will helplimited capacity, elaborate tools, cryptic results, and poor determine where your organization can realize value fromdistribution. Special equipment was required; only a small Big Data analytics.number of people knew how to use it; and the demand on Changing Company Culture Companies need a focused plan, great execution, the right technical platform, and the ability to operationalize the resultsToday, new, powerful data warehouse of analysis. Without accompanying cultural change, however, those things will only deliver a fraction of the potential value ofsystems using in-database analytics Big Data analysis.can quickly ingest and process Big Data Let’s go back to the diamond mine one more time. They havewherever it resides. new sifting equipment that tells the miners where the highest- value diamonds are, but the miners aren’t authorized to react to the information. The best equipment can’t make up for broken culture. Employees should be able to run analytics and see actionable answers on demand: a forecast of how close the sales team16 Big Data Analytics Guide
    • It’s crucial to create a culture that rewards decisions and encourages analytics innovation, which may require modifying incentive and bonus structures.is to meeting this month’s numbers, a customer’s credit history and the actions of customers with similar histories,score, or a report of which advertising keywords to buy an analytics engine can recommend actions that will reducetoday. Armed with information, employees must also be churn, or suggest products or services that will be thecomfortable and confident taking action before the value of customer’s next likely purchase. One call center leveragedthe insight diminishes. Big Data analytics and saw a 10% reduction in churn, an 8% increase in per call revenue, and a 12% improvementAs a company incorporates the use of analytics, employees in cross-sale revenue.will have ideas about how to improve on the original models.Building a culture that encourages constant testing and Health care organizations are using Big Data analyticslearning—as well as providing access to a flexible platform when evaluating care quality and efficiency. Using traditionalthat can accommodate new ideas—will greatly improve the methods, analyzing more than 700 million lines of claims datavalue companies can reap from Big Data. can take six weeks and a dedicated team of analysts, and only produce reports twice a year. With Big Data solutions, riskIt’s crucial to create a culture that rewards decisions and management teams can now run the models in 22 minutesencourages analytics innovation, which may require modifying and take immediate action to improve quality of care,incentive and bonus structures. Not allowing employees to act reducing the window during which risk can go unnoticedis the most common point of failure for analytics projects— from six months to less than a week.don’t make that mistake. It’s rarely mentioned in discussionsof Big Data, but it can make or break an analytics initiative. Big Data analytics are ushering in a new era of predictive insight that is changing how companies operate and engageMaximizing Results with their customers, suppliers, and employees. To takeMany companies are succeeding at their search for value advantage of the opportunity, companies must start within Big Data. They have the systems and infrastructure the “whys,” align analytics projects with business needs, andto capture and analyze Big Data; they have operational quantify the value that can be created. To realize the value,processes in place; and their employees have permission employees must have access to powerful, innovative, andto act on the results. For these companies, the payoff can proven technology, participate in the process, understandbe dramatic. the results, and be empowered to act. Get all of this right, and your diamonds will shine bright, creating competitiveFor example, equity traders may need to buy or sell assets advantage and financial gain. nduring the trading day to balance their portfolios, but oneday’s Opera feed can contain data for 500,000 to 1 milliontrades. If portfolio risk can only be calculated overnight, then Mike Upchurch is responsible for customerinstitutions are exposed to an unquantifiable amount of risk acquisition, partnerships, global operations,during each trading day. With Big Data analytics, traders can and corporate culture at Fuzzy Logix.get real-time pricing and calculate risk throughout the day. Previously he worked at Bank of America,The result is that they can rebalance their portfolios at the leveraging trading instruments to createexpense of less agile traders. Millions of dollars can be won consumer products, mining consumerand lost by having better information than the institution on data to identify trading opportunities, andthe other side of the trade. Other examples of capitalizing on building and implementing a strategy that grew telephoneBig Data include modeling loan default risk on demand, and mortgage lending from $9 billion to $22 billion in fourstress-testing entire portfolios in a fraction of the time years. He has also held a number of strategy andrequired by traditional solutions. operational roles at global technology companies.Call centers use analytics to better serve customers, reducechurn, and cross-sell new products. By analyzing a customer’sBig Data Analytics Guide 17
    • Business Analytics RoadmapBusiness Value through Operational BIAlign operational and real-time business analytics andanalytics technology with true business requirementsand capabilities to ensure greater success in reachingbusiness and IT goals.President of Intelligent Solutions, Inc. and This Checklist Report helps you determine how to align theFounder of the Boulder BI Brain Trust implementation of operational and real-time BI and analytics technology with true business requirements and capabilities to ensure greater success in reaching business and IT goals.Excerpted from TDWI Checklist Report, “Delivering HigherBusiness Value with Operational Business Intelligence and 1. RECOGNIZE THAT NOT ALL ANALYTICS MUST COMEReal-Time Information” FROM THE DATA wAREHOUSE ENvIRONMENT. The data warehouse (DW) is a key supplier of data analytics,Operational BI (OBI) is a popular topic in most but it’s not the sole supplier of analytics. Other forms ofbusiness intelligence (BI) shops these days, analytics are needed for a fully functioning OBI environment. Because many analytics used in OBI require low-latency orand rightfully so. OBI enables more informed real-time data, organizations try to speed up the overallbusiness decisions by directly supporting processes of the DW—trickle-feeding the data, automatingspecific business process and activities. analyses, and so on—in an effort to make it the sole supplier of analytics. Although this approach works for some low-OBI has had a dramatic impact on traditional BI environments latency analytics, at some point the DW team must turn toand on a new audience of BI users. These users now have other analytical techniques to complete the OBI picture.immediate access to the insights they need when makingdecisions about customers, products, and even campaigns One of these techniques is event analytics. Event data iswhile these business activities are happening. created by business activities (generated by banking transac- tions [ATM], retail operations [POS, RFID], market trades, and Web interactions) or by system events (generated by sensors, security devices, or system hardware or software). Event analytics applications often perform their analyses even before the transactional data is stored in an operational18 Big Data Analytics Guide
    • system. For example, many fraud-detection applications Another trade-off is the soundness and flexibility of the archi-analyze transactions for fraudulent characteristics first and tectural infrastructure in terms of allowing for delivery ofthen store them in transactional systems for further information in different latency time frames (more on thisprocessing. Obviously, the DW contributes to the overall OBI later). Building an OBI solution that is inflexible or fragile justenvironment by generating the fraud models used by the to meet an arbitrary time frame may spell disaster. If theevent analytics software. action time requirement changes (and it almost certainly will) from two hours to one hour, you don’t want to have to rebuildAnother technique is to make BI analytics (or its results) the entire architecture.available as callable services within an operational workflow.Embedded BI services can be external to the workflow (as a To avoid this situation, the BI implementers must understandpart of a service-oriented architecture) or included within the how the business community interacts with OBI, from eventworkflow itself. These services come in two flavors. The firstcalls a stored analysis or model, uses it dynamically duringthe workflow, and receives the results before invoking the nextactivity—for example, calling a stored analysis to dynamically Embedded BI services can bedetermine a loan applicant’s credit worthiness. The second external to the workflow (as a part oftype retrieves the static results from an earlier analysis; forexample, a customer service representative (CSR) retrieves a service-oriented architecture) ora customer’s lifetime value score or segment ID stored in a included within the workflow itself.DW. Both types are employed by a business process orperson to support real-time or near-real-time businessdecisions and actions. occurrence to action taken. Interactions must include theThe combination of traditional data analytics, embedded BI impact of the growing usage of tablets and mobile devices.services, and event analytics forms the foundation of OBI. All OBI must reach its audience with the appropriate informationthree must come together at appropriate points in the work- formatted for the myriad mobile devices available today.flow to provide a mature and effective operational decision-making environment. 3. DETERMINE THE PROPER INFRASTRUCTURE FOR BUSINESS-CRITICAL OPERATIONAL BI.2. MATCH REAL-TIME CAPABILITIES FOR INCREASING BI Although traditional BI processing is often critical to businessAGILITY TO ACTUAL BUSINESS NEEDS. operations, a temporary failure of the BI system will not typi-There is a lag between the time an event happens and the cally affect short-term business operations. Also, given thattime a company responds to it. This lag is caused by several the BI system is separated from operational processing, itfactors, such as preparing the data for analysis, running the means that BI processing has little effect on operationalanalysis, and determining the best course of action based on performance except during the capturing of operational data.the results—for example, taking action when a campaign sellsa product that is about to run out of stock. Clearly, the ability The situation with OBI is different from traditional BIto reduce the time to action here (stopping the campaign or because it is closely tied to the daily operations of thechanging the featured product) can have significant impact on business. A failure in an OBI system could severely impacta company’s revenues and reputation. This is BI agility. It business operations. This risk is especially relevant for OBIrequires that the action time match the business need. applications that support close to real-time decision making, such as fraud detection.However, there is a trade-off. Is it timely enough for the busi-ness or is it actually too fast? Even if the business requires There are several approaches to supporting OBI, includingreduced latency, can the business users correctly process the embedding BI in operational processes, accessing live opera-inputs that quickly? Can the operating procedures handle the tional data, and capturing operational data events and trickle-time frame appropriately to ensure a correct reaction? There feeding them to a DW. All of these approaches have the abilityare many moving parts in an OBI environment, and any that to affect the performance of operational systems.are out of sync or incomplete can cause an erroneous deci-sion to be made. In this situation, the cost of creating such a It is very important, therefore, that the infrastructure of thelow-latency BI environment may be more than the actual BI system, its underlying DW environment, and related opera-benefit the company receives. tional systems be capable of providing the performance,Big Data Analytics Guide 19
    • scalability, availability, and reliability to meet OBI service 5. UNDERSTAND THAT OPERATIONAL BI IS MORE THANlevels. The cost of providing such an infrastructure increases SIMPLY CAPTURING MORE TIMELY DATA.as these service levels approach real time, and these costs It is often assumed (incorrectly) that OBI simply involvesmust be balanced against the business benefits achieved capturing more timely data. Certainly data consolidationand the ability of the organization to exploit a more agile (ETL), data replication, and data federation (enterprise infor-decision-making environment. mation integration [EII]) technologies have advanced to the point that we can capture data and make it available in a far4. UNDERSTAND THAT OPERATIONAL BI IS NOT JUST A more timely fashion than ever before. For example, using log-TECHNOLOGY SOLUTION. based changed data capture (CDC) has distinct advantagesIt’s critical that BI implementers be able to tie BI applications for speeding up data integration and processing for a DW.to operational applications and, even more importantly, with Without doubt, real-time or low-latency data is an importantoperational processes. Yes, technology is important, but feature of OBI processing. In addition, there are other factorsperhaps just as important are the standard operating proce- that need to be considered when improving BI agility anddures (SOPs) that must be followed by business personnel. supporting faster decision making.Many BI implementers do not realize that their OBI solutionimpacts how people perform their jobs. Without under- Once operational data has been captured, it needs to bestanding how SOPs will be affected, the OBI team can cause analyzed and the results delivered to the BI consumer, whichsevere problems with operations or, worse, find their solutions may be a business user or another application. The time itbeing ignored or circumvented. takes to analyze the data increases the time (the action time) it takes for a business user or an application to make a deci-As a first step, the BI team should study, understand, and sion. It is important, therefore, that the actual queries used indocument the full business workflow using the new BI applica- the analysis are optimized for good performance. It is also important that the underlying query processing engine is opti- mized for efficient analytical processing. In some instances,As a first step, the BI team should the analytical results may be precalculated to reduce action times (customer lifetime value scores, for example).study, understand, and document thefull business workflow using the new The efficient delivery of results to the BI consumer is also important for OBI success. The delivery medium used (dash-BI application board, portal, mobile device, action message) must be selected to match the action time requirements of the busi- ness. The availability of automated decision-making featurestion. OBI applications can cause big changes to processes and such as alerts, recommendations, and decision workflows canprocedures. When they do, the team must determine how the help business users make faster decisions. In near-real-timeSOPs must change. For instance, will they need to be rewritten decision-making situations (fraud detection, for example),or enhanced to include the new OBI application? What impact fully automated decision-making features may be employed. nwill this have on the workforce? Who will create and maintainthe new SOP? This contribution was extracted from “Delivering Higher Business Value with Operational Business Intelligence andThe team must also determine which personnel will be Real-Time Information.” To read the entire document, go to:affected by the new procedures and what training they will http://tdwi.org/research/2011/11/tdwi-checklist-report-need. The team must study how these personnel make delivering-higher-business-value-with-operational-bi-and-decisions, how they access and use information, and how real-time-information.aspxthey monitor the impact of their decisions on the company.Training must be ongoing and flexible to accommodate theinevitable turnover in operational personnel. Some of the Claudia Imhoff, Ph.D. is an analyst andworkforce may immediately grasp this new paradigm; speaker on business intelligence and theothers may not. infrastructure to support these initiatives. She is the president of Intelligent Solutions, Inc., a data warehousing and and founder of the Boulder BI Brain Trust. She has co-authored five books on these topics and writes articles and research papers for technical and business magazines.20 Big Data Analytics Guide
    • A NEw APPROACH IS NECESSARY IN TODAY’S ALwAYS-ON wORLD. SAP IS DELIvERING A PORTFOLIO FOR THEREAL-TIME BUSINESS.Real-time Data Platform for a Real-time WorldBy Amit Sinha, Head of Database and Technology While business is happening faster, many IT departments areInnovation, SAP still using traditional data management tools designed in the 1980s when the pace of life and business was slower, and the amount of data was much smaller.Professor Richard Wiseman, author of The challenge is that today enterprises are looking to analyzeQuirkology, compared the ‘pace of life’ in 31 terabytes or petabytes of data in the moment, instead of days or weeks in the past. Yet, the underlying infrastructure hascountries by studying how fast people walk. remained status quo, with enterprises being forced to spendThe study definitely fits the title of his book! time ‘shoehorning’ old technology into their data centers toMore interesting is that the overall pace of life address new problems. And many of them have reached theincreased by 10% over a 10-year period, and breaking point.it’s only getting faster. Smartphones, wireless Instead, a new approach is required that can not only mine thenetworks, and an ‘always on’ lifestyle is further information and make sense of it, but do it in real time. Toaccelerating the pace of people’s lives and of empower organizations to remain competitive in today’sbusiness and generating vastly more data at constantly evolving market, SAP has committed to helping them unleash the value of Big Data through a new approach tothe same time. data management. It all starts with a foundation based on the SAP HANA database, a state-of-the-art in-memory platform, whichInstead, a new approach is required that allows enterprises to cut out the complexity that’s crept into IT environments. SAP HANA’s extreme performance andcan not only mine the information andmake sense of it, but do it in real time.Big Data Analytics Guide 21
    • innovation for the next generation of applications is redefining These solutions together are just the beginning of SAP’s goalthe database market by helping customers access and deliver of providing customers with a single, logical, real-timeinformation at speeds up to 100,000 times faster than platform for all transaction and workloads. By leveragingpreviously available. the industry-leading SAP Sybase data management products, customers will be able to transact, move, store, process, andSurrounding HANA, the centerpiece of SAP’s real-time data analyze data in real time while reducing overall costs.platform, are several components that bring the best ofdatabase innovation forward. Sybase IQ, the # 1 column The old way of doing things is no longer acceptable.database on the market, offers enterprises the best overall The new world of data needs a new data platform, andtotal cost of ownership by reducing administration by SAP is committed to helping enterprise IT departments75% and reducing data storage volumes by more than 70% evolve from complex, slow-moving entities into a morethrough advanced data compression algorithms. simplified architecture that enables Big Data, cloud services, as well as analytic, transactional, and mobile applicationsSAP Sybase Adaptive Server Enterprise (ASE) is the #1 trans- while preserving investment in existing applications in aactional database and in use by most Wall Street firms. non-disruptive way. nSAP Sybase ASE delivers top performance for enterprises,reduces risk due to security breaches or system failures andincreases efficiency by simplifying administration and effi- Amit Sinha leads marketing for SAP’sciently using hardware and storage. technology platform, data management, and real-time applications. Prior to this role,Another piece of the real-time database puzzle is SAP Sybase he led the market introduction of SAP HANA.SQL Anywhere, the #1 mobile and embedded database that Previously, as Vice President of Businesssupports advanced synchronization, out-of-the-box perfor- Network Transformation, Amit was respon-mance with little to no DBA support and the ability to enable sible for driving the phenomenon of collab-applications in remote locations. oration across business boundaries through innovations in SAP’s portfolio. He has worked with customers on newLastly, enterprise information management (EIM) cloud-based collaborative applications that empowersolutions from SAP enable enterprises to rapidly ingest, people and communities to collaborate, leverage infor-cleanse, model, and govern data in order to improve the mation for collaborative decision making, and ultimatelyeffectiveness of Big Data across operational, analytical, enhance the company’s business model. Amit is a grad-and governance initiatives. uate of the Indian Institute of Technology (IIT) Bombay and the Haas School of Business at the University of California, Berkeley. By leveraging the industry-leading SAP Sybase data management products, customers will be able to transact, move, store, process, and analyze data in real time while reducing overall costs.22 Big Data Analytics Guide
    • ADvANCEMENTS TO IN-MEMORY DATABASES, LOwER MEMORY COSTS, AND THE COMBINATION OFTRANSACTIONS AND ANALYTICS MOvE HANA INTO A CLEAR LEADERSHIP POSITION.How HANA Changes the Database MarketBy Ken Tsai, Vice President of HANA Solution Decision-makers “swimming in information” need a databaseMarketing, SAP designed to navigate data’s deep waters. While some databases can be a life raft, helping an organization stay afloat, the SAP HANA in-memory database gives a company fleet command over oceans of Big Data. This advanced, high-In the 2011 spy thriller Page Eight, the director performance database is dramatically changing the market.general of Britain’s domestic intelligence The cost of memory is one sign of market change. Whenagency, MI5, played by veteran actor Michael traditional databases were first designed memory wasGambon, utters a lament expressed by many extremely expensive. The big database vendors traded thea corporate CEO. “This building is swimming speed of memory for more cost efficient storage on disk.in information,” he complains. “We have But that has changed and dramatically. In 1990 a terabyte of memory cost more than $100 million. Today a terabyte ofinformation coming out of ears.” What’s memory costs under $5,000. In three years it’s estimated thatdifficult, he adds, is to determine whether price will fall to one quarter of that and by 2018 users will paysomething is important or not. one-thirtieth. Given that CPU memory is at least 50,000 times faster than accessing data on a mechanical disk drive, with memory being that cheap the reasons not to use in-memory databases have vanished.Given that CPU memory is at least Combining analytics and transactions is another change50,000 times faster than accessing data upending the market. HANA provides the power needed in both analytics and transactions to streamline businesseson a mechanical disk drive, with memory activities. In fact, SAP HANA portends the end of thebeing that cheap the reasons not to usein-memory databases have vanished.Big Data Analytics Guide 23
    • separation of online application processing (OLAP) and SAP HANA scales linearly along with the growth in the volumeonline-transaction processing (OLTP) database functions in and velocity of a company’s information sources. SAP HANA’slarge organizations, providing instead a single, massive data columnar architecture is data agnostic, ideally suited for thestore for both transactional and analytical database activity variety of Big Data pouring into organization’s today. There’swith performance levels previously unimaginable by decision no practical limit to the capacity of SAP HANA.makers. Such a combination of business functions will berevelatory for corporate leaders. Hasso Plattner, in his 2009 Most important, SAP HANA is fast. Not just whiteboard-theorypaper “A Common Database Approach for OLTP and OLAP fast, but real-world business fast. Take Liechtenstein-basedUsing an In-Memory Column Database,” concludes that when Hiliti Corp., a global provider of value-added products to thethe merging of the two processes occurs “that the impact on construction and building maintenance industry. Its applica-management of companies will be huge, probably like the tion of the SAP HANA database merged transactional andimpact of Internet search engines on all of us.” analytic functions to improve the sales and support process by many orders of magnitude; in one case, improving theChange for the Better response time for analyzing 53 million customer data recordsSAP HANA is a 100% in-memory database software appliance to two to three seconds from what once took two or threedesigned to run on Intel processors and optimized for specific hours. In Japan, Mitsui & Co. Ltd.’s retail operations experi-advances in chip design such as multi-core processors, enced a stunning 400,000 times performance improvementshared memory, and multi-socket topology. According to Intel, in its inventory management application with SAP HANA over“SAP HANA enables real-time decision making by bringing all the prior database’s performance. And Germany’s T-Mobilethe data in your enterprise within the reach of decision implemented SAP HANA to analyze huge data volumes inmakers in seconds, not weeks or months, in an easy-to-use seconds—up to 1 billion rows and a 300 trillion record set in asformat so your company can run smarter and perform better.” little as 16 seconds, dynamically modifying its marketing andThe company concludes that SAP HANA delivers “an unprece- promotions vehicles to deliver more effective results.dented robustness in real-time business analysis.” Leading through InnovationCisco, for example, has applied HANA to its seasonality anal- The arrival of SAP HANA has already changed the marketysis of customer purchase sentiment to a mere five seconds landscape. Competitors are following SAP’s lead and areno matter filters it applies to the report. While Lockheed announcing in-memory databases in an attempt to stay in theMartin has improved its labor utilization report by 1,131x in performance game. However, because SAP began its develop-responsiveness. And fashion and fragrance leader PUIG, with ment years ago, it has a long head start and will be able stay iniconic brands such as Prada and Paco Rabanne, are now able the lead for the foreseeable future as it continues to innovate.to predict sales trends in real-time for new products andmarkets with 400x boost in report execution. However, the biggest opportunity SAP HANA creates will be for business. It will unleash powerful and innovative applica- tions that exploit the wealth of knowledge within a company’s trove of Big Data. It will improve the capabilities and respon- siveness of operations, finance, marketing, engineering, andSAP HANA’s columnar architecture is virtually all areas of business. No longer will CEOs feel likedata agnostic, ideally suited for the they are swimming in information. Rather, they will be sailing across it, fully in control and charting new opportunities forvariety of Big Data pouring into increased growth and profitability. norganization’s today. There’s no practicallimit to the capacity of SAP HANA. Ken Tsai is the head of SAP HANA product marketing team at SAP and is responsible for driving marketing, communication, and adoption of SAP HANA in-memory data platform worldwide. Tsai has 17 years of experiences with application development, middleware, database, and enterprise applications. He has been with SAP for the past 7 years and is a graduate of University of California, Berkeley.24 Big Data Analytics Guide
    • TO PROvIDE BUSINESS INTELLIGENCE FOR EvERYONE IN AN ENTERPRISE, DATA DELIvERY AND ANALYSISMUST BECOME MORE NIMBLE THAN DATA MARTS CAN BE.DBTA: Data MartsCan’t Dance to Data’s New GrooveBy John Schitka, Senior Product Marketing Manager, and still widely rely upon—are no longer sufficient for today’sSybase IQ needs. You can put data marts near the top of that list. Data marts were a reaction to the extreme performance limi- tations of traditional enterprise data warehouses. The dataLimitations in scalability and business demand warehouse itself, which came of age in the 1990s, representedfor analytics are causing IT departments a tremendously enticing vision—offering to virtually every department across the enterprise an opportunity to see itsto rethink the traditional data warehouse/ performance metrics and find out what’s working and why.data mart strategy in favor of a powerful,centralized business analytics information grid. That is, data warehouses would have answered all of those questions, if only users could get to the data. Most organiza-Few things in the world are changing as dramatically as data. tions quickly discovered that data warehouses—with theirData has tapped out a powerful rhythm to keep time with centralized, brittle architecture—performed abysmally undertechnology’s bleeding edge, leaving many technologies strug- unpredictable workload demands. Even the load of just a fewgling to keep up. It should come as no surprise, then, that users could degrade performance precipitously. It quicklymany of the data strategies that IT departments developed— became clear that if they wanted to scale the data warehouse, organizations would need to replicate and distribute the data locally. Thus, data marts were deployed. The Power of PredictionBusiness leaders are looking for ways to While data marts were never a perfect solution, they adequatelygain deeper insights from data, to enable addressed businesses’ urgent need to let stakeholders from across the organization explore the data and uncover themore business users to search for these insights they hold. But while data mart deployments havedeep insights, and to directly embed these largely continued unabated for the past decade, business hasinsights into core business processes.Big Data Analytics Guide 25
    • changed dramatically: Global competition, mobility, social Intelligence for Everyonemedia, and the accelerating pace of business are forcing Thus, at many organizations today, IT is under mountingenterprises to re-evaluate how they think about data. pressure to abandon these wallflowers—traditional data warehouses (and data marts)—in favor of a quick-footedIn this fast-paced business climate, it’s no longer enough to modernized architecture thatuse the data warehouse to find out what happened in the past;today’s businesses need real-time data—data capable of • Can answer complex questions using massive volumesmaking credible predictions about what will happen in the of datafuture. Business leaders are looking for ways to gain deeper • Can scale massively to support the analytics needs of allinsights from data, to enable more business users to search enterprise usersfor these deep insights, and to directly embed these insights • Can embed advanced analytical models into end-processesinto core business processes. to help increase revenue and limit risk.Such predictive analytics can have a tremendously uplifting These new business demands are driving recognition of aeffect on business, especially when they can be embedded number of critical technology challenges:into the workflows and applications that power key businessprocesses. For example, analytics can be used on the fly to 1. Big Data: Today’s businesses are deluged with a massivedetermine the likelihood of fraud for any transaction, identify volume of data, created in part by the recognition that all thecross-sell opportunities, or to single out particularly influential data available to an enterprise can be analyzed.customers. Imagine the power of being able to alert call-center operators or branch agents to such conditions at the 2. Data type diversity: There are dozens of structured andearliest moments in the customer contact. AOK Hessen saved unstructured data types that must be included in the data$3.2 million by using predictive analytics to identify fraudulent warehousing effort, including numeric, text, audio, video,insurance claims. HMV Japan used it to better predict the high-resolution imagery, SMS, RFID, clickstream, and more.interests of its customers and increased per-transactionrevenue by 300 percent as a result. 3. Complex questions: The requirement for in-depth knowledge discovery means the solution must be capable of recognizing and adapting to data anomalies, recognizing data clusters and trends, identifying influencing factors and making reliable assumptions. According to a University of Texas study, product development alone justifies deploying analytics for a typical Fortune 1000 enterprise26 Big Data Analytics Guide
    • 4. Decision velocity: Enterprises are looking to make The Death of the Data Martdecisions in seconds and minutes, and not days or weeks. Building an enterprise data warehouse is generally viewed asThe solution must be able to answer user questions at the a long-term investment. And yet, traditional solutions havespeed of thought, and in some cases remove the user from proved to be surprisingly brittle—inadequate for the businessthe equation entirely. needs of tomorrow and unable to learn the steps to data’s new groove. Tomorrow’s massively scalable, grid-style5. Many users: Today’s business analytics environment must architectures provide an opportunity to create truly flexiblesupport decision-making at all levels of the organization: and predictive business analytics while solving the verytactical, operational, and strategic. Furthermore, enterprises problem data marts were invented to address in the firstare increasingly looking to incorporate analytics directly into place: a central place for all business users to access andbusiness operations. analyze all enterprise data.While each is undoubtedly a critical requirement of the An analytics-optimized information grid is the rightnew architecture, the fifth challenge, servicing many users, dance partner for today’s data. It will not only usurp theis perhaps the one that will most definitively set apart departmental data mart but it will take inflexible, flat-footedsuccessful solutions from those that fail to live up to their data warehouses of the past along with it. npotential. After all, even the most insightful conclusions are oflimited value if the data isn’t seen by the right people at theright time. For this reason, the next frontier in analytics is John Schitka, currently works indelivering intelligence for everyone. Solution Marketing at SAP focusing on database and technology products.The secret to delivering business analytics to the whole A graduate of McMaster University, heorganization is to harness smart parallelization techniques. also holds an MBA from the UniversityWhereas traditional data warehouses use a shared-nothing of Windsor. He has worked in productarchitecture, forcing users to wait in queues while resources marketing and product management inare locked by other queries, a high-performance business the high tech arena for a number of years and hasanalytics information grid will instead employ a shared- taught at a local college and co-authored a number ofeverything architecture. This will make it possible to: published text books. He has a true love of technology and all that it has to offer the world.• Share resources, making all data accessible to any server or a group of servers, allowing many simultaneous users with diverse workloads• Scale out independently and heterogeneously across resources with or without private clouds• Provide a “self-service” methodology that supports data analysis from anywhere, including from specialized applications, through the Web or on mobile devices.Big Data Analytics Guide 27
    • IN-DATABASE ANALYTICS ELIMINATES THE DATABASE-TO-ANALYTICS SERvER TRAvEL OF TRADITIONALMETHODS, PROvIDING FASTER, MORE SECURE, AND MORE ECONOMICAL RESULTS.In-Database Analytics:Reducing Travel TimeBy Courtney Claussen, Sybase IQ Product Manager, SAP port models to the database. Called in-database analytics, this technology eliminates the need to move data and signifi- cantly reduces the time and effort required for processing.Traditionally, data analysis has required data In-database analytical capabilities aren’t new. They have beento commute from home to work and back available commercially for nearly 20 years, but only recently have they begun to gain popularity.again. When a business asked a question ofits data, someone in the IT department had to To run in a database, an analytics model first must bemove a data set out of the database where it translated from the “native language” of its developmentresided, into a separate analytics environment environment to something its destination database can understand. Until recently, the way to do that would be tofor processing, and then back. This “commute” recode the model from scratch, which could take weeks,comprised the bulk of the time and the work months, or longer—rendering the final results so late that theyof the analysis, often causing frustration might no longer be useful. For this reason, in-databaseand delays on the business side as it waited analytics simply didn’t deliver much of a benefit over traditional methods.for results. Under the triple threat of burgeoning data volumes, sped-upIt doesn’t have to be this way, however. It’s now possible to business transactions, and more data use in the enterprise,build analytic logic into the database itself, or automatically the back-and-forth analytical process has become unbearable for many organizations. The PMML CatalystJust as working from home saves Predictive model markup language (PMML) is a big reason why in-database analytics has become a viable option. PMML,travel time and fuel costs, performing a flavor of XML and an industry standard, makes it easy toanalysis where the data resides saves transfer complex analytic models between different environ- ments. In practical terms, it means that the “translation”time and money. process that was once measured in days or weeks can now be completed in minutes or seconds, which dramatically reduces model implementation time. Just as working from home saves travel time and fuel costs, performing analysis where the data resides saves time and money. In-database analytics provide accurate results up28 Big Data Analytics Guide
    • Predictive model markup language (PMML) is big reason why in-database analytics has become a viable option.to 10 times faster than traditional methods, for roughly so they can stay competitive. In an industry where fractions20 percent less cost. of seconds can mean success or failure, in-database analysis of historical data and streaming feeds provides fast queryAnother advantage is security: Corporate information never execution and immediate risk understanding across multipleleaves the protection of the data warehouse. business units, so traders can make split-second decisions.Further, when reporting tools have the capability to run A private stock exchange in Asia uses in-database analytics toanalytic models inside the database, business users can use establish a comprehensive system to detect abusive tradingfamiliar tools to get the answers they need. These types of patterns to detect fraud.systems give decision makers easy access to powerfulanalytics on demand. Credit card companies rely on the speed and accuracy of in-database analytics to identify possible fraudulentIn-Database In Action transactions. By storing years’ worth of usage data, theyIn-database processing makes data analysis more accessible can flag atypical amounts, locations, and retailers, and followand relevant for high-throughput, real-time applications up with cardholders before authorizing suspicious activity.including fraud detection, credit scoring, risk management,trend and pattern recognition, predictive analytics, and ad hoc For enterprises around the world, in many industries,analysis, which allows business users to drill deeper into in-database analytics are providing a competitive advantage.existing reports or create new ones on the fly. When data doesn’t have to commute to work and back, it can deliver faster insights that help businesspeople makePredictive analytics applications use in-database processing informed decisions in real time—for less expense thanto fuel behavior-based ad targeting and recommendation traditional data analysis tools. nengines, such as those used by retail Web sites to encourageupsell and cross-sell (How about some batteries to go withthat flashlight?) and by customer service organizations to Courtney Claussen is a product managerdetermine next-best actions. at SAP concentrating on SAP’s data , warehousing and analytics products.The largest mortgage database in the United States uses She has enjoyed a 30-year career inin-database analytics to assemble ad hoc reports from billions software development, technical supportof records, delivering fast results over the Web—around the and product marketing in the areas ofclock. Customers receive information to make buy decisions computer aided design, computer aidedon mortgage securities 5 to 10 times faster than before. software engineering, database management systems, middleware, and analytics.Financial institutions use real-time analytics solutions tocontinuously assess risk positions and market opportunitiesBig Data Analytics Guide 29
    • Analytics AdvantageData Variety Is the Spice of AnalyticsToday’s data is as varied and diverse as the entities thatcreate it. Organizations that learn to roll with the dynamicnature of complex data open a new world of businessopportunity.By Amr Awadallah, CTO, Cloudera That is changing rapidly. Thanks to technologies such as Hadoop, businesses today can store raw data in a boggling array of formats and combine it all together in comprehensive analysis. Making sense of this variety is certainly an ITGreat decisions are usually the result of lots challenge, but it is also the source of great opportunity.of data. But for many years, businesses that When organizations properly instrument their analytics infrastructure to combine varied sources of data and reactwanted to leverage unstructured or complex on the fly to changes in data attributes and schemas, thedata to aid their decision making were limited results can be game-changing.to what they could glean from extract,transform, load (ETL) processes. Basically, The insights made possible by this dynamic approach to complex, aggregated data are unprecedented in business.if you couldn’t store it in a structured For example, the data can tell you—with a high degree ofdatabase, it wasn’t a decision resource. accuracy—what to sell, where to advertise, or when to try something new. They are responsible for new heights of profitability and dozens of new business models based entirely on insights that were previously inaccessible.New heights of profitability and The New Data Realitydozens of new business models are We are surrounded by data and yet, until recently, most of it was of little use in its raw form. Each data type is typicallybased entirely on insights that were so unlike the next in its syntax and structure that comparing—previously inaccessible. or even storing—such records side-by-side in a relational30 Big Data Analytics Guide
    • paradigm was impossible. Some examples of nonrelational Today, our most powerful tool for leveraging the potential ofdata types include: data variety is Hadoop. Hadoop makes no attempt to under-• sensor output stand data as it is copied. Rather it uses a schema-on-read• mobile device data methodology: It parses data and extracts the required schema• machine logs only when data is read for processing. Because no develop-• images and video ment cycle is required to accommodate new values or• social media columns, agility and flexibility are maximized.These data are very different in character and structure than Solving New Problemsany data that was previously suitable for analysis. They are Processing data in this way is a bit like alchemy. Whilecomplex. They lack schemas, fixed names, or fixed types. They individually inert, these data combine to become somethinghave nested structures rather than tables. far larger than the sum of the parts. At Cloudera we have helped dozens of customers create powerful competitiveBut what is most notably different about these data types advantages simply by seizing the opportunity that liescompared to traditional structured data is that these data dormant in their complex data. These businesses spanfrequently change form. The data characteristics that we care the gamut of industries and include agriculture, finance,about today may not be the same ones that we value manufacturing, and many more.tomorrow; thus, individual attributes are often added,dropped, or modified. Consumer Goods. A maker of consumer products collects consumer preference and purchasing data extracted fromThis dynamic property of new data types is a 180-degree shift surveys, purchases, web logs, product reviews from onlinefrom data processing in the past. Traditional structured data retailers, phone conversations with customer call centers,doesn’t change form very often, and the set of analytical even raw text picked up from around the Web. Their ambitiousprocedures performed on this data are very well defined. goal: to collect everything being said and communicatedTherefore, when building out relational database systems, publicly about their products and extract meaning from it. Byorganizations could afford to develop a static schema and doing this, the company develops a nuanced understanding ofreact to infrequent changes on an ad hoc basis. But in the new why certain products succeed and why others fail. They candata reality, organizations need instrumentation that adapts spot trends that can help them feature the right products inquickly to change, enabling them to answer questions they the right marketing media.didn’t anticipate or build into their data models.Figure 1. Amr_Awadallah-Cloudera article figure caption text here.Hadoop Use CasesHadoop Use Cases Two Core Use Cases Applied Across Verticals 1 Industry Term Vertical Industry Term 2 Social networking analysis Web Clickstream sessionization Content optimization Media Engagement Advanced Analytics Data Processing Network analytics Telco Mediation Loyalty and promotions analysis Retail Data factory Fraud analysis Financial Trade reconciliation Entity analysis Federal Signals Intelligence (SIGINT) Sequencing analysis Bioinfomatics Genome mappingSource: ClouderaWhere to Use Hadoop: In every vertical there are data tasks with which Hadoop can assist. These tasks havedifferent terms depending on the industry but they all come down to either advanced analytics or data processing.Big Data Analytics Guide 31
    • Agriculture. A biotechnology firm uses sensor data to Economy: Designed from the ground up to deal intelligentlyoptimize crop efficiency. It plants test crops and runs with commodity hardware, Hadoop can help organizationssimulations to measure how plants react to various changes transition to low-cost servers.in condition. Its data environment constantly adjusts tochanges in the attributes of various data it collects, including Conservation: Keeping data in a merged, isolated systemtemperature, water levels, soil composition, growth, output, provides business intelligence benefits and is both financiallyand gene sequencing of each plant in the test bed. These and ecologically sound.simulations allow it to discover the optimal environmentalconditions for specific gene types. These are compelling advantages, but for many organizations that have been wishing for years to perform analytics onFinance. A major financial institution grew wary of using corporate data that can’t be normalized, they are icing on thethird-party credit scoring when evaluating new credit cake. By transforming data of every type from a cost andapplications. Today the bank performs its own credit score management burden into a critical asset, a nonrelationalanalysis for existing customers using a wide range of data, approach to data is raising the efficiency of business aroundincluding checking, savings, credit cards, mortgages, and the globe. ninvestment data.A Multifaceted Advantage Amr Awadallah is Co-Founder and CTOWhile a nonrelational approach is great for encapsulating of Cloudera, where he is responsible forand drawing inferences from multivariate data, there are all engineering efforts from product devel-other advantages as well: opment to release, for both open-source projects and Cloudera’s proprietaryScale: Hadoop is the only technology proven to scale to software. Prior to Cloudera, Amr served80 petabytes in a single instance, making the size of the as Vice President of Engineering at Yahoo,data challenge moot for most organizations. and led a team that used Hadoop extensively for data analysis and business intelligence across the Yahoo online services. Amr holds bachelors and master’s degrees in electrical engineering from Cairo University, Egypt, and a doctorate in electrical engineering from Stanford University. In the new data reality, organizations need instrumentation that adapts quickly to change, enabling them to answer questions they didn’t anticipate or build into their data models.32 Big Data Analytics Guide
    • TEXT SEARCH, TEXT ANALYTICS, SEMANTICS, AND OTHER ANALYTICS STRATEGIES HELP MACHINESUNDERSTAND PEOPLE AND EXPOSE MEANINGFUL BUSINESS INFORMATION.Text Analytics for Speed Reading—Do YouMean What You Say?By Seth Grimes, Strategy Consultant and Potential applications for customer service and support,Industry Analyst, Alta Plana market research and competitive intelligence, life sciences and clinical medicine, financial services and capital markets, law enforcement and intelligence, and for other business tasks and in other industries can readily leverage textText analytics sounds abstruse, but the central analytics. Businesses that do not apply these new strategiesidea is simple. Text analytics turns human are missing out on valuable insight.communications—news articles, blogs, social The algorithms are quite interesting. They apply statistics,status updates and online reviews, corporate linguistics, and machine learning to discern and exploitfilings, e-mail, and survey responses—into information captured in an array of textual sources.data that can be crunched to support fast, Automated solutions take adopters far beyond the capacity and speed, and sometimes the pattern-recognition acuity,accurate, optimal business decision making. achievable via human analyses. Text analytics is part of every comprehensive business intelligence (BI) program, which is a component of anyText analytics seeks answers, not the complete analytics strategy.links and documents retrieved by most Why Text Analytics?search systems. Answers are found in Text conveys both quantitative and qualitative information; it records and communicates events, data, facts, and opinions.the information content of documents.Big Data Analytics Guide 33
    • Think of all the text that individuals, businesses, governments, From Information to Insightand communities of all stripes generate: Text analytics involves a few basic, transformative steps:• A corporate 10-K includes financial data tables and 1) Collect and select source material, whether via Web extensive narrative describing products and services, retrieval or via hooks into an e-mail system, document market conditions, operations, and outlook. repository, database, or file system.• A news article or opinion piece covers and provides context 2) Extract the business-relevant information you need. for events—it may describe people, organizations, and relationships, as well as locations, products, and 3) Apply BI and data-mining techniques that automate text actions—in narrative form, intended to inform or inspire. processing and help you generate insights.• A warranty or insurance claim details product or service Information sourcing often starts with search, with a news defects and damage, with significant quality, liability, feed, or via a software connector or application programming customer-relationship, and reputation implications; a close interface (API) into an external system. Search alone, it should reading may also expose fraud. be noted, is rarely sufficient in today’s fast-paced business environments. Text analytics seeks answers, not the links and• E-mail traffic is part of corporate decision-making documents retrieved by most search systems. Answers are processes, but e-mail use creates risk of inadvertent found in the information content of documents, especially in (or intentional) exposure of sensitive or proprietary the ensemble of linked documents and databases. information, with compliance repercussions. Text analytics extracts features such as:• A hotel or restaurant visitor posts likes and dislikes, visible to the world, to an online forum such as TripAdvisor or • Entities, such as people, places, companies, products and Yelp. These reviews, and similar examples such as Amazon- brands, ticker symbols posted product reviews, expose preferences and flaws and influence consumers’ choices. • Patterned information such as telephone numbers, e-mail addresses, and datesThe richness and diversity of text sources creates bothopportunities and challenges. How do you systematically • Topics, themes, and conceptsget the information you need and filter the noise? How doyou turn text-sourced information into insights that can • Associations, facts, relationships, and events, such as adrive better decision making? person’s job title, a stock closing price, a vote in Congress • Opinions, attitudes, and emotions, such as sentiment about the spectrum of entities and topics Semantic computing overlays meaning on data objects and enables content enrichment, advanced categorization, and classification.34 Big Data Analytics Guide
    • Semantic information is of particular interest because this Users benefit from a choice of installed, hosted, and cloudinformation helps us interrelate text-sourced entities and link implementations. They can access the data via data-analysisthem to database records and into the emerging semantic workbenches, as a service (via APIs), and embedded inWeb. Semantic computing overlays meaning on data objects.It enables content enrichment, advanced categorization andclassification, dynamic data integration, and semantic search Text-analytics solutions are tailored tothat extend beyond keywords to cover concepts, patterns,and relationships. text’s many forms and languages to help discover opportunity in data variety.Text analytics generates the semantic information that fuelsthe linked-data Web as well as next-generation BI and data line-of-business applications. Options available are bothmining. There’s huge analytical lift in adding semantic, text- commercial and free, with open source distributions.sourced information to the enterprise analytics mix, enabling Challenges still exist, but the benefits are huge and theintegrated analysis of text and data. barriers to getting started are low.Text as Big Data Text analytics is here-and-now, meeting a spectrum ofA text-analytics solution can systematically, accurately, and business needs. The idea is simple: Text analytics helpsquickly extract whatever information content interests a machines understand people. It erases enterprise databusiness. Text’s volume, velocity, and variety—the three Vs boundaries and is a key source of competitive advantageof Big Data—do pose a challenge, however. Text is produced in 2012 and beyond. n24 hours a day: online, via social media, within the enterprise,in informal chatter and formal settings (such as science labs,courts, and corporations), and in dozens of business-relevant Seth Grimes is an analytics strategylanguages around the globe. consultant with Alta Plana Corporation, located near Washington, DC, and a leadingFortunately, text-analytics software can handle large data IT industry observer focusing on businessvolumes via parallelized, distributed computing frameworks intelligence, text analytics, and decisionsuch as Apache Hadoop. The software runs in-memory to support. Grimes is a longtime InformationWeekcope with data velocity and semantics, and the solutions are contributing editor and founding chair oftailored to text’s many forms and languages to help discover the Sentiment Analysis Symposium and the Textopportunity in data variety. Analytics Summit.Big Data Analytics Guide 35
    • WITH FACIAL RECOGNITION SOFTWARE, ANYONE CAN SORT AND IDENTIFY A SINGLE FACE FROM AMONGHUNDREDS, EVEN THOUSANDS, OF POSSIBILITIES.Image Recognition, Pattern Identification,and the New Memory GameBy Joydeep Das, Director, Data Warehousing and I See YouAnalytics Product Management, SAP Today, consumers with off-the-shelf desktop computers use facial recognition software embedded in their photo- management applications to sort through thousands of their digital photographs. With it, they can quickly organize imagesLuckily for society, image-recognition of family members and friends with ease. Popular socialtechnology has become widely available to networking sites, such as Facebook and Google+, include automated “tagging” services that can detect individualsautomate image analysis quickly. Without within uploaded photographs by comparing their faces toautomated systems, the torrents of image previously identified people in a member’s image portfolio.data encountered daily would overwhelm us.Detecting subtle differences among Perhaps more impressive, people now carry facial recognition technology in their pockets. Users of iPhone and Androidthousands, even millions of images within smartphones have applications at their fingertips that usereasonable time constraints is beyond human facial recognition technology for various tasks. For example,abilities. Simply put, professionals in a variety Android users with the remembAR app, can snap a photo ofof enterprises could not get their jobs done someone, then bring up stored information about that person based on their image when their own memory lets themwithout computer-aided image recognition. down—a potential boon for salespeople. iPhone users can unlock their device with RecognizeMe, an app that uses facialTake facial recognition technology. As with any image recognition in lieu of a password. If deployed across a largerecognition system, it relies on compute-intensive algorithms enterprise, this app could save an average of $2.5 million ato determine a person’s unique features—eyes, mouth, nose, year in help-desk costs for handling forgotten passwords.and more—to positively identify him or her. But it’s no longerthe rarefied task of highly skilled professionals, using banks of Marketers have begun to use facial recognition softwareservers to crunch enormous amounts of data, to identify a to learn how well their advertising succeeds or fails atsingle face. It’s become an everyday tool for consumers, stimulating interest in their products. A recent studybusiness, and government to improve their personal lives, published in the Harvard Business Review looked at whattarget customers better, and deliver services to citizens kinds of advertisements compelled viewers to continuemore efficiently. watching and what turned viewers off. Among their tools was “a system that analyzes facial expressions to reveal what viewers are feeling.” The research was designed to discover what kinds of promotions induced watchers to share the ads with their social network, helping marketers create ads most likely to “go viral” andWithout automated systems, the torrents improve sales.of image data encountered daily wouldoverwhelm us.36 Big Data Analytics Guide
    • The ubiquity of facial recognition technology has raised some thorny social and legal issues.Air passengers with ePassports in Australia and New Zealand combined with widespread adoption of social networks.can get first-class service through customs with the deploy- For example, in what has been called an example of “digitalment of facial recognition technology called SmartGate. vigilantism,” after the London riots in summer 2011, aInstead of waiting in line for a border control officer, eligible Google group called London Riots Facial Recognitiontravelers use a kiosk that uses facial recognition software attempted to identify lawbreakers by matching theircombined with data held on the ePassport to process the images caught on CCTV cameras with those on Facebookperson through customs. More than 1 million travelers pages. The ad hoc group abandoned their effort when testshave used SmartGate, and officials have deemed it “an proved disappointing.unqualified success.” In Canada, authorities attempted to combine FacebookState of the Art information and ICBC image data with photos taken duringSAP Sybase IQ is a platform for facial recognition technology the riots in Vancouver after the local team failed to win thein a variety of applications. It stores the image data and Stanley Cup. But a court ruled that a warrant would beexecutes the image-processing functions inside the database needed by the authorities to use facial recognition tools inthrough the User Defined Function interface. this instance.In one implementation, every new image is represented by Despite the murky legal and privacy ramifications of facialnumeric data, called “Image DNAs,” which can be compared recognition, the technology is now widely available and willwith the DNA of stored images. First, an IQ database is loaded not vanish. It will be an increasingly vital tool to helpwith a set of training images having particular characteristics. consumers manage their digital selves while helping businessThen a batch of new images is compared against the set of and government to deliver improved products and servicestraining images to filter out those with similar characteristics, more efficiently and securely. nresulting in a much smaller image set to be analyzed further.A filtered and processed image is shown with people’s faces Joydeep Das has worked in the field ofoutlined with colored boxes. Users select a particular face, enterprise databases for over 20 yearsclick on a “Search” button, and within a few seconds all in leadership roles in engineering andimages in the database that include that specific person product management. As an engineer,will appear. Such a tool can be used for everything from he led several research and developmentlaw enforcement to deleting unwanted photos of yourself projects in leading DBMS firms. In hison Facebook. product management role, Das has been a strong advocate of SAP’s data warehousing productFace the Facts line in setting its product and business strategy andHowever, the ubiquity of facial recognition technology has managing its day-to-day operations. He frequentlyraised some thorny social and legal issues, especially when speaks at tradeshows, user conferences and webcasts.Big Data Analytics Guide 37
    • WHEN IT COMES TO BIG DATA, PEOPLE UP AND DOWN HIGHWAY 101 IN SILICON VALLEY TALK ABOUTTECHNOLOGIES SUCH AS NOSQL, HADOOP, AND MAPREDUCE AS THOUGH THEY’LL SOLVE ALL OF OURPROBLEMS. WHILE THESE ARE CERTAINLY EXCITING NEW CAPABILITIES, TECHNOLOGY ALONE IS ABOUT10% OF THE ANSWER TO BIG DATA.Technology Alone is Not the AnswerBy Byron Banks, Vice President, Business Analytics, SAP images, video, and the whole universe of social media out on the Web. From a technology point of view, we’ve made a lot of progressI’m Byron Banks and have more than 20 years with providing people with better tools to solveof experience with enterprise applications. the technical challenge of dealing with thisCurrently I manage a solution marketing team massive amount of ever-changing data, butat SAP that is focused on enterprise what we also need to do is help companiesinformation management (EIM) and data leverage this data to support businesswarehousing (DW) solutions, so I am not new objectives—be it to improve the efficiency andto the challenges that organizations are facing operations of a business area or make betterfor improving business results by providing business decisions of almost any kind.integrated, accurate, and trusted information For example, technology has allowed people to spend entirethroughout the enterprise. Hence, I do not days reading e-mail, scanning Facebook, and staying current with what’s happening on the Web. But does this make themwant to discount the contribution of more productive at work? Yes and no. No in the sense that thetechnologies to solving the Big Data dilemma. access to more information in itself doesn’t necessarily makeHadoop, MapReduce, and other recent you a better employee and access to everything at once caninnovations are helping companies deal with actually be overwhelming. But having insight to the “right” information can help you close a deal or deliver betterever-increasing amounts of data, whether customer service if it comes to you in a way that gives youthey’re working with traditional rows of insight to the task at hand.transactional data inside enterpriseapplications or information in documents, Big Data Equals More Business Insight At SAP, we are focusing not only on the technology dimension of Big Data but also on how to integrate these new innovations into business solutions that help the individual lines of business and industries identify what pieces of the data stream people need access to and how to turn the data intoFor example, technology has allowed actionable information that the business can understandpeople to spend entire days reading and use. The real opportunity with Big Data is it gives the business users more sources of knowledge to tap into, toe-mail, scanning Facebook, and staying combine with sales and inventory data stored in traditionalcurrent with what’s happening on data warehouses, and thereby get a better, more complete understanding of how their customers perceive them andthe Web. their brand, what products and services are most appealing, and perhaps what the competition is up to. This better, more complete picture of the market then informs the38 Big Data Analytics Guide
    • But having insight to the “right” information can help you close a deal or deliver better customer service if it comes to you in a way that gives you insight to the task at hand.business user on what they should be doing next—such as However, by using these new technologies to do the moni-when to run promotions, adjust pricing, or maybe plan new toring, aggregating, and analyzing of these numerous commu-product enhancements. nities, tracking running publications, and even following influential runners and coaches on their blogs and TwitterFor example, let’s say you are a product manager in athletic feeds, the application can detect patterns and highlight trendsfootwear and you’re trying to decide what’s going to be the in millions of individual postings. One trend discovered couldnext update of your product. You’re designing next season’s be that a segment of the market—maybe the “weekendrunning shoe, and you need to figure out when would be the warrior” runner–is encountering Achilles heel injuries, which isright time to introduce the next version and what design a serious injury for runners, when they wear the minimalist-changes you may want to incorporate. Part of that decision type of running shoe. With this type of information in hand,will be based on how well the current version is selling, the the product manager can now make more informed businessinventory level, and the cost and profitability of the current decisions as to how to plan the full product line and associ-version. A lot of that information is easily accessible in enter- ated marketing campaigns so that there will be products andprise systems you already have. But once you come to the advertising that appeals to the type of runner that will do wellconclusion that inventory is low, or that price discounting is with minimalist footwear, and they will also still retain tradi-increasing due to competition, then maybe it’s time to start tional running footwear styles and promotional spending to goplanning a product update. What will that update entail? after the people not suitable for the “barefoot” runner trend.Big Data on the Run Used effectively, Big Data combined with your existing enter-With running shoes, one popular trend for the past few years prise data can help you get closer to your market and yourhas been a very minimal, lightweight running shoe. Based on business, to shift traditional conversations around pricing andjust past sales data, you could see the recent strong demand profitability to one that considers a holistic view of not onlyand say we need to create another minimalist shoe with a new what happened yesterday, but also what is happening now,color and pattern, and that’s your update. Good product next week, and next month. By doing this, you don’t replacemanagers would also go to industry events, read the relevant your current best practices for say product planning; you justpress and magazines, and maybe work with a consultant, for augment them with additional information sources so thatmore knowledge about industry trends, or even conduct a you can ask questions and discover new trends and insightsfocus group or two. That’s how you would have proceeded in you may have not realized about your business. nthe past. But wouldn’t it be better to augment that with moreinsight and analysis based on hard data? Byron Banks has more than 20 years ofBy leveraging these new Big Data technologies and inte- experience with enterprise applications.grating them with existing business solutions and processes, He currently manages a solution marketinginnovative organizations can now give that product manager a team that is focused on enterprise infor-lot more insight to validate some of the decisions being mation management (EIM) and datacontemplated. In the realm of athletic footwear, there’s a huge warehousing (DW) solutions that enableamount of discussion occurring in online communities, blogs, organizations to improve business resultsexpert commentary, and online magazines. The challenge is by making integrated, accurate, and trusted informationthat there is so much discussion going on, it is more than a available throughout the enterprise.person, or team of people, can read and analyze on their own.Big Data Analytics Guide 39
    • Analytics InnovationsWhat’s All the Hadoop-la About?Hadoop can bring value to Big Data analysis projects, butit’s not the solution to every need.By Wayne Eckerson, Principal, BI Leader Consulting Hadoop and NoSQL Free Software. Hadoop is an open-source distributed file system that is capable of storing and processing largeThere are two types of Big Data in the market volumes of data in parallel across a grid of commoditytoday. There is open source software, focused servers. Hadoop emanated from companies such as Google and Yahoo, which needed a cost-effective way to build searchlargely around Hadoop, which eliminates indexes. Engineers at these companies knew that traditionalupfront licensing costs for managing and relational databases would be prohibitively expensive andprocessing large volumes of data. And then technically unwieldy, so they came up with an alternativethere are new analytical engines, including that they built themselves. Eventually, they gave it to the Apache Software Foundation so others could benefit fromappliances and column stores, which provide their innovations.significantly higher price-performance thanthe general-purpose relational databases that Today, many companies are implementing Hadoop softwarehave dominated the market for three decades. from Apache as well as third-party providers such as Cloudera, Hortonworks, EMC, and IBM. Developers seeBoth sets of Big Data software deliver higher Hadoop as a cost-effective way to get their arms around largereturns on investment than previous volumes of data. Companies are using Hadoop to store,generations of data management technology, process, and analyze large volumes of Web log data so theybut in vastly different ways. can get a better feel for the browsing and shopping behavior of their customers. Previously, most companies outsourced the analysis of their clickstream data or simply let it “fall on the floor” since they couldn’t process it in a timely and cost- effective way.While each server may not cost a lot, Data Agnostic. Besides being free, the other major advantagecollectively the price adds up. of Hadoop software is that it can handle any type of data.40 Big Data Analytics Guide
    • Unlike a data warehouse or traditional relational database, people have the skills or experience to run it efficiently in aHadoop doesn’t require administrators to model or transform production environment. These specialists are hard to find,data before they load it. With Hadoop, you simply load and go. and they don’t come cheap. Members of the Apache SoftwareThis significantly reduces the cost compared to a data ware- Foundation admit that Hadoop’s latest release is equivalent tohouse. Most experts assert that 60 to 80% of the cost of version 1.0 software, so even the experts have a lot to learnbuilding a data warehouse, which can run into the tens of since the technology is evolving at a rapid pace. Nonetheless,millions of dollars, involves extracting, transforming, and Hadoop and its NoSQL brethren have opened up a vast newloading (ETL) data. Hadoop virtually eliminates this cost. frontier for organizations to profit from their data.As a result, many companies are starting to use Hadoop as a Analytic Platformsgeneral-purpose staging area and archive for all their data. So, The other type of Big Data predates Hadoop and NoSQLa telecommunications company can store 12 months of call variants by several years. This version of Big Data is less adetail records instead of aggregating that data in the data “movement” than an extension of existing relational databasewarehouse and rolling the details to offline storage. With technology optimized for query processing. These analyticalHadoop, they can keep all their data online, eliminate the cost platforms span a range of technology, from appliances andof data archival systems, and feed the data warehouse with columnar databases to shared-nothing, massively parallelsubsets or aggregates that a majority of users want to view. processing (MPP) databases. The common thread amongThey can also let power users query Hadoop data directly if them is that most are read-only environments that deliverthey want to access all the details or can’t wait for the aggre- exceptional price-performance compared to general-purposegates to be loaded into the data warehouse. relational databases originally designed to run transaction- processing applications.Hidden Costs. Of course, nothing in technology is ever free.When it comes to processing data, you either “pay the piper” Sybase (now SAP) laid the groundwork for the analyticalupfront, as in the data warehousing world, or at query time, as platform market when it launched the first columnar databasein the Hadoop world. Before querying Hadoop data, a devel- in 1995. Teradata was also an early forerunner, shipping theoper needs to understand the structure of the data and all of first analytical appliance in the early 1980s. Netezza kickedits anomalies. With a clean, well understood, homogenous the current market into high gear in 2003 when it unveiled adata set, this is not difficult. But few corporate data sets fit popular analytical appliance and was soon followed bythis description. So a Hadoop developer ends up playing the dozens of startups. Recognizing the opportunity, all the bigrole of a data warehousing developer at query time, interro- names in software and hardware—Oracle, IBM, HP, and SAP—gating the data and making sure its format and contents subsequently jumped into the market, either by building ormatch expectations. Querying Hadoop today is a “buyer buying technology, to provide purpose-built analyticalbeware” environment. systems to new and existing customers.Moreover, to run Big Data software, you still need to purchase, Although the price tag of these systems often exceeds ainstall, and manage commodity servers (unless you run your million dollars, customers find that the exceptional price-Big Data environment in the cloud, say through Amazon Web performance delivers significant business value, in bothServices). While each server may not cost a lot, collectively tangible and intangible form. XO Communications recoveredthe price adds up. $3 million in lost revenue from a new revenue assurance application it built on an analytical appliance, even before itBut what’s more costly is the expertise and software required had paid for the system. It subsequently built or migrated ato administer Hadoop and manage grids of commodity dozen applications to run on the new purpose-built system,servers. Hadoop is still bleeding-edge technology, and few testifying to its value. Most experts assert that 60 to 80% of the cost of building a data warehouse, which can run into the tens of millions of dollars, involves extracting, transforming, and loading (ETL) data. Hadoop virtually eliminates this cost.Big Data Analytics Guide 41
    • To gain a competitive edge for its online automobile valua- Today, we find that companies that have tapped out theirtions, Kelley Blue Book purchased an analytical appliance to SQL Server or MySQL data warehouses often replace themrun its data warehouse, which was experiencing performance with analytical platforms to get better performance.issues. The new system reduces the time needed to process However, companies that have implemented an enterprisehundreds of millions of automobile valuations from one week data warehouse on Oracle, Teradata, or IBM database systemsto one day. Kelley Blue Book now uses the system to analyze often find that the best use of analytical platforms is to sitits Web advertising business and deliver dynamic pricing for alongside the data warehouse and offload existing analyticalits Web ads. workloads or handle new applications. By offloading work, the analytical platform can help organizations avoid a costlyChallenges. Given the upfront costs of analytical platforms, upgrade to their data-warehousing platform, which mightorganizations usually undertake a thorough evaluation of easily exceed the cost of purchasing an analytical platform.these systems before jumping on board. SummaryFirst, companies must assess whether an analytical platform The Big Data movement consists of two separate, butsufficiently outperforms their existing data warehouse interrelated, markets: one for Hadoop and open sourcedatabase, which requires testing the systems in their own data management software and the other for purpose-built,data centers using their own data across a range of queries. analytical engines with SQL databases optimized for queryThe good news is that the new analytical platforms usually processing. Hadoop avoids most of the upfront licensing anddeliver jaw-dropping performance for most queries tested. loading costs endemic to traditional relational databaseIn fact, many customers don’t believe the initial results and systems. However, since the technology is still immature,rerun the queries to make sure that the results are valid. there are hidden costs that have thus far kept many Hadoop implementations experimental in nature. On the other hand,Second, companies must choose from more than two dozen analytical platforms are a more proven technology, but imposeanalytical platforms on the market today. They must decide significant upfront licensing fees and potential migrationwhether to purchase an appliance or a software-only system, costs. Companies wading into the waters of the Big Dataa columnar database or an MPP database, or an on-premise stream need to evaluate their options carefully. nsystem or a Web service.Finally, companies must decide what role an analytical Wayne Eckerson has been a thought leaderplatform will play in their data warehousing architectures. in the business intelligence field since 1995.Should it serve as the data warehousing platform? If so, does He has led numerous research studies andit handle multiple workloads easily or is it a one-trick pony? is a noted speaker, blogger, and consultant.If the latter, which applications and data sets should be He is the author of the best-selling bookoffloaded to the new system? How do you rationalize having Performance Dashboards: Measuring,two data warehousing environments instead of one? Monitoring, and Managing Your Business, and is currently working on a new book that profiles analytical leaders. He is principal consultant at BI Leader Consulting and director of research at TechTarget. XO Communications recovered $3 million in lost revenue from a new revenue assurance application it built on an analytical appliance, even before it had paid for the system.42 Big Data Analytics Guide
    • IN SOME COMPANIES, BIG DATA IS CAUSING INFORMATION OVERLOAD FOR DECISION MAKERS. FOR OTHERSTHAT LEVERAGE CEP TECHNOLOGY, IT’S OFFERING A COMPETITIVE ADVANTAGE.Fast Flowing Decisions ThroughStreams of DataBy Irfan Khan, Senior Vice President and When Real Time Is Real MoneyChief Technology Officer, SAP Database and Technology Eventually, nearly every large organization will confront Big Data in its own way. But some businesses will face it sooner. Take the wireless telecommunications sector. Carriers are already drowning in data. Yet, with the arrival of 4G LTEToday, Big Data is swarming over the Internet networks, their current data deluge may very well look like aat punishing volumes and velocities. And the trickle. According to Cisco, mobile data traffic will grow 26 times between 2010 and 2015, reaching 6.3 exabytes persmartest enterprise executives in a variety of month by 2015. That’s a stunning compound annual growthmarkets are pushing their organizations to rate of 92%. Mobile IP traffic will jump 300% faster than fixedembrace Complex Event Processing (CEP) as broadband IP levels and will be 8% of the total IP traffic bywell as in-memory database technologies to 2015. But even those amazing growth figures may be too low. Researchers at the New Paradigm Resources Group say Ciscoanalyze and act upon vast amounts of data in underestimated the impact of 4G LTE on IP traffic levels by athe blink of an eye. They do so knowing that factor of 10.for their companies to gain or retaincompetitive advantage it is imperative that Crunching this data is critical to wireless operators. Real-time CEP analytics as well as in-memory database technologiesthey be able to make fast-flowing decisions can be used to predict how these huge traffic volumes, withthrough the onrushing streams of data. their attendant massive capacity spikes, will affect resource consumption. They can be used to allocate bandwidth on the fly and target the deployment of resources to avoid network meltdown. Carriers that cannot detect critical changes to their infrastructure in real time will risk rising call-failure rates,Eventually, nearly every large increased customer churn, and unfulfilled corporate service-organization will confront Big Data in level agreements that inevitably lead to nasty surprises on their balance sheets. In the end, it’s all about money.its own way. But some businesses willface it sooner.Big Data Analytics Guide 43
    • And speaking of money, in the financial services sector the IRONIC CONVERGENCEgrowth in data and its speed of arrival are equally staggering. Among the ironic convergences in technology history, the year 1962This sector has long been a leader in pushing the technology stands out. It was then, according to the Oxford English Dictionary, thatenvelope to analyze huge amounts of data arriving at machine the phrase “information overload” found its way into the language. It isspeed. Still, in the Big Data era, the information pours in at also the year that the Computer History Museum assigns to the originfaster rates from numerous sources, potentially overloading of the Internet. Of course, 1962 is notable for being neither a time ofexisting conventional business intelligence systems. information overload nor of the Internet. For that convergence we had to wait until our day, the era of Big DataFor example, one of many such data sources, The Options However, the 1960s were when computer-generated data began to bePrice Reporting Authority (OPRA), provides the details on used in formal “decision support” systems, the precursors to our moderncompleted trades and current options, among other vital analytics platforms. Basic quantitative models were developed to analyzeaspects of the daily markets critical to many financial enter- information that, at the time, was considered too much for business managers to sift through. It’s amusing to consider that in those daysprises. It produces a vast amount of data each and every data sets were measured in kilobytes, and 300-baud modems were thesecond. And its volume of information is growing at phenom- high-speed interconnects. Yet, even back then, with relatively paltryenal rates. In 1995 OPRA delivered 500 peak messages per amounts of information to evaluate over leisurely time frames, savvysecond (MPS) to its clients. By 2005 that increased to a business leaders understood that there was value to be gleaned fromwhopping 83,000 peak MPS. And in 2010, in only five years, the data.the OPRA feed ballooned to 2,200,000 peak MPS. Each MPS There’s no point in wishing we were back in simpler times whenaverages 120 bytes, translating to 264 megabytes per second “information overload” was a printed green-bar report. Nostalgia hascoming from OPRA’s data fire hose alone. no place in our Big Data era. Luckily, CEP does.And, as noted, OPRA is merely one of dozens of sourcespumping out huge quantities of data. Yet, accepting this datais merely the cost of doing business for most financial Attempts to blame the volume and velocity of the data forservices companies. Being able to apply CEP technology and failing to meet customer needs is not a valid excuse. CEP andinstantly and effectively analyze and intelligently act on the in-memory database tools exist to conduct real-time analyticsdata is the difference between a profit and a loss on each of in a variety of business environments. Companies that usethe millions of daily transactions. them will prosper. Companies that don’t will muddle or fail. nProspering With CEPOrganizations that are unprepared for the data deluge As Senior Vice President and Chiefdescending upon them are ripe for disaster. If they are unable Technology Officer, Irfan Khan overseesto glean insight from all the information they are gathering, all technology offices in each of Sybase’sthey will be battered in the marketplace. Consider retailers business units, ensuring market needs andduring the past holiday season that were unable to predict customer aspirations are reflected withinproduct demand, took orders they could not deliver, or prema- the company’s innovation and productturely changed customer service plans, resulting in wide- development. Khan is also responsible forspread customer dissatisfaction and hits to their balance setting the architecture and technology direction forsheets and public images. the worldwide technical sales organization. Each MPS averages 120 bytes, translating to 264 megabytes per second coming from OPRA’s data fire hose alone.44 Big Data Analytics Guide
    • FROM RETAIL TO LAW ENFORCEMENT, SAVVY COMPANIES ARE FIGURING OUT THAT THE CONNECTIONS WEMAKE TO ONE ANOTHER ARE A VALUABLE SOURCE OF MARKETING INTELLIGENCE.Age of Influence:Making the Most of Social NetworksBy Bruno Delahaye, Senior Vice President Worldwide Welcome, then, to the age of influence. Whether it’s a movieBusiness Development, KXEN recommendation, fashion advice, or simply directing others’ attention to an unusual YouTube video, influence is a hot commodity. But finding influential people is not as easy as it sounds. It requires advanced mathematical modelingIn a world where media channels are saturated techniques and high-powered computer networks. Despitewith noise, it’s more difficult than ever to be these challenges, organizations are beginning to learn that social network analysis (SNA)—the process of modeling theheard through traditional marketing. relationships between connected people to find patterns—is aMarketers have learned that shouting louder is powerful source of customer insight.not effective. Reaching a smaller number ofhighly influential people—and letting them A Sizable Challenge The value of influence is hardly a well-kept secret. Ever sincedistribute your message—can be far more researcher Stanley Milgram published his study on the “sixeffective at a lower cost. degrees of separation” in 1967, organizations have been trying to figure out how to reach the most influential people. But finding hubs of social influence is a complicated mathematical problem requiring advanced network analytics technology. Historically, such analysis has been expensive and compli-Not all social networks consist of cated to support.individuals who know one another. That’s By their nature social networks are inclusive, involving individ-a misconception fueled by the popularity uals within and outside of the organization’s customer base.of social networking Web sites such as They also attempt to describe the nature of the relationship between individuals (nodes) in the model. Thus, the resultingFacebook and LinkedIn.Big Data Analytics Guide 45
    • Social Network AnalysisA Social Network Model: SNA provides useful visualizations of the interconnectedness of people.data sets are very large; it’s not unusual to have billions of At the same time, new computational techniques haverecords in a social network. Just a few years ago, the compu- emerged to handle large volumes of social network data moretational time frame to run such a model would take weeks or efficiently. With less expensive infrastructure and better math-months, to say nothing of the cost to store the data. ematic principles impelling it forward, SNA is poised to make big improvements in business intelligence quality for a wideToday the falling cost of storage and computer processing variety of companies.power has finally made SNA an achievable—even practical—undertaking. Today’s state-of-the-art in-memory technologies Tuning Predictive Power There’s no such thing as a universal network. Every individual belongs to a range of different networks—for example, one forSNA can create communities based on work, one for home; or perhaps weekday versus weekend behaviors—and within each of those networks the individualproduct interests and buying patterns. plays a different role. A person’s amount of influence can varyFor example, a community of strangers wildly from network to network. For example, he or she may be very influential in a technology-related network but be anwho buy designer handbags can advice seeker in a network related to fashion or fine art—be mined for other preferences having little or no influence on others.and commonalities. For marketers the trick is to leverage these differences by optimizing their analyses on a case-by-case basis. Socialallow for a virtually unlimited amount of information to be held networks are especially useful when the results are used toin memory, providing a real systematic solution for the amend existing predictive analytic efforts, thereby improvingmassive analytic requirements of SNA. the results.46 Big Data Analytics Guide
    • One of the most valuable uses of SNA is in predicting strangers who buy designer handbags can be mined for otherpurchasing behaviors. SNA can show that communities of preferences and commonalities. Using predictive analytics,users tend to change their loyalties in a somewhat predictable you may be able to figure out what purchases certainmanner. For example, in a close community of phone users, it customers are likely to make soon.frequently takes only one high-volume phone user to changecarriers in order for other friends and family members to Fraud/risk. Individuals or companies that engage in fraud arefollow suit. This is powerful knowledge for a marketing organi- likely to have social connections to other fraudulent entities.zation: For example, marketers can offer targeted incentives Once you’ve identified one customer as a fraudster, SNA canto prevent the exodus before it happens. help you identify other potential frauds through transactions, such as wire transfers or medical claims. Assessing risk ofSNA can also help organizations make sense of the way that nonpayment or bankruptcy can be done in a similar fashion.viral messages or campaigns diffuse throughout a community.While these messages are often passed among direct friends Law enforcement. Social networks can help investigatorsand acquaintances, many bloggers and Twitter users possess locate dangerous individuals through the people they aresufficient influence to spur a viral event without direct, connected to. In fact, SNA was used by the U.S. government toperson-to-person contact. track, and ultimately locate, Osama bin Laden.By adding social network interactions to predictive analytics, Powerful Returnsyou can extract results that tell about user behavior based SNA isn’t just a cool idea riding the coattails of today’s moston who they are connected with, not just about the individual popular online pastime. Regardless of how you use it, thein isolation. results can be meaningful. A European telecom company improved the accuracy of its predictive analytics by 50%. AnBirds of a Feather online auction site used SNA to offer users recommendationsNot all social networks consist of individuals who know one based on past interests and purchases, and saw a 30% lift inanother. That’s a misconception fueled by the popularity of clicks as a result.social networking Web sites such as Facebook and LinkedIn,where individuals establish friends or follow one another Interconnectedness of people is an old idea that is finallydirectly, which should not be confused with SNA. Direct social making the technology marriage it needs to reach its businessnetworks—where the individuals in a given community all potential. Whether the payoff is higher sales, preventingknow one another—are generally only available to a few losses, or keeping customers more engaged and loyal, SNA iscompanies. These include phone network operators and a breakthrough technology with broad applicability. ntransportation companies, which have unique access to tele-phone call data and travel itineraries. Bruno Delahaye is responsible forBut social networks don’t have to be direct communities of managing and developing strategicfriends and acquaintances. Plenty of companies are getting partnerships worldwide. Delahaye bringshigh value out of indirect social networks—ones where the extensive management and technicalconnections are measured in interest or similarity rather than experience to partner relationships, withfamiliarity. Some examples: over 10 years of experience in providing high return on investment from data miningRetail. SNA can create communities based on product inter- with partners in sectors like telecommunication,ests and buying patterns. For example, a community of finance and retail. A European telecom company improved the accuracy of its predictive analytics by 50%.Big Data Analytics Guide 47
    • PREDICTIVE MODEL MARKUP LANGUAGE PROVIDES A STANDARD LANGUAGE TO INTEGRATE WITH MANYDATA MINING TOOLS, AND TO AUTOMATE DEPLOYMENT AND EXECUTION OF PREDICTIVE MODELS INSIDEAN ANALYTICS SERVER.Embracing a Standard forPredictive AnalyticsBy Michael Zeller, Ph.D., CEO, Zementis reduces the time it takes to implement and deploy operational data-mining models. Supported in all major commercial and open-source data-mining tools, PMML also extends toData—its growing volume, velocity, and business intelligence, database platforms, and Hadoop.variety—is driving the rapid adoption of data Predictive Analyticsmining and predictive analytics across all Predictive analytics employs a variety of statistical techniquesindustries. While collecting and using data to analyze current and historical data to predict future events.to make better decisions and understand When faced with large data volumes that may elicit complexcustomer behavior has historically been structures and dependencies, it allows an organization to embed real-time intelligent decisions into many mission-complex and expensive, it is becoming more critical business processes.standardized and affordable as the marketmatures. The Predictive Model Markup Predictive solutions have historically been very specializedLanguage standard, or PMML, is one of and costly to implement. Used frequently for credit-card fraud detection, for example, predictive models can identify unusualthe key reasons. patterns (such as an unusually large amount charged in a foreign city), deny the charge, and recommend a follow-upPMML is an XML-based language used to define data mining call to the merchant.and statistical models. Just as HTML is the standard languagefor the Web, and SQL is the standard for databases, PMML Until recently, the application of predictive analytics onlyprovides one common framework to address data mining and made sense in scenarios where the potential damages werestatistical analysis, making it easy to transfer models between big enough to justify the investment in related systems andsystems and solutions from different vendors. PMML also processes. However, as Big Data brings about a huge increase in the amount and use of data, interest, applications, and implementations are growing. Driven by lower cost of data storage and processing, combined with a standards-based solution stack, the total cost of ownership for predictiveDriven by lower cost of data storage and solutions is rapidly decreasing. As this trend continues, itprocessing, combined with a standards- will open new opportunities for applications that optimize business processes through smarter decisions. We arebased solution stack, the total cost of only at the beginning of this revolution.ownership for predictive solutions is The Importance of Standardsrapidly decreasing. The Data Mining Group (DMG), an independent, vendor-led consortium, first released the PMML standard in 1999.48 Big Data Analytics Guide
    • Having the capability to execute predictive models within existing infrastructure helps organizations quickly capitalize on opportunity while minimizing risk and cost.Companies have been using data and business intelligence IN-DATABASE PREDICTIVE ANALYTICS: PMML IN ACTIONapplications to make better decisions for well over a decade. Without a common standard, there is often a disconnect between aSo why is PMML suddenly interesting? Why should you scientist’s completed predictive analytics model and its intended use caseembrace it? in the business context. The time required to move predictive models from a development environment to operational deployment can result in costlyThe industry is rapidly transitioning from a stage in which and frustrating delays, forcing some business decisions into limbo untileach company developed its own custom solution or was the transition is complete.content with a single vendor to a point where a standardized With the PMML standard, predictive models can instantly be deployedframework is increasingly considered a best practice, and directly inside a database platform. No custom code or manual transi-vendor-independent, best-of-breed solutions are required to tion is required. PMML enables all major commercial and open-sourcestay competitive. In recent years there has been a surge in data-mining tools to export models as a standard XML file, which can beinterest in predictive analysis, and as more organizations use efficiently executed inside the database on large data volumes.it, there is a greater need to adopt the standard. The advantages of in-database predictive analytics based on the PMML standard include:PMML provides a common process that results in an immedi-ately deployable predictive model. Through the standard, • Direct integration of advanced analytical algorithms foreveryone can speak the same language across the enterprise, high-performance scoringexternal service providers, partners, and vendors. There is nomore worrying about custom code or incompatible formats. • Minimization of data movement to enable efficient processing of very large data setsDocumentation of complex statistical models becomes mucheasier, which is an additional benefit to industries subject to • Instant execution of predictive models from all major commercial andregulatory requirements. open-source data-mining toolsThat is especially important given today’s typical multivendor, • Lower total cost of ownership from streamlined, vendor-neutral, platform-independent data mining processescross-platform data center environment. Database administra-tors need to leverage existing architecture, skill sets, and soft-ware and hardware from different vendors, and they need todeploy analytic models across a heterogeneous infrastructure.PMML shines in that regard: alleviating various friction points,not only allowing models to easily move between different ITsystems but also facilitating the communication betweenvarious project stakeholders. Having the capability to executepredictive models within existing infrastructure helps organiza-tions quickly capitalize on opportunity while minimizing riskand cost. This is where the PMML standard makes a big differ-ence in accelerating time to market and time to value.Big Data Analytics Guide 49
    • Big Data Opportunities news is that the tools are already available to turn this dataData is poised to become an organization’s new competitive into action right inside your existing database infrastructure.advantage. Rather than treating data as an afterthought, it isimportant to recognize the value it can provide. Create aware- The closer to “real-time” that a business scores customersness in an organization by first systematically capturing for cross-sell or upsell recommendations, the more accurateessential data and then consistently analyzing stored data to and valuable such recommendations will be. For example,identify patterns and other knowledge it may contain. Data say a customer goes to a Web site and looks at a DVD player one week. The next week, the same customer returns to buy diapers. Real-time scoring will help that Web site identifyThe industry is awakening to the the customer, recommend the right products at the right time, and take into account new information that was nottremendous benefits of a common part of the customer’s profile before, all while balancinglanguage and process. Once businesses the recommendation with underlying business goals to maximize revenue. This is how predictive analysis leveragessee predictive analysis in action, they data in real-time to identify new trends and deliver betterimmediately recognize its value. customer experiences. More companies are taking advantage of the opportunity to automate processes and become more efficient. The industrycan tell you how to optimize business processes and make is awakening to the tremendous benefits of a commonmore informed, timely decisions. Predictive analytics uses language and process. Once businesses see predictive anal-algorithms that can “learn” and detect complex patterns in ysis in action, they immediately recognize its value. Mostdata that a human may never see, uncovering hidden value important, the operational deployment and integration ofthat would have gone undiscovered otherwise. predictive analytics, which used to be a monumental task, is rapidly becoming easier and more affordable, thanks in part toWith predictive analytics, many day-to-day decisions can be the PMML standard. nfully automated. Rather than creating more reports that stillrequire the business user to review and decide, a more intelli-gent system minimizes manual tasks, allowing the executive Michael Zeller, Ph.D., is the CEO andto focus on the important decisions that truly require human co-founder of Zementis, a softwareintervention. Smarter decisions make for better customer company focused on predictive analyticsexperiences, because systems can remedy problems before and advanced enterprise decision-they occur or recommend the right products at the right time. management technology. He hasThis can be seen as the next logical step in the evolution of extensive experience in strategicbusiness intelligence. Organizations have the data—often technology implementation, businessmore data than they know what to do with—and the good process improvement, and system integration. Previously he served as CEO of OTW Software and director of engineering for an aerospace firm.50 Big Data Analytics Guide
    • RECENT ADVANCES IN ANALYTICS APPLICATIONS DELIVER BETTER PERFORMANCE IN CRUNCHINGMASSIVE DATA SETS.How Modern Analytics “R” DoneBy Jeff Erhardt, Chief Operations Officer, Legacy analytic tools were designed to run on legacy hard-Revolution Analytics ware. Now, newer techniques enable companies to substitute multiple commodity servers for expensive legacy processors. These “commodity cores” are much less expensive to acquire and operate than single “superprocessors.” The newer multi-For companies across a broad spectrum of processor methodology is faster, more flexible, and more inindustries data represents a new form of step with today’s real-world technology architectures than traditional solutions.capital. Not surprisingly, only a fraction of thevaluable data available is ever put to use New Technologies for a New Erabecause most of the tools built to analyze The era of legacy analytic tools is ending, and a new era islarge amounts of data are slow, expensive, and beginning. This new era offers solutions that are faster, more cost-effective, more user friendly, and more extensible. Theseold. Moreover, they were designed for use modern analytic technologies can handle very large volumesalmost exclusively by specialists with of data at very high speeds. Processes that used to take daysadvanced degrees in statistical analysis. to perform can now be accomplished in minutes. While most organizations have invested heavily in first- generation analytics applications, recent advances in second-generation “Big Analytics” platforms haveModern analytic technologies can handle improved both analytical and organizational performance.very large volumes of data at very high Big Analytics platforms are optimized for Big Data and utilize today’s massive data sets, thanks to recent performancespeeds. Processes that used to take days advancements coupled with innovative analytic techniques.to perform can now be accomplished In addition to the analytic routines themselves, data visualization techniques and the way the analytics arein minutes. executed on various hardware platforms have drastically improved and increased capabilities.Big Data Analytics Guide 51
    • Welcome to the World of R The second driver is the fact that applying predictive modelsThe newer, faster, and more powerful technologies that make to data is no longer a “secret art.” In universities and collegesit possible to find needles of insight in haystacks of data are worldwide, a new generation of data analysts has been trainedbased on an open-source programming language called R. in the analytic methods that offer competitive advantage. And the training tool of choice for the majority of those students isWith more than 2 million users, R has become the de facto the R language.standard platform for statistical analysis in the academic,scientific, and analytic communities. If you are part of the data Finally, the economic opportunity is unmistakable: The marketmanagement team at a large global organization, chances are for data management and analytic technologies currentlygood that you are already developing programs using R. generates about $100 billion and is growing at a pace of 10% annually. The market leaders in data analysis software todayThe adoption of R as the lingua franca of analytic statistics are based on decades-old technology unable to meet currentis creating a deep pool of fresh talent. Among students, demands for analysis of huge data sets within an easy-to-usescientists, programmers, and data managers, R is the user interface.accepted standard. R represents both the present and thefuture of statistical analytics. Benefits and Challenges Open-source software development models offer manyUnlike other programming languages used to crunch large benefits—and pose many challenges. The benefits includedata sets, R is not inextricably tied to any single proprietary faster development cycles and lower development costs;system or solution. Because the R programming language is the challenges include lack of controls, lack of clear account-an open-source project, it evolves continually through the ability, and lack of support.contributions of a global community. For many businesses, especially those operating in complexThe Trend Toward Adopting R or highly regulated markets, open-source software can beA “perfect storm” of events is now pushing R beyond its orig- impractical or threatening. The commercial potential of R,inal core audience of students, scientists, and quantitative however, has led to a surge of interest in developinganalysts, and transforming the analytics industry. Two condi- enhanced “enterprise-grade” versions of R software.tions are driving this widespread adoption. These newer applications address the key issues that have prevented R from realizing its full potential as aThe first driver is the data deluge, and the consensus that the mainstream enterprise technology.companies that most effectively gain insight and predictionsfrom their data will have a competitive edge. The two primary obstacles facing many R users today involve capacity and performance. For example, most R software cannot currently handle the kind of enormous The market for data management and analytic technologies currently generates about $100 billion and is growing at a pace of 10% annually.52 Big Data Analytics Guide
    • data sets that are generated routinely by large multichannel R You Ready?retailers, consumer packaged good marketers, pharmaceu- The R revolution is just beginning. As it spreads, it will becometical companies, global finance organizations, and national common practice for business leaders to rely on knowledgegovernment agencies.The capacity of R-based solutions is limited by therequirement that all the data has to fit in memory in order Analytics performance can be improvedto be processed. The algorithms simply won’t scale to accom- dramatically by distributing the workmodate Big Data. This capacity limitation then forces analyststo use smaller samples of data, which can lead to inaccurate across a network of computers, reducingor suboptimal results. processing time from hours to minutesThe second issue involves the inability of many R applications or mere seconds.to read data quickly from files or other sources. Speed iscritical in all areas of modern life, and it seems unreasonableto wait weeks or months for a computer to crunch through generated through rigorous numerical analysis of large datalarger sets of data. sets. Fact-based decision-making will become the norm instead of the exception. nAlthough some software packages claim to address theseissues, what’s usually missing is an overarching framework foranalyzing Big Data easily and efficiently. Typically, analysts Jeff Erhardt, Chief Operations Officerfind themselves struggling with a collection of software tools at Revolution Analytics, is an executivethat can create more problems than they solve. with extensive and diverse experience at Fortune 400 companies in technology,This capacity problem can be overcome by using an external operations, finance, strategy, and M&A.memory framework that enables extremely fast chunking of He began his career at Advanced Microdata from large data sets, which typically include billions of Devices where he was responsible for therows and thousands of columns. But even the fastest data development and commercialization of leading-edgeprocessing can take hours if it is performed sequentially. semiconductor devices. Erhardt graduated with a B.S. inOvercoming this performance obstacle requires the capability Engineering, Cum Laude, from Cornell University, and anto distribute computations automatically among multiple M.B.A. with honors from The Wharton School.cores and multiple computers through the use of parallelexternal memory algorithms.For example, a computer with four cores can perform analyticcalculations very quickly because one core reads the datawhile the other three cores process the data. Performance canbe improved even more dramatically by distributing the workacross a network of computers, reducing processing timefrom hours to minutes or mere seconds.Big Data Analytics Guide 53
    • MOBILE PROVIDERS ARE TURNING TO ADVANCED ANALYTICS TO CONTROL COSTS AND ENSURE QUALITY OFSERVICE IN THE FACE OF VORACIOUS CUSTOMER DEMAND.Navigating a 4G WorldBy Greg Dunn, Vice President, Sybase 365 In addition, these highly intricate networks will increase the operational costs while competitive pressures continue to decrease average revenue per user (ARPU) for service providers. With these factors in place operators will beIn a recent survey of 100 mobile service scrambling to employ high levels of automation within theirproviders by Heavy Reading, almost half the networks to offset the rising operational costs.respondents (46%) expected a fifth of their One of the most effective tools for bolstering existing networkcustomers (20%) to be using a 4G LTE device management systems will be the use of advanced analytics.by 2014. Three-quarters of the respondents Used strategically, advanced analytics can give operatorsexpected that many of their subscribers to be the ability to more accurately plan network capacity while streamlining resource optimization—both vital to effectivelyon 4G LTE by 2016. Clearly, mobile service managing operating expenses.providers see a strong trend toward 4G intheir industry. In the 4G LTE era, advanced analytics will be critical for service providers for identifying and making informed deci-During that time, the 80% of their users with older devices will sions in near-real-time about subscribers’ usage, and actingrequire providers to keep managing multistandard, multiband, to offset anomalies or exploit opportunities. Opportunity andand even multimode networks. This requirement, combined technology are converging to transform the telecommunica-with growth in subscribers and bandwidth use, makes tions industry, and Big Data, with its attendant advances inmanaging 4G LTE networks inordinately complex. That analytics, is driving the change. Nowhere is the impact ofcomplexity can threaten a network’s quality-of-service levels. Big Data greater than among telecommunications because nowhere is Big Data bigger than in this industry—a status that will increase by orders of magnitude with the ongoing rollout of 4G LTE.The shift to 4G LTE networks will be an Opportunity and Challengeobvious boon to mobile customers. And To accommodate the needs of smartphone users, wirelessit has the potential to create new revenue carriers are rapidly making the move from a voice-oriented, circuit-switched design to a full-fledged, IP-based data-opportunities for network operators. oriented architecture. The data-consumption habits of these users are pushing the old 3G architectures to the breaking54 Big Data Analytics Guide
    • In the 4G LTE era, advanced analytics will be critical for service providers to identify and make informed decisions in near-real-time.point. One study of 42 countries—encompassing nearly three- CASE STUDY: BIG SAVINGS WITH BIG DATAquarters of the world’s mobile subscribers—shows that some Facing a flood of Big Data, a major wireless telecommunications carrierlocales have surpassed 50% smartphone penetration, and in the EMEA region reviewed its options and adopted a purpose-builtanother 20 nations already have 30% of mobile subscribers analytics engine.using smartphones. Company policies dictated that the telco retain six months of data capableThe telcos have virtually no choice but to embrace 4G LTE, of being analyzed through hundreds of standardized reports, as well asbecause its higher performance is essential to sustaining countless daily ad hoc queries. Its analytics data warehouse would contain more than 600 TB of uncompressed data on average. With the indexes andbusiness growth. ARPU is on the decline for 3G voice and summaries of the raw data, the total amounted to more than a petabyte.short message service (SMS), while revenue from data use by But the analytics engine’s columnar approach to storing informationsmartphone users is on the rise. This pressures carriers to dramatically reduced the size of data to be stored, to just under 105 TB,deploy 4G LTE as fast as possible to remain competitive. delivering immediate and dramatic savings. On average, 400 users access 95 TB of information daily. At least 10 billionOptimizing Operations of rows of data are processed every day, and 60 data streams pourRolling out a 4G LTE system is not only expensive to implement, information into the data center from the carrier’s network infrastructure.it is complex to manage. One promising solution is to create Within 30 minutes this binary deluge goes through an extract, transform,multimode networks using much more complex—but much and load (ETL) process, making it ready for real-time queries.more effective—small cell stations. This type of environmenthas greater potential for service interruptions than a “pure” As with any large wireless carrier, innumerable changes to the data occur every second of every day. The analytics solution can immediatelyarchitecture. Savvy mobile providers are using intelligent isolate and flag problems and inconsistencies in the network to assure,tools to forecast everything from anticipated traffic loads for example, that every call, message or data transfer is chargedto catastrophic equipment failures. Advanced analytics appropriately. In this area alone, the company estimates it savessoftware, once considered primarily a front-office investment, millions of dollars a year.is becoming an operational necessity. IDC estimates thatcarriers will get a 277% ROI boost in operational efficiencyfrom applying analytics to operations.Telcos are taking advantage of the latest in analyticstechnology—such as in-database analysis, MAPReduce,Hadoop, columnar architectures, and event streamprocessing (ESP). With these capabilities, sensor andmachine data are analyzed in real-time conditions.Big Data Analytics Guide 55
    • Figure 1. Higher performance 4G LTE networks are operationally more complex to runthan previous generation mobile networks.Here is another art item that will go with the Telco article. No author is assigned just yet.It will be in Ch 4/Analytic Techniques.Higher Performance networks NetworksHigher performance 4G LTE 4G LTE Network Data Center Packet core Service App server Virtualized Network control and data plane enablers environment environment Access/packet backhaul Mobile packet core GRX Multimode RNC Data center Base Base Service enablers App servers transceiver station Macro station controller Domain Authentication, cells Serving GPRS name authorization, service node system accounting Micro LTE Signaling Gateway GPRS cells eNodeB gateway support node Packet Data Content Network delivery Pico Gateway network cells Home Policy Cloud servers LTE subscriber and Mobility WiFi eNodeB server charging management APs entity function Internet Roaming offload Policy serversSource: IDCFigure 1. Higher performance 4G LTE networks are operationally more complex to run than previous generationmobile networks.For example, using ESP lets carriers process multiple In fact, the era of Big Data is an opportunity for carriersstreams of data in real time, while filtering the data with to exploit competitive advantages. The knowledge withincustom business logic. This offers continuous insight it can be used to develop new revenue-generating services,with millisecond latency on hundreds of thousands of identify unnecessary costs, improve operations, predictevents per second. Alerts can be created when conditions subscriber activity, and more. By taking control ofwarrant, and automated responses can be applied to Big Data, telecommunications operators will gain apredefined situations. firmer grip on their market and their future. And that is a very big deal. nBig Data Is No Big DealThere has been much handwringing about how Big Datais overwhelming some organizations. But it does not have Greg Dunn manages the global productto be so. management group leading the efforts for hosted and business solutions focusingEven telecommunication carriers—awash in data storms on B2C services within the mCommerce,arising from new technologies like 4G LTE or emerging telco, and enterprise related verticalsregulations—have little to fear from Big Data deluges. With for SAP He is specifically responsible .a purpose-built analytics engine, Big Data is truly no big for driving roadmaps, strategy, anddeal. It is merely the ongoing business environment, albeit product delivery to position SAP as a leader in thea challenging one. mobility sector.56 Big Data Analytics Guide
    • EVERY DEPARTMENT AND EMPLOYEE CAN BENEFIT FROM THE INSIGHTS GENERATED BY ANALYZINGBIG DATA, AND SOLUTIONS ARE AVAILABLE TO MAKE THOSE INSIGHTS PREVALENT THROUGHOUTTHE ENTERPRISE.Increasing the IQ of Everyone with AnalyticsBy Jürgen Hirsch, CEO, Qyte GmbH In contrast, mass-data-querying options are rare for most companies. They lack high security enterprise solutions that protect customer data and processes against unauthorized access and ensure data integrity.In our personal lives, we take for granted the This void, in turn, limits employee potential andability to query mass data via the Web and capabilities, as employees miss crucial information that could help them in their jobs.receive results on-screen within seconds.Appointments and contacts are stored on This challenge is especially notable in the area oflaptops and mobile devices and in the cloud. data analysis and business intelligence applications.Social contacts and news feeds, as well as Here, a process has been established in which a user specifies the analysis requirements for the IT departmentmedia data ranging to hundreds of gigabytes and—after an unspecific waiting period—receives aare available on several devices for permanent report, which is often nothing like what was originallyuse. Engaging with this vast amount of data requested. The requested data analysis can beis a daily routine. unrecognizable and not helpful due to analytic tools that have strict rule restrictions and are inflexible. Gaining Access to Data Why do expert users, who know and monitor relevant processes and methods, not get directThis challenge is especially notablein the area of data analysis andbusiness intelligence applications.Big Data Analytics Guide 57
    • access to the data that they know and are in a position of increasing performance by adding more hardware.to evaluate? The systems are further normalized in the database design (one table becomes X tables that are interconnected).A number of common reasons and objections are often citedfor not developing and supporting easier, more direct access In the second approach, data is aggregated. Instead of a singleto data. set, sums are built for product groups and time ranges, which reduces the amount of data for the query but also reduces1. The performance of the data warehouse systems is at the amount of information returned. risk of breaking down if too many users run ad hoc queries across corporate data. Existing data warehouse solutions As a result, column-oriented database systems are strongly offer only short periods of time for direct queries; profes- recommended. These systems store immense numbers of sional analyses require the creation of a data mart first. single data entries cost-efficiently and offer them for further deployment at very high performance. Data marts or data2. Nobody actually understands the data structure of aggregates are thus made redundant. Access is controlled Big Data instances, and nobody is able to define via user rights and roles, and giving most users read-only correct queries. access to this data avoids manipulation. Proven and fast replication systems help to keep the data pools up-to-date.3. Most users do not have the know-how it takes to run such queries on their own. Regarding the second obstacle, unwieldy data structures and inexact queries, one way to increase performance inWhile all of these objections are valid, how can these the DBMS is to transfer the data into as high a normalobstacles be overcome so that the business can benefit form as possible. This method reduces redundantfrom its data? information to a single entry. Yet it also reduces clarity and comprehensibility of table models. Often the personWith regard to the first obstacle, the data warehouse market is who requested the information will not even recognize thestill dominated by traditional relational database management resulting result, as the DB Optimizer has transformed itsystems (DBMSes). These have never been designed for into 15 separate tables with eight different connector keys.fast responses to ad hoc queries on mass data. To fulfill the Such proceedings are no longer necessary for column-basedincreasing analytics demands, numerous IT managers opt for DBMSes. Columnar databases store redundant informationone of two paths. In one approach, the DBMSes are upgraded only once within one column and offer all other informationup to the limits of what is feasible and possible—with the hope via an appropriate index. Thereby, tables and views can Column-oriented database systems are strongly recommended. They can store immense numbers of single data entries cost-efficiently and can offer them for further deployment at very high performance.58 Big Data Analytics Guide
    • be processed in such a way that the end user can still available, RayQ can connect to HANA and SAP Sybase IQrecognize and use the data. at the same time and use the advantages from bothAs for the third obstacle, workers who lack queryingexpertise, this can be described best with the saying,“When the only tool you have is a hammer, every problem Until now, the use of data for analyticlooks like a nail.” Until now, the use of data for analytic purposes was determined by the limitedpurposes was determined by the limited access andcapabilities of the tools. With intelligent, robust tools access and capabilities of the tools.available, more users can take on these queries and be With intelligent, robust tools available,successful in finding answers. more users can take on these queriesQuerying Power to the People: A Case Study and be successful in finding answers.Enterprise-wide access to data is gaining momentum.One, early supporter, a large German health insurer,wanted to make data available to its end-users foranalytical purposes. This data was stored in an Oracle systems simultaneously and make the data availabledatabase and the access was granted via Discoverer. to end-users.As soon as end-users finished creating scripts withtheir query tool, their queries vanished into a query It is definitely possible to grant a broad group of expertbatch—and few returned. users direct access to mass data and make the data available for professional analysis. It simply requiresTo get access to more analytic features, the health the application of suitable technologies and tools,insurer deployed Qyte’s analytics tool RayQ and which are available and proven in numerous tests. nconnected it to the Oracle database. To further improvethe query performance, Qyte and the customer jointlyimplemented SAP Sybase IQ and fed it with the source Jürgen Hirsch is CEO of Qyte GmbH. He has followeddata. Response times have been reduced to a fraction of an entrepreneurial path since the age of 17. In additionthe previous length, even though the number of users to managing the company, he is responsible for strategichas grown considerably. sales and partner management. He coordinates projects to open up new topics and application fields for theFurther improvements are possible by integrating product RayQ. For more than 10 years, Hirsch hasSAP HANA. Since more and more customer data is worked in fraud detection using data analytics, focusinggenerated in SAP systems, a direct replication of the on fraud prevention in healthcare and stock trading.data could be realized in SAP Sybase IQ via PBS/NLS.As more data from an SAP HANA environment isBig Data Analytics Guide 59
    • Market DataThe Big Deal with Big DataThe volume, velocity, and variety of data coursinginto organizations today are continually increasing.Organizations must find a way to ride the Big Data waveor risk being pulled under water.Today, organizations of all types and sizes are While there are many definitions of Big Data, it’s generallyinundated with data from various internal and agreed that Big Data comprises enormous data sets and the technologies that are now available to help organizationsexternal sources, from transactional data to successfully deal with and use the data deluge. What’s clearunstructured data from social media and is that companies need to come up with a plan to manage,other sources. Organizations can struggle to store, and take advantage of the potential benefits ofget ahead of—or out from under—the Big Data.increasing piles of data flooding into their The good news is that the vast majority of organizations arebusinesses, or they can leverage the data to at least exploring their Big Data options, according to find-gain competitive advantage, to fight fraud, to ings from a survey of 154 C-suite executives at multinationalease regulatory compliance, or to boost companies performed online in the United States in April 2012 by Harris Interactive® on behalf of Bite Communications andoperational efficiencies. its client, SAP. In general, organizations see the opportunities Big Data presents, as opposed to seeing only the challenges ofWhat’s clear is that companies need wrestling with huge amounts of data, and most respondents identified an array of competitive and business benefits ofto come up with a plan to manage, store, successfully managing and using Big Data. Hopefully, thisand take advantage of the potential market data will provide useful insight for your upcoming plans for Big Data. ■benefits of Big Data.60 Big Data Analytics Guide
    • Figure MR_O1_Q705. Which definition of big data most closely identifies your company’s definition?Big Data DefinitionsBig Data DefinitionsMassive growth of transaction data, Requirement to store andincluding data from customers archive data for regulatoryand the supply chain and compliance28% 19% Explosion ofNew technologies new data sourcesdesigned to address (social media,the volume, variety, and mobile device, andvelocity challenges machine-generatedof Big Data devices)24% 18% Some other definition 11%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Defining Data: About a quarter of C-suite executives say their company believes Big Data hasFigurewith the growth of Do you view big data as more of a challenge, or more of anto do MR_O2_Q711. Q711. transaction data. Another quarter of top-level executives say theiropportunity for your company?organization defines Big Data as the technologies created to address volume, variety, andvelocity challenges Big Data presents.Big Data—Challenge or OpportunityBig Data–Challenge or Opportunity More of an opportunity 76%More of a challenge24%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Figure MR_O3_Q716. What typeA vast majority (76%) of C-suite executives believe that Big DataChallenge vs. Opportunity: of data sets, either social/external or existing,does your opportunities for their companies, while only a quarter see Big Data as creatingpresents company prioritize?challenges.Data Set Prioritizations (Company)Data Set Prioritizations (Company)Prioritize social/externaldata sets 27%Prioritize existingdata sets 73%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Existing Data Sets: Nearly three-fourths (73%) of C-suite executives said their organizationsprioritize existing data sets, while only about one-quarter felt their companies put moreimportance on social and external data sets.Big Data Analytics Guide 61
    • Figure MR_O4_Q716. Q721. What type of data sets, either social/external or existing,do you personally prioritize?Data Set Prioritizations (Personal Preference)Data Set Prioritizations (Personal Preference)Prioritize social/externaldata sets 30%Prioritize existingdata sets 70%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Focusing on What’s Within: A majoritysolution of C-suitecompany use to store prioritizeFigure MR_O5_Q730. What infrastructure (70%) does your executives personallyand manage its big data?existing data sets over social and external data sets, like their companies. Only three in 10 saythey personally prioritize social and external data sets.Infrastructure Solutions forand Managing BigManaging Big DataInfrastructure Solutions for Storing Storing and DataPrivate cloud or off-premise Public cloudserver farms used only (Rackspace,by my company Amazon)27% 11% Data warehouseOther (storage equipment,3% data localization) 33%A hybrid of data warehouseand cloud technology26%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Looking Up to the Clouds: A majority of companies have adopted the cloud in some form.More than one-third (38%) of C-suite executives report that their company solely uses cloudtechnology to store and manage Big Data, 27% say their firm uses private cloud or off-premiseserver farms, and 11% use a public cloud.62 Big Data Analytics Guide
    • Figure MR_O6_Q736. What percentage of the model you deploy for big data iscomprised of data warehouse vs. cloud?Deployed Big Big Data ModelsDeployed Data Models Private cloud 24.4%Data warehouse53% Public cloud 22.6%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Figure MR_O7_Q740. Among C-levelyour business, if any, do you see receiving a hybrid of dataBig Data Storage: Which areas of executives who say their company usessignificant growth/benefits from the utilization of big data?warehouse and cloud technology to store Big Data, data warehousing comprises 53% of allPlease select all that apply.hybrid solutions, on average.BigData Benefits and GrowthGrowth AreasBig Data Benefits and Areas Information Technology (IT)/MIS 58% Sales 57% Marketing 54% Customer service 54% Production/Operations 46% Research and development 43% Finance/Billing 37% Distribution/ Warehousing/Shipping and receiving 37% Administration 33% Human Resources 31% Advertising/Public relations 24% Facilities 17% Other 3% None 6%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Big Data, Big Benefits: Nearly all (94%) of C-level executives were able to identify businessareas that would benefit from Big Data. More than half of those surveyed said that informationtechnology (IT) MIS, sales, marketing, and customer service were areas that would benefitfrom utilizing Big Data.Big Data Analytics Guide 63
    • Figure MR_O8_Q745. Which of the following competitive advantages, if any, do you anticipate your company would gain by utilizing big data? Please select all that apply. Competitive Advantages of Big Data Competitive Advantages of Big Data Improving efficiency in business operations 59% Increased sales 54% Lowering IT costs 50% Increasing business agility 48% Attracting and retaining customers 46% Ensuring compliance 41% Increased savings and cutting of spending 39% Increased brand exposure 36% Lowering risk 34% Introducing new products/services 32% Developing new channels to market 29% Outsourcing of non-core functions 27% Mergers, acquisitions, and divestitures 21% Expanding partner ecosystem 20% Other 1% None 7% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012 Gaining a Competitive Edge: The vast majority (93%) of C-suite executives surveyed was able to identify areas in which their company could potentially gain competitive advantage Figure MR_O9_Q750. When wouldanticipatedyour company to see a return by using Big Data. The top five you expect areas include improving business operations on big data investments? efficiencies, boosting sales, lowering IT costs, increasing business agility, and attracting and retaining customers. BigData ROI Timeline Big Data ROI Timeline Within a year (Net) 70% More than one year Within three months 17% 19% Within three to six months 12% Not sure Within six months to a year 11% 41% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012 Rapid Returns: About seven in 10 of C-level executives surveyed anticipate their organization will see a return on their Big Data investments within one year.64 Big Data Analytics Guide
    • Figure MR_10_Q755. What is the total amount your company has spent/plans tospend on investing in a solution to store and manage its big data?Big Data InvestmentsBig Data Investments15%12% 9% 6% 3% 0% Less $10K $25K $50K $100K $250K $500K $750K $1,000K $2,500K $5,000K Not than to to to to to to to to to or more sure $10K $24.9K $49.9K $99.9K $249.9K $499.9K $749.9K $999.9K $2,499.9K $4,999.9K 9% 14% 12% 4% 7% 10% 11% 6% 5% 8% 8% 6%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Investing Big: More than half of C-level executives (53%) say their company has spentFigure MR_11_Q760. What is the amounton solutions to store and manage Big Data. Nearly aor plans to spend $100,000 or more your company typically spends onmonthly (23%) plans costs for storing and managing its big data? more on Big Dataquarter maintenance to spend or has spent at least $1 million ormanagement and storage.Maintenance Costs for Storing and Managing Big DataMaintenance Costs for Storing and Managing Big Data Less than $5,000 28% $5,000-$9,999 7% $10,000-$24,999 11% $25,000-$49,999 5% $50,000 or more 12% Not sure of monthly cost 37%Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012Monthly Maintenance: The median amount that C-suite executives say their company spendsto store and manage Big Data is $5,000 a month. More than one-fourth (28%) of surveyrespondents report that their firm spends at least $10,000 or more each month on Big Datamanagement and storage.Big Data Analytics Guide 65
    • Figure MR_12_Q765. What is the average size of data that your company manages in a typical big data project? If you are not sure, please give your best estimate. Average Size of Data Data Projects Projects Average Size of Data in Big in Big Data 500TB or less (Net) 80% 500TB 100TB or larger 16% 1TB or smaller (Sub-sub-sub-net) 20% 20% 10TB 1000TB 24% 10% 100TB 1PB or larger 20% 10% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012 Figure MR_13_Q770. How important is having instant access to data in mobile Big Data Projects: Collectively, more than half (56%) of C-suite executives report that Great BI/real-time analytics? the average size of their company’s Big Data projects are 100 TB or larger. Importance of Instant Access to Data Importance of Instant Access to Data in Mobile Business Intelligence and in Mobile Business Intelligence and Real-Time Analytics Real-Time Analytics At least somewhat important (Net) 90% Absolutely essential 16% Very important 41% Somewhat important 33% Not at all important 10% Slide 10 title: Approaches and Communications C-Suite Study, April 10–23, 2012 Source: Harris Interactive, Inc., J41673-Bite Methodologies Needing Support Quick and Easy Access: Nine in 10 of C-level executivesare the top limitiations having instant Cost and budget constraints, as well as increasing data volumes, surveyed consider companies are experiencing with respect to data management. access to data in mobile business intelligence and real-time analytics at least somewhat important. Database Approaches and Methodologies Needing Support Agile business intelligence methodologies 36%, 39%, 32% Row-based data structure 35%, 35%, 34% Columnar data structure 28%, 22%, 34% Massively parallel processing grid 16%, 15%, 16% MapReduce/Hadoop data structure 14%, 15%, 13% Companies managing 500-plus TB Kimball method or more of data are more likely to 7%, 9%, 5% have plans to support Agile BI as All respondents Inmon data warehouse well as MapReduce/Hadoop data principles 6%, 7%, 4% structure over the next year. Senior IT None of the above Mid-level IT 29%, 24,% 33% Source: IDG Research Services Database Needs: Both senior and mid-level IT management report a need to support agile business intelligence and row-based data structure approaches over the next 12 months.66 Big Data Analytics Guide
    • Figure MR_14_Slide 9 title: Data Management ChallengesCost and budget constraints, as well as increasing data volumes, are the top limitiationscompanies are experiencing with respect to data management.Data Management Challenges and Limitations Cost and budget constraints 54%, 55%, 53% Increasing data volumes 45%, 39%, 50% Integrating and managing siloed data and applications 33%, 33%, 32%Inadequate staffing for database management and maintenance 29%, 25%, 32% Scalability 29%, 26%, 31% Data redundancy 28%, 25%, 31% Too many tools/interfaces 28%, 28%, 27% Data quality 26%, 30%, 22% Complex to use or administer 26%, 26,% 26% Slow querying/reporting speed 25%, 18%, 31% Difficult to maintain 23%, 24%, 22% Data latency 17%, 16%, 18% Senior IT respondents are more Inability to support diverse data sources likely than mid-level IT to cite 15%, 20%, 9% an inability to support diverse Inability to handle complex queries data sources as a challenge. 8%, 8%, 8% Current database solution can’t load Additionally, data redundancy data fast enough 7%, 4%, 10% is more likely to be identified as a challenge at companies Inability to support enough concurrent users managing 500-plus TB of data 7%, 8%, 5% (47% vs. 25% among those All respondents Other managing less than 500 TB of data). 4%, 4%, 3% Senior IT None of the above Mid-level IT 7%, 5%, 8%Source: IDG Research ServicesData Management Headaches: Cost and budget constraints as well as increasing data volumes,are the top limitations companies are experiencing with respect to data management.Big Data Analytics Guide 67
    • Company IndexSeth Grimes founded Alta Plana Corporation in 1997 to deliver Dr. Brian Bandey is acknowledged as one of the leadingbusiness analytics strategy consulting and implementation experts on computer law and the international application ofservices with a focus on advanced analytics (business intelli- intellectual property law to computer and Internet program-gence, text mining, data visualization, analytical databases, ming technologies. His experience in the global computer lawcomplex event processing), as well as management, analysis, environment spans more than three decades. He is the authorand dissemination of governmental statistics. Via Alta Plana, of a definitive legal practitioners textbook, and his commen-Grimes consults, presents, writes, and trains, bridging the taries on contemporary IT legal issues are regularly publishedconcerns of end-users and solution providers. The company throughout the world.delivers fresh, insightful, and actionable perspectives oncritical challenges that face enterprises in today’s rapidly Visit drbandey.comevolving information technology market.Visit altaplana.com Cloudera, a leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. Cloudera’s Distribution including Apache Hadoop (CDH)BI Leader Consulting provides advisory services to user is a comprehensive, tested, stable, and widely deployedand vendor organizations in the areas of data warehousing, distribution of Hadoop in commercial and non-commercialbusiness intelligence, performance management, and environments. For the fastest path to reliably using this openbusiness analytics. Wayne Eckerson, the company principal, source technology in production for Big Data analyticsis a veteran thought leader in the business intelligence field and answering previously unanswerable, big questions,as well as a noted speaker, blogger, consultant, and author of organizations can subscribe to Cloudera Enterprise,several books and many in-depth reports. He also founded comprised of Cloudera Manager Software and ClouderaBI Leadership Forum, which promotes best practices Support. Cloudera also offers training and certification onand knowledge sharing among business intelligence Apache technologies as well as consulting services. As adirectors worldwide. top contributor to the Apache open source community and with tens of thousands of nodes under management acrossVisit bileader.com customers in financial services, government, telecommunica- tions, media, Web, advertising, retail, energy, bioinformatics, pharmaceuticals/healthcare, university research, oil and gas, and gaming, Cloudera has a depth of experience and commitment to sharing expertise. Visit cloudera.com68 Big Data Analytics Guide
    • Qyte GmbH was founded in 1999 as subsidiary of the Hirsch & Sachs GmbH. While Hirsch & Sachs GmbHFuzzy Logix, an analytics software and professional services provided high quality IT-Services, Qyte GmbH worked oncompany, provides a new generation of in-database analytic the programming of the data mining software RayQ.solutions that help companies make smarter decisions and The strategic aim was to transform from a pure IT-serviceimprove effectiveness and performance. Clients can embed company into a software house with consulting competenciesanalytics directly in their business processes, enterprise in all questions regarding data. In May 2004 the transforma-applications, mobile devices, and Web services using tion was successfully concluded and the operative businessin-database analytics that run inside the data warehouse. was re-structured. Since then, the activities of Qyte GmbH focus on the continuous development and the distributionVisit www.fuzzyl.com of leading-edge data mining and the business intelligence solution RayQ, as well as on the provisioning of services and consulting with regard to all aspects of our clients’ data. The company has established a network of reliable and highly competent partners, which help market the RayQ solution and enable Qyte GmbH to rely on a big competence pool when staffing project teams. Visit qyte.comKXEN is helping companies use predictive analytics tomake better decisions. Based on patented innovations, thecompany’s InfiniteInsight™ delivers orders with speed andagility to optimize every step in the customer lifecycle,including acquisition, cross-sell, up-sell, retention, and nextbest activity. Proven with more than 400 deployments at Revolution Analytics is a leading commercial provider ofcompanies such as Bank of America, Barclays, Wells Fargo, software and services based on the open source R project forLowe’s, Meredith Corporation, Rogers, and Vodafone, KXEN statistical computing. The company brings high performance,solutions deliver predictive power and infinite insight™. productivity, and enterprise readiness to R, the most powerfulKXEN is headquartered in San Francisco, Calif., with field statistics language in the world. The company’s flagshipoffices in the United States, Paris, and London. Revolution R Enterprise product is designed to meet the production needs of large organizations in industries suchVisit kxen.com as finance, life sciences, retail, manufacturing, and media. Used by more than 2 million analysts in academia and at cutting-edge companies such as Google, Bank of America, and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups, and offering free licenses of Revolution R Enterprise to everyone in academia. Visit revolutionanalytics.comBig Data Analytics Guide 69
    • Company Index Zementis is a software company focused on predictive analytics and advanced enterprise decision management technology. It combines science and software to createTDWI Research provides research and advice for business superior business and industrial solutions for clients.intelligence and data warehousing professionals worldwide. Scientific expertise includes statistical algorithms, machineTDWI Research focuses exclusively on business intelligence learning, neural networks, and intelligent systems. Zementisand data warehouse issues and teams up with industry scientists have a proven record in producing effectivethought leaders and practitioners to deliver both broad and predictive models to extract hidden patterns from a varietydeep understanding of the business and technical challenges of data types. This experience is complemented by thesurrounding the deployment and use of business intelligence product offering ADAPA®, a decision engine framework forand data warehousing solutions. TDWI Research offers real-time execution of predictive models and business rules.in-depth research reports, commentary, inquiry services,and topical conferences as well as strategic planning Visit zementis.comservices to user and vendor organizations.Visit TDWI.orgAcknowledgmentsEditorial TeamEditors: Lori Cleary, Becca Freed, Elke Peterson, BaySide MediaExecutive Producer: Don Marzetta, SAPCo-Producer: David Jonker, SAPGraphic Designer: Margaret Anderson, BaySide MediaDeveloped and produced with help from BaySide Media, 201 4th St., Ste 305, Oakland, CA 94607BaySideMedia.com70 Big Data Analytics Guide
    • www.sap.com/contactsapMaterial #2012/07(12/08) ©2012 SAP AG. All rights reserved.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAPBusinessObjects Explorer, StreamWork, SAP HANA, and other SAPproducts and services mentioned herein as well as their respective logosare trademarks or registered trademarks of SAP AG in Germany andother countries.Business Objects and the Business Objects logo, BusinessObjects,Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, andother Business Objects products and services mentioned herein as wellas their respective logos are trademarks or registered trademarks ofBusiness Objects Software Ltd. Business Objects is an SAP company.Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, andother Sybase products and services mentioned herein as well as theirrespective logos are trademarks or registered trademarks of Sybase Inc.Sybase is an SAP company.Crossgate, m@gic EDDY, B2B 360°, and B2B 360° Services are registeredtrademarks of Crossgate AG in Germany and other countries. Crossgateis an SAP company.All other product and service names mentioned are the trademarks oftheir respective companies. Data contained in this document servesinformational purposes only. National product specifications may vary.These materials are subject to change without notice. These materialsare provided by SAP AG and its affiliated companies (“SAP Group”)for informational purposes only, without representation or warranty ofany kind, and SAP Group shall not be liable for errors or omissions withrespect to the materials. The only warranties for SAP Group productsand services are those that are set forth in the express warrantystatements accompanying such products and services, if any. Nothingherein should be construed as constituting an additional warranty.These materials are subject to change without notice. These materialsare provided by SAP AG and its affiliated companies (“SAP Group”)for informational purposes only, without representation or warranty ofany kind, and SAP Group shall not be liable for errors or omissions withrespect to the materials. The only warranties for SAP Group productsand services are those that are set forth in the express warrantystatements accompanying such products and services, if any. Nothingherein should be construed as constituting an additional warranty.