• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
What is big data - Architectures and Practical Use Cases
 

What is big data - Architectures and Practical Use Cases

on

  • 1,354 views

Five ways to get started with big data using IBM's big data platform.

Five ways to get started with big data using IBM's big data platform.

Statistics

Views

Total Views
1,354
Views on SlideShare
1,354
Embed Views
0

Actions

Likes
2
Downloads
121
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    What is big data - Architectures and Practical Use Cases What is big data - Architectures and Practical Use Cases Presentation Transcript

    • What is big data?Architectures and Practical use casesTony Pearson – IBM Master Inventor and Senior Managing ConsultantMarch 2013 © 2013 IBM Corporation
    • “Data is the new Oil.”In its raw form, oil has little value. Once processed & refined, it helps power the world. “Big Data has arrived at Seton “At the World Economic “Increasingly, businesses are Health Care Family, fortunately Forum last month in Davos, applying analytics to social accompanied by an Switzerland, Big Data was a media such as Facebook and analytics tool that will help marquee topic. A report by the Twitter, as well as to product deal with the complexity of forum, “big data, big impact,” review websites, to try to more than two million declared data a new class of “understand where customers are, patient contacts a year…” economic asset, like what makes them tick and what currency or gold. they want”, says Deepak Advani, who heads IBM’s predictive analytics group.” “Companies are being inundated with data—from information on customer-buying “…now Watson is being put to habits to supply-chain efficiency. work digesting millions of But many managers struggle to The Oscar Senti-meter — a tool pages of research, make sense of the numbers.” developed by the L.A. Times, IBM incorporating the best clinical and the USC Annenberg practices and monitoring the Innovation Lab — analyzes outcomes to assist physicians in opinions about the Academy treating cancer patients.” Awards race shared in millions of public messages on Twitter.” “Data is the new oil.” Clive Humby2 © 2013 IBM Corporation
    • Big data is the analysis of information to identify trends, patterns andinsights to make better business decisions Cost efficiently Responding to the Collectively processing the increasing Velocity Analyzing the growing Volume broadening Variety 30 50x 35 Billion ZB RFID 80% of the sensors and worlds data is counting unstructured 2010 2020 Establishing the 1 in 3 business leaders don’t trust Veracity of big the information they use to make data sources decisions3 © 2013 IBM Corporation
    • Where is all this data coming from?4 © 2013 IBM Corporation
    • Why is big data such a hot topic? Because technology makes itpossible to analyze ALL available data Cost effectively manage and analyze all available data, in its native form – unstructured, structured, streaming Website Social Media Billing5 ERP Network Switches © 2013 IBM Corporation CRM RFID
    • Imagine the Possibilities of Harnessing your Data Resources Big data challenges exist in every business today Government cuts acoustic Utility avoids power Hospital analyzes streaming analysis from hours to 70 failures by analyzing 10 vitals to intervene 24 Milliseconds PB of data in minutes hours earlier Retailer reduces time to Stock Exchange cuts Telco analyses streaming run queries by 80% to queries from 26 hours to network data to reduce optimize inventory 2 minutes on 2 PB hardware costs by 90%6 © 2013 IBM Corporation
    • Impact every aspect of your business Know Everything about your Customer Analyze all sources of data to know your customers as individuals, from channel interactions to social media. Run Zero-latency Operations Analyze all available operational data and react in real-time to optimize processes. Reduce the cost of IT with new cost-effective technologies. Innovate New Products at Speed and Scale Capture all sources of feedback and analyze vast amounts of market and research data to drive innovation. Instant Awareness of Fraud and Risk Develop better fraud/risk models by analyzing all available data, and detect fraud in real-time with streaming transaction analysis. Exploit Instrumented Assets Monitor assets from real-time data feeds to predict and prevent maintenance issues and develop new products & services. 77 © 2013 IBM Corporation
    • In order to realize new opportunities, you need to think beyondtraditional sources of data Machine Data Transactional & Social Data Enterprise Application Data Content • Velocity • Volume • Variety • Variety • Semi-structured • Structured • Highly unstructured • Highly unstructured • Ingestion • Throughput • Veracity • Volume8 © 2013 IBM Corporation
    • Leveraging big data requires multiple platform capabilities Understand and navigate Federated Discovery and federated big data sources Navigation Manage & store huge Hadoop File System volume of any data MapReduce Structure and Data Warehousing control data Manage Stream Computing Streaming Data Analyze Unstructured Data Text Analytics Engine Integrate and govern Integration, Data Quality, Security, all data sources Lifecycle Management, MDM9 © 2013 IBM Corporation
    • Business-centric big data enables you to start with most critical businesspain and expand the foundation for future requirements Not just new technology— big data is a business strategy for capitalizing on information resources Getting started is crucial IBM Success at each entry Pur eData S big data point is accelerated by e Opti m platform products within the big data platform her y st sp ms e fo In Build the foundation for future requirements by expanding further into the big data platform10 © 2013 IBM Corporation
    • 1 – Federated Discovery and Navigation • Customer Need – Understand existing data sources – Expose the data within existing content management and file systems for new uses, without copying the data to a central location – Search and navigate big data from federated sources • Value Statement – Get up and running quickly and discover and retrieve relevant big data – Use big data sources in new information-centric applications • Customer examples – Proctor and Gamble – Connect employees with a 360° view of big data sources • Get started with: IBM Vivisimo Velocity11 © 2013 IBM Corporation
    • 2 – Analyze raw data• Customer Need – Ingest data as-is into Hadoop and derive insight from it – Process large volumes of diverse data within Hadoop – Combine insights with the data warehouse – Low-cost ad-hoc analysis with Hadoop to test new hypothesis• Value Statement – Gain new insights from a variety and combination of data sources – Overcome the prohibitively high cost of converting unstructured data sources to a structured format – Extend the value of the data warehouse by bringing in new types of data and driving new types of analysis – Experiment with analysis of different data combinations to modify the analytic models in the data warehouse• Customer examples – Financial Services Regulatory Org – managed additional data types and integrated with their existing data warehouse• Get started with: InfoSphere BigInsights 12 © 2013 IBM Corporation
    • 3 – Analyze streaming data • Customer Need – Harness and process streaming data sources – Select valuable data and insights to be stored for further processing – Quickly process and analyze perishable data, and take timely action Streaming Data Streams Computing Sources • Value Statement – Significantly reduced processing time and cost – process and then store what’s ACTION valuable – React in real-time to capture opportunities before they expire • Customer examples – Ufone – Telco Call Detail Record (CDR) analytics for customer churn prevention • Get started with: InfoSphere Streams13 © 2013 IBM Corporation
    • 4 – Simplify your data warehouse • Customer Need – Business users are hampered by the poor performance of analytics of a general-purpose enterprise warehouse – queries take hours to run – Enterprise data warehouse is encumbered by too much data for too many purposes – Need to ingest huge volumes of structured data and run multiple concurrent deep analytic queries against it – IT needs to reduce the cost of maintaining the data warehouse • Value Statement – Speed and Simplicity for deep analytics – 100s to 1000s users/second for operation analytics • Customer examples – Catalina Marketing – executing 10x the amount of predictive workloads with the same staff • Get started with: IBM PureData Systems14 © 2013 IBM Corporation
    • 5 – Reduce costs with Archive and Storage Tiering • Customer Need – Reduce the overall cost to maintain data in the warehouse – often its seldom used and kept ‘just in case’ – Lower costs as data grows within the data warehouse – Reduce expensive infrastructure used for processing and transformations • Value Statement – Support existing and new workloads on the most cost effective alternative, while preserving existing access and queries – Lower storage costs – Reduce processing costs by pushing processing onto commodity hardware and parallel processing • Customer examples – Financial Services Firm – archive production data to archive files on less expensive disk and tape storage, while maintaining access to data • Get started with: IBM Infosphere Optim, Infosphere Information Server, Master Data Management (MDM)15 © 2013 IBM Corporation
    • Entry points are accelerated by products within the big data platform Analytic Applications1 – FederatedDiscovery and BI / Exploration / Functional Industry Predictive Content BI / Reporting Visualization App App Analytics AnalyticsNavigation ReportingIBM Vivisimo IBM big data platform 4 – Simplify your data warehouse Visualization Application Systems & Discovery Development Management2 – Analyze raw data IBM Warehousewith Hadoop Solutions and PureData systemsInfoSphere AcceleratorsBigInsights Hadoop Stream Data System Computing Warehouse 5 – Reduce costs through archive and storage tiering3 – Analyze streamingdata InfoSphere Optim Information ServerInfoSphere Streams Information Integration & Governance MDM16 © 2013 IBM Corporation
    • There are many use cases for a big data platform Know Everything About your Customers Innovate new Products Speed • Social media customer sentiment analysis and Scale • Promotion optimization • Social Media - Product/brand Sentiment • Segmentation analysis • Customer profitability • Brand strategy • Click-stream analysis • Market analysis • CDR processing • RFID tracking & analysis • Multi-channel interaction analysis • Transaction analysis to create insight- • Loyalty program analytics based product/service offerings • Churn predictionRun Zero Latency Operations Instant Awareness of Risk• Smart Grid/meter management and Fraud• Distribution load forecasting • Multimodal surveillance• Sales reporting • Cyber security• Inventory & merchandising optimization • Fraud modeling & detection• Options trading • Risk modeling & management• ICU patient monitoring • Regulatory reporting• Disease surveillance• Transportation network optimization• Store performance• Environmental analysis Exploit Instrumented Assets• Experimental research • Network analytics • Asset management and predictive issue resolution • Website analytics • IT log analysis17 © 2013 IBM Corporation
    • Achieve Breakthrough Outcomes with big data capabilities Analyze any With Unique To Achieve Breakthrough big data Type Capabilities Outcomes Visualization Know Everything & Discovery about your Customers Machine Data Hadoop Run Zero-latency Operations Transactional / Data Application Data warehousing Innovate new products at Speed Stream and Scale Social Media Computing Data Instant Awareness Text Analytics of Fraud and Risk Enterprise Exploit Content Integration & Instrumented governance Assets18 © 2013 IBM Corporation
    • Example Configuration for big data IBM PureData System for Analytics19 © 2013 IBM Corporation
    • The IBM big data platform advantage Analytic Applications • The platform provides benefit as you BI / Exploration / Functional Industry Predictive Content move from an entry point to a second BI / Reporting Visualization App App Analytics Analytics Reporting and third project IBM big data platform • Shared components and integration Visualization Application Systems between systems lowers deployment & Discovery Development Management costs Accelerators • Key points of leverage • Reuse text analytics across streams Hadoop Stream Data and Hadoop System Computing Warehouse • HDFS connectors between Streams and Information Integration • Common integration, metadata and governance across all engines • Accelerators built across multiple Information Integration & Governance engines – common analytics, models, and visualization20 © 2013 IBM Corporation
    • 21 © 2013 IBM Corporation
    • IBM Redbooks Available! IBM Redbooks available on deployment of big data using IBM System x and InfoSphere software Reference Architecture for different sizes22 © 2013 IBM Corporation
    • Tony Pearson 9000 S. Rita Road Bldg 9032 Room 1238About the Speaker Master Inventor, Tucson, AZ 85744 Senior Managing Consultant Mr. Tony Pearson +1 520-799-4309 (Office) Master Inventor, IBM System Storage™ tpearson@us.ibm.com Senior Managing Consultant IBM System Storage Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products. Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1 most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through V. Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.23 © 2013 IBM Corporation
    • Additional Resources Email: tpearson@us.ibm.com Twitter: http://twitter.com/az99Øtony Blog: http://ibm.co/brAeZØ Books: http://www.lulu.com/spotlight/99Ø_tony IBM Expert Network: http://www.slideshare.net/az99Øtony 242424 © 2013 IBM Corporation
    • Trademarks and disclaimersAdobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. ITInfrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, IntelInside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or itssubsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, andthe Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Officeof Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Javaand all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment,Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBMCorp. and Quantum in the U.S. and other countries.Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs andperformance characteristics may vary by customer.Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute anendorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements andvendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questionson the capability of non-IBM products should be addressed to the supplier of those products.All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or deliveryschedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBMs currentinvestment and development activities as a good faith effort to help with our customers future planning.Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experiencewill vary depending upon considerations such as the amount of multiprogramming in the users job stream, the I/O configuration, the storage configuration, and the workload processed.Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBMrepresentative or Business Partner for the most current pricing in your geography.Photographs shown may be engineering prototypes. Changes may be incorporated in production models.© IBM Corporation 2013. All rights reserved.References in this document to IBM products or services do not imply that IBM intends to make them available in every country.Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on theWorld Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-0025 © 2013 IBM Corporation