Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IBM Big Data Analytics Concepts and Use Cases

0 views

Published on

Presented at IBM Systems Technical University 2015 Orlando

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

IBM Big Data Analytics Concepts and Use Cases

  1. 1. © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What Is Big Data? Architectures and Practical Use Cases Tony Pearson Master Inventor and Senior IT Specialist IBM Corporation
  2. 2. 2 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Abstract Do you understand the storage implications of big data analytics? This session will explain what big data is, provide some practical use cases, then explain the IBM products that support big data
  3. 3. 3 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. This week with Tony Pearson Day Time Topic Monday 10:15am Opening Session – Storage 01:45pm IBM's Cloud Storage Options Tuesday 11:30am Software Defined Storage -- Why? What? How? (repeats Friday) 03:15pm The Pendulum Swings Back – Understanding Converged and Hyperconverged Environments 04:30pm New Generation of Storage Tiering: Less Management Lower Cost and Increased Performance Wednesday 09:00am What Is Big Data? Architectures and Practical Use Cases 01:45pm Data Footprint Reduction – Understanding IBM Storage Efficiency Options 03:15pm IBM Spectrum Virtualize – SVC, Storwize and FlashSystem V9000 (repeats Friday) Thursday 10:15am IBM Spectrum Scale and Elastic Storage Offerings 01:45pm IBM Spectrum Scale for File and Object storage 03:15pm IBM Storage Integration with OpenStack 05:45pm Meet the Experts Friday 09:00am Software Defined Storage -- Why? What? How? 10:15am IBM Spectrum Virtualize – SVC, Storwize and FlashSystem V9000
  4. 4. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  5. 5. 5 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What is Big Data? Data sets so large and complex that it becomes difficult to process using relational databases The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization Analysis of a single large set of related data allows correlations to be found Can be used to identify trends, patterns and insights to make better decisions Source: Wikipedia
  6. 6. 6 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. OLAP cube Extract Transform Load (ETL) Strategic planning based on historical analysis and speculation Day-to-day operations based on reports, news, intuition Business Executives Make decisions 3 Traditional Decision Making Process Reports Batch Processing Transaction and Application data Database Administrators System of Record Gather data 1 Business Analysts Analyze 2
  7. 7. 7 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What has Changed in the Last Few Decades? 1986 2015 6% 99% Analog data Digital data Transaction and Application data Machine data Social media, email Enterprise content 20% Structured data 80% Unstructured data
  8. 8. 8 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. New Sources of Data to Analyze – the Four V’s of big data Volume – Scale of data has grown beyond relational database capabilities Variety – Machine data, enterprise content, and social media and email Velocity – Computing has advanced to receive and analyze real-time data streams Veracity – How much can you trust the data is right and accurate? Transaction and Application data Database Administrators System of Record System of Engagement System of Insight Machine Data, log data Social media, photos, audio, video, email Enterprise content Storage Administrators Gather and Identify sources of data 1
  9. 9. 9 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Data is the New Oil DATA is the new OIL In its raw form, oil has little value… Once processed and refined, it helps to power the world!
  10. 10. 10 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Structured, Repeatable, Linear OLAP cube Unstructured, Exploratory, Iterative New Capabilities to Analyze the Data Reports Visualization and Discovery Hadoop Data warehousing Stream Computing Integration and Governance Text Analytics Business Analyst Data Scientist Analyze data2
  11. 11. 11 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What does a Data Scientist do? “It’s no longer hard to find the answer to a given question; the hard part is finding the right question. And as questions evolve, we gain better insight into our ecosystem and our business.” -- Kevin Weil, Lead Analyst at Twitter A data scientist must have… – Strong business acumen – Modeling, statistics, analytics and math skills – Ability to communicate findings, tell a story from the data, to both business and IT leaders Inquisitive: exploring, doing “what if?” analyses, questioning existing assumptions and processes to spot trends, patterns and hidden insight. Computers are useless. They can only give you answers. – Pablo Picasso Source: http://www-01.ibm.com/software/data/infosphere/data-scientist/ http://blog.cloudera.com/blog/2010/09/twitter-analytics-lead-kevin-weil-and-a-presenter-at-hadoop-world-interviewed/
  12. 12. 12 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Data Information Knowledge Wisdom (DIKW) Wisdom Applied I better stop the car! Knowledge Context The traffic light I am driving towards has turned red Information Meaning South-facing light at corner of Pitt and George streets has turn red Data Raw červený 685 nm, 421 THz, #FF0000 http://legoviews.com/2013/04/06/put-knowledge-into-action-and-enhance-organisational-wisdom-lsp-and-dikw/
  13. 13. 13 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Better Decisions for New Business Outcomes Day-to-day operations based on real-time analytics Strategic planning based on science, trends, patterns and insight Know Everything about your Customers Innovate new products at Speed and Scale Instant Awareness of Fraud and Risk Exploit Instrumented Assets Run Zero-latency Operations Business Executive Make Decisions and Take Action 3 Empowered Employees
  14. 14. 14 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. statistical models Decision Making Process in the Era of big data Real-time Analytics Database Administrators System of Insight Strategic planning based on science, trends, patterns and insight Dashboard Storage Administrators Gather and Identify sources of data 1 Day-to-day operations based on real-time analytics Business Executives Empowered Employees Make Decisions and Take Action 3Data Scientists Business Analysts Analyze data2
  15. 15. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  16. 16. 16 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Practical Use Cases – The Analytics Landscape Degree of Complexity CompetitiveAdvantage Standard Reporting Ad hoc reporting Query/drill down Alerts Simulation Forecasting Predictive modeling Optimization What exactly is the problem? What will happen next if ? What if these trends continue? What could happen…. ? What actions are needed? How many, how often, where? What happened? Stochastic Optimization Based on: Competing on Analytics, Davenport and Harris, 2007 Descriptive Prescriptive Predictive How can we achieve the best outcome? How can we achieve the best outcome including the effects of variability?
  17. 17. 17 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Innovate New Products and Services at Speed and Scale Vestas, the world’s largest wind energy company, was able to use big data and IBM technology to increase wind power generation through optimal turbine placement. Reducing the time to analyze petabytes of data with IBM Big Insights software and IBM Spectrum Scale “Before, it could take us three weeks to get a response to some of our questions simply because we had to process a lot of data. We expect that we can get answers for the same questions now in 15 minutes.” – Lars Christian Christensen
  18. 18. 18 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. If You are Not Paying for it… Then you are not the Customer, … You are the Product Being Sold! How much is each user worth to Social Media companies? Sources: Geek & Poke comic, “Let’s Talk about Data” by Neha Mehta
  19. 19. 19 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Social Network Public Database How valuable is Amy to my retail sales? Who does she influence? What do they spend? Retailer Amy Bearn 32, Married, mother of 3, Accountant Telco Score: 91 CPG Score: 76 Fashion Score: 88 Telco company How valuable is Amy to my mobile phone network? How likely is she to switch carriers? How many other customers will follow Merged Network Calling Network 360 Degree View of the Customer – A Demographic of One
  20. 20. 20 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Deep Individual Customer Insight • Preferences • Interests • Likes Run Zero-Latency Operations Direct Channel Workflow Enrich Initiate Direct Response Initiate Channel Response Initiate Process or Workflow Enrich Customer Profile Real-time Decision
  21. 21. 21 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. How Target® Figured Out a Teen Girl Was Pregnant Before Her Father Did Every time you go shopping, you share intimate details about your consumption patterns with retailers. Target has figured out how to data-mine whether you have a baby on the way Looked at historical buying data for all the ladies who had signed up for Target baby registries – Unscented soaps and lotions – Calcium, magnesium and zinc supplements About 25 products help generate “pregnancy prediction” score and her “baby due date” Target sends coupons timed to very specific stages of her pregnancy Source: http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/ “My daughter got this in the mail. She’s still in high school, and you’re sending her coupons for baby clothes and cribs?” -- Angry father of teen girl “I had a talk with my daughter,…She’s due in August. I owe you an apology.” -- Same father, 3 days later
  22. 22. 22 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Exploit Instrumented Assets Doctors from University of Ontario apply big data to neonatal infant monitoring to predict infection Detect Neonatal Patient Symptoms Up to 24 Hours sooner Continuously correlate data Thousands of events each second Signal Processing and Data Cleansing Heart Rate Variability
  23. 23. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  24. 24. 24 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. The IBM big data platform advantage BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM big data platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse • The platform provides benefit as you move from an entry point to a second and third project • Shared components and integration between systems lowers deployment costs • Key points of leverage • Reuse text analytics across streams and BigInsights • Hadoop connectors between Streams and Information Integration • Common integration, metadata and governance across all engines • Accelerators built across multiple engines – common analytics, models, and visualization
  25. 25. 25 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Simplify your data warehouse Customer Need – Business users are hampered by the poor performance of analytics of a general-purpose enterprise warehouse – queries take hours to run – Enterprise data warehouse is encumbered by too much data for too many purposes – Need to ingest huge volumes of structured data and run multiple concurrent deep analytic queries against it – IT needs to reduce the cost of maintaining the data warehouse Value Statement – Speed and Simplicity for deep analytics – 100s to 1000s users/second for operation analytics Customer examples – Catalina Marketing – executing 10x the amount of predictive workloads with the same staff System for Transactions System for Analytics System for Operational Analytics Get started with IBM PureData Systems!
  26. 26. 26 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Ad-Hoc versus Operational Analytics
  27. 27. 27 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Analyze streaming data in Real time Customer Need – Harness and process streaming data sources – Select valuable data and insights to be stored for further processing – Quickly process and analyze perishable data, and take timely action Value Statement – Significantly reduced processing time and cost – process and then store what’s valuable – React in real-time to capture opportunities before they expire Customer examples – Ufone – Telco Call Detail Record (CDR) analytics for customer churn prevention Get started with IBM Streams! Visualization Streams Runtime Deployments Sync Adapters Analytic Operators Source Adapters Automated and Optimized Deployment Streaming Data Sources Streams Studio IDE
  28. 28. 28 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Dominant Players vs. Contender platforms OS Tape Cloud Management Big Data & Analytics Dominant Player Microsoft Windows Quantum DLT Amazon Web Services Cloudera Contender platform Linux Linear Tape Open (LTO) OpenStack Open Data Platform Supporters of Contender platform IBM, RedHat, SUSE, Oracle and others IBM, HP, Certance and others IBM, HP, Rackspace, RedHat, Dell, Cisco, VMware and others IBM, Pivotal, Hortonworks and others
  29. 29. 29 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM InfoSphere BigInsights is a 100% standard Hadoop distribution By default, open source components are always deployed Elect to use proprietary capabilities depending on your needs In some cases, proprietary capabilities offer significant benefits Open standards first, but with freedom of choice HDFS YARN HIVE MapReduce PIG Spectrum Scale Platform Symphony Big SQL Adaptive MapReduce BigSheets Share data with non-Hadoop applications and simplify data management Re-use existing tools and expertise, Avoid additional development costs Boost performance, support time-critical workloads, do more with less True multi-tenancy to boost service levels and avoid duplication on infrastructure Simplify access for end-users, minimize software development
  30. 30. 30 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Text Analytics Spectrum Scale Platform Symphony IBM BigInsights Enterprise Management System ML on Big R Distributed R IBM Open Platform with Apache Hadoop IBM BigInsights Data Scientist IBM BigInsights Analyst Big SQL Big Sheets Big SQL BigSheets IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Three new user-centric modules founded on an Open Data Platform IBM Open Platform with Apache Hadoop is IBM’s own 100% open source Apache Hadoop distribution. IBM will include the ODP common kernel when available. Business Analyst Data Scientist Administrator
  31. 31. 31 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Platform Symphony Integrates with Hadoop YARN uses a pluggable architecture for schedulers. – FIFO, Fair, and Capacity Schedulers implemented this way – Symphony EGO is also implemented this way. Therefore, scheduler is completely transparent to YARN Applications. ISV Certification for Platform Symphony is not required. YARN (open source) Fair Capacity Symphony EGO FIFO Like other schedulers, queues and policies are defined in Platform Symphony EGO. App1 App2 App3
  32. 32. 32 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Spark, a Complement to Hadoop 32 • Spark - complement Hadoop, not replace • Provides distributed memory abstractions for clusters to support applications that repeatedly use a working set of data, • Iterative algorithms (machine learning), • Interactive data mining tools (R, Python, ..) • Spark Programming Model – Resilient Distributed Datasets (RDDs) • Immutable collections partitioned across cluster that can be rebuilt if a partition is lost • Created by transforming data in stable storage using data flow operators (map, filter, group-by, …) • Can be cached across parallel operations • Spark uses HDFS or IBM Spectrum Scale • Can use any Hadoop data source • Use Hadoop InputFormats and OutputFormats • Spark runs on YARN • Can run on the same cluster with MapReduce • Spark works with Hadoop ecosystem • Flume, Sqoop, HBase • Spark architectural considerations • Keep dataset in memory • Spark programs can be bottlenecked by any resource in cluster: CPU, network bandwidth, memory. Most often, if data fits in memory, the bottleneck is network bandwidth. HDFS or IBM Spectrum Scale YARN
  33. 33. 33 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM InfoSphere BigInsights – Big SQL Native Hadoop Data Sources CSV SEQ Parquet RC AVRO ORC JSON Custom Optimized SQL MPP Run-time Big SQL SQL based Application IBM’s SQL for Hadoop • Makes Hadoop data accessible to a wider audience • Familiar, widely known syntax • Leverage native Hadoop data sources Complements the Data Warehouse • Exploratory analytics • Sandbox, Data Lake Included in IBM BigInsights Use familiar SQL tools • Cognos, SPSS, Tableau, MicroStrategy
  34. 34. 34 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Information Ingestion and Operational Information Decision Management BI and Predictive Analytics Navigation and Discovery Intelligence Analysis Landing Area, Analytics Zone and Archive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Real-time Analytics Video/Audio Network/Sensor Entity Analytics Predictive Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Stream Processing Data Integration Master Data Streams Information Governance, Security and Business Continuity Architecture Pattern for big data Implementation Application Transaction Machine data Social media, email Enterprise content Data at Rest
  35. 35. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  36. 36. 36 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Why use IBM Spectrum Scale™ Extreme Scalability Add or Remove nodes and storage, without disruption or performance impact to applications Universal Access to Data All servers and clients have access to data through a variety of file and object protocols High Performance Parallel access with no hot spots Proven Reliability Used by over 200 of the top 500 Supercomputers Survive any node or storage failure with Distributed RAID and redundant components
  37. 37. 37 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Hadoop Analytics – HDFS vs IBM Spectrum Scale™ HDFS Save Results Discard Rest IBM Hadoop Connector allows Map/Reduce programs to process data without application changes IBM Spectrum Scale Application data stored on IBM Spectrum Scale is readily available for analytics Save Results JFS2 NTFS EXT4 Data Sources mashup of structured and unstructured data from a variety of sources Actionable Insights Provides answers to the Who, What, Where, When, Why and How Business Intelligence & Predictive Analytics > Competitive Advantages > New Threats and Fraud > Changing Needs and Forecasting > And More!
  38. 38. 38 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Hadoop HDFS HDFS NameNode HA added in version 2.0. NameNode HA in active/passive configuration Difficulty to ingest data – special tools required Lacking enterprise readiness No single point of failure, distributed metadata in active/active configuration since 1998 Ingest data using policies for data placement Versatile, Multi-purpose, Hybrid Storage (locality and shared) Enterprise ready with support for advanced storage features (Encryption, DR, replication, SW RAID etc) Large block-sizes – poor support for small files Variable block sizes – suited to multiple types of data and metadata access pattern Scale compute and storage independently (Policy based ILM) Compute and Storage tightly coupled – leading to very low CPU utilization Single-purpose, Hadoop MapReduce only POSIX file system – easy to use and manage Non-POSIX file system – obscure commands. Does not support in-place updates. IBM Spectrum Scale HDFS versus IBM Spectrum Scale™
  39. 39. 39 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. HDFS Namenode Secondary Namenode IBM Spectrum Scale™ – File Placement Optimization SAN Internal, Direct-Attach TCP/IP or RDMA Network • Spectrum Scale avoids the need for a central namenode, a common failure point in HDFS • Avoid long recovery times in the event of namenode failure • Spectrum Scale can intermix FPO with standard NSD server and client nodes in the same cluster • POSIX compliance which is key to avoid data islands. • Robustness and performance at massive scale and maturity File Placement Optimization (FPO) Creates a “shared nothing” cluster similar to HDFS in Hadoop environments
  40. 40. 40 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Share-Nothing versus Shared-Disk Deployments Data Data Data Parity Data Data Data Copy Copy Copy Copy Copy Copy TCP/IP or RDMA Need more compute? Add another node! Spectrum Scale and Elastic Storage Server reduce storage to one RAID-protected copy of the data Scale compute and storage capacity separately Spectrum Scale FPO can keep 1,2 or 3 replicas of the data Need more storage capacity? Add another node! 3x versus 1.3x TCP/IP or RDMA
  41. 41. 41 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM Spectrum Scale™ – Software, Systems or Cloud Services Software • Install software on your own choice of Industry standard x86 or POWER servers Pre-built Systems • Elastic Storage Server with distributed RAID • Storwize V7000 Unified Cloud Services • Spectrum Scale can be deployed on any Cloud Scale
  42. 42. 42 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Session summary Big data is being generated by everything around us – Every digital process and social media exchange produces it – Systems, sensors and mobile devices transmit it Big data is arriving from multiple sources at amazing velocities, volumes and varieties To extract meaningful value from big data, you need optimal processing power, storage, analytics capabilities, and skills Sources: The Economist, and special thanks to Dr. Bob Sutor, IBM VP, Business Solutions & Mathematical Sciences
  43. 43. 43 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Session Evaluations YOUR OPINION MATTERS! Submit four or more session evaluations by 5:30pm Wednesday to be eligible for drawings! *Winners will be notified Thursday morning. Prizes must be picked up at registration desk, during operating hours, by the conclusion of the event. 1 2 3 4
  44. 44. 44 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
  45. 45. 45 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Big Data & Analytics Building Big Data and Analytics Solutions in the Cloud http://www.redbooks.ibm.com/abstracts/redp5085.html?Open o IBM BigInsights o IBM PureData System for Hadoop o IBM PureData System for Analytics o IBM PureData System for Operational Analytics o IBM InfoSphere Warehouse o IBM Streams o IBM InfoSphere Data Explorer (Watson Explorer) o IBM InfoSphere Data Architect o IBM InfoSphere Information Analyzer o IBM InfoSphere Information Server o IBM InfoSphere Information Server for Data Quality o IBM InfoSphere Master Data Management Family o IBM InfoSphere Optim Family o IBM InfoSphere Guardium Family “Analytics is about examining data to derive interesting and relevant trends and patterns, which can be used to inform decisions, optimize processes, and even drive new business models.”
  46. 46. 46 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Research Paper “In this paper, we revisit the debate on the need of a new non-POSIX storage stack for cloud analytics and argue, based on an initial evaluation, that it can be built on traditional POSIX-based cluster filesystems.“
  47. 47. 47 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Hadoop for the Enterprise http://www.ibm.com/software/data/infosphere/hadoop/enterprise.html IBM BigInsights for Apache Hadoop provides a 100% open source platform and offers analytic and enterprise capabilities for Hadoop.
  48. 48. 48 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM Tucson Executive Briefing Center Tucson, Arizona is home for storage hardware and software design and development IBM Tucson Executive Briefing Center offers: –Technology briefings –Product demonstrations –Solution workshops Take a video tour! – http://youtu.be/CXrpoCZAazg
  49. 49. 49 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. About the Speaker Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products. Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1 most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through V. Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products. 9000 S. Rita Road Bldg 9032 Floor 1 Tucson, AZ 85744 +1 520-799-4309 (Office) tpearson@us.ibm.com Tony Pearson Master Inventor, Senior IT Specialist IBM System Storage™
  50. 50. 50 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Email: tpearson@us.ibm.com Twitter: twitter.com/az99Øtony Blog: ibm.co/Pearson Books: www.lulu.com/spotlight/99Ø_tony IBM Expert Network on Slideshare: www.slideshare.net/az99Øtony Facebook: www.facebook.com/tony.pearson.16121 Linkedin: www.linkedin.com/profile/view?id=103718598 Additional Resources from Tony Pearson
  51. 51. 51 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Continue growing your IBM skills ibm.com/training provides a comprehensive portfolio of skills and career accelerators that are designed to meet all your training needs. If you can’t find the training that is right for you with our Global Training Providers, we can help. Contact IBM Training at dpmc@us.ibm.com Global Skills Initiative
  52. 52. 52 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Trademarks and Disclaimers Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. © IBM Corporation 2015. All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-00

×