Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

1524 how ibm's big data solution can help you gain insight into your data center v2

  • Be the first to comment

  • Be the first to like this

1524 how ibm's big data solution can help you gain insight into your data center v2

  1. 1. How IBMs Big Data SolutionCan Help You Gain Insight intoYour Data CenterChristophe Menichetti, Certified IT SpecialistBAO / Big Data © 2012 IBM Corporation
  2. 2. Please noteIBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion.Information regarding potential future products is intended to outline our general productdirection and it should not be relied on in making a purchasing decision.The information mentioned regarding potential future products is not a commitment, promise,or legal obligation to deliver any material, code or functionality. Information about potentialfuture products may not be incorporated into any contract. The development, release, andtiming of any future features or functionality described for our products remains at our solediscretion.Performance is based on measurements and projections using standard IBM benchmarks ina controlled environment. The actual throughput or performance that any user will experiencewill vary depending upon many factors, including considerations such as the amount ofmultiprogramming in the user’s job stream, the I/O configuration, the storage configuration,and the workload processed. Therefore, no assurance can be given that an individual userwill achieve results similar to those stated here. 1
  3. 3. IBM Montpellier Client Center Our Client Center partners with clients to meet their IT infrastructure goals and improve their overall business by demonstrating the capabilities of the IBM solutions.  Smarter Computing Design:  Benchmarks & Proofs of Concept: System Briefings Energy, Cities, Cloud, Water, –PureSystems Business Resilience –System z Software Briefings –Power Systems  Enterprise Architecture Design –HPC Demonstrations  z Key Workload Initiatives –System x & Blade  Advanced Technical Skills –Storage Industry Showcases  Solution Testing  ISV Solution Centers: BP, ISV & CSI Support SAP, Oracle, Siebel  WW GDPS Solution Testing  WW Financial Services CoE  Software zTEC  New Technology IntroductionTalk & Teach Design Prove 2
  4. 4. Innovation Lab – Resources & Skills Smarter Cities Innovation through R&D Collaborative Projects supported by CAS France in partnership with Labs & Clients (funded by Governments or European Commission) and Client projects Big Data / BAO & Smarter Cities offerings Customer Briefings & Workshops: Architecture, Design Session, PoC Presales technical support: RFP, sizing, pilot support, architecture Showcases Xavier Vasques Virginie Radisson Marie Angèle Grilli Olivier Hess Manager Business Leader Project Manager CTO Smarter Energy & Cities: Innovation with a vision of improving Energy Smarter cities Consumption through the use of IT and BAO with Universities/company Montpellier Water Management COE: The use of numerical simulations-HPC for Water Manager as Deep Thunder from IBM Research first implemented in IBM Europe by our IBM Montpellier team. Christophe Menichetti Elsa Fabres Colin Dumontier Romain Chailan BAO/Big Data Data Analytics & IOC HPC/Water Specialist PhD Student Specialist Specialist Saniya Ben Hassen Jean-Philippe Durney Denis Gras BAO IT Architect BAO/Big Data IT Smarter Cities Architect IT Architect Promote and develop innovative assets around Data applied to Smarter Planet/Cities issues in order to engage customer and collaborative R&D projects 3
  5. 5. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams Big Data in action > Our Customer Center Showcase experience 44
  6. 6. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams Big Data in action > Our Customer Center Showcase experience 55
  7. 7. What is Big Data ? 66
  8. 8. Our data rich world is exploding… 4.6 IT: Logs & 30 billion RFID transactions billion tags today camera Twitter process (1.3B in 2005 phones 7 TBs of world data every day wide 900 million GPS devices Facebook processes sold 10 TBs of annuallyWorld Data Centre for Climate data every daykeeps 220 TBS of Web data by 2013and 9 PBs of auxiliarysupporting data 2 billion Capital market people data volumes grew on the 76 million smart Web by 1,750%, 2003-06 meters in 2009… 2011 200M by 2014 Text, Blog, Weblog 77
  9. 9. The Big Data Opportunity Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Variety: Manage the complexity of multiple relational and non- relational data types and schemas Velocity: Streaming data and large volume data movement Volume: Scale from terabytes to zettabytes (1B TBs) 888
  10. 10. Bring Together a Large Volume and Variety of Data to Find New Insights Multi-channel customer sentiment and experience a analysis Detect life-threatening conditions at hospitals in time to intervene Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Make risk decisions based on real-time transactional data Identify criminals and threats from disparate video, audio, and data feeds 99
  11. 11. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams Big Data in action > Our Customer Center Showcase experience 1010
  12. 12. Big Data : why is it possible Now ? Traditional approach : Data to Function Traditional approach Application server and Database User request Query Data server are separate Database Data can be on multiple servers Application Analysis Program can run on server server multiple Application servers Network is still a the middleSend result return Data Data have to go through the network process Data Data •Big Data Approach Big Data approach : Function to Data  Analysis Program runs where are Query & the data : on Data Node Send Function to process Data Only the Analysis Program are have process on Data Data to go through the networkUser request Data nodes Analysis Program need to be Data nodes Master Data nodes MapReduce aware node nodes Highly Scalable : Data Data 1000s Nodes Data Petabytes and more Data Send Consolidate result 1111
  13. 13. Big Data : why is it possible Now ? Traditional approach : Data to Function Example : User request Query Data How many hours Clint Eastwood Database appears in all the movies he has done ? Application server All movies need to be parsed to find server Clint face Send result return Data Traditional approach : All movies are process Data going to be sent through the Network Data Big Data approach : Function to Data Query & • Big Data Approach : Only the Send Function to process Data Analysis Program and Clint picture are process on Data Data sent through the NetworkUser request Data nodes Data nodes Master Data nodes node nodes Data Data Data Data Send Consolidate result 1212
  14. 14. Merging the Traditional and Big Data Approaches Traditional Approach Big Data Approach Structured & Repeatable Analysis Iterative & Exploratory Analysis IT Business Users Delivers a platform to Determine what enable creative question to ask discovery IT Business Users Structures the Explores what data to answer questions could be that question asked Monthly sales reports Brand sentiment Profitability analysis Product strategy Customer surveys Maximum asset utilization 1313
  15. 15. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams Big Data in action > Our Customer Center Showcase experience 1414
  16. 16. IBM Big Data platform Analyse unstructured Big Data Analyze structutred Big Data Analytic Applications Content Analytics Cognos BI Reporting Exploration / Functional Industry Predictive Reporting BIContent Index for contextual collaborative Reporting / Content SPSS Visualization App App Analytics Analytics Analytics insights ReportingCreate Reports on BigInsights , Analyze In Streams Simplify your warehouse Unlock Big Data Big Data Platform PureData Analytics, PureData Operational Analytics Infosphere Data Explorer Visualization Application Systems Deliver deep insight with advancedGather, extract and explore data using best of breed visualization & Discovery Development Management in-database analytics and operational analytics Analyze Raw Rata Accelerators InfoSphere BigInsights Infosphere Streams (RT) Index Big Data Data ExplorerSpeed time to value with analytic and Hadoop Stream Data Content Analytics application accelerators Content System Computing Warehouse Management Index for contextual collaborative insights Reduce costs with Hadoop PlatForm Computing , GPFS Cost-effectively analyze Manage Big Data petabytes of structured and Gardium, Information Server unstructured information Information Integration & Governance Govern data quality and manage information lifecycle insights Analyze Streaming Data InfoSphere Streams Cloud | Mobile | SecurityAnalyze streaming data and large data bursts for real-time insights 15
  17. 17. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams InfoSphere BigInsights 1616
  18. 18. What’s so Special About Open Source Hadoop? Storage Scalable • Distributed • New nodes can be added on the fly • Reliable • Commodity gear Affordable • Massively parallel computing on commodity servers Flexible • Hadoop is schema-less – can absorb MapReduce any type of data • Parallel Programming Fault Tolerant • Fault Tolerant • Through MapReduce software framework 1717
  19. 19. Basic Hadoop principles: HDFS and MapReduce  Hadoop Distributed File System = HDFS : where Hadoop stores the data – This file system spans all the nodes in a cluster  Hadoop computation model – Data stored in a distributed file system spanning many inexpensive computers – Bring function to the data – Distribute application to the compute resources where the data is stored – Scalable to thousands of nodes and petabytes of data public static class TokenizerMapper Hadoop Data Nodes extends Mapper<Object,Text,Text,IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text val, Context StringTokenizer itr = new StringTokenizer(val.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); } context.write(word, one); 1. Map Phase } } (break job into small parts) public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita private IntWritable result = new IntWritable(); public void reduce(Text key, Distribute map 2. Shuffle Iterable<IntWritable> val, Context context){ int sum = 0; for (IntWritable v : val) { tasks to cluster (transfer interim output sum += v.get(); . . . for final processing)MapReduce Application 3. Reduce Phase (boil all output down to Shuffle a single result set) Result Set Return a single result set 18 18
  20. 20. InfoSphere BigInsights Platform for volume, variety, velocity -- V3  Enhanced Hadoop foundation Analytics for V3 Enterprise Edition  Text analytics & tooling Licensed Usability  Web console Business process accelerators (“Apps”)  Integrated install Text analytics Spreadsheet-style analysis tool  Spreadsheet-style tool Enterprise class RDBMS, warehouse connectivity  Ready-made “apps” Integrated Web-based console Enterprise Class Basic Edition Flexible job scheduler  Storage, security, cluster Performance enhancements Free download management Eclipse-based tooling Integration Integrated install LDAP authentication Online InfoCenter  Connectivity to DB2, Netezza, JDBC .... BigData Univ. databases, SPSS, Cognos, Unica, Apache Hadoop coremetrics, Streams, Datastage Breadth of capabilities 1919
  21. 21. InfoSphere BigInsights – A Full Hadoop Stack Open Source Components IBM Specific Components 2020
  22. 22. Vestas optimizes capital investments based on 3 Petabytes of information. Capabilities Utilized: InfoSphere BigInsights InfoSphere Warehouse • Model the weather to optimize placement of turbines, maximizing power generation and longevity. • Reduce time required to identify placement of turbine from weeks to hours. • Incorporate 3 PB of structured and semi-structured information flows. • Data volume expected to grow to 6 PB. 2121 2
  23. 23. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams InfoSphere Streams > 2222
  24. 24. IBM InfoSphere Streams for companies who need to… Real-time delivery  Deal with Terabytes of data each second ICU Environment Monitoring Monitoring  Work with application, sensor and Algo Powerful Telco churn Analytics predict internet data, video/audio Trading Cyber Smart Security Government / Grid  Deliver insight in microseconds to Law enforcement analytical applications  Support complex scenarios using Millions of events per Microsecond C++ or Java code Latency second  Integrate with existing analytics & data warehousing investments Traditional / Non-traditional data sources 2323
  25. 25. Stream Computing – Analyze Data in Motion Traditional Computing Stream ComputingHistorical fact finding Current fact findingFind and analyze information stored on disk Analyze data in motion – before it is storedBatch paradigm, pull model Low latency paradigm, push modelQuery-driven: submits queries to static data Data driven – bring the data to the query 2424
  26. 26. Big Data in Real-Time with InfoSphere Streams Filter / Sample Modify Annotate Fuse Classify 2525
  27. 27. Asian Telco reduces billing costs and improves customer satisfaction Capabilities: Stream Computing Analytic Accelerators Real-time mediation and analysis of 5B CDRs per day Data processing time reduced from 12 hrs to 1 min Hardware cost reduced to 1/8th Proactively address issues (e.g. dropped calls) impacting customer satisfaction. 2626 2
  28. 28. Most Use Cases Combine Technologies Variety VolumeCombination of Streams filtersNon-traditional/ incoming data internet datawith traditional data InfoSphere BigInsights InfoSphere Streams Traditional Data Reuse Warehouse Analytic models IBM Data Warehouse Velocity Persistent Data In-Motion Data 2727
  29. 29. Big Data Patterns Common Big Data and Warehouse patternsSeparate unstructured & structured analysis Common analysis of structured and unstructured data App /BI App / BI App / BI Visualization Visualization Visualization / Exploration Exploration Exploration BigInsights Warehouse BigInsights Warehouse Unstructured Structured Unstructured StructuredWarehouse and BigInsights partitioning Warehouse batch offload App / BI App / BI App / BI Visualization Visualization Visualization Exploration Exploration Exploration Warehouse BigInsights BigInsights Warehouse Structured Structured 28
  30. 30. Big Data Patterns Common Big Data and Warehouse patternsIn motion, at rest analysis with BigInsights In motion and at rest applications Real time App Analytic App / Real time App / BI / BI BI Streams BigInsights Streams BigInsights Warehouse Warehouse Streaming data Streaming dataIn motion, at rest analysis of structured and In motion, structured at rest analysisunstructured data Real time App Real time App Analytic App / BI Analytic App / BI / BI / BI BigInsights Warehouse Streams BigInsights Warehouse Streams Streaming data Unstructured Structured Streaming data Structured data data data 29
  31. 31. AGENDA Big Data Challenges > Why the interest is growing? Big Data Technologies > What is Big Data ? IBM Big Data Solutions > IBM Big Insights and IBM Streams Big Data in action > Our Customer Center Showcase experience 3030
  32. 32. Big Data Use Cases and customer outcomesFindings from the research collaboration of IBM Institute for Business Value and Saïd Business School, University of Oxford Big data objectives Big data sources Customer-centric outcomes New business model Respondents were Operational optimization Employee collaboration asked which data Risk / financial management sources are currently being collected and Top functional objectives identified by organizations with active big analyzed as part of data pilots or implementations. Responses have been weighted active big data efforts and aggregated. within their organization. 31
  33. 33. Operations / Performance Data is ExplodingA typical enterprise with 5000 servers, running 125 applications across 2 to 3data centers generates in excess of 1.3 TB of data per day Data RatioOnly 3% of the data generate is operations Metric Data Unstructured Dataoriented metric data. 3%97% is made up of unstructured/semistructured data 97%Workloads are running on heterogeneousplatforms. 32
  34. 34. Log Analysis: Problem CharacteristicsSeveral thousand log files collected daily, data collected over several years Infrastructure (Servers, Networks, Storage), Middleware (App Server, Web Server, Database Server, Messaging Server), Apps Value in collocating and co-analyzing the above dataMillions of files, petabytes of data in total, terabytes produced per day. The relationships between logs (links shown below) have to be discovered Large percentage of storage in an enterprise is for log data Analysis of log data has many challenges One replica stops responding... Collection and parsing of data App 2 App Server Interpretation of logs App Load 2 Balancer Replicated Database SMEs flooded with common bugs ...causing a fraction of database calls to time out... Lack of a joined up view....which leads to intermittent failures in theapplication. Reactive rather than proactive 33
  35. 35. Central Lab Platform – Before The consequence of scattered Infrastructures for hands-on classes are high costs and business transformation roadblocks 3434
  36. 36. Central Lab Platform – After The scattered infrastructures were transformed into a centralized consolidated hands-on Cloud Platform 3535
  37. 37. Central Lab Platform Cloud Architecture SELF-SERVE SERVICE SERVICE DYNAMIC PORTAL REQUEST PROVISIONING INFRASTRUCTURE Class Manager Teacher & Students Management CLP Cloud Management Front-end Internet access Web PortalPlanning VPNReportingInvoicing Reservation CLP Application engine Setup manager Shared CLP Resources TA CLP TPM Daily repl. Workflows TA DB CLP DB & Scripts 36
  38. 38. Process diagram for log analysis 700 Servers •Unix •Windows •Mainframe •HMC, BladeCenter 170 Storage servers •DS8000 •V7000 •SVC 180 Switches •SAN •LANCloud Mgt & applications•TPM•Odina•Citrix•Aventail•Scripts Business application •Labs Reservation Portal •Problem Tracking 37
  39. 39. Big Data Project Trends & Directions2 Majors Front End objective to demonstrate Big Data Benefit Navigating Enterprise Information: “Leverage Big Data Business Value” • 360° Operational View : To accelerate incident resolution • 360° Business View : To provide metric and Insight – Cloud Data Center utilization : Data Center Business View – Training Labs : Data Center’s Customer Business View Predictive Incident Alerting : “Act Proactively on Incident” • Create Predictive Models based on log history to alert before Incident arrived • Reduce number of Incident Tickets 38
  40. 40. How support Team Work today : Many applications / Information dispersion 39
  41. 41. Navigating Enterprise Information: 360° Operational View About | Help | Profile | Logout - Durney Power System System X System Z Storage SoftwareSort by: Date Relevance Title Search: 153494 Your query has been expanded. Show Expansions 0 documents selected. Select/deselect all on this page Global Status Documentation Service Warn Error Down Up Top 76 Results Ticket Citrix 0 0 0 10 Creator ID Assignee Status Priority Course code Class # Contact Network 0 0 0 180 Lab Setup Guide (4) Nick Yabut 153494 Jean-Philippe Durney Open 2 AN14GB H65X Martin Elliff Storage 0 0 0 170 Courses Exercices (3) need to rebuild LPAR2 for this course (sys5442_lpar2), but cant log into the class NIM server Phone # Master Production documentation (10) nim151 ( It appears to be off-line and it is not showing on the managed system. Cell # servers 2 0 0 4 Best Practice (3) Nim 4 0 56 87 Citrix (3) Sametime ID inst151 TPM 2 0 0 1 Provisionning (10) Storage (4) Course Schedule Open Tickets TSM (3) Overview (4) Sev 1 Sev 2 Sev 3 Processes (5) 5 Tech Choices (12) 4 How To (15) 3 more | all 2 Lotus Notes 1 Re: AN14 scripts on LPAR 10nov.2012 0 I have copied a tar file with all the script for the an14 course on you nim server "sys3862_nim1" in Ticket on AN14 h-24 h-12 h-6 h-1 /home/an14. ... AN14 scripts on LPAR Sent by: ID Assignee Status Priority Course code Class # Contact Jeffrey Emmanuel D ... 153301 Pascal Seignez Closed 2 AN14GB H65X Martin Elliff Re: Ticket #123078 course AN14 ref 8849/E9D4/9416 26fév.2012 Access AN14 ref 8849/E9D4/9416 Hello We have sent 3 IBM CLP class information: AN14GB / H65X (Jan. 21, 2013, 12:00 PM)sys5442 -- We have found that when a device course kits : IBM CLP class ... is deleted from any of the LPARs (rmdev -dl hdisk2), cfgmgr has to be run twice to bring the device back online. Im Top 11 Results sure this is not standard behaviour. Can you explain why this is happening? mime.htm (Ticket #123078 Updated (IBM Problem Tracking & ... 25fév.2012 AN14 class number E9D4 - customer sent message 150000 Pascal Seignez Closed 2 AN14GB 9023 Amin Ezzy HMC (4) on St to request for 4 more additional ... AN14 class TPMHMC (3) number E9D4, and also applid for two more kits ST: AN14G 9023 all students could not log in to the HMCs, username / password error because the students ... Course HMC (1) HMC Power down 03Jan.2013 NIM (2) Pour le cours an14, ZRGV, linstructeur demande •nim_master pourquoi les lpars sont en AIX 7. … 60906 2nd Level Support Closed 2 AN14GB 2861 Martin Elliff •nim151 Customer has issues on course AN14GB/2861. Storage (2) more | all CLP Servers (3) Citrix (1) 59861 Jean Midot Closed 2 AN140 VYRM Ben Gibbs Admin Tools (5) Unable to log in to citrix (elabs), UID and PW not working : error : invalid credentials for all one example : UID : stud148_1 pw : dayheat_67 more | all 40
  42. 42. Navigating Enterprise Information: 360° Business View About | Help | Profile | Logout - Durney Power System System X System Z Storage SoftwareSort by: Date Relevance Title Search: Your query has been expanded. Show Expansions Show Metrics • Number of Running courses versus Number of Logged Students (typically Extract through Big Data Log Analysis) • Cumulative time usage per course/session • Servers, Storage usage • Electric consumption Create Correlated views • Electric Consumption versus number of courses running • Consolidate view per by Global Training Partner Analyze operation • Number of Ticket per Courses Brand, per Course, per Geo • Average Resolution time per Incident type • Top 10 incident per frequency • Top 10 incident per Geo • Top 10 course per Geo 41 41
  43. 43. To learn more and deeperIBM Tivoli Product to monitor and analyse machine logs:> IBM Log Analytics Download the presentation on Pulse2013 site Session 1844 : Problem Determination and Resolution in Minutes Using Unstructured Data Analytics Martin O’Brien - Product Manager Geetha Adinarayan - Client Best Practices Lead 42
  44. 44. BIG Thanks you for your attention 4343
  45. 45. Acknowledgements and Disclaimers:Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in allcountries in which IBM operates.The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They areprovided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance oradvice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in thispresentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damagesarising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation isintended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or alteringthe terms and conditions of the applicable license agreement governing the use of IBM software.All customer examples described are presented as illustrations of how those customers have used IBM products and the results theymay have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in thesematerials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specificsales, revenue growth or other results.© Copyright IBM Corporation 2013. All rights reserved.  U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.  Please update paragraph below for the particular product or family brand trademarks you mention such as WebSphere, DB2, Maximo, Clearcase, Lotus, etcIBM, the IBM logo,, [IBM Brand, if trademarked], and [IBM Product, if trademarked] are trademarks or registered trademarksof International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarkedterms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S.registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also beregistered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright andtrademark information” at you have mentioned trademarks that are not from IBM, please update and add the following lines:[Insert any special 3rd party trademark names/attributions here]Other company, product, or service names may be trademarks or service marks of others. 44