Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Future of Data Warehousing, Data Science and Machine Learning


Published on

Watch the on-demand recording here:
Evolution of Big Data and the Role of Analytics | Hybrid Data Management
IBM, Driving the future Hybrid Data Warehouse with IBM Integrated Analytics System.

Published in: Data & Analytics
  • Be the first to comment

The Future of Data Warehousing, Data Science and Machine Learning

  1. 1. IBM Cloud © 2018 IBM Corporation Modernize your Data Warehouse with IBM Integrated Analytics System Thomas Chu Director, Offering Management Hybrid Data Management, IBM Analytics
  2. 2. 2 Agenda Evolution of Big Data and the Role of Analytics Hybrid Data Management IBM, Driving the future Hybrid Data Warehouse with IBM Integrated Analytics System IBM Cloud © 2018 IBM Corporation
  3. 3. 3IBM Cloud / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Headline Global data growth By 2025, 163 Trillion Gigabytes of data will be created IBM Cloud © 2018 IBM Corporation
  4. 4. 0.5% of all data is actually analyzed — MIT Technology review 10% increase in data accessibility will result in more than $65 M additional net income — Baseline Magazine 80% of all data is stored by corporations — Baseline Magazine 50% of large enterprises will have hybrid cloud deployments by the end of 2017 — IBM Institute for Business Value Data is proliferating, often stored in different locations and formats. It’s getting more difficult to provide data access and analytics to the business. Why you need a hybrid data strategy IBM Cloud © 2018 IBM Corporation
  5. 5. Data Driven Insight Driven Digital Transformation  Culture Change  Breaking Silos  Discover “What”  Understand “Why”  Self Service  Reports  Business Intelligence  Prediction  Optimization  Automation  Collaboration  Models  Visualization  Applications Outcomes Capabilities Drivers Most are here Value from Data  New Business Models  Disruptive Technology  Real-Time Decisions  Instrumentation  Orchestration  Integration Competitive Cost Reduction Modernization Market Leader None of this is possible without the right hybrid data management strategy! As data becomes more accessible, it provides more value IBM Cloud © 2018 IBM Corporation
  6. 6. 6 What is your organization trying to solve? Innovation Create and own the data management strategy, leverage data virtualization and cloud. New Data Types The ability to integrate unstructured, semi-and structured data into a single analytic architecture. Leverage both SQL and NnSQL data sources Flexibility Ability to choose between a flexible set of deployment and licensing models, workload types, technologies, data sources and storage tiers. Efficiency Optimize data architecture and life cycle management to reduce cost, increase performance and protect existing investments in skills, applications and ecosystem Enterprise-strong Address data sprawl, workloads and open source technologies that can scale with the business in a highly and continuously available manner. Portability The ability to move data and insights where needed, without the requirement to re-architect applications. Write-once-run- anywhere application architecture. IBM Cloud © 2018 IBM Corporation
  7. 7. 7 Agenda Evolution of Big Data and Analytics Hybrid Data Management IBM, Driving the future Hybrid Data Warehouse with IBM Integrated Analytics System IBM Cloud © 2018 IBM Corporation
  8. 8. Digital transformation journey with hybrid data management More intelligent analytics and insights Go at the speed of your business Write once, run anywhere, from any source Deploy your data where you need it Write once, access anywhere with a common access layer to promote application independence Hybrid Data Management Unified Governance & IntegrationData Science & Business Analytics Prepare, publish and protect your data to drive insights while mitigating compliance risks Descriptive, predictive, prescriptive to understand the current, predict the future and change outcomes Organize Your Data Analyze Your Data Machine Learning On-Premises and Cloud Common SQL Engine Infused with Seamless between Powered by IBM Cloud © 2018 IBM Corporation
  9. 9. …Hybrid IBM’s strategy is… NOT about Cloud OR On-premises… NOT about Traditional Relational OR Open Source… NOT about SQL OR NoSQL… NOT about Structured OR Unstructured Data… It’s about Cloud AND On-premises It’s about Traditional Relational AND Open Source It’s about SQL AND NoSQL It’s about Structured AND Unstructured Data IBM Cloud © 2018 IBM Corporation
  10. 10. Built on a Common SQL Engine • Application Agility Write once, run anywhere One ISV product certification for all platforms • Operational compatibility Reuse operational and housekeeping procedures • Standardized analytics Common programming model for in-DB analytics • Common Skills One skill set for all deployments Drive higher efficiencies and portfolio rationalization • Licensing Flexible entitlements for business agility and cost-optimization • Integration Common Data Virtualization capabilities for query federation and data movement Managed public Cloud DBaaS Db2 Warehouse on Cloud Software defined warehouse on-premises or in cloud Db2 warehouse Custom deployable database Db2 Open source Hadoop with Hortonworks Big SQL Dedicated analytics appliance IBM Integrated Analytics System IBM Hybrid Data Management solutions Anchored by a Common SQL Engine enabling true, highly scalable hybrid data warehousing solutions with portable analytics IBM Cloud © 2018 IBM Corporation
  11. 11. 11 Agenda Evolution of Big Data and Analytics Hybrid Data Management IBM, Driving the future Hybrid Data Warehouse with IBM Integrated Analytics System IBM Cloud © 2018 IBM Corporation
  12. 12. Introducing the IBM Integrated Analytics System A Next Generation Hybrid Data Warehouse That Does Data Science Faster Cloud-ready to support multiple workload deployment options Built-in IBM Data Science Experience to collaboratively analyze data Optimized for high performance to support the broadest array of workload options for structured and unstructured data in your hybrid data management infrastructures Reliable, elastic and flexible system that reduces and simplifies management resources Real time analytics with machine learning that accelerates decision making, bringing new opportunities to the business – ready for business analyst and data scientist Leverages a Common SQL Engine for workload portability and skill sharing across public and private cloud 12 Cloud © 2018 IBM Corporation
  13. 13. Evolution of Netezza and PureData System for Analytics World’s first Data Warehouse appliance World’s first 100 TB Data Warehouse appliance World’s first petabyte Data Warehouse appliance World’s first Analytic Data Warehouse appliance NPS® 8000 Series TwinFin™ with i-Class Advanced Analytics NPS® 10000 Series TwinFin™ 2012 2014 Sept 20172003 2006 2009 2010 World’s fastest and “greenest” analytical platform PureData System for Analytics N2000 PureData System for Analytics N3000 IBM Integrated Analytics System Future World’s First Hybrid Data Warehouse and Data Science Platform 13 NEW IBM Cloud © 2018 IBM Corporation
  14. 14. Hardware Architecture Overview 2x Mellanox 10G Ethernet switches • 48x10G ports • 12x40/50G ports • Dual switches form resilient network IBM SAN64B 32G Fibre Channel SAN • 16Gb FC Switch • 48x 32Gb/s SFP+ ports Up to 3 Flash Arrays in 1 rack containing • IBM FlashSystem 900 • Dual Flash controllers • Micro Latency Flash modules • 2-Dimensional RAID5 and hot swappable spares for high availability 7 Compute Nodes in 1 rack containing • IBM Power 8 S822L 24 core server 3.02GHz • 512 GB of RAM (each node) • 2x 1.2TB SAS HDD • Red Hat® Linux OS User data capacity: 324 TB (Assumes 4x compression) Power requirements: 9.4 kW Cooling requirements: 32,000 BTU/hr Scales from: 1/3rd Rack to 8 Racks (initial GA is 1/3rd to 4 Racks and supports Tier Storage expansion) 14IBM Cloud © 2018 IBM Corporation
  15. 15. IBM Integrated Analytics System Configurations IBM Power 8 S822L 24 core server 3.02GHz IBM Flash System 900 In-place Expansion, Tiered storage Mellanox 10G Ethernet switches Brocade SAN switches -003 1/3 Rack -006 2/3 Rack -010 Full Rack -020 2 Racks -040 4 Racks Servers 3 5 7 14 28 Cores 72 120 168 336 672 Memory 1.5 TB 2.5 TB 3.5 TB 7 TB 14 TB Available User Space1 27 TB 54 TB 81 TB 162 TB 324 TB Optional Tier Storage (Flash + HDD) Available User Space1,2 32 TB + 166 TB 32 TB + 299 TB 32 TB + 432 TB 64 TB + 831 TB 128 TB + 1,629 TB 1Assume up to 4x compression to calculate user data (pre-load uncompressed user data). Example a full rack user data capacity = 4 x 81TB = 324 TB 1Example Total user data capacity for full rack Tier Storage models = 4 x 81TB + 4 x 32TB (Flash) + 4 x 432 TB (HDD) = 2,180 TB 15IBM Cloud © 2018 IBM Corporation
  16. 16. IBM Cloud © 2018 IBM Corporation IBM Integrated Analytics System Console
  17. 17. Always Available Analytics Redundancy to ensure no single point of failure – Fault tolerant design to ensure continued operation in the event of hardware failure 99.999% reliability hardware components – Built with IBM Power and IBM FlashSystem reliability, combined with automated failovers for application continuity Single monitoring solution for all of your data – IBM Data Server Manager can easily monitor and manage all components on the systems and can be used across all of your data 17IBM Cloud © 2018 IBM Corporation
  18. 18. Expansions and Upgrades In-place incremental expansion • Reduce disruptions to your analytics systems as you scale out computer power In-place tiered storage expansion Independently scale storage for cost effective capacity growth Cloud-ready • Tools to shift workloads within a hybrid public/private cloud and on-premises environments based on your application requirements Cost efficient multi-temperature storage • Most frequently accessed data (“hot”) on faster flash storage • Less frequently accessed data (“colder”) on cost efficient storage systems 18IBM Cloud © 2018 IBM Corporation
  19. 19. Data Science and Hybrid Data Management with IBM Integrated Analytics System Machine Learning Demo on Youtube High Performance IBM Integrated Analytics System External Data Sources Stock Portfolio Analytics Applications leverage In-database Machine Learning (ML) models and R analytics Db2 Warehouse on Cloud (Structured Data Store) IBM BigSQL on HortonWorks Data Platform (Hadoop) (Unstructured Data Store) Macro economic data feeds (Source: FRED) News data feeds (Source: NASDAQ) Stock and customer portfolio data (On-Premises) Move data and federate queries with Common SQL Engine DSX LOCAL 19 The new use case … IBM Cloud © 2018 IBM Corporation
  20. 20. Analysis of viewership data generated from fragmented audiences in this multi-platform, multi-channel business takes a lot of time, money, and resources. AMC Networks’ Business Intelligence team spent 80% of their time evaluating audience data and only 20% doing actual research — making it challenging to uncover the insights they needed, when they needed them. Time lost, unexpected costs, and limited access to data adds up to missed opportunities NEED To combine, store, and quickly analyze third-party ratings & viewer data within a logical data warehouse CHALLENGES Requires a simple method to pull together disparate data sources. Solution must support an integrated data science and analytics platform IBM Cloud © 2018 IBM Corporation
  21. 21. Do Data Science Faster IBM Integrated Analytics System uses cognitive machine learning to assist your data scientists, all collaborating inside one unified platform. Support Hybrid Workloads IBM Fluid Query federates queries across all your data repositories — with a single, shared API. Support Hybrid Deployments IBM Common SQL Engine enables logical data warehousing on open standards, across on-premises and cloud deployments. “The combination of high performance and advanced analytics – from the Data Science Experience to the open Spark platform – gives our business analysts the ability to conduct intense data investigations with ease and speed... The Integrated Analytics System is positioned as an integral component of an enterprise data architecture solution, connecting IBM Netezza Data Warehouse and IBM PureData System for Analytics, cloud-based Db2 Warehouse on Cloud clusters, and other data sources.” Vitaly Tsivin — Executive Vice President, AMC Networks IBM Cloud © 2018 IBM Corporation
  22. 22. Get Started Today Start Your Journey Try It Out Learn More Learn more: Visit: marketplace on Read: Now is perfect time to move from Netezza to the Integrated Analytics System Solution Brief Read: Integrated Analytics System-Do Data Science Faster Solution Brief Visit: Integrated Analytics System content hub Trials and downloads: Trial: Contact us to get stared IBM Marketplace Try it: Proof of Technology DataFirst Method: Engage the IBM DataFirst Method to build the strategy, expertise, and roadmap needed to gain the most value from data and achieve your goals 22 IBM Knowledge Center IBM Integrated Analytics System YouTube Channel Data Warehouse User Community IBM Cloud © 2018 IBM Corporation
  23. 23. FutureStatements:IBM’sstatementsregardingitsplans,directions,andintentaresubjecttochangeorwithdrawalwithoutnoticeatIBM’ssolediscretion. Informationregardingpotentialfutureproductsisintendedtooutlineourgeneralproductdirectionanditshouldnotbereliedoninmakingapurchasing decision.Theinformationmentionedregardingpotentialfutureproductsisnotacommitment,promise,orlegalobligationtodeliveranymaterial,codeor functionality.Informationaboutpotentialfutureproductsmaynotbeincorporatedintoanycontract.Thedevelopment,release,andtimingofanyfuture featuresorfunctionalitydescribedforourproductsremainsatoursolediscretion.Donotdistribute. Thank you