Iod 2013 Jackman Schwenger


Published on

Scientific Research with DBaaS on IBM PureApplication System & PureData System for Transactions

Published in: Technology, Education
1 Comment
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Iod 2013 Jackman Schwenger

  1. 1. Scientific Research with DBaaS on IBM PureApplication System & PureData System for Transactions IPT – 1961A Tom Jackman, DRI Maria N. Schwenger, IBM Vikram Khatri, IBM © 2013 IBM Corporation
  2. 2. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
  3. 3. Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. © Copyright IBM Corporation 2012. All rights reserved. • U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo,, WebSphere, DB2, PureSystems, PureData and PureApplication System are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at Other company, product, or service names may be trademarks or service marks of others.
  4. 4. Assumptions What we expect you to know • You have a good understanding of cloud computing concepts • You have a reasonable working level knowledge of Relational database designs, principles, architecture o Some knowledge of DB2 database and its features (i.e. DB2 HADR, DB2 pureScale, etc.) • You are familiar with the IBM PureSystems family o You are aware of the value of pattern based deployments in the IBM PureSystems • Application architecture knowledge preferred, but not essential • Knowledge of DBaaS principles is highly appreciated!
  5. 5. Agenda What this presentation is all about? • The Nature of Scientific Data o One client’s perspective o Scientific Data (SD) vs Business Data (BD) o High reliability and availability for SD management • DataBase-as-a-Service (DBaaS) o Why DBaaS and why now? o Scientific research and DBaaS o DBaaS in PureSystems
  6. 6. About Desert Research Institute (DRI) Applied research addressing environmental issues globally Non-profit research arm of the Nevada System of Higher Education  More than 550 scientists, engineers and technicians  Campuses in Reno and Las Vegas  60 specialized labs & research facilities (e.g., Virtual Reality lab) Non-tenured, entrepreneurial faculty  300 research projects happening on all continents  $459 million in sponsored research projects since 2000
  7. 7. The Story Emergence of innovation-based economy  Disruption by knowledge-based technology  Non-traditional science institute (DRI) adapting  Academia-Government-Industry partnerships  Catalyzing change with IBM Pure Systems  New science, new engineering, new model  7 Cooperating on shared values: innovation clustering empowering, responsive, fiscally prudent Government Society Academia diffusive, relevant, sustainable Industry differentiated, competitive, profitable
  8. 8. Applied Innovation Center for Advanced Analytics Supporting Nevada’s Economic Development with Innovation Services 8 ● High Performance Computing ● Data Science & Engineering ● Cyber-physical Systems ● Advanced Visualization DATA acquiring, computing, processing, archiving, correlating, visualizing, exploring, analyzing, mining, …
  9. 9. Why is Scientific Data Important to You? • • • • • SD has the characteristics of Big Data SD is your facilities data Your BD will become more like SD To remain competitive, you need research data SD is relevant to your region/planet/solar system/galaxy/universe ByBob Violino, New IDC Research shows Impact of Big Data on High Performance Computing Systems: October 28, 2013 Gary M. Johnson, Convergence: HPC, Big Data & Enterprise Computing, October 28, 2013 |
  10. 10. The Evolution of Scientific Investigation Ancient Greece Observation Renaissance – Enlightenment Observation Experimentation Industrial Revolution – Atomic Age Observation Experimentation Theory Electronics Age Observation Experimentation Theory Computation Data and Communications Observation Experimentation Age Theory Computation Telemetry
  11. 11. SD Management Structured, semi-structured or unstructured Heterogeneous (sources, units, types, dimensions) Reliance on arrays and other complex data structures Large data objects; sensitive to I/O & network performance Distributed data repositories Repositories are open, or not Datasets are cleansed, and not Many protocols, too few (persistent) standards  Increasing need for rigorous data provenance 
  12. 12. SD is Heterogeneous Structures  raster  vector  point  relational  human-derived  documents  lab notes  social Atomic Types * #  array  image  table  tuple  string  reference Popular Formats  HDF5  netCDF  SEG-Y  FITS  Shapefile  XML  3DXML  JSON * Structures can be composed of type float, double, integer, fixed-point, categorical, binary, string # Data may be noisy and have associated uncertainties
  13. 13. Sources of SD  NVM In Situ sensing RAM Rx ROM sensor sensor sensor sensor sensor Sensor μP Tx o Sensor arrays o RFID  o Smart meters o Surveillance Remote sensing o Active o Passive  o Aircraft o Orbital craft (satellite) Computed/Simulated o Forecasts o Earth models  o Hydro models o Brain simulations Machine-derived o Seismograms o Tomograms  o Gene sequencers o Accelerators Human-derived (text, media) ~ actuator actuator I/O DAC ADC Actuator
  14. 14. Patterns of SD Database Design  Design 0: File based approaches   Design 1: RDBMS   Data is relational or can be made relational Design 2: Metadata in RDBMS   Ad hoc management system lacking high availability Only metadata abstraction is kept in relational database Design 3: Metadata in RDBMS with file pointers    Metadata is kept in relational database File pointers to non-relational data also included in RDBMS Design 4: ETL subsets into a working RDBMS   Spatially register, temporally synchronize, and coherently fuse data extractions for use in a “working” database Design 5: NoSQL DBMS’s
  15. 15. Accessing Applications for SD SD access patterns: •Large and bursty •Coupled to data analysis applications o o o o Data integration Feature extraction, segmentation Interpolation, regression, kriging Correlation − ~O(N2) complexity o Pattern discovery − naively, ~O(N4) complexity o Classification, Data APP Access to software applications and hardware processors needs to be part of the design Data APP network Where are each of these located? Full Service Cloud minimal data movement
  16. 16. Jim Gray’s Rules for Database-centric Computing 1. Scientific computing is increasingly data intensive 2. The applications need a scale-out architecture 3. Bring computations to data, rather than the other way 4. Design the database environment around 20 queries 5. Be agile, be modular, design for change
  17. 17. Examples of SD Databases  Sloan Digital Sky Survey (SDSS) o o 1) 5 band photometric, 2) redshift surveys o 5 Tpx images, 120 TB processed, 35 TB catalog o  Public data resource with JHU as lead institution Rich application portfolio 1000 Genomes Project o Part of the Bionimbus scientific cloud (Note ~0.5 TB/genome, ~1 TB/patient) o Inst. for Genomics & Systems Biology at UChicago o Human diversity project using Next Gen Sequencing (NGS) Both SDSS and 1000 Genomes are member projects in the Open Science Data Cloud (OSDC).
  18. 18. Cloud-based, High-Availability, Distributed SD Scient ific The Contextual Enterprise V Structured, Repeatable, Linear Data Warehouse Data •Transaction •Client app •OLTP Hadoop & Streams Content Accumulation and Integration Data •Sensor •RFID •Text Adapted from IBM GTO 2013 Unstructured, Exploratory, Dynamic
  19. 19. In Summary        SD is similar to Big Data – heterogeneous, multi-contextual There is no uniform infrastructure in science Solutions must be flexible and generally interoperable SD needs BD reliability and accessibility SD access is not generally transactional More typically involves large data extractions for analysis There are alternative approaches to reliable SD management RDBMS can be a practical approach to reliable SD access when coupled with application delivery As businesses embrace Big Data, they face similar challenges What is DBaaS for science? Why DBaaS for science? How can DBaaS for science be implemented?
  20. 20. Why DBaaS for scientific research? Optimization & integration for delivering higher values Today, the scientific research starts to rethink its participation and possible new collaboration in the different phases of data lifecycle: Data Collection Data Integration Data Analytics Data Presentation • Scientific research is mainly based on HPC practices o Often deals with unstructured data & file based processing o Traditionally has not embraced high-availability, business solutions o Capital cost and funding are significant issues • Scientific research just starts to adopt RDBMS processing (where feasible) o Process less and only relevant data, producing results faster o Improved consumability - forced to integrate with other (i.e. commercial, portal) applications to deliver the value
  21. 21. File vs. data driven processing Files loaded into PureData VM N VM 3 VM 2 VM 1 GB Size TB Size DB2 File based processing VM 1 VM 1 VM 1 DB2 DB2 DB2 VM 1 TXT 1 VM 1 DB2 DB2 DB2 VM 1 VM 1 VM 1 DB2 DB2 DB2 MB Size Single call to the database (parallelism) Only relevant data set is retuned to the user Parallel or sequential (!!!) file reads
  22. 22. What is Database as a Service (DBaaS)? On PureSystems family (private cloud)  Delivery of Database functionally as a Service    Defines the architectural and operational approaches of a new serviceoriented delivery Often defined as “Database in a Cloud” Characteristics of DBaaS architecture:       Self-service interaction models to reduce complexity of database service delivery - on-demand usage, rapid self-provisioning and management of database instances Multi-tenancy capabilities Elasticity of workloads Multiple levels of high availability Automated resource management and monitoring Metering of database usage (to allow a charge-back functionality)
  23. 23. Why DBaaS? Why now? The 4 Vs: Volume, Variety, Velocity, Veracity • Database sprawl and infrastructure growth is overwhelming o With the growth of data, database infrastructure management has become hugely expensive, complicated and introduced many risks • Self service technology is needed o Today we need “IT on demand” for fast business response while keep up with compliance, less risk, and proper security • Cost savings from virtualization & smart IaaS are “a must” o Database needs/volumes grow while IT budgets are shrinking • Data driven business decisions are the only way to go o The business wants the data delivered faster, simpler and more reliable • Cost-effectively scaling the data layer o Companies are looking to replace the traditional expensive database/infrastructure model for scaling an enterprise level of SLAs
  24. 24. New Technical Concepts in DBaaS • DB Instance: A live database instance • DB Image: Similar to a HV/VM image, but for databases o Database backup includes the meta data to reconstitute a deployment • DB Clone: The act of creating a DB instance from a DB image • DB Pattern: A saved set of provisioning parameters to encourage standardization on the application group side • Workload Standard: A package that allows a level of customization for a DB under the virtual application or DB2 Service for Cloud o Allows configuration of the OS, DB2 instance, DB2 database o Linked with a workload such as OLTP, Datamart, etc. • DBaaS: Defines the architectural and operational approaches of a new service-oriented delivery of database functionally (as a service)
  25. 25. New operational approaches in DBaaS • Single click provisioning of databases from patterns • Linked with a workload such as OLTP, Data mart, etc. • Database can be provisioned via cloning (from backup) • The database might be a part of application pattern • A database might be provisioned from another system - Integration between PureApplciation and PureData system for transactions o Use a Workload Standard to enforce your best practices • Logs and monitoring are available straight in the console o Use context links to navigate for troubleshooting, management and monitoring • New considerations on upgrades – system and workload upgrades • Use of command line – only when feasible
  26. 26. Where is the database? A Maximo deployment from pattern
  27. 27. Workloads standards and database patterns Single click database deployment
  28. 28. DB2 HADR pattern in Virtual System on PureApplciation System Match editions Match versions
  29. 29. Deploy PureData database as part of application pattern from PureApplication New option added when PureData is registered
  30. 30. Manage Logging (Database Service Console) Database Service Console OS logs DB2 logs Agent logs Bring cursor on file – arrow link will pop up – click to download log file
  31. 31. Pre-integrated DB2 Monitoring See detailed DB2 metrics from the Workload Console Launches a new browser Tab/window in context to Database Overview page.
  32. 32. Further Drill Down: Detailed DB2 metrics Can drill-down & focus on “popular“ problems • • • • • • • Inflight Database Memory Dashboard Inflight Rogue Query Dashboard Inflight I/O Dashboard Inflight Locking Dashboard Inflight Logging Dashboard Inflight Utilities Dashboard Inflight Throughput Dashboard
  33. 33. IBM PureSystems & DBaaS The ideal Platform as a Service (PaaS) for databases • DBaaS provides a deep built-in integration of application and database server capabilities in a simple, but powerful combination intended to simplify the way applications and databases are designed, deployed, run and managed. • DBaaS offers a single-click pattern based development and deployment via IBM provided database patterns and workloads that speeds up the deployment of new applications and databases and enforces creating of reusable assets for consistent enterprise interactions. • The capabilities to create custom patterns and workloads provide optimized way of establishing and enforce enterprise standards. • The pattern based management simplifies the database development and deployment while the inbuilt best practices allow to obtain optimized deployments right out of the box. • DBaaS provides a simplified way of database development even for complex task like creating of high availability and disaster recovery (HADR) or DB2 cluster setups.
  34. 34. What is new in DBaaS on PureApplication System DBaaS - Sept 2013 • Added support for DB2 v10.5 (AKA Kepler) and DB2 BLU (for data mart) o IBM DB2 for BLU Acceleration Pattern was added • Added HADR for OLTP (HA in same rack with auto failover) (not related to HADR in vSys) • Increased max VM size to 16 cores and 2TB disk • Allow manual scaling up for existing DBaaS VM (CPU/Memory/Disk) • DB2 versions available on IPAS: o a choice of DB2 (DB2 10.5 FP1) o a choice of DB2 (DB2 10.1 FP2) o a choice of DB2 (DB2 9.7 FP8) NOTE: DBaaS is available separately on Fix Central (9/26/13) from where it can be downloaded and imported as needed
  35. 35. Two key takeaways How DBaaS applies to your business? 1) Explore the value the SD might provide to your business • The scientific research is motivated to collaborate more than ever • SD is Big Data • 2) Explore the values of DBaaS for your organization • The PureSystems family provides an easy way for collaboration Rapid transformation in data delivery is required by the businesses today and is touching every side of our society o Even more conservative environments like scientific research have to adapt to the new requirements to stay relevant • IBM PureSystems provide an ideal platform in enabling the efficiency of database provisioning and management • Use the patterns of expertise o • They deliver real value in time and resources savings for applications and databases alike. Embrace the change DBaaS brings to you and your organization o Simplicity means automation, less risk, more reliable and cost effective data delivery for your business
  36. 36. Thank You Your feedback is important! • Access the Conference Agenda Builder to complete your session surveys o Any web or mobile browser at o Any Agenda Builder kiosk onsite Questions? Thomas Jackman DRI/AIC Maria Nichole Schwenger IBM Technical Lead for Analysis & Computation PureSystems Technical Specialist
  37. 37. Learn More about IBM Cloud Visit the EXPO Cloud Booth SoftLayer Booth Connected Car Cloud Sessions Business Leadership Forums Connected Car is Mobile, Social, Cloud, Big Data – Tues, 10-11 a.m. in S. Pacific I Social, Mobile, Analytics, Cloud, and Beyond for the Automotive Industry -Tues, 4:30-5:45 p.m. in S. Pacific B Online Technology Forums Forty unique Cloud Sessions across 72 time slots – check your event guide for details!
  38. 38. Backup Slides
  39. 39. DB2 deployment options in PureApplication system  Virtual systems using DB2 hypervisor-edition images   Ability to create custom patterns  Traditional configuration and administration model   Provides patterns for common topologies Automated provisioning of images into patterns DBaaS (Database-as-a-Service) using Database Patterns (virtual applications)   Simplified interaction model  Highly standardized and automated  Integrated life cycle management   Patterns are solutions derived from standardized industry best practices Shared between users/teams Connections to existing remote or existing local databases - option for both Virtual Applciations and Virtual systems