Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IBM THINK 2018 - IBM Cloud SQL Query Introduction

455 views

Published on

Introduction in serverless SQL on data in cloud object storage

Published in: Data & Analytics
  • Be the first to comment

IBM THINK 2018 - IBM Cloud SQL Query Introduction

  1. 1. IBM Cloud Query Introduction and Roadmap Session 1480 Torsten Steinbach, IBM Cloud Architect @torsstei Chris Glew, IBM Cloud Offering Manager
  2. 2. Innovation in Big Data Analytics Enterprise Data Warehouses Tightly integrated and optimized systems Hadoop Introduced open data formats & easy scaling on commodity HW Serverless Analytics-aaS • Seamless elasticity • Pay-per-query consumption • Analyze data as it sits in an object store • Disaggregated architecture • No more infrastructure head aches The 90-ies 2000 Today
  3. 3. IBM Cloud Query Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation • Server-less ANSI SQL queries of open data formats on cloud object storage • Pay per query • Free of charge beta now available to everyone via IBM Cloud catalog
  4. 4. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation • Provision & run your first query in < 1 minute • Put your data in cloud object storage and immediately query it • No database load • Dynamic schema inference • Turns cold storage into live workspace for big data analytics IBM Cloud Query
  5. 5. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) 2. Read data 4. Read results Application 3. Write results IBM Cloud Object Storage Result Set Data Set Data Set Data Set 1. Submit SQL SQL Archive / Export IBM Cloud Streaming IBM Streams Message Hub Land Query Watson IoT IBM Cloud Query Architecture IBM Cloud Databases Db2 on Cloud
  6. 6. SQL REST API SQL Query Usage Create Query SQL Web Console Watson Studio Notebooks SQL Cloud Function Integrate Explore Deploy IBM Cloud Query – Access Patterns Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
  7. 7. Introducing Query Web Console Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Reference source data in Cloud Object Storage Location in Cloud Object Storage for the result set Details of the SQL execution
  8. 8. Table Locators Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation cos://<endpoint>/<bucket>/[<prefix>] Endpoint – of your object storage bucket or a short alias E.g. s3.us-south.objectstorage.softlayer.net or us-south Bucket – name in object storage Prefix – one or multiple objects (e.g., table partitions) with same prefix Used in FROM clauses for input data and in target field for result set data Examples: cos://us-south/myBucket/myFolder/mySubFolder/myData.parquet cos://us-geo/otherBucket/myData cos://us-geo/otherBucket/myData/part cos://eu-geo/newBucket/
  9. 9. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation <Table Locator> [STORED AS CSV | PARQUET | JSON] • Specifies the data format of the input data • Table schema is automatically inferred at SQL execution time • Clause is optional, the default is CSV Table Formats
  10. 10. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Submit a SQL query POST https://sql-api.ng.bluemix.net/v2-beta/sql_jobs Runs the SQL in the background and returns a job_id Detailed info for a SQL query (e.g. status, result location) GET https://sql-api.ng.bluemix.net/v2- beta/sql_jobs/{job_id} Returns JSON with query execution details List of recent SQL query executions GET https://sql-api.ng.bluemix.net/v2-beta/sql_jobs Returns JSON array with last 30 SQL submissions and outcomes IBM Cloud Query REST API
  11. 11. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Open source Python client !pip install ibmcloudsql Convenient programmatic access • You only provide: API key, SQL query and location URL for SQL result • Result set written to object storage and returned as pandas data frame • Useful methods for SQL job status & SQL history Use Watson Studio Notebook with Python kernel • Interactive SQL submission and result visualization using PixieDust widgets IBM Cloud Query in Watson Studio
  12. 12. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation IBM Cloud Functions: IBM’s function-aaS for running event-based custom logic in any language SQL Query adds scale-out data processing functions Server-less + Server-less = Server-lesser Example: automated data processing pipelines bx wsk action create mysql --docker ibmfunctions/sqlquery + Bind parameters for SQL statement text and result target location Server-less & scale-out (Spark) SQL Execution Service Server-less & event-driven function execution & orchestration Cloud Function
  13. 13. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) Logs Your Cloud Application/Solution IBM Cloud Object Storage Use for analyzing application logs Query Transform Compress Aggregate Repartition Analyze Anomaly Detection User Segmentation Customer Support Resource Planning • Build & run data pipelines and analytics of your log message data • Flexible log data analytics with full power of SQL • Seamless scalability & elasticity according to your log message volume
  14. 14. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Use to explore and preprocess data for BI IBM Cloud Object Storage Acquire Query Data Warehouses & Databases Db2 on Cloud Process Report ApplicationsApplications Applications IoT Streaming Devices Devices Devices BI Reporting Land Promote Cleanse Filter Merge Aggregate Compress Read Watson Studio Looker Cognos Tableau Explore
  15. 15. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) Application IBM Cloud Object Storage Result Set Data Set Data Set Data Set Data Skipping Geospatial SQL SQL Read Write Table Meta Data Query IBM Cloud Query – Strategic Architecture IBM Cloud Databases Db2 on Cloud Read Write Watson Knowledge Catalog SQL queries meta datadata sets Read Register IBM Cloud Streaming IBM Streams Message Hub Watson IoT
  16. 16. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Roadmap: for Location Analytics on IoT Data IBM Cloud Object Storage Locatio n Data Query Location Analytics Mobile Cars Devices Land Location Filtering Spatial Aggregation GPS SQL/MM Location data is a native SQL data type • Points (e.g. current location) • Lines (e.g. GPS track, road) • Polygons (e.g. zip code area) Native SQL functions to accurately aggregate, filter and join based on location Full geodesic globe support. No projection incorrectness • E.g. well suited for oil & gas data in polar regions Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
  17. 17. Roadmap: Data Skipping Index for huge potential cost and time savings 17Think 2018 / March 21, 2018 / © 2018 IBM Corporation Cloud Object Storage SQL Query Analytics and storage are independent micro services Critical cost factors include bytes shipped and number of REST calls Reduce with a metadata index for objects in a data set Enable Spark SQL to query metadata index to determine which objects are not relevant to a query Example: Detect SLA violation on log messages SELECT acct, count(CASE WHEN status > 499…) AS err_cnt, WHERE acct IN (‘3c04affe-… -bluemix’) FROM cos://…-logs… /february STORED AS parquet GROUP BY acct, dt, hour(timegen), minute(timegen) ... Default CSV: 41x faster with data skipping* Optimized Parquet: 6x faster with data skipping* *Results are data and query dependent.
  18. 18. Data Skipping Architecture Flow Spark Worker dataset obj1 Spark Driver obj2 obj3 Spark WorkerSpark Worker Create Index
  19. 19. Data Skipping Architecture Flow Spark Worker dataset obj1 Spark Driver obj2 obj3 Spark WorkerSpark Worker Create Index Object metadata Indexing meta Create Index
  20. 20. Data Skipping Architecture Flow Spark Worker dataset obj1 Spark Driver obj2 obj3 Object metadata Indexing meta Spark WorkerSpark Worker query
  21. 21. Data Skipping – Create Index 21Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
  22. 22. Index Statistics Before Usage 22Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
  23. 23. Using the Index 23Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
  24. 24. Index Statistics After Usage 24Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
  25. 25. Thank you! 25Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Torsten Steinbach torsten@de.ibm.com Chris Glew cglew@us.ibm.com
  26. 26. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 26
  27. 27. Notices and disclaimers 27Think 2018 / January 12, 2018 / © 2018 IBM Corporation © 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.
  28. 28. Notices and disclaimers continued 28Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml. .

×