IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services

Cloud Data Lake with IBM Cloud Data Services
Think 2020 / Cloud Data Lake / May 05, 2020 / © 2020 IBM Corporation
Riz Amanuddin – Offering Management
Torsten Steinbach – Technical Leader
Session ID - 3736

2
Cloud Data Lake
Overview

Cloud Data Lake Evolutionary Context
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats &
easy scaling on commodity HW
Cloud-Native: Serverless Analytics-aaS
• Elasticity
• Pay-per-query
• Data in object store
• Disaggregated architecture
• No more infrastructure headaches
The 90-ies 2000 Today

Organizations need
managed platform for
Ingest, Store, Extract,
Transform, Load (ETL)
and perform Data
Analytics to gain
insights on demand
Organizations need
very short time to
insight to outpace
competition
Strong need to
dramatically reduced
costs via cloud
economics
Organizations need to
manage data growth
of workloads and
applications that
generate massive
amounts data
4
Cloud Data Lake Drivers

Data lakes are not a completely new thing:
• Common solution found in enterprises
implemented with a traditional form factor of
the past: Dedicated Hadoop Clusters. Heavy
modernization need and opportunity.
• Now Data Lakes are evolving to Cloud Native
5
Repository to
explore, prepare,
optimize and analyze
broad range of
structured and
unstructured data
types
Highly versatile &
scalable mechanism
to onboard large
volumes of data to
analytics
Typical use cases:
• IoT analytics
• Customer/User intelligence
• AIOps
• Next generation of DWHs
Cloud Data Lake
Attributes

IBM Cloud Data Services
empowering Cloud Data Lakes
Enterprise
GradeOpen Secure
Object StorageElasticsearchMongoDB RedisPostgreSQL
RabbitMQ
etcdCloudant
Relational Non-Relational Persistent Storage
Data
Stores
IBMCloudDataServices
Data
Movement and
Action Event Streams SQL Query
DataServicesSolutions
CloudDataLake,etc.

7
Cloud Data Lake
Benefits

8
Organizations need the ability to:
o Visualize data and build data
driven applications
o Increased Data flexibility and
accessibility
o Provide Data governance to
retain data authenticity
o Gain speed with data insights
o Collect , explore and analyze
data
Cloud Data Lake
for the Enterprise
Data Architects Business and Data Analysts
Data scientists and application developers

Industry-leading
optimizations for SQL-
native location &
timeseries data and
indexing of object storage
data
High velocity due to self-
service data management,
preparation & analytics
with extreme low barrier of
entry thanks to serverless
model
Most secure data lake
option in cloud due unique
BYO and KYO key
services in IBM Cloud.
Enables Cloud Economics,
Resiliency and Scale for
Big Data
9
Why Data Lake on IBM Cloud

10
Cloud Data Lake
Architecture

11
Data Lake Flow
Ingest Data
Persist and
Store data
Protect,
Secure and
Manage data
Prepare data
Analyze data
and gain
insights
Automate

Telemetry Data
Explore Prep Enrich Optimize Analyze
 Seamless Elasticity
 Seamless Scalability
 Highly Cost Effective
 Long Term Retention
 Any data formats
ETL
IBM Cloud Data Lake – Big Picture
DWH
Databases
 Response Time SLAs
 Warm High-quality Data only
Cloud Data Lake
Analytics

13
IBM Cloud Data Lake
Data Sources
Automate
Store
Prepare – Manage –Analyze
Protect
Ingest
Data Lake Foundation
Optional Services & Products
Batch:Stream:
Hadoop
Cloud
Real
Time
On-
Prem
SQL Query
Cloud Object Storage
Event Streams SQL Query
IAMKey Protect
Cloud Functions
AT-LogDNA
Cloud Databases
Analytics Engine
DB2 Warehouse on
Cloud
Watson Studio
Cloud Pak for
data
Infosphere data
replication
Cognos
Analytics
Knowledge
Catalog

The SQL Sandwich
Object Storage
Object Storage
Data Warehouse
Raw Data
High Quality
Data
Archived Data
SQL ETL
SQL ETL
SQLFederation
Explore, Prepare &
Batch Analytics
Interactive Analytics
with SLAs
Compliance
Reporting
SQL
SQL
SQL
Blog Article:
SQL Sandwich

15
Cloud Data Lake
Use
Cases/Resources

Replicate on-prem
DB to cloud data
lake for analytics
o Capture database
change feed into
Kafka in Cloud
o Land Kafka data to
object storage
o Prepare replicated
change feed for
analytics
o Query for insights
o Present & visualize
insights
Collect, historize &
analyze IoT data
o Land IoT message
data through Even
Streams (Kafka)
o Prepare, cleanse,
extract and enrich
IoT data
insights
Move existing
Hadoop Workload to
Cloud
o Replace HDFS with
cloud-native storage:
object storage
o Run Hadoop
processing in fully
managed Hadoop
service: analytic
engine
o Interactive analytics
through Watson
Studio
AIOps, gain operational
& business insights from
solution logs
o Collect full solution
telemetry (logs)
o Prepare, cleanse,
extract and enrich
data from logs
insights
16
Use Cases
SQL in Place :
Reduce cost and
decouple workload
from DWHs
o Use data lake in as
landing and
preparation storage
before data gets
ingested to DWH
o Archive data from
DWH to data lake
from affordable
SQL-enabled
archive
o Automate ETL and
enable SQL-
federation across
data lake and DWH

17
Need ability to effectively analyze data from from remote locations to
gain insights with cost effective, secure, on demand analytics and long-
term data retention
Case Study
 Nightly batch export from operational production databases in factory
locations are automatically uploaded to data lake in cloud (central COS
bucket).
 LoB engineers subscribes to data in data lake, which is then ETLed with
SQL query to tenant-specific zones (tenant specific COS buckets).
 Future updates of data lake data in central COS bucket is automatically
ETLed right away to tenant specific COS bucket via cloud functions
events.
 LoB engineers explore, experiment and do data preparation using SQL
query on tenant specific buckets.
 LoB engineer uses Watson Studio to run data science, visualize and
present insights to executives.
Automotive Company
Solution
Business Problem

18Think 2020 / Cloud Data Lake / May 05, 2020 / © 2020 IBM Corporation
Resources and
Links
Architecture and Solution Guide Links:
o Reference architecture
o Data Lakes in Cloud
o Customer Presentation, Data & AI Forum
o SQL Query Short Intro Video
o SQL Query Deep Dive Video
o Data Layout Best Practices
o Data Skipping
o SQL Query Getting Started
o SQL Reference in IBM Cloud
o SQL Query Starter Notebook

The leading Overall Peer Rating:
4.7 out of 5 stars
The highest rating for Willingness to Recommend:
95%
The highest rating for Security and Compliance:
4.7 out of 5 stars
The highest rating for Integration & Deployment: 4.6
out of 5 stars
IBM Cloud: Highest
customer ratings on
Gartner Peer Insights
19Think 2020 / Hyper Protect Your Sensitive Data and Workloads in the IBM Cloud / April, 2020 / © 2020 IBM Corporation
IBM Cloud received the highest overall customer
rating among leading cloud providers for the last
12 months, as of 28 Feb 2020, based on 84
reviews. Customers rated IBM Cloud above
Amazon Web Services (AWS), Google Cloud
and Microsoft Azure.
Gartner Peer Insights reviews constitute the subjective opinions of individual end users based on their own experiences,
and do not represent the views of Gartner or its affiliates.
https://www.gartner.com/reviews/market/public-cloud-iaas/vendor/ibm/product/ibm-cloud

IBM Cloud: The most open and secure public cloud for business
20Think 2020 / Hyper Protect Your Sensitive Data and Workloads in the IBM Cloud / April, 2020 / © 2020 IBM Corporation
Open
Innovation
⎻ API Services that are
cloud delivered
⎻ Kubernetes on IBM Cloud:
1k+ clients, 19k+ clusters
in production
⎻ Major contributor to cloud
native open source work:
Istio, Knative, Razee, etc.
⎻ Highest compliance for
data encryption
⎻ Configurable so that even
IBM cannot see your data
⎻ Edge-to-cloud threat
management with IBM
security integration
⎻ #1 VMware public cloud
2,000 clients
⎻ Cloud migration for Power
AIX, IBM i, Z, SAP and
mission critical
⎻ Broadest portfolio of
compute instances,
including Power & X86
Security
Leadership
Enterprise
Grade
World's First Financial Services-Ready Public Cloud With Bank of America
Highest level of encryption
FIPS 140-2 Level 4
Isolation for cloud native
ROKS and containers on bare metal
No data egress charges with Cloud
Databases
No vendor lock in and lower TCO
No-cost bandwidth
between regions
Significantly lower TCO
Enhanced availability SLAs
HA: 99.99%, Non-HA: 99.9%
Higher SLA payouts versus market
25% of monthly at 60 minutes
Audit transparency to bare metal
Traceable serial number compliance
Full control to bare-metal level
Full admin control of compute
Customer Choice
Award for Cloud IaaSGood Design Award for VPC
Good Design Award
for API Connect
2019 IBM Winners

Thank you!
21
Riz Amanuddin – Offering Management
ramanudd@us.ibm.com
Torsten Steinbach – Technical Leader
torsten@de.ibm.com

22Think 2020 / DOC ID / Month XX, 2020 / © 2020 IBM Corporation
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should
not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not be
incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products remains
at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be
given that an individual user will achieve results similar to those stated here.
Please note

Notices and disclaimers
© 2020 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM.
This document is current as of the initial date of publication and may be
changed by IBM at any time. Not all offerings are available in every
country in which IBM operates.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided. The performance data and client
examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and
operating conditions.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a
controlled, isolated environments. Customer examples are presented as
illustrations of how those customers have used IBM products and the
results they may have achieved. Actual performance, cost, savings or
other results in other operating environments may vary.
References in this document to IBM products, programs, or services does
not imply that IBM intends to make such products, programs or services
available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared
by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for informational
purposes only, and are neither intended to, nor shall constitute legal or
other guidance or advice to any individual participant or their specific
situation.

Notices and disclaimers
continued
It is the customer’s responsibility to insure its own compliance with legal
requirements and to obtain advice of competent legal counsel as to
the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions
the customer may need to take to comply with such laws. IBM does not
provide legal advice or represent or warrant that its services or products
will ensure that the customer follows any law.
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance, compatibility
or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed
to the suppliers of those products. IBM does not warrant the quality of any
third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, and ibm.com are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other
product and service names might be trademarks of IBM or other
companies. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at:
www.ibm.com/legal/copytrade.shtml.

IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services

Similar to IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services (20)

More from Torsten Steinbach

More from Torsten Steinbach (9)

Recently uploaded

Recently uploaded (20)

IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services

Editor's Notes