Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Power Your Delta Lake with Streaming Transactional Changes

Download to read offline

Organizations are adopting data digitization and data-driven decision making is at the heart of this transformation. Cloud Data Lakes and Datawarehouses provide great flexibility to proto-type and roll out applications continuously at much lower costs.

Transactional databases are optimized for processing huge volumes of transactions in real-time, whereas the cloud data lake needs to be optimized for analyzing huge volumes of data quickly. This brings about a challenge in creating a streamlined data flow process from capturing realtime transactions into a cloud datawarehouse to drive realtime insights in a scalable and cost effective manner.

In this session, we’ll show how organizations can easily overcome that challenge by adopting a robust platform with StreamSets and Delta Lake. StreamSets provides a no-code framework to automate ingestion of transactional data and data processing on Spark, while Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

  • Be the first to like this

Power Your Delta Lake with Streaming Transactional Changes

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Power your Delta Lake with streaming transactional changes Rupal Shah Director, Cloud Services StreamSets
  3. 3. NEW DATA CONTINUOUSLY GENERATED User Behavior Data Click Streams Sensor data (IoT) Video/Speech Usage/Billing data Machine Telemetry Commerce Data … DATA LAKE Reporting Dashboards Alerting
  4. 4. Data Integrity Reporting Dashboards Alerting Performance NEW DATA DATA LAKE InfrastructureCost
  5. 5. Delta Lake 1. Data Reliability ACID Compliant Transactions Schema Enforcement & Evolution Reporting Machine Learning Alerting Dashboards 2. Query Performance Fast at Scale (10-100x Faster) Cheaper to Operate Indexing & Caching 3. Simplified Architecture Unify batch & streaming Early data availability for analytics LOTS OF NEW DATA User Behavior Data Click Streams Sensor data (IoT) Video/Speech Usage/Billing data Machine Telemetry Commerce Data …
  6. 6. • Single tool for all data, all use cases • Built-in drift handling • Supports all data platforms to avoid lock-in Design Deploy Operate Monitoring Automation Smart Data Pipelines • Continuous delivery across lifecycle • Performance and security SLAs • End-to-end views StreamSets DataOps platform powers continuous data by operationalizing the full data flow lifecycle
  7. 7. MLETL Ingest ELT Data Stores Events Databases Message Queues Sensors Logs • Bulk • Micro-Batch • CDC Raw Curated Consumer Ready Prep AnalyzeIntegrate • Streaming • Edge Data Shipping • Micro-batch Data Consumers Collaborative Development, Continuous Design Request Deployment Flexibility: Choose, Change Machines APIs Internal&External Sources Distribute End to End Data Integration Data Platforms On-premise, private cloud or public cloud
  8. 8. DEMO
  9. 9. Change Data Capture
  10. 10. Slowly Changing Dimension
  11. 11. Take Aways • No code • Powers Delta Lake with Fast Data • Handles complex data integration logic (CDC, SCD, …) with ease • Become a DataOps champion!
  12. 12. Next Steps… • Visit booth #88 for further information • Get started with StreamSets for powering your Delta Lakes: https://streamsets.com/download/ • Get slack’ing with StreamSets ninjas https://streamsetters-slack.herokuapp.com/
  13. 13. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

Organizations are adopting data digitization and data-driven decision making is at the heart of this transformation. Cloud Data Lakes and Datawarehouses provide great flexibility to proto-type and roll out applications continuously at much lower costs. Transactional databases are optimized for processing huge volumes of transactions in real-time, whereas the cloud data lake needs to be optimized for analyzing huge volumes of data quickly. This brings about a challenge in creating a streamlined data flow process from capturing realtime transactions into a cloud datawarehouse to drive realtime insights in a scalable and cost effective manner. In this session, we’ll show how organizations can easily overcome that challenge by adopting a robust platform with StreamSets and Delta Lake. StreamSets provides a no-code framework to automate ingestion of transactional data and data processing on Spark, while Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

Views

Total views

438

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

25

Shares

0

Comments

0

Likes

0

×