Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Databricks Delta Lake and Its Benefits

1,143 views

Published on

Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.

Published in: Data & Analytics
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Databricks Delta Lake and Its Benefits

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Nagaraj Sengodan, HCL Technologies Nitin Raj Soundararajan, Cognizant Worldwide Limited Databricks Delta Lake And Its Benefits #UnifiedDataAnalytics #SparkAISummit
  3. 3. 3 1 2 3 4 5 What is Delta lake offering from Databricks and overview The Brief Good news is Delta Lake is open source now! Open Source Why I care about Delta lake? Benefits We do the necessary steps to deliver the result. Modern Warehouse Are we ready for building unified data platform? Conclusion Agenda
  4. 4. 4 Who are we ? NITIN RAJ SOUNDARARAJAN Senior Consultant – Data, AI and Analytics Cognizant Worldwide Limited NAGARAJ SENGODAN Senior Manager – Data and Analytics HCL Technologies
  5. 5. Common Challenges with Data Lakes 5 5 Data Lake getting Polluted Unsafe Writes Orphan Data No Schema Evolution No Schema Hot path for Streaming ACID Transaction Metadata Handling Unified Batch and Stream Schema Enforcement Time Travel Audit History Full DML Support
  6. 6. Typical Databricks Architecture 6 SecurityIntegration DATABRICKS COLLABORATIVE WORKSPACE Apis Jobs Models Notebooks Dashboards DATA ENGINEERS DATA SCIENTISTS DATABRICKS RUNTIME for Big Data for Machine Learning Batch & Streaming Data Lakes & Data Warehouses DATABRICKS CLOUD SERVICE
  7. 7. Databricks - Delta Lake Architecture 7 SecurityIntegration DATABRICKS COLLABORATIVE WORKSPACE Apis Jobs Models Notebooks Dashboards DATA ENGINEERS DATA SCIENTISTS DATABRICKS RUNTIME for Big Data for Machine Learning Batch & Streaming Data Lakes & Data Warehouses DATABRICKS CLOUD SERVICE DATABRICKS DELTA
  8. 8. The Brief 8 Delta.io – OPEN SOURCE. APR. 2019 Announcing Delta Lake Open Source Project | Ali Ghodsi Delta 0.2 – Cloud storage JUN. 2019 Support for cloud storage (Amazon S3, Azure Blob Storage) and Improved Concurrency (Append-only writes ensuring serializability) Delta 0.3 – Scala/Java API AUG. 2019 Scala Java APIs and DML Commands, Query Commit History and vacuuming old files Delta 0.4 – Python APIs and Convert to Delta OCT. 2019 Python APIs for DML and utility operations, Convert-to-Delta, SQL for Utility Operations
  9. 9. Demo 9
  10. 10. Benefits 10 ACID transactions on Spark Scalable metadata handling Unified Batch and Streaming Source Schema enforcement Time travel Audit History Full DML Support
  11. 11. ACID Transaction on Spark 11 The Brief Every write is a transaction Serial order for writes recorded in a transaction log 01 Multiple Writes Multiple writes trying to modify the same files don’t happen that often 02 Optimistic concurrency Continuously keep writing to a directory or table and consumers to keep reading from the same directory or table 03 Serializable isolation level
  12. 12. Scalable Metadata Handling 12 Metadata information of a table or directory in the transaction log instead of the metastore 01 Metadata in Transaction Log Delta Lake can list files in large directories in constant time 02 Efficient Data Read
  13. 13. Unified Batch and Streaming Sink 13 Efficient streaming sink with Apache Spark’s structured streaming01 Streaming Link with ACID transactions and scalable metadata handling, the efficient streaming sink now enables lot of near real-time analytics use cases without having to maintain a complicated streaming and batch pipeline 02 Near Real Time Analytics
  14. 14. Schema enforcement 14 Automatically validates the DataFrame’s schema with schema of the table01 Automatic Schema Validation Columns that are present in the table but not in the DataFrame are set to null Exception is thrown when when extra column present in the DataFrame but not in Table 02 Column Validation 03 Serializable isolation level Delta Lake has DDL to explicitly add new columns Ability to update the schema automatically
  15. 15. Time Travel and Data Versioning 15 Allows users to read a previous snapshot of the table or directory 01 Snapshots Newer version of the files are created when the files are modified during writes and older versions are preserved 02 Versioning Provide a timestamp or a version number to Apache Spark’s read APIs to read the older version of the table or directory Delta Lake constructs the full snapshot as of that timestamp or version based on the information in the transaction log User can reproduce experiments and reports and also can revert a table to its older versions 03 Timestamp and Transaction Log
  16. 16. Record Update and Deletion (Coming Soon) 16 Will support Merge, Update and Delete Easily upsert and delete records in data lakes simplify their change data capture and GDPR use cases 01 Merge Update Delete More efficient than reading and overwriting entire partitions or tables02 File-level granularity
  17. 17. Data Expectations (Coming Soon) 17 Will support an API to set expectations on tables or directories01 API to set data expectations Engineers will be able to specify a boolean condition and tune the severity to handle data expectations 02 Severity to handle expectations
  18. 18. Modern Data warehouse 18 Ingestion Tables Refined Tables Feature/Agg Data Store Existing Data Lake Azure Data Lake Storage Analytics Machine Learning
  19. 19. Conclusion 19 Delta Lake Data Lake Unification High Batch and Stream data set can be processed in same pipeline Medium Stream process require hot pipeline Reliable High As it enforce schema and ACID operations helps data lake more reliable Less Accept all data and late binding leads lot of orphan data Ease of Use Medium Delta require DBA operations like Vacuum and Optimize High No write on schema and accept any data Performance High Z-Order skipping files for efficient read Medium Sequence read
  20. 20. Like to know more? 20 https://github.com/KRSNagaraj/SparkSummit2019 https://www.linkedin.com/in/NagarajSengodan https://www.linkedin.com/in/NitinRajS/
  21. 21. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×