Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 24

Building Data Intensive Analytic Application on Top of Delta Lakes

2

Share

Download to read offline

Why to build your own analytics application on top on Delta lake : – Every enterprise is building a data lake. However, these data lakes are plagued by low user adoption, poor data quality, and result in lower ROI. – BI tools may not be enough for your use case, especially, when you want to build a data driven analytical web application such as paysa. – Delta’s ACID guarantees allows you to build a real-time reporting app that displays consistent and reliable data

In this talk we will learn :

how to build your own analytics app on top of delta lake.
how Delta Lake helps you build pristine data lake with several ways to expose data to end-users
how analytics web application can be backed by custom Query layer that executes Spark SQL in remote Databricks cluster.
We’ll explore various options to build an analytics application using various backend technologies.
Various Architecture pattern/components/frameworks can be used to build custom analytics platform in no time.
How to leverage machine learning to build advanced analytics applications Demo: Analytics application built on Play Framework(for back-end), React(for front-end), Structured Streaming for ingesting data from Delta table. Live query analytics on real time data ML predictions based on analytics data

More Related Content

You Might Also Like

Building Data Intensive Analytic Application on Top of Delta Lakes

  1. 1. Ganesh Chand, Databricks Ravi Gawai, Databricks
  2. 2. Agenda • Delta Lake - What and Why? • Common Delta Lake use cases • Data as a Service (DaaS) • Our Approach • Use Cases • Demo • Q&A 3
  3. 3. What’s a Data Lake? 4 A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. “If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” - James Dixon
  4. 4. Why Data Lake ? 5 LAKES STREAMS WAREHOUSES NOSQL CSV, JSON, TXT… Challenges with Data Warehouse • Big Data problem • Expensive (build, store and process) • Proprietary technology (processing and storage) • Vendor lock-in • Lack of ML capabilities
  5. 5. Data Lake: Aspiration 6 Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics • IoT & Predictive Maintenance • Genomics & DNA Sequencing Use AI and Machine Learning to outperform your competition, retain your customers, boost your productivity with lower TCO using variety of data sources
  6. 6. Data Lake: Reality 7 Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics • IoT & Predictive Maintenance • Genomics & DNA Sequencing The majority of these projects are failing! Unreliable, low quality data slow performance
  7. 7. Why ? 8 Data WarehouseStrengths of Data Warehouse • Full ACID Transaction • Insert, Delete, Update w/ SCD-II • Indexing for faster query response • Schema-On-Write Strengths of Data Lake • Open Source, Open Standards • Powered By Apache Spark • Scale • Unified platform for data & AI ● Unification of Batch & Streaming workloads ● Incrementally improve the quality of your data until it is ready for consumption (Multi-hop pipelines) ● Dramatically reduces legacy Spark/Hive operational burdens ● Scalable Metadata Handling And
  8. 8. What’s a Delta Lake 9 A Data Lake Powered By Delta LAKES STREAMS WAREHOUSES NOSQL CSV, JSON, TXT… Raw Ingestion Bronze Filtered, Cleaned Augmented Silver Business-level Aggregates Gold Delta Lake
  9. 9. Common Delta Lake Use Cases • Interactive Queries • BI reporting and dashboards • Train and Build Machine Learning Models • Create Data Warehouse • Create / Monetize Data Products • Sell or Share curated data to partners, vendors and internal customers • Feed data back to source systems, web applications, Mobile Apps 10
  10. 10. Common Delta Lake Use Cases • Interactive Queries • BI reporting and dashboards • Train and Build Machine Learning Models • Create Data Warehouse 11 • Create / Monetize Data Products • Sell or Share curated data to partners, vendors and internal customers • Feed data back to source systems, web applications, Mobile Apps
  11. 11. Serving Data From Delta Lake 12 Web app Mobile app ERP Storage Data product Data enrichment Data Integration Data export
  12. 12. Serving Data From Delta Lake 13
  13. 13. Serving Data From Delta Lake 14 Storage S3 ADLS HDFS Catalog ConsumersCompute Serving API Access Management Data Service Metadata Service
  14. 14. Serving Data From Delta Lake Data-as-a-Service (DaaS ) • Rest APIs • Ready-Only • Data Format • Delivery mechanism 15 Challenges • Security • Latency • Throughput • SLA • Data licensing, ownership and monetization model • Managing evolving requirements • Minimizing Information Silos
  15. 15. Use Cases for Demo App • MVP features for the demo app • End-to-end etl pipeline writing into delta lake • DaaS REST endpoint to export data • Front-end app to consume data and build a dashboard 16 • UI to interact with delta lake • Export classified and aggregated data out of delta lake to be consumed by a client app
  16. 16. Our implementation 17 Storage S3 Compute Consumers databricks Jobs API Serving R E S T Routes: /listSchemas /listTables /exportData
  17. 17. DaaS APIs 18 GET delta-meta-service/getDbDetails GET delta-meta-service/previewTable?table=db.tablename POST delta-sql-service/exportSqlData -d { "inputSql": "select * from db.table where condition", "outputPath": "/path/", "format": "json" } GET delta-sql-service/getRunStatus?run_id=id
  18. 18. Demo 19
  19. 19. Delta ETL pipeline 20
  20. 20. Front-End 21
  21. 21. Front-End 22
  22. 22. Thank You 23 ganesh@databricks.com ravi@databricks.com

×