SlideShare a Scribd company logo
1 of 31
Download to read offline
OpenMetadata
Community Meeting
April 2024 🚀
3975✨
GitHub Stars
Open Source
Developers
238
Community
Members
5350
Community
2214
+8
+215
+267
+58
Community Stats
👤 Shin-ichi Hashiba
👤 Sonika Shah
👤 Park Jin Won
👤 Trs
👤 Lulienne Schiefelbein
👤 Kenji Nakagaki
👤 Kent Chenery
👤 Mariusz
11 Community Contributions 💖
👤 Italo Batista
👤 Antoine Balliet
👤 Thiago Pires
👤 Kent Chenery - MSSQL Improvements
👤 Mariusz Górski & Dominik Choma -
OpenLineage connector (& more!)
👤 Shiyang Xiao - SAS connector
👤 Sander Van Laar - NL translation
👤 Megumi Aikawa - PGSpider Connector
👤 Anatoliy Shulika - Greenplum Connector
👤 Gautham Kishore - Lightdash Connector
👤 Nguyen Huu Loc - Looker Improvements
OpenMetadata SWAG 💃
● 527 in the last 4 weeks
Community Metrics
● MetaPilot
● Metadata Actions
● Qlik Cloud
● Alerts Improvements
● Ingestion Performance
● Tables DDL
Release 1.4 Some highlights…
+450 Commits vs. 1.3.0
● Azure Services Identity
● Connectors SSL
● Azure Key Vault Integration
● Metadata Bulk Operations
● Custom Properties Types
● Suggest APIs
OpenMetadata - Unlocking AI
● Metadata Schemas & APIs → the foundations of OM
● Our vision: foster innovation
● Stop reinventing the wheel and build on top of strong foundations
OpenMetadata - Unlocking AI
MetaPilot - Metadata & Gen AI
In 1.4, we’ll bring three main capabilities:
● Bulk Automated Description Generation
● Natural Language to Query Generation
● Co-Pilot for your Query Optimization
MetaPilot - Demo
MetaPilot - Description Generation
MetaPilot - SQL Query Chat & Generation
● Ask questions anywhere
● Get queries to quickly solve your
use cases
● Explore your data in a new way
MetaPilot - SQL Query Chat & Generation
Understand how others are querying the data
MetaPilot - SQL Query Optimization
MetaPilot - SQL Query Fixing
Ingestion Framework - Improvements
● Multiprocessing Extraction
● Incremental Extraction
Ingestion Framework - Recap
What is it?
It is responsible for extracting data from the different Sources,
map it to the OpenMetadata Standard and
send it to the OpenMetadata API
It can be used directly through the UI or programmatically
through the Python SDK
Ingestion Framework - Main Ideas
Separation of Concerns Service Topology
Multithreading Extraction
[BETA] At Schema level for Databases
No Multithreading Multithreading
Multithreading Extraction - Performance
Tests
BigQuery
● Scenario: 4 Schemas, 200 Tables Each
● Improvement: Up to 75% faster
Redshift
● Scenario: 4 Schemas, 1000 Tables Each
● Improvement: Up to 21% faster
Snowflake
● Scenario: 4 Schemas, 1000 Tables Each
● Improvement: Up to 29% faster
Multithreading Extraction - General
Considerations
● Each case is specific. More Threads does not necessarily mean better performance.
● With each Thread we:
○ Increase the load on the Database (We open one connection per thread)
○ Increase the Memory used (We hold more context in memory at any given time)
● We recommend testing different approaches with not many threads (1-8)
If unsure or having issues, leaving it at 1, falls back to the default single threaded behavior
Multithreading Extraction - User Perspective
Incremental Extraction - Push vs Pull
Pull Pattern
● We go to the source and get all the data we need
● We don't know what changed and what remained the same
Push Pattern
● We receive the data
● Something external is responsible for the data we receive
[BETA] For BigQuery, Redshift and Snowflake
Incremental Extraction - Approach
1. Look for a previous successful pipeline run
2. Get the time it started to run and add a safety margin
3. Update only the assets that changed since then
With this initial step, we could potentially avoid processing a lot of data
[BETA] For BigQuery, Redshift and Snowflake
Incremental Extraction - Flow
[BETA] For BigQuery, Redshift and Snowflake
No Incremental Incremental
Incremental Extraction - How does it work?
● BigQuery
Query Cloud Logging to get the Table changes.
● Redshift
Query SYS_QUERY_HISTORY Table to get the Query Text and parse it
● Snowflake
Query SNOWFLAKE.ACCOUNT_USAGE.TABLES Table to get the changes
Incremental Extraction - Performance Tests
BigQuery
● Scenario: 1 Schema, 1000 Tables
● Improvement: Up to 89% faster
Redshift
● Scenario: 5 Schemas, 1000 Tables Each
● Improvement: Up to 57% faster
Snowflake
● Scenario: 7 Schemas, 1000 Tables Each
● Improvement: Up to 28% faster
Incremental Extraction - User Perspective
Configurations
● enabled
● lookbackDays
● safetyMarginDays
Execution Time Aggregate on DEBUG
Star us on GitHub
https://github.com/open-metadata/OpenMetadata
Join our Slack
https://slack.open-metadata.org/
Follow us on X
@open_metadata
Discover Collate SaaS
https://www.getcollate.io/

More Related Content

Similar to OpenMetadata Community Meeting - 4th April, 2024

Similar to OpenMetadata Community Meeting - 4th April, 2024 (20)

Introduction to SoapUI day 3
Introduction to SoapUI day 3Introduction to SoapUI day 3
Introduction to SoapUI day 3
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Frappe Open Day - May 2018
Frappe Open Day - May 2018Frappe Open Day - May 2018
Frappe Open Day - May 2018
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
How to Monitor Postgres Like a Pro!
How to Monitor Postgres Like a Pro!How to Monitor Postgres Like a Pro!
How to Monitor Postgres Like a Pro!
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
 
MIGRATION - PAIN OR GAIN?
MIGRATION - PAIN OR GAIN?MIGRATION - PAIN OR GAIN?
MIGRATION - PAIN OR GAIN?
 
Datastage Online Training
Datastage Online TrainingDatastage Online Training
Datastage Online Training
 
What we learned at pass summit in 2018
What we learned at pass summit in 2018What we learned at pass summit in 2018
What we learned at pass summit in 2018
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]
 
Implementation of the new REST API for Open Source LBS-platform Geo2Tag
Implementation of the new REST API for Open Source LBS-platform Geo2TagImplementation of the new REST API for Open Source LBS-platform Geo2Tag
Implementation of the new REST API for Open Source LBS-platform Geo2Tag
 

Recently uploaded

Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Marc Lester
 

Recently uploaded (20)

Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
 
Malaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMalaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptx
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 

OpenMetadata Community Meeting - 4th April, 2024

  • 3. 👤 Shin-ichi Hashiba 👤 Sonika Shah 👤 Park Jin Won 👤 Trs 👤 Lulienne Schiefelbein 👤 Kenji Nakagaki 👤 Kent Chenery 👤 Mariusz 11 Community Contributions 💖 👤 Italo Batista 👤 Antoine Balliet 👤 Thiago Pires
  • 4. 👤 Kent Chenery - MSSQL Improvements 👤 Mariusz Górski & Dominik Choma - OpenLineage connector (& more!) 👤 Shiyang Xiao - SAS connector 👤 Sander Van Laar - NL translation 👤 Megumi Aikawa - PGSpider Connector 👤 Anatoliy Shulika - Greenplum Connector 👤 Gautham Kishore - Lightdash Connector 👤 Nguyen Huu Loc - Looker Improvements OpenMetadata SWAG 💃
  • 5. ● 527 in the last 4 weeks Community Metrics
  • 6.
  • 7. ● MetaPilot ● Metadata Actions ● Qlik Cloud ● Alerts Improvements ● Ingestion Performance ● Tables DDL Release 1.4 Some highlights… +450 Commits vs. 1.3.0 ● Azure Services Identity ● Connectors SSL ● Azure Key Vault Integration ● Metadata Bulk Operations ● Custom Properties Types ● Suggest APIs
  • 8. OpenMetadata - Unlocking AI ● Metadata Schemas & APIs → the foundations of OM ● Our vision: foster innovation ● Stop reinventing the wheel and build on top of strong foundations
  • 10. MetaPilot - Metadata & Gen AI In 1.4, we’ll bring three main capabilities: ● Bulk Automated Description Generation ● Natural Language to Query Generation ● Co-Pilot for your Query Optimization
  • 13. MetaPilot - SQL Query Chat & Generation ● Ask questions anywhere ● Get queries to quickly solve your use cases ● Explore your data in a new way
  • 14. MetaPilot - SQL Query Chat & Generation Understand how others are querying the data
  • 15. MetaPilot - SQL Query Optimization
  • 16. MetaPilot - SQL Query Fixing
  • 17. Ingestion Framework - Improvements ● Multiprocessing Extraction ● Incremental Extraction
  • 18. Ingestion Framework - Recap What is it? It is responsible for extracting data from the different Sources, map it to the OpenMetadata Standard and send it to the OpenMetadata API It can be used directly through the UI or programmatically through the Python SDK
  • 19. Ingestion Framework - Main Ideas Separation of Concerns Service Topology
  • 20. Multithreading Extraction [BETA] At Schema level for Databases No Multithreading Multithreading
  • 21. Multithreading Extraction - Performance Tests BigQuery ● Scenario: 4 Schemas, 200 Tables Each ● Improvement: Up to 75% faster Redshift ● Scenario: 4 Schemas, 1000 Tables Each ● Improvement: Up to 21% faster Snowflake ● Scenario: 4 Schemas, 1000 Tables Each ● Improvement: Up to 29% faster
  • 22. Multithreading Extraction - General Considerations ● Each case is specific. More Threads does not necessarily mean better performance. ● With each Thread we: ○ Increase the load on the Database (We open one connection per thread) ○ Increase the Memory used (We hold more context in memory at any given time) ● We recommend testing different approaches with not many threads (1-8) If unsure or having issues, leaving it at 1, falls back to the default single threaded behavior
  • 23. Multithreading Extraction - User Perspective
  • 24. Incremental Extraction - Push vs Pull Pull Pattern ● We go to the source and get all the data we need ● We don't know what changed and what remained the same Push Pattern ● We receive the data ● Something external is responsible for the data we receive [BETA] For BigQuery, Redshift and Snowflake
  • 25. Incremental Extraction - Approach 1. Look for a previous successful pipeline run 2. Get the time it started to run and add a safety margin 3. Update only the assets that changed since then With this initial step, we could potentially avoid processing a lot of data [BETA] For BigQuery, Redshift and Snowflake
  • 26. Incremental Extraction - Flow [BETA] For BigQuery, Redshift and Snowflake No Incremental Incremental
  • 27. Incremental Extraction - How does it work? ● BigQuery Query Cloud Logging to get the Table changes. ● Redshift Query SYS_QUERY_HISTORY Table to get the Query Text and parse it ● Snowflake Query SNOWFLAKE.ACCOUNT_USAGE.TABLES Table to get the changes
  • 28. Incremental Extraction - Performance Tests BigQuery ● Scenario: 1 Schema, 1000 Tables ● Improvement: Up to 89% faster Redshift ● Scenario: 5 Schemas, 1000 Tables Each ● Improvement: Up to 57% faster Snowflake ● Scenario: 7 Schemas, 1000 Tables Each ● Improvement: Up to 28% faster
  • 29. Incremental Extraction - User Perspective Configurations ● enabled ● lookbackDays ● safetyMarginDays
  • 31. Star us on GitHub https://github.com/open-metadata/OpenMetadata Join our Slack https://slack.open-metadata.org/ Follow us on X @open_metadata Discover Collate SaaS https://www.getcollate.io/