The document describes ODPi Egeria, an open source project that provides open metadata and data governance capabilities. It enables the exchange of metadata between tools from different vendors through a distributed virtual graph. Key points include that Egeria supports automated metadata maintenance at scale, provides ubiquitous metadata management on platforms like Hadoop, uses open and remotely accessible metadata with standard interfaces, and aims to integrate metadata discovery and maintenance into all data tools.
Presenter: Kenn Knowles, Software Engineer, Google & Apache Beam (incubating) PPMC member
Apache Beam (incubating) is a programming model and library for unified batch & streaming big data processing. This talk will cover the Beam programming model broadly, including its origin story and vision for the future. We will dig into how Beam separates concerns for authors of streaming data processing pipelines, isolating what you want to compute from where your data is distributed in time and when you want to produce output. Time permitting, we might dive deeper into what goes into building a Beam runner, for example atop Apache Apex.
Presented at All Things Open 2022
Presented by Danny McCormick
Title: Streaming Data Pipelines With Apache Beam
Abstract: Handling big data presents big problems. Along with traditional concerns like scalability and performance, the increasingly common need for live streaming data processing introduces problems like late or incomplete data from flaky data sources. Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines that addresses these challenges. Using one of the open source Beam SDKs, you can build a program that defines a pipeline to be executed by one of Beam’s supported distributed processing back-ends, which include Apache Flink, Apache Spark, and Google Cloud Dataflow.
This talk will explore some problems associated with processing large datasets at scale and how you can write Apache Beam pipelines that address those issues. It will include a demo of a basic Beam streaming pipeline.
Takeaways: an understanding of some challenges associated with large datasets, the Apache Beam model, and how to write a basic Beam streaming pipeline
Audience: anyone dealing with big datasets or interested in data processing at scale.
ODPi Egeria provides a framework for open metadata management, supporting many use cases in the governance of data in Data Lakes – as described in the Egeria webinar on 2nd June: “Data Lake Design with Egeria”. As described in that webinar, Egeria operates across the Data Lake, without needing centralization of metadata from the different tools into a central tool or repository.
Metadata frequently describe relationships between things like Assets, Schemas, Glossaries, and Terms – and these relationships form graphs. Egeria is distributed in nature, enabling you to see a federated view of the metadata contained in multiple tools and metadata repositories. As a result, the discrete graphs naturally federate to form a distributed graph. In this session, we’ll cover the Open Metadata Repository Services (OMRS) layer that enables Egeria to operate across this distributed graph.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...DataWorks Summit
Learn about the industry's new open metadata standard Egeria, introduced in September by ODPi, The Linux Foundation’s Open Data Platform initiative. Egeria supports the free flow of standardized metadata between different technologies and vendor platforms, enabling organizations to locate, manage and use their data resources more effectively. Explore how Egeria's set of open APIs, types and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery and access frameworks for automating the collection, management and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed and used in order to deliver maximum value to the enterprise.
This presentation by ODPi Director John Mertic provides an introduction to Egeria, and explores how the standard provides a vendor-neutral approach to data governance. Learn how a group of companies led by ING, IBM and Hortonworks came together through the open source community to re-imagining data governance and delivered Egeria -- to automate the collection, management and use of metadata across organizations of any size and complexity. Learn how Egeria was built on open standards and delivered via Apache 2.0 open source license.
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
Presentation of Data lineage an Observability with OpenLineage at the "Data and AI summit" (formerly Spark summit). With a focus on the Apache Spark integration for OpenLineage
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
Presenter: Kenn Knowles, Software Engineer, Google & Apache Beam (incubating) PPMC member
Apache Beam (incubating) is a programming model and library for unified batch & streaming big data processing. This talk will cover the Beam programming model broadly, including its origin story and vision for the future. We will dig into how Beam separates concerns for authors of streaming data processing pipelines, isolating what you want to compute from where your data is distributed in time and when you want to produce output. Time permitting, we might dive deeper into what goes into building a Beam runner, for example atop Apache Apex.
Presented at All Things Open 2022
Presented by Danny McCormick
Title: Streaming Data Pipelines With Apache Beam
Abstract: Handling big data presents big problems. Along with traditional concerns like scalability and performance, the increasingly common need for live streaming data processing introduces problems like late or incomplete data from flaky data sources. Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines that addresses these challenges. Using one of the open source Beam SDKs, you can build a program that defines a pipeline to be executed by one of Beam’s supported distributed processing back-ends, which include Apache Flink, Apache Spark, and Google Cloud Dataflow.
This talk will explore some problems associated with processing large datasets at scale and how you can write Apache Beam pipelines that address those issues. It will include a demo of a basic Beam streaming pipeline.
Takeaways: an understanding of some challenges associated with large datasets, the Apache Beam model, and how to write a basic Beam streaming pipeline
Audience: anyone dealing with big datasets or interested in data processing at scale.
ODPi Egeria provides a framework for open metadata management, supporting many use cases in the governance of data in Data Lakes – as described in the Egeria webinar on 2nd June: “Data Lake Design with Egeria”. As described in that webinar, Egeria operates across the Data Lake, without needing centralization of metadata from the different tools into a central tool or repository.
Metadata frequently describe relationships between things like Assets, Schemas, Glossaries, and Terms – and these relationships form graphs. Egeria is distributed in nature, enabling you to see a federated view of the metadata contained in multiple tools and metadata repositories. As a result, the discrete graphs naturally federate to form a distributed graph. In this session, we’ll cover the Open Metadata Repository Services (OMRS) layer that enables Egeria to operate across this distributed graph.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...DataWorks Summit
Learn about the industry's new open metadata standard Egeria, introduced in September by ODPi, The Linux Foundation’s Open Data Platform initiative. Egeria supports the free flow of standardized metadata between different technologies and vendor platforms, enabling organizations to locate, manage and use their data resources more effectively. Explore how Egeria's set of open APIs, types and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery and access frameworks for automating the collection, management and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed and used in order to deliver maximum value to the enterprise.
This presentation by ODPi Director John Mertic provides an introduction to Egeria, and explores how the standard provides a vendor-neutral approach to data governance. Learn how a group of companies led by ING, IBM and Hortonworks came together through the open source community to re-imagining data governance and delivered Egeria -- to automate the collection, management and use of metadata across organizations of any size and complexity. Learn how Egeria was built on open standards and delivered via Apache 2.0 open source license.
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
Presentation of Data lineage an Observability with OpenLineage at the "Data and AI summit" (formerly Spark summit). With a focus on the Apache Spark integration for OpenLineage
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
Mario Molina, Software Engineer
CDC systems are usually used to identify changes in data sources, capture and replicate those changes to other systems. Companies are using CDC to sync data across systems, cloud migration or even applying stream processing, among others.
In this presentation we’ll see CDC patterns, how to use it in Apache Kafka, and do a live demo!
https://www.meetup.com/Mexico-Kafka/events/277309497/
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Details:
• DevOps and Business Intelligence?
• CI/CD Pipelines: What are they?
• Database Deployments: State based vs Migration based
• Snowflake features for CI/CD
• Azure DevOps: Build and Release Pipelines
• Putting it all together: End to End solution
• Demo
Change Data Streaming Patterns for Microservices With Debezium confluent
(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
I see the following topics coming up more regularly in conversations with customers, prospects, and the broader Kafka community across the globe:
Kappa Architecture: Kappa goes mainstream to replace Lambda and Batch pipelines (that does not mean that there is no batch processing anymore). Examples: Kafka-powered Kappa architectures from Uber, Disney, Shopify, and Twitter.
Hyper-personalized Omnichannel: Retail and customer communication across online and offline channels becomes the new black, including context-specific upselling, recommendations, and location-based services. Examples: Omnichannel Retail and Customer 360 in Real-Time with Apache Kafka.
Multi-Cloud Deployments: Business units and IT infrastructures span across regions, continents, and cloud providers. Linking clusters for bi-directional replication of data in real-time becomes crucial for many business models. Examples: Global Kafka deployments.
Edge Analytics: Low latency requirements, cost efficiency, or security requirements enforce the deployment of (some) event streaming use cases at the far edge (i.e., outside a data center), for instance, for predictive maintenance and quality assurance on the shop floor level in smart factories. Examples: Edge analytics with Kafka.
Real-time Cybersecurity: Situational awareness and threat intelligence need to process massive data in real-time to defend against cyberattacks successfully. The many successful ransomware attacks across the globe in 2021 were a warning for most CIOs. Examples: Cybersecurity for situational awareness and threat intelligence in real-time.
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
Presentation at Presto Conference Tokyo 2019
- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
Hive tables are an integral part of the big data ecosystem, but the simple directory-based design that made them ubiquitous is increasingly problematic. Netflix uses tables backed by S3 that, like other object stores, don’t fit this directory-based model: listings are much slower, renames are not atomic, and results are eventually consistent. Even tables in HDFS are problematic at scale, and reliable query behavior requires readers to acquire locks and wait.
Owen O’Malley and Ryan Blue offer an overview of Iceberg, a new open source project that defines a new table layout addresses the challenges of current Hive tables, with properties specifically designed for cloud object stores, such as S3. Iceberg is an Apache-licensed open source project. It specifies the portable table format and standardizes many important features, including:
* All reads use snapshot isolation without locking.
* No directory listings are required for query planning.
* Files can be added, removed, or replaced atomically.
* Full schema evolution supports changes in the table over time.
* Partitioning evolution enables changes to the physical layout without breaking existing queries.
* Data files are stored as Avro, ORC, or Parquet.
* Support for Spark, Pig, and Presto.
codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy
CQRS (Command Query Responsibility Segregation) is a pattern, which separates the process of querying and updating data. As a query only returns data without any side effects, a command is designed to change data. CQRS is often combined with Event Sourcing. This is an architecture in which all changes to an application state are stored as a sequence of events.
Because of its great capability to store time series data Cassandra is the perfect fit for implementing the event store. But there a still a lot of open questions: What about the data modeling? What techniques will be used to process and store data in the Cassandra database? How to access the current state of the application, without replaying every event? And what about failure handling?
In this talk, I will give a brief introduction to CQRS and the Event Sourcing pattern and will then answer the questions above using a real life example of a data store for customer data.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
ELT vs. ETL - How they’re different and why it mattersMatillion
ELT is a fundamentally better way to load and transform your data. It’s faster. It’s more efficient. And Matillion’s browser-based interface makes it easier than ever to work with your data. You’re using data to improve your world: shouldn’t the tools you use return the favor?
In this webinar:
- Explore the differences between ELT and ETL
- Learn why ELT is a better, more modern process
- Discover the latest trends in ELT and how they apply to your business
- Find out how Matillion ETL makes loading large amounts of data easier
Build Real-Time Applications with Databricks StreamingDatabricks
In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.
This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)
• The current locations and status of firefighters, EMT personnel and other relevant fire department employees
• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.
In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.
Lambda architecture is a popular technique where records are processed by a batch system and streaming system in parallel. The results are then combined during query time to provide a complete answer. Strict latency requirements to process old and recently generated events made this architecture popular. The key downside to this architecture is the development and operational overhead of managing two different systems.
There have been attempts to unify batch and streaming into a single system in the past. Organizations have not been that successful though in those attempts. But, with the advent of Delta Lake, we are seeing lot of engineers adopting a simple continuous data flow model to process data as it arrives. We call this architecture, The Delta Architecture.
FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...ODPi
ODPi founders Cloudera, SAS, IBM, ING and other members, are creating an open metadata and governance ecosystem that enables an organization to get the maximum value from data while managing the risks associated with data collection, storage and use.
This collaborative effort between vendors, customers, data architects and developers bring different perspectives to the complex problems of Data Governance and allows for quicker and more creative solutions to get the most out of a company’s data.
ODPi Egeria supports the free flow of standardized metadata between different technologies and vendor platforms, enabling organizations to locate, manage and use their data resources more effectively. Explore how ODPi Egeria’s set of open APIs, types and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery and access frameworks for automating the collection, management and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed and used in order to deliver maximum value to the enterprise.
Mario Molina, Software Engineer
CDC systems are usually used to identify changes in data sources, capture and replicate those changes to other systems. Companies are using CDC to sync data across systems, cloud migration or even applying stream processing, among others.
In this presentation we’ll see CDC patterns, how to use it in Apache Kafka, and do a live demo!
https://www.meetup.com/Mexico-Kafka/events/277309497/
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Details:
• DevOps and Business Intelligence?
• CI/CD Pipelines: What are they?
• Database Deployments: State based vs Migration based
• Snowflake features for CI/CD
• Azure DevOps: Build and Release Pipelines
• Putting it all together: End to End solution
• Demo
Change Data Streaming Patterns for Microservices With Debezium confluent
(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
I see the following topics coming up more regularly in conversations with customers, prospects, and the broader Kafka community across the globe:
Kappa Architecture: Kappa goes mainstream to replace Lambda and Batch pipelines (that does not mean that there is no batch processing anymore). Examples: Kafka-powered Kappa architectures from Uber, Disney, Shopify, and Twitter.
Hyper-personalized Omnichannel: Retail and customer communication across online and offline channels becomes the new black, including context-specific upselling, recommendations, and location-based services. Examples: Omnichannel Retail and Customer 360 in Real-Time with Apache Kafka.
Multi-Cloud Deployments: Business units and IT infrastructures span across regions, continents, and cloud providers. Linking clusters for bi-directional replication of data in real-time becomes crucial for many business models. Examples: Global Kafka deployments.
Edge Analytics: Low latency requirements, cost efficiency, or security requirements enforce the deployment of (some) event streaming use cases at the far edge (i.e., outside a data center), for instance, for predictive maintenance and quality assurance on the shop floor level in smart factories. Examples: Edge analytics with Kafka.
Real-time Cybersecurity: Situational awareness and threat intelligence need to process massive data in real-time to defend against cyberattacks successfully. The many successful ransomware attacks across the globe in 2021 were a warning for most CIOs. Examples: Cybersecurity for situational awareness and threat intelligence in real-time.
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
Presentation at Presto Conference Tokyo 2019
- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
Hive tables are an integral part of the big data ecosystem, but the simple directory-based design that made them ubiquitous is increasingly problematic. Netflix uses tables backed by S3 that, like other object stores, don’t fit this directory-based model: listings are much slower, renames are not atomic, and results are eventually consistent. Even tables in HDFS are problematic at scale, and reliable query behavior requires readers to acquire locks and wait.
Owen O’Malley and Ryan Blue offer an overview of Iceberg, a new open source project that defines a new table layout addresses the challenges of current Hive tables, with properties specifically designed for cloud object stores, such as S3. Iceberg is an Apache-licensed open source project. It specifies the portable table format and standardizes many important features, including:
* All reads use snapshot isolation without locking.
* No directory listings are required for query planning.
* Files can be added, removed, or replaced atomically.
* Full schema evolution supports changes in the table over time.
* Partitioning evolution enables changes to the physical layout without breaking existing queries.
* Data files are stored as Avro, ORC, or Parquet.
* Support for Spark, Pig, and Presto.
codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy
CQRS (Command Query Responsibility Segregation) is a pattern, which separates the process of querying and updating data. As a query only returns data without any side effects, a command is designed to change data. CQRS is often combined with Event Sourcing. This is an architecture in which all changes to an application state are stored as a sequence of events.
Because of its great capability to store time series data Cassandra is the perfect fit for implementing the event store. But there a still a lot of open questions: What about the data modeling? What techniques will be used to process and store data in the Cassandra database? How to access the current state of the application, without replaying every event? And what about failure handling?
In this talk, I will give a brief introduction to CQRS and the Event Sourcing pattern and will then answer the questions above using a real life example of a data store for customer data.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
ELT vs. ETL - How they’re different and why it mattersMatillion
ELT is a fundamentally better way to load and transform your data. It’s faster. It’s more efficient. And Matillion’s browser-based interface makes it easier than ever to work with your data. You’re using data to improve your world: shouldn’t the tools you use return the favor?
In this webinar:
- Explore the differences between ELT and ETL
- Learn why ELT is a better, more modern process
- Discover the latest trends in ELT and how they apply to your business
- Find out how Matillion ETL makes loading large amounts of data easier
Build Real-Time Applications with Databricks StreamingDatabricks
In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.
This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)
• The current locations and status of firefighters, EMT personnel and other relevant fire department employees
• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.
In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.
Lambda architecture is a popular technique where records are processed by a batch system and streaming system in parallel. The results are then combined during query time to provide a complete answer. Strict latency requirements to process old and recently generated events made this architecture popular. The key downside to this architecture is the development and operational overhead of managing two different systems.
There have been attempts to unify batch and streaming into a single system in the past. Organizations have not been that successful though in those attempts. But, with the advent of Delta Lake, we are seeing lot of engineers adopting a simple continuous data flow model to process data as it arrives. We call this architecture, The Delta Architecture.
FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...ODPi
ODPi founders Cloudera, SAS, IBM, ING and other members, are creating an open metadata and governance ecosystem that enables an organization to get the maximum value from data while managing the risks associated with data collection, storage and use.
This collaborative effort between vendors, customers, data architects and developers bring different perspectives to the complex problems of Data Governance and allows for quicker and more creative solutions to get the most out of a company’s data.
ODPi Egeria supports the free flow of standardized metadata between different technologies and vendor platforms, enabling organizations to locate, manage and use their data resources more effectively. Explore how ODPi Egeria’s set of open APIs, types and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery and access frameworks for automating the collection, management and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed and used in order to deliver maximum value to the enterprise.
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
All production environment requires monitoring and alerting. Apache Spark also has a configurable metrics system in order to allow users to report Spark metrics to a variety of sinks. Prometheus is one of the popular open-source monitoring and alerting toolkits which is used with Apache Spark together.
Creating a modern web application using Symfony API Platform, ReactJS and Red...Jesus Manuel Olivas
The API Platform framework is a set of tools to help you building API-first projects. The API project Platform is built on top of the Symfony framework, it means you can reuse all your Drupal 8 and Symfony skills and benefit of the incredible amount of Symfony documentation and community bundles.
During this session, you will learn how to use the API Platform project to create a modern web application using Symfony, Doctrine, ReactJS, Redux, Redux-Saga, Ant Design and DVA.
grlc is a thin server that translates SPARQL queries (and their associated metadata) from GitHub repositories into full-fledged API Swagger specifications and user interfaces.
Presented by: Mandy Chessell, IBM
Presented at All Things Open 2020
Abstract: I am one of the leaders in the open metadata and governance initiative. This is seeking to develop standards and a reference implementation through and open source project called ODPi Egeria. Egeria enables organizations to manage data as an asset even when they use tools and platforms from multiple vendors. This type of problem is extremely complex and it needs the collaboration of multiple organizations to make it happen. In this talk I will go through the technical challenges we face and how they are being overcome.
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)sparkfabrik
Our Drupal 8 websites are true applications, often very complex ones.
More and more workload is delegated to external systems, usually microservices, that are used for many different tasks.
Architectures are always more distributed and fragmented.
To trace the lifecycle of a single request that originates in a client, passes through all Drupal subsystems, reaches external (micro)services and comes back, it will become mandatory to track down problems and to optimize for performances. This is often time consuming and without the right tools may become very difficult.
A simple unstructured log stream isn't enough anymore, we need to find a way to observe the details of what is going on.
Observability is all about this and is based on structured logs, metrics and traces. In this talk we will see how to implement these techniques in Drupal, which tools and which modules to use to trace and log all requests that reach our website and how to expose and display useful metrics.
We will integrate Drupal with OpenTelemetry, Monolog and Grafana to collect, scrape, store and visualize telemetry data.
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Similar to OSS NA 2019 - Demo Booth deck overview of Egeria (20)
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
8. https://github.com/odpi/egeria
A new manifesto for metadata and governance
The maintenance of metadata must be automated to scale to the sheer volumes and variety
of data involved in modern business. Similarly the use of metadata should be used to drive the
governance of data and create a business friendly logical interface to the data landscape.
The availability of metadata management must become ubiquitous in cloud platforms and
large data platforms, such as Apache Hadoop so that the processing engines on these
platforms can rely on its availability and build capability around it.
Metadata access must become open and remotely accessible so that tools from
different vendors can work with metadata located on different platforms. This implies
unique identifiers for metadata elements, some level of standardization in the types and
formats for metadata and standard interfaces for manipulating metadata.
Wherever possible, discovery and maintenance of metadata has to an integral part of all
tools that access, change and move information.
13. https://github.com/odpi/egeria
Search
A Cohort of OMAG Servers
13
Open Metadata Repository Services
OMRS Cohort
Open Metadata
Access Services
Open Metadata
Access Services Open Metadata
Access Services
Open Metadata
And Governance
(OMAG) Server
14. https://github.com/odpi/egeria
Egeria Open Metadata Repository Services (OMRS)
The OMRS defines a protocol and a set of connectors
The Enterprise Connector performs cohort-wide operations –
this includes issuing queries to the cohort and when metadata
is replicated from another server it can use the local connector
and repository to cache it for availability and performance
The Local Connector performs local operations and provides a
default Event Mapper that enables events relating to local
operations to be sent to the cohort
The Repository Connector interfaces to a specific repository –
and optionally, may be accompanied by a custom Event
Mapper
Egeria provides two built in repositories and there are
connectors to other repositories
The interface to a repository connector is the MetadataCollection
API, described on the next slide
OMRS Enterprise Connector
OMRS Local Connector
& Event Mapper
OMRS Repository
Connector
Repository
Cohort
MetadataCollection
API
15. https://github.com/odpi/egeria
The OMRSMetadataCollection interface
The interface to an Egeria repository is the OMRSMetadataCollection interface
It includes groups of operations:
Group 1: Identification of metadata repository - metadataCollectionId
Group 2: Type definitions (types, attributes) - add, find, get, remove, …
Group 3: Find instances (entities, relationships) - get, find, graph-queries, …
Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …
Group 5: Change control information (entities, relationships) - reIdentify, reHome, …
Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
16. https://github.com/odpi/egeria
Egeria metadata – a distributed graph
Business
metadata
Structural
metadata for
a data store
EMPNAM
E
EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data
The interconnected nature of metadata forms a graph
The distributed nature of Egeria leads to a distributed graph…
18. https://github.com/odpi/egeria
Egeria distributed graph model
18
Database
Column
Glossary
Term
Glossary
Term
Meaning
OMAG Server 1 OMAG Server 2
Reference
Copy
Relationship
One entity could be replicated to the other server, as a ‘reference copy’
The original Glossary Term on OMAG Server 2 is still the master
A relationship could be defined between the local DB column and the reference copy of the Glossary Term
19. https://github.com/odpi/egeria
Egeria distributed graph model
19
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Database
Column
Glossary
Term
Meaning
Both entities could be replicated to a third server, as reference copies
The originals are still the masters
A relationship could be defined between the local reference copies
20. https://github.com/odpi/egeria
Egeria distributed graph model
20
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Meaning
Database
Column
Glossary
Term
Entity
Proxy
Instead of replication, the third server could relate the original entities using entity proxies
21. https://github.com/odpi/egeria
Egeria Local Graph Repository
The Egeria distribution includes a persistent repository and a non-persistent reposiutory
The persistent repository is a graph repository built on JanusGraph
JanusGraph is an open-source project, hosted by the Linux Foundation
http://janusgraph.org
http://github.com/janusgraph/janusgraph
The built-in graph repository provides an OMAG Server with a persistent metadata store and is built
using Egeria’s ‘plugin’ pattern
The graph repository can store instances of metadata owned by the local server
It can also store reference copies of metadata instances replicated to the local server
It also supports relationship instances that refer to entity proxy instances
22. https://github.com/odpi/egeria
Anatomy of the local graph repository
22
Graph Repository
JanusGraph
persistence
search
OMAG Server
OMAS – access services
OMRS Enterprise Connector OMRS topics
in
out
Apache
Tinkerpop
OMRS Local Connector
& Event Mapper
OMRS Graph Connector
JanusGraph
Management
Cohort
23. https://github.com/odpi/egeria
Graph Repository components
GraphOMRSRepositoryConnector - implements the open connector framework interface
GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector
GraphOMRSMetadataCollection – top level interface supporting type and instance operations
GraphOMRSMetadataStore – implements the MetadataCollection using a graph database
GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics
Mappers – convert between OMRS objects and graph vertices and edges
GraphOMRSEntityMapper
GraphOMRSRelationshipMapper
GraphOMRSClassificationMapper
Plus various utility classes – error codes, audit logging, constants and utility methods
https://github.com/odpi/egeria/
See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/
open-metadata-collection-store-connectors/graph-repository-connector
24. https://github.com/odpi/egeria
To use the Egeria Graph Repository
Configure the OMAG Server repository-mode = ‘local-graph-repository’
e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository
Subsequently, start the OMRS instance in the server
e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servername}/instance
When OMRS starts, the graph repository auto-creates a JanusGraph database – including:
Persistence backend
Search backend
Graph schema
Search indexes
For now, the persistence backend is embedded Berkeley DB and the indexing backend is Lucene –
further options could be added
25. https://github.com/odpi/egeria
Graph Schema
The MetadataCollection interface is the formal interface to an Egeria repository.
Whilst it is possible to look at the graph directly (e.g. using Gremlin console):
Please don’t rely on the schema – it is likely to evolve
Type data:
The Graph Repository does not store type definitions
It delegates all type operations to the Repository Content Manager
Instance data:
The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:
vertices for entities and classifications
edges for relationships and classifiers
29. https://github.com/odpi/egeria
Metadata Repository API
A MetadataCollection supports a comprehensive API
Metadata collection Id
Query types
Define/maintain types
Search/query metadata instances
Maintain metadata instances
Historical (as of time) queries
Effectivity dating
Versioning
Metadata
Advanced maintenance
Managing reference copied
Protocol is forgiving – allowing minimal capability -
metadata instance search/query
29
30. https://github.com/odpi/egeria
Local instances, reference copies and proxies
30
The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy
The graph contains one edge per relationship – whether the relationship is local or a reference copy
Reference Copies
• The metadataCollectionId core attribute is set to the ‘guid’ of the home repository
Entity Proxy objects
• Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
31. https://github.com/odpi/egeria
The MetadataCollection ‘graph-query’ methods
There are 4 sub-graph query methods:
getRelatedEntities()
Returns the entity and its immediate neighbors
getEntityNeighborhood()
Returns the entity and its neighbors up to the depth specified by the
‘level’ parameter
getLinkingEntities()
Returns the relationships and intermediate entities that connect the
specified pair of entities
getRelationshipsForEntity()
Returns relationships associated with entity, optionally filtered by
relationship type and status
level = 2
32. https://github.com/odpi/egeria
Graph Repository – supported functions
The GraphRepository supports most of the OMRS MetadataCollection API, including:
Save and purge of reference copies
Use of entity proxies
Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent
Re-type of instances
Re-identify of instances
Re-home of instances
The four ‘graph queries’ – described on the previous slide
The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification
The Graph Repository does not (yet) support:
Historic queries – find methods that specify an asOfTime parameter
Undo of previous instance updates
33. https://github.com/odpi/egeria
Further Information
Please visit us at Booth #53 on the 4th floor
Project website:
https://www.odpi.org/projects/egeria
Open source repositories:
http://github.com/odpi/egeria
http://github.com/janusgraph/janusgraph
33
35. https://github.com/odpi/egeria
A hybrid multi-cloud world
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
36. https://github.com/odpi/egeria
Open metadata ecosystem
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
37. https://github.com/odpi/egeria
The OMAG Server Platform
37
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
Egeria Server 1
Egeria Server 2
Egeria Server 3
Kubernetes
OMAG Server
Platform
Egeria
Server 1
Egeria
Server 2
Egeria
Server 3
Multi-tenant
OMAG Server
Platform
Egeria
Server 1
Edge
40. https://github.com/odpi/egeria
Example of a simple cohort
Cohort A
Chief Data Office
Data Lake
Systems of
Record
40
Virtualizer
Security-Sync
Data Bridge
Apache Ranger
Gaian
Stewardship
Stewardship
Stewardship
Data Onboarding
43. https://github.com/odpi/egeria
UI: good and the not so good.
43
Confusing
Not my language
(too technical or not technical enough)
Not meeting my needs
Presented for my role
Logically flows to complete the
tasks I do.
Underpinned by relevant
(persona specific) APIs
Not using my words
Mismatches my world view
Someone from my role was involved
In creating the UI.
45. https://github.com/odpi/egeria
UIs
ODPi Egeria UI types
45
Open Metadata Access Services
Open Metadata Repository Services
45
Search
Daemon
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
Data
store
46. https://github.com/odpi/egeria
UIs
ODPi Egeria UI types work in progress
4646
Search
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
IBM creating
Subject Area UI
ING creating
Asset Search
IBM creating
Type explorer
and instance
explorer
ING creating
Lineage viewer
48. https://github.com/odpi/egeria
UI design – profile driven
48
Login
Personal
Profile
User’s roles defines what UI capabilities
a user should see
Subject
area
Type
explorer
Asset
Search
Many more to come ……..
Dealing well with
potentially large
amounts of data in a
persona specific way is
the challenge. E.g. by
paging, limiting by
neighborhood depth in
graph calls
49. https://github.com/odpi/egeria
Egeria UI technology experiences
49
• Web component technology providing web components. It is not a framework
• + nice separation of components – hiding implementation in shadow dom
• + communicate with property binding
• + support for events
• + many existing paper and iron components for simple things.
David’s (Polymer newby) experiences:
• - quirky – spent a lot of time finding the happy path to get things working, especially around web
components not being initialized when you want to use them (a big frustration was trying to issue a rest call
from the ready() method).
• +/- need to be rigorous with architecture, it seems best to use one way bindings and events and
a top level controller component to drive state transitions for MVC e.g. around a grid. Redux may make
sense to hold state and define state transitions
• - There is no free commercial smart (editable) grid I can find (this seems true for other frameworks as well)
50. https://github.com/odpi/egeria
The sort of architecture more complex web components
require.
50
• Controller controls all transitions
• The model allows data updates to occur on
the model with simple CRUD operations
• The model changes are then reflected into
the view.
Considerations:
- Operations are currently synchronous. Redux
would be asynchronous
- Spinner would need to lock across the complete
User interaction not just the rest call
- Changes to the view made by the user and
changes to the view from the model, need to be
managed
- Paging required.
54. https://github.com/odpi/egeria
Using ODPi Egeria …
Eases the cost of metadata integration
through
Comprehensive standards and libraries.
Active vendor recruitment program.
Provides direct support to many
governance roles, filling the gaps
between function offered through
commercial tools.
Provides best practices and content
packs to accelerate an organization’s
journey to becoming data driven.
54
57. https://github.com/odpi/egeria
The ODPi is a non-profit that is part of The Linux Foundation
Delivering core technology
Recruiting vendors
Assisting practitioners
57
Vendors
Practitioners
Core
Technology
Conformance
Suite
Best
Practices
Project
Egeria
Project
Data
Governance
61. https://github.com/odpi/egeria
IBM Information Governance Catalog Integration
Egeria’s IGC integration uses the
Adapter Pattern
There are two connectors to IGC running
in the repository proxy server.
They translate IGC APIs and events into
open metadata APIs and events.
Egeria handles the interaction with the
cohort.
No need to upgrade IGC to adopt
Outbound metadata only
61
Information
Governance
Catalog
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
62. https://github.com/odpi/egeria
Apache Atlas Integration
The Egeria community is working on a similar
integration for Apache Atlas.
Again there are two connectors in the repository
proxy server.
These connectors translate Atlas APIs and events
into open metadata APIs and events.
Egeria handles the interaction with the cohort.
No need to upgrade Atlas to adopt
Two-way exchange of native Atlas metadata
62
Apache Atlas
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
63. https://github.com/odpi/egeria
Native Integration
An alternative approach is the Native Pattern
There are still two connectors. They translate
internal APIs and events into open metadata APIs
and events.
ODPi Egeria handles the interaction with the cohort.
The connectors and the ODPi Egeria libraries reside
in the metadata server.
No additional server; less network traffic; upgrade
required.
63
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
Metadata
Server
64. https://github.com/odpi/egeria
Plug-in Integration
The plug-in pattern allows different repository back-
ends to be plugged into the ODPi Egeria’s OMAG
Server.
Egeria includes:
In-memory Repository (Testing and demos)
JanusGraph Repository (All scenarios)
Supports the full protocol and fills in the gaps left by
the proprietary tools.
64
Repository
Connector
Open Metadata Highway
Open Metadata and
Governance (OMAG)
Server
71. https://github.com/odpi/egeria
Scope of metadata covered
Glossary Collaboration
Governance
Models and
Reference Data
Metadata
Discovery
Lineage Data Assets
Base Types, Systems
and Infrastructure
71
72. https://github.com/odpi/egeria
Scope of metadata covered
Policy Metadata (Principles,
Regulations, Standards,
Approaches, Rule Specifications,
Roles and Metrics)
Governance
Actions and
Processes
Augmentation
MappingImplementation
Business Objects and
Relationships, Taxonomies
and Ontologies
Business Attributes
Organization
Teaming Metadata
(people profiles,
communities, projects,
notebooks, …)
Models and Schemas
4
3
1
5
Physical Asset Descriptions
(Data stores, APIs,
models and components)
Asset Collections
(Sets, Typed Sets, Type
Organized Sets)
Information Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ratings, …)
ClassificationSchemes
Classification
Strategy Subject Area Definition
Campaigns and Projects
Rollout
2
Discovery
Metadata (profile data,
technical classification, data
classification,
data quality assessment, …)
Augmentation
Instrument
Association
Information Process
Instrumentation (design lineage)
6
7
ConnectorsBasic Types, Infrastructure and Systems
Access
0
72
77. https://github.com/odpi/egeria
Different personas need different services
Callie Quartile
Data Scientist
Jules Keeper
Chief Data Officer
Find data
Understand data
Manage analytics models
Build data strategy
Define governance program
Monitor progress
77
78. https://github.com/odpi/egeria
Different personas need different services
Tanya Tidie
Clinical Trials Administrator
Ivor Padlock
Chief Security Officer
Maintain accurate patient records
Catalog clinical trials data
Demonstrate good data management practices
Understand risks to organization
Set up protection
Monitor for suspicious activity
78
80. https://github.com/odpi/egeria
Current Open Metadata Access Services (OMASs)
80
Project Management
Community ProfileAsset Catalog
Stewardship Action
Information View
Governance Program
Data Process
Subject Area
Connected Asset Discovery EngineGovernance Engine
Data Protection
Software Developer
Data Platform
Asset Owner
Digital Architecture
Data Science
DevOps
Asset Consumer
Data Infrastructure
Data Privacy
Asset Lineage
84. https://github.com/odpi/egeria
Scared to share (example)
Faith Broker
Human Resources
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3
Callie Quartile
Data Scientist
Very Sensitive DataVery Sensitive Data
84
85. https://github.com/odpi/egeria
What does metadata look like?
Business
metadata
Structural
metadata for
a data store
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A
IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
85
89. https://github.com/odpi/egeria
Building governance maturity is a gradual process
Organizations may operate different
levels of maturity in different parts of
their business.
Choices determined by where the
most value lies.
Many organizations aspire to provide
all employees with the data they need
(data citizenship*)
89
https://opengovernance.odpi.org/maturity-model/
AUTOMATED – Metadata is created by application at the same as the data is created in a standard manner easily consumable for all with necessary permissions
Device that took the picture / name of picture / settings picture was taken at / location geo tag of picture etc – all automatic – all done at creation of data time
Egeria is an Open Source framework that can be used to provide a distributed, unified view of metadata from different sources, including different stores and tools from different vendors.
Egeria creates a unified view of metadata residing in those tools and stores, so users can collaborate and share metadata, without needing to visit multiple tools or stores.
Egeria does not attempt to consolidate the metadata into one repository or tool – it’s better to leave it in place - the current owners stay in control of their metadata, and it stays local to its native store or tool.
Egeria provides an open type system, plus APIs, protocols, connectors and local metadata repositories.
The internal architecture of Egeria has two distinct layers.
The Open Metadata Access Services layer supports the different types of user and use case.
The Open Metadata Repository Services layer provides the unified view of metadata across distinct systems, using protocols and repositories for access and exchange of metadata objects.
Egeria’s OMRS layer includes the ability to refer to remote objects or replicate cached copies of remote objects for performance and availability
Egeria can store this distributed model in its own local repositories, which support the storing of:
local objects,
replicas of remote objects and
proxy-references to remote objects.
This slide shows a physical embodiment of a cohort of OMAG Servers.
An OMAG Server is a deployable unit of function and each OMAG Server can be configured to either run a set of OMAS services or support a repository, or a combination of these roles.
An Egeria cohort is a collection of cooperating OMAG Servers.
An OMAG Server may belong to multiple cohorts.
The OMAS services are local to a server
Each server runs the set of OMAS services listed in its configuration – it is OK to run 0, 1 or multiple OMAS services in a server
Each OMAS is for a specific purpose or persona
The OMRS protocol layer is supported by all servers
The OMAG Servers use OMRS to access/exchange metadata across the cohort
A server shares its metadata over OMRS – sending an event each time a change occurs, or sending a query to other servers
A server may optionally maintain a local Egeria repository
A server may optionally connect to a 3rd party metadata repository
In a few slides we’ll see that the OMRS itself is composed of distinct layers that focus on cross-cohort (“Enterprise”) functions and Local functions.
The role of OMRS is to provide a location transparent, unified view of metadata within a cohort.
Cross-cohort operations are supported by the OMRS ‘Enterprise Connector’, including sending queries to the cohort and receiving the results, as well as receiving replicated metadata and saving copies via the local connector.
Meanwhile the ‘Local Connector’ handles interactions with an (optional) local repository and provides a default event mapper that sends events when the local state changes.
The OMRS protocol uses publish/subscribe over Kafka topics, but the communication/messaging system is pluggable so different transports could be used.
The interface to the repository connector is the MetadataCollection API _ which is described on the next slide….
We’re not going to describe this interface in detail – but it’s worth being aware of it, especially as we’re going to talk later about the graph-queries in Group 3.
Egeria’s model of metadata is graph-oriented, both at the business layer and beneath that in the structural metadata
Business metadata describes the data that the business needs, what it means and how it should be classified and protected.
Structural metadata describes how the data is actually stored and labelled in the data store.
The linkages within and between the business and technical metadata forms a graph, that can be used to switch between these two perspectives.
One of the built-in repositories in Egeria is a graph repository,; a natural fit for the metadata graph that also accommodates the distributed nature of OMRS.
The Egeria local graph repository is built on the open-source JanusGraph graph database.
It may not always be practical to replicate an instance
There are 2 occasions where using a proxy is advantageous:
An OMAS wants to save a relationship in a repository and the replication has not happened yet (or the set up is such that replication of that type is not enabled).
2. The repository does not support the full entity type but does support proxies (all proxies have the same storage requirement).
A key point about the distributed graph is that whether the relationship refers to a replica entity or uses an entity proxy – it is location transparent.
The Enterprise OMRS layer can select which repository into which to save an instance – based on capability and proximity.
Egeria provides a persistent graph repository
It’s built using JanusGraph and currently uses version 0.3.1
JanusGraph is an open source project hosted by the Linux Foundation that supports the Apache Tinkerpop 3.3 interface.
The Egeria graph repository is built using the Egeria ‘plugin’ repository pattern – in which the repository connector is both the connector and the implementation of the repository.
The graph repository supports instances originating locally, instances replicated from a remote server and proxy instances.
This slide shows (some of) the layers within an OMAG Server.
We talked earlier about the access services and about the Enterprise Connector and Local Connectors within OMRS.
Now we want to focus on the relationship between the Egeria graph repository connector and repository implementation (both in aqua-blue) and the JanusGraph code (in green)
As far as possible the repository uses Apache Tinkerpop for graph operations. This is simply that – while we like JanusGraph – it is probably sensible to stay as far as possible with the Tinkerpop interface for possible future portability.
There are some aspects of interacting with a graph database that are inherently implementation-specific – things like the configuration (e.g. of backends), schema and indexing. For these types of interaction it is necessary to use the JanusGraph Management interface.
Whilst you could look inside the graph for debugging or development – please don’t write code that relies on the schema as it is very likely to evolve
The graph does not contain type information – Egeria provides a repository helper that manages types.
The graph is used to store instance data - as described in mode detail on the following slides…
Here is an example of a number of OMRS instance objects – there are two entities, that are connected by a relationship.
Also, one of the entities has two classifications.
All of the instances have attributes – some will be core attributes used for type or control information; others will be attributes that are specific to the instance type (known as type-defined attributes).
You don’t need to remember this picture – we’ll stick a copy of it in the top corner so we can refer back to it…..
Entities and classifications are vertices.
Relationships and classifiers are edges.
The graph schema defines labels for Entity, Relationship, Classification and Classifier.
Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information:
Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph
Control information is stored in ‘core attribute’ properties
Instance properties are stored in serialized form and under unique custom keys to support search
Entities and classifications are vertices.
Relationships and classifiers are edges.
The graph schema defines labels for Entity, Relationship, Classification and Classifier.
Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information:
Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph
Control information is stored in ‘core attribute’ properties
Instance properties are stored in serialized form and under unique custom keys to support search
Within Group 3 of the MDC API ….
Experts in a field with their own jargon and ways of doing things.
Search report writer interested in assets and not security policies. Security policy author not interested in assets
Goals tasks associated artifacts for a role.
1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria
2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is
accessed using an RDB connector
3 OMRS oriented UIs – e.g. Tex used to explore Egeria types
4 Daemon UIs – displaying Lineage
1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria
2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is
accessed using an RDB connector
3 OMRS oriented UIs – e.g. Tex used to explore Egeria types
4 Daemon UIs – displaying Lineage
For this to work we need to know hostname and ports and url structures.
Configuration for tomcat is via application.properties
Configuration of the server is held in a file and authored via admin rest calls.
Example here is the glossary grid. A grid for authoring glossaries in the subject area UI. Work in progress
ODPi
Business metadata describes the data that the business needs, what it means and how it should be classified and protected.
Structural metadata describes how the data is actually stored and labelled in the data store.
The linkage between the business and technical metadata allows our technology to switch between these two perspectives. For example,
A request for data expressed in business terminology can be translated into a query for data from a data store.
An integration engine copying data into a sand box can discover which are the fields that the business classifies as sensitive and then mask these values dynamically.