Data is our Product: Thoughts on LOD SustainabilityRobert Sanderson
Invited keynote presentation for the LINCS Project, June 23rd 2022 at the University of Guelph, Canada. It describes thoughts on a framework for sustainability of linked open usable data products in the cultural heritage domain.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
LUX - Cross Collections Cultural Heritage at YaleRobert Sanderson
A brief presentation based on the CNI talk for the Linked Data for Libraries Discovery affinity group about LUX, Linked Open Usable Data and our discovery processes based on graphs rather than documents.
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
During this lunch, we’ll review open-source reverse ETL tools to uncover how to send data back to SaaS systems.
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
#data #dataengineering #datagovernance
ChatGPT and not only: How to use the power of GPT-X models at scaleMaxim Salnikov
Join this session to get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we’ll put in order all the terms – OpenAI, GPT-X, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we’ll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one. During this session, we’ll keep our playground – Azure OpenAI Studio – open to illustrate these capabilities with live demos!
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.
Data is our Product: Thoughts on LOD SustainabilityRobert Sanderson
Invited keynote presentation for the LINCS Project, June 23rd 2022 at the University of Guelph, Canada. It describes thoughts on a framework for sustainability of linked open usable data products in the cultural heritage domain.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
LUX - Cross Collections Cultural Heritage at YaleRobert Sanderson
A brief presentation based on the CNI talk for the Linked Data for Libraries Discovery affinity group about LUX, Linked Open Usable Data and our discovery processes based on graphs rather than documents.
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
During this lunch, we’ll review open-source reverse ETL tools to uncover how to send data back to SaaS systems.
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
#data #dataengineering #datagovernance
ChatGPT and not only: How to use the power of GPT-X models at scaleMaxim Salnikov
Join this session to get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we’ll put in order all the terms – OpenAI, GPT-X, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we’ll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one. During this session, we’ll keep our playground – Azure OpenAI Studio – open to illustrate these capabilities with live demos!
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...Jouko Nyholm
Selected slides from presentation regarding Power BI Governance and Development Best Practices. Presentation was held at MS BI & Power BI User Group Finland event 12.6.2018 at Microsoft Flux, Helsinki.
Without the animations & hands-on demos the slides do not tell the whole story, but hopefully valuable to some nevertheless.
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldDenodo
If you’re a Denodo Partner, this presentation is for you. Learn how to gain a competitive edge in the marketplace with Denodo Platform 6.0, and leverage off the new features and functionality.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/Qh8MeX.
The slide deck from data and analytics workshop for HR professionals. Presented in @hrtechgroup event in Microsoft Vancouver. The workshop was built around the HR sample partner data set
https://docs.microsoft.com/en-us/power-bi/sample-human-resources
Power BI Governance - Access Management, Recommendations and Best PracticesLearning SharePoint
This document outlines permissions management for Power BI Workspace and features of new admin, Member and Contributor Roles. Recommendations and best practices for sharing report are also included. Free to Download.
Data Warehouse Operational System ArchitectureSlideTeam
“You can download this product from SlideTeam.net”
Presenting this set of slides with name Data Warehouse Operational System Architecture. The topics discussed in these slides are Data Warehouse, Operational System, Architecture. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience. https://bit.ly/3Hj8QCP
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
Agenda:
1) Defence, Modern Warfare, and Cybersecurity in 202X
2) Data in Motion with Apache Kafka as Defence Backbone
3) Situational Awareness
4) Threat Intelligence
5) Forensics and AI / Machine Learning
6) Air-Gapped and Zero Trust Environments
7) SIEM / SOAR Modernization
Technologies discussed in the presentation include Apache Kafka, Kafka Streams, kqlDB, Kafka Connect, Elasticsearch, Splunk, IBM QRadar, Zeek, Netflow, PCAP, TensorFlow, AWS, Azure, GCP, Sigma, Confluent Cloud,
The current Microsoft PowerBI governance enabling and recommendations. Including the changes following the November PowerBI release and PASS conference announcements.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
An introduction to self-service data with Dremio. Dremio reimagines analytics for modern data. Created by veterans of open source and big data technologies, Dremio is a fundamentally new approach that dramatically simplifies and accelerates time to insight. Dremio empowers business users to curate precisely the data they need, from any data source, then accelerate analytical processing for BI tools, machine learning, data science, and SQL clients. Dremio starts to deliver value in minutes, and learns from your data and queries, making your data engineers, analysts, and data scientists more productive.
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."
Apache Kafka in the Transportation and LogisticsKai Wähner
Event Streaming with Apache Kafka in the Transportation and Logistics.
Track & Trace, Real-time Locating System, Customer 360, Open API, and more…
Examples include Swiss Post, SBB, Deutsche Bahn, Hermes, Migros, Here Technologies, Otonomo, Lyft, Uber, Free Now, Lufthansa, Air France, Singapore Airlines, Amadeus Group, and more.
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open DataRobert Sanderson
What is the notion of trust, when it comes to publishing linked open data in the cultural heritage sector? This presentation discusses some aspects with relation to three primary questions: How do we trust what was said, trust that the institution said it, and trust what it means?
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...Jouko Nyholm
Selected slides from presentation regarding Power BI Governance and Development Best Practices. Presentation was held at MS BI & Power BI User Group Finland event 12.6.2018 at Microsoft Flux, Helsinki.
Without the animations & hands-on demos the slides do not tell the whole story, but hopefully valuable to some nevertheless.
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldDenodo
If you’re a Denodo Partner, this presentation is for you. Learn how to gain a competitive edge in the marketplace with Denodo Platform 6.0, and leverage off the new features and functionality.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/Qh8MeX.
The slide deck from data and analytics workshop for HR professionals. Presented in @hrtechgroup event in Microsoft Vancouver. The workshop was built around the HR sample partner data set
https://docs.microsoft.com/en-us/power-bi/sample-human-resources
Power BI Governance - Access Management, Recommendations and Best PracticesLearning SharePoint
This document outlines permissions management for Power BI Workspace and features of new admin, Member and Contributor Roles. Recommendations and best practices for sharing report are also included. Free to Download.
Data Warehouse Operational System ArchitectureSlideTeam
“You can download this product from SlideTeam.net”
Presenting this set of slides with name Data Warehouse Operational System Architecture. The topics discussed in these slides are Data Warehouse, Operational System, Architecture. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience. https://bit.ly/3Hj8QCP
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
Agenda:
1) Defence, Modern Warfare, and Cybersecurity in 202X
2) Data in Motion with Apache Kafka as Defence Backbone
3) Situational Awareness
4) Threat Intelligence
5) Forensics and AI / Machine Learning
6) Air-Gapped and Zero Trust Environments
7) SIEM / SOAR Modernization
Technologies discussed in the presentation include Apache Kafka, Kafka Streams, kqlDB, Kafka Connect, Elasticsearch, Splunk, IBM QRadar, Zeek, Netflow, PCAP, TensorFlow, AWS, Azure, GCP, Sigma, Confluent Cloud,
The current Microsoft PowerBI governance enabling and recommendations. Including the changes following the November PowerBI release and PASS conference announcements.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
An introduction to self-service data with Dremio. Dremio reimagines analytics for modern data. Created by veterans of open source and big data technologies, Dremio is a fundamentally new approach that dramatically simplifies and accelerates time to insight. Dremio empowers business users to curate precisely the data they need, from any data source, then accelerate analytical processing for BI tools, machine learning, data science, and SQL clients. Dremio starts to deliver value in minutes, and learns from your data and queries, making your data engineers, analysts, and data scientists more productive.
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."
Apache Kafka in the Transportation and LogisticsKai Wähner
Event Streaming with Apache Kafka in the Transportation and Logistics.
Track & Trace, Real-time Locating System, Customer 360, Open API, and more…
Examples include Swiss Post, SBB, Deutsche Bahn, Hermes, Migros, Here Technologies, Otonomo, Lyft, Uber, Free Now, Lufthansa, Air France, Singapore Airlines, Amadeus Group, and more.
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open DataRobert Sanderson
What is the notion of trust, when it comes to publishing linked open data in the cultural heritage sector? This presentation discusses some aspects with relation to three primary questions: How do we trust what was said, trust that the institution said it, and trust what it means?
An introduction to the linked.art LOD data model, based on a carefully selected profile of CIDOC-CRM, and expressed as JSON-LD. It focuses on developer happiness and data usability, while trying to also maintain as much of the richness of CRM as possible.
A Perspective on Wikidata: Ecosystems, Trust, and UsabilityRobert Sanderson
Brief and skeptical presentation about wikidata and its potential for use and abuse in the cultural heritage data ecosystem, presented at the PCC/LDAC forum on wikidata, November 12th, 2021.
A walk through of the Linked Art data model, API and community processes. Presented originally at the Rijksmuseum for the 5th Linked Art face to face meeting. Linked Art is a linked open usable data specification created by the community to describe artwork, museum objects, and related bibliographic and archival content.
Link checking, the 404 problem (and the other 403) (Richard Cross, Nottingham...Talis
Link checking, the 404 problem (and the other 403) (Richard Cross, Nottingham Trent University)
How much time should libraries spend validating links from resource lists to electronic and online materials? How much effort should go in to validating copyright and quality assuring materials selected by academics from the web? This session will highlight some of the key issues raised by the business of 'link checking' at Nottingham Trent University, discuss the pros and cons of different assurance methods, and suggest some possible developments in the functionality of Aspire that could support and streamline this area of work.
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataRobert Sanderson
An introduction to Linked Art - why we need it, what it is, and how it works. A great starting point if you're interested in linked open usable data in cultural heritage, especially art museums.
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
My talk on Patent Visualization at The 3rd IEEE Workshop on Interactive Visual Text Analytics. Primary focus is to introduce the Scalable Visual Analytics research that my team is working on. Workshop paper can be found at: http://vialab.science.uoit.ca/textvis2013/papers/Ankam-TextVis2013.pdf
A non-technical introduction to Linked Data, from a Cultural Heritage organization's perspective. This presentation is from the Provenance Index workshop at the Getty in 2016, with an emphasis on why Linked Data is valuable, as well as how it works in general. [Please see speaker notes for explanations of image slides]
The Problem of Interoperability: Archive Grid as an Archival Discovery PlatformAmy Chen
This presentation discusses ArchiveGrid's UX problems arising from inconsistent archival metadata and ponders how we can improve the platform in the future, especially considering the increase in born digital content in collections.
Using Lucene/Solr to Build CiteSeerX and Friendslucenerevolution
Presented by C. Lee Giles, Pennsylvania State University - See complete conference videos - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
Cyberinfrastructure or e-science has become crucial in many areas of science as data access often defines scientific progress. Open source systems have greatly facilitated design and implementation and supporting cyberinfrastructure. However, there exists no open source integrated system for building an integrated search engine and digital library that focuses on all phases of information and knowledge extraction, such as citation extraction, automated indexing and ranking, chemical formulae search, table indexing, etc. We propose the open source SeerSuite architecture which is a modular, extensible system built on successful OS projects such as Lucene/Solr and discuss its uses in building enterprise search and cyberinfrastructure for the sciences and academia. We highlight application domains with examples of specialized search engines that we have built for computer science, CiteSeerX, chemistry, ChemXSeer, archaeology, ArchSeer. acknowledgements, AckSeer, reference recommendation, RefSeer, collaboration recommendation, CollabSeer, and others, all using Solr/Lucene. Because such enterprise systems require unique information extraction approaches, several different machine learning methods, such as conditional random fields, support vector machines, mutual information based feature selection, sequence mining, etc. are critical for performance.
Similar to Provenance and Uncertainty in Linked Art (20)
An introduction to Linked Open Usable Data (LOUD) through the lens of a zooming paradigm, and thoughts on how such a paradigm can help to address some grand challenges of LOUD, including search granularity, trust and reconciliation. Presented to the IDLab / Knowledge at Web Scale department of the University of Ghent in Feb '23
Invited seminar for UIUC's IS 575 class on metadata in theory and practice, about structural metadata practice in RDF/LOD. Touches on OAI-ORE, PCDM, Annotation, IIIF and Linked Art. Challenges explored are graph boundaries, APIs and context specific metadata.
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data EcosystemRobert Sanderson
There have been, and continue to be, many initiatives to address the social, technological, financial and policy-based challenges that throw up roadblocks towards achieving this vision. However, it is hard to tell whether we are making progress, or whether we are eternally waiting for the hyperloop that will never come. If we are to ever be able to answer research questions that require a broad, international corpus of cultural data, then we need an ecosystem that can be characterized with 5 “C”s: Collaborative, Consistent, Connected, Correct and Contextualized. Each of these has implications for the sustainability, innovation, usability, timeliness and ethical considerations that must be addressed in a coherent and holistic manner. As with autonomous vehicles, technology (and perhaps even machine “intelligence”) is a necessary but insufficient component.
In this presentation, I will frame and motivate this grand challenge and propose where we can build connections between the academy, the cultural heritage sector, and industry. The discussion will explore the issues, and highlight some of the successful endeavors and more approachable opportunities where, together, progress can be made.
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingRobert Sanderson
A walk through of a framework based around the distinctions between Abstraction, Implementation and Audience for considering the value and utility of data modeling patterns and paradigms in cultural heritage information systems. In particular, a focus on CIDOC-CRM, BibFrame, RiC-CM/RiC-O, EDM, and IIIF, with the intent to demonstrate best practices and anti-patterns in modeling.
Presentation about usability of linked data, following LODLAM 2020 at the Getty. Discusses JSON-LD 1.1, IIIF, Linked Art, in the context of the design principles for building usable APIs on top of semantically accurate models, and domain specific vocabularies.
In particular a focus on the different abstraction layers between conceptual model, ontology, vocabulary, and application profile and the various uses of the data.
Standards and Communities: Connected People, Consistent Data, Usable Applicat...Robert Sanderson
Keynote presentation at JCDL 2019 at UIUC, on the interaction between standards (development and usage) and communities. Looking at Linked Open Data, digital library protocols, and evaluation of standards practices.
Euromed2018 Keynote: Usability over Completeness, Community over CommitteeRobert Sanderson
Discussion of cultural heritage issues around usability and prioritization with completeness, and focus on bringing together communities rather than small and transient committees. Focus on Linked Open Usable Data, Annotations, JSON-LD, IIIF and Linked.Art.
Background for linked open data at the J Paul Getty Trust, followed by a summary of Linked Open Usable Data, and an initial walkthrough of the https://linked.art/ model.
Linked Open Data is great for recommendations about publishing data, but we need five more stars for the consumer -- How can it be both complete and usable? Design principles for Linked Open Usable Data.
US2TS Conference position paper on publishing and retrieving not just LOD, but LOUD -- Linked Open Usable Data.
APIs are the UIs of Developers, and need:
* Correct Abstraction level for the Audience
* Few Barriers to Entry
* Comprehensible by introspection
* Thorough Documentation with copy-able examples
* Few Exceptions, instead consistent patterns
A walkthrough of the CIDOC-CRM based, LOD data model developed and maintained at https://linked.art/ for describing cultural heritage resources and activities.
IIIF and Linked Data: A Cultural Heritage DAM EcosystemRobert Sanderson
Presentation at DAMLA, November 15 2017, on the adoption of the IIIF image interoperability APIs across the Cultural Heritage sector for access to digital assets. How Linked Open Data then provides interoperable discovery solutions for that content.
Digital Share 2017 presentation about Linked Open Data at The Getty, starting from what LOD is, to why we're interested in it, and some of the practical approaches we're using to make it real.
To be useful, Linked Open Data requires shared identities and the reuse of their identifiers (URIs). This presentation argues that exact identity matching is both theoretically and practically impossible, and proposes some practical considerations for how to create an actual web of data.
Presented as invited seminar at UC Berkeley, February 24th, 2017
Community Challenges for Practical Linked Open Data - Linked Pasts keynoteRobert Sanderson
A call to action to discuss and agree on practical considerations around the creation, publication and discovery of linked open data about historical activities and objects.
Text of approximately what I said: http://bit.ly/usable_lod
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
3. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
• Conceptual Model
• Abstract way to think about the world,
holistically, consistently and coherently
• Ontology
• Shared set of terms to encode that thinking
in a logical, machine-actionable way
• Vocabulary
• Curated set of sub-domain specific terms,
to make the ontology more concrete
encodes
refines
Model
Ontology
Vocabulary
Abstraction Standards
5. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Linked Art Profile
• Domain: Cultural Heritage, especially Artworks
• Model: CIDOC Conceptual Reference Model
• Ontology: RDF encoding of CRM 7.1, plus extensions
• Vocabulary: Getty AAT, plus minimal extensions
• Format: JSON-LD with 10 primary document boundaries
• Target: 90% of the use cases with 10% of the effort
https://linked.art/
6. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
What is Data Usability?
… usability is the degree to which [a thing]
can be used by specified consumers to
achieve [their] quantified objectives with
effectiveness, efficiency, and satisfaction
in a quantified context of use.
who
what
how
where
Usability is dependent on the Audience
https://en.wikipedia.org/wiki/usability
“ ”
10. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Progressive Enhancement
• Data for: Humans - Strings
• Separate entities, with searchable textual descriptions
• Data for: Machines - Structured
• Entities with machine-processable, comparable values
• Data for: The Graph - d’Stributed
• Entities are connected (within and across systems)
• Data for: Research - Stringent
• Sufficient accuracy and comprehensiveness to answer
research questions from aggregated data
Human
Machine
Graph
Research
20. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Custody Use Cases
• Loans: e.g. for exhibitions
• Permanent Loans: For display, with no return timeframe
• Losses/Thefts: Transfer of custody to no one (but not
transfer of ownership … they’re still the legal owner, even if
they don’t know where it is)
• Ownership vs Custody: Museum is the owner, Department
has custody of it (and is part of the Museum)
• Multiple objects at once by repeating the pattern
22. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Encounter Use Cases
• Discovery of a Fossil: e.g. no production event
• Rediscovery of a lost Object: e.g. statue in the sea
• Inventory taking: e.g. curator/collector “encountered” the
object even if no state of the world changed.
• Physical co-location of agent and object: e.g. artist
encountered objects at an exhibition, which then went on
to affect their work
23. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Other Use Cases Covered
• Commissions, Promised Gifts: Obligation of future action
• Transfer of Rights: e.g. performance rights, copyright
• Transfer of Partial Ownership: e.g. asymmetrical shared
ownership (shares in the value of an object)
• Physical Location: Movement between two places, rather
than transfer of rights or currency
• Auctions: A documented structure for sale by auction
26. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Certainty?
• Accuracy: Does the data correctly represent the state of the
real world for the things it describes? (Objective)
• Certainty: Belief of the Publisher as to the extent of the
accuracy of the data. (Subjective)
• Utility: Belief of the Researcher that the data is useful for
fulfilling their current information need. (Subjective,
context specific)
40. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Why Is It Hard?
Consuming systems must …
• Look in multiple places for the same information
• More processing, more code, more developer knowledge
• Understand the vocabulary of un/certainty levels
• What processing needs to occur?
• How can the structure be displayed?
• Be able to merge metadata and metametadata
• When appropriate, based on certainty and use cases
41. Provenance
and
Uncertainty
@azaroth42
robert.
sanderson
@yale.edu
Discussion Question: Why Make It Hard?
• Data for: Humans - Strings
• Separate entities, with searchable textual descriptions
• Data for: Machines - Structured
• Entities with machine-processable, comparable values
Human
Machine
What is the requirement for structured,
rather than string, data for uncertainty?
Looking for hair in materials for an analysis of human remains in a collection, then this record is very useful – high utility for that research question, but low for most others given the uncertain (but perhaps accurate) information.
Looking for the oldest person in our data… 39 trillion years old.
When we talk about trust, we often mean confidence.
When we talk about trust, we often mean confidence.
When we talk about trust, we often mean confidence.