Slides of the presentation for CLARIAH community on the ideas how to make controlled vocabularies sustainable and FAIR (Findable, Accessible, Interoperable, Reusable) with the help of Decentralized Identifiers (DIDs).
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
The document discusses several Azure network architectures including:
1) An Azure landing zone with firewall/WAF that includes hub-spoke VNets with web, business, and data tiers separated across spokes connected to an on-premises network.
2) An Azure network architecture deployed to a primary region including production and non-production subscriptions, VNets, and resource groups separated by function and connected to an on-premises network via VPN.
3) A hub-spoke network topology with shared services and subnets in a central hub VNet and workloads separated across spoke VNets connected to the hub.
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...Amazon Web Services Korea
온프레미스 분석 플랫폼에는 자원 증설 비용, 자원 관리 비용, 신규 자원 도입 및 환경 설정의 리드타임 등 다양한 측면에서의 한계가 존재합니다. 이에 KB국민카드에서는 기존 분석 플랫폼의 한계를 극복함과 동시에 시너지를 낼 수 있는 클라우드 기반 분석 플랫폼을 설계 및 도입하였습니다. 본 사례 소개는 KB국민카드의 데이터 혁신 여정과 노하우를 소개합니다.
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
The document summarizes a meetup about NoSQL databases hosted by AWS in Sydney in 2012. It includes an agenda with presentations on Introduction to NoSQL and using EMR and DynamoDB. NoSQL is introduced as a class of databases that don't use SQL as the primary query language and are focused on scalability, availability and handling large volumes of data in real-time. Common NoSQL databases mentioned include DynamoDB, BigTable and document databases.
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
The document discusses several Azure network architectures including:
1) An Azure landing zone with firewall/WAF that includes hub-spoke VNets with web, business, and data tiers separated across spokes connected to an on-premises network.
2) An Azure network architecture deployed to a primary region including production and non-production subscriptions, VNets, and resource groups separated by function and connected to an on-premises network via VPN.
3) A hub-spoke network topology with shared services and subnets in a central hub VNet and workloads separated across spoke VNets connected to the hub.
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...Amazon Web Services Korea
온프레미스 분석 플랫폼에는 자원 증설 비용, 자원 관리 비용, 신규 자원 도입 및 환경 설정의 리드타임 등 다양한 측면에서의 한계가 존재합니다. 이에 KB국민카드에서는 기존 분석 플랫폼의 한계를 극복함과 동시에 시너지를 낼 수 있는 클라우드 기반 분석 플랫폼을 설계 및 도입하였습니다. 본 사례 소개는 KB국민카드의 데이터 혁신 여정과 노하우를 소개합니다.
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
The document summarizes a meetup about NoSQL databases hosted by AWS in Sydney in 2012. It includes an agenda with presentations on Introduction to NoSQL and using EMR and DynamoDB. NoSQL is introduced as a class of databases that don't use SQL as the primary query language and are focused on scalability, availability and handling large volumes of data in real-time. Common NoSQL databases mentioned include DynamoDB, BigTable and document databases.
Logical Data Fabric: Architectural ComponentsDenodo
Watch full webinar here: https://bit.ly/39MWm7L
Is the Logical Data Fabric one monolithic technology or does it comprise of various components? If so, what are they? In this presentation, Denodo CTO Alberto Pan will elucidate what components make up the logical data fabric.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
This document discusses how semantic web ontologies and knowledge graphs can help reduce high IT costs by providing a common schema and linking data across systems. It introduces AnzoGraph DB, a graph database built on semantic web standards that can perform both analytics and graph algorithms on large datasets. The document demonstrates how public flight delay data can be converted to a knowledge graph and analyzed using techniques like PageRank, shortest paths, and querying for delayed flights. Overall, it argues that semantic technologies can help address the problem of data integration costs by enabling linked and standardized data.
Data is our Product: Thoughts on LOD SustainabilityRobert Sanderson
The document discusses sustainability of cultural heritage linked open data products. It defines sustainability as when running costs are less than value plus shutdown costs. Running costs include technology, content, and staffing. Value includes income, benefits to mission, and intangible benefits. Building sustainability requires maximizing usage, usability, trust, and loyalty among users. Usability, trust, and loyalty develop through community engagement and ensuring the data meets user needs. Sustainability ultimately depends on having championing people to build, support, and use the product.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Databricks: A Tool That Empowers You To Do More With DataDatabricks
In this talk we will present how Databricks has enabled the author to achieve more with data, enabling one person to build a coherent data project with data engineering, analysis and science components, with better collaboration, better productionalization methods, with larger datasets and faster.
The talk will include a demo that will illustrate how the multiple functionalities of Databricks help to build a coherent data project with Databricks jobs, Delta Lake and auto-loader for data engineering, SQL Analytics for Data Analysis, Spark ML and MLFlow for data science, and Projects for collaboration.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
This document provides an overview of Azure Databricks, a Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It discusses key components of Azure Databricks including clusters, workspaces, notebooks, visualizations, jobs, alerts, and the Databricks File System. It also outlines how data engineers can leverage Azure Databricks for scenarios like running ETL pipelines, streaming analytics, and connecting business intelligence tools to query data.
Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. In this session we will learn how to create data integration solutions using the Data Factory service and ingest data from various data stores, transform/process the data, and publish the result data to the data stores.
An introduction to self-service data with Dremio. Dremio reimagines analytics for modern data. Created by veterans of open source and big data technologies, Dremio is a fundamentally new approach that dramatically simplifies and accelerates time to insight. Dremio empowers business users to curate precisely the data they need, from any data source, then accelerate analytical processing for BI tools, machine learning, data science, and SQL clients. Dremio starts to deliver value in minutes, and learns from your data and queries, making your data engineers, analysts, and data scientists more productive.
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
Presentation given at Macquarie University in support of the ARDC 'institutional role in the data commons' project on "Implementing FAIR: Standards in Research Data Management" https://ardc.edu.au/news/data-and-services-discovery-activities-successful-applicants/
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
This document provides an agenda and overview for a workshop on building a data lake on AWS. The agenda includes reviewing data lakes, modernizing data warehouses with Amazon Redshift, data processing with Amazon EMR, and event-driven processing with AWS Lambda. It discusses how data lakes extend traditional data warehousing approaches and how services like Redshift, EMR, and Lambda can be used for analytics in a data lake on AWS.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
Decentralized Identifiers (DIDs) are self-sovereign identifiers for individuals, organizations, and things that are persistent, dereferenceable, decentralized, and cryptographically verifiable identifiers registered in a blockchain or other decentralized network. There are different DID methods that must have a specification and resolver implementation. A DID document containing public keys, service endpoints, and other metadata is resolved from the DID. DIDs enable verifiable credentials and authentication through challenge-response protocols using the DID document. Standards groups are working on further developing DIDs, verifiable credentials, and rebooting the web of trust through decentralized identity.
Logical Data Fabric: Architectural ComponentsDenodo
Watch full webinar here: https://bit.ly/39MWm7L
Is the Logical Data Fabric one monolithic technology or does it comprise of various components? If so, what are they? In this presentation, Denodo CTO Alberto Pan will elucidate what components make up the logical data fabric.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
This document discusses how semantic web ontologies and knowledge graphs can help reduce high IT costs by providing a common schema and linking data across systems. It introduces AnzoGraph DB, a graph database built on semantic web standards that can perform both analytics and graph algorithms on large datasets. The document demonstrates how public flight delay data can be converted to a knowledge graph and analyzed using techniques like PageRank, shortest paths, and querying for delayed flights. Overall, it argues that semantic technologies can help address the problem of data integration costs by enabling linked and standardized data.
Data is our Product: Thoughts on LOD SustainabilityRobert Sanderson
The document discusses sustainability of cultural heritage linked open data products. It defines sustainability as when running costs are less than value plus shutdown costs. Running costs include technology, content, and staffing. Value includes income, benefits to mission, and intangible benefits. Building sustainability requires maximizing usage, usability, trust, and loyalty among users. Usability, trust, and loyalty develop through community engagement and ensuring the data meets user needs. Sustainability ultimately depends on having championing people to build, support, and use the product.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Databricks: A Tool That Empowers You To Do More With DataDatabricks
In this talk we will present how Databricks has enabled the author to achieve more with data, enabling one person to build a coherent data project with data engineering, analysis and science components, with better collaboration, better productionalization methods, with larger datasets and faster.
The talk will include a demo that will illustrate how the multiple functionalities of Databricks help to build a coherent data project with Databricks jobs, Delta Lake and auto-loader for data engineering, SQL Analytics for Data Analysis, Spark ML and MLFlow for data science, and Projects for collaboration.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
This document provides an overview of Azure Databricks, a Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It discusses key components of Azure Databricks including clusters, workspaces, notebooks, visualizations, jobs, alerts, and the Databricks File System. It also outlines how data engineers can leverage Azure Databricks for scenarios like running ETL pipelines, streaming analytics, and connecting business intelligence tools to query data.
Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. In this session we will learn how to create data integration solutions using the Data Factory service and ingest data from various data stores, transform/process the data, and publish the result data to the data stores.
An introduction to self-service data with Dremio. Dremio reimagines analytics for modern data. Created by veterans of open source and big data technologies, Dremio is a fundamentally new approach that dramatically simplifies and accelerates time to insight. Dremio empowers business users to curate precisely the data they need, from any data source, then accelerate analytical processing for BI tools, machine learning, data science, and SQL clients. Dremio starts to deliver value in minutes, and learns from your data and queries, making your data engineers, analysts, and data scientists more productive.
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
Presentation given at Macquarie University in support of the ARDC 'institutional role in the data commons' project on "Implementing FAIR: Standards in Research Data Management" https://ardc.edu.au/news/data-and-services-discovery-activities-successful-applicants/
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
This document provides an agenda and overview for a workshop on building a data lake on AWS. The agenda includes reviewing data lakes, modernizing data warehouses with Amazon Redshift, data processing with Amazon EMR, and event-driven processing with AWS Lambda. It discusses how data lakes extend traditional data warehousing approaches and how services like Redshift, EMR, and Lambda can be used for analytics in a data lake on AWS.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
Decentralized Identifiers (DIDs) are self-sovereign identifiers for individuals, organizations, and things that are persistent, dereferenceable, decentralized, and cryptographically verifiable identifiers registered in a blockchain or other decentralized network. There are different DID methods that must have a specification and resolver implementation. A DID document containing public keys, service endpoints, and other metadata is resolved from the DID. DIDs enable verifiable credentials and authentication through challenge-response protocols using the DID document. Standards groups are working on further developing DIDs, verifiable credentials, and rebooting the web of trust through decentralized identity.
Blockchain R&D to Decentralized Identity DeploymentAnil John
DHS S&T SVIP Presentation on the DHS work using W3C Verifiable Credentials and W3C Decentralized Identifiers at the EU/EC NGI's ESSIF-Lab Final Event in Brussels on 12 Dec, 2022.
The document provides an overview of decentralized identifiers (DIDs) and the DID universal resolver. It discusses how DIDs enable self-sovereign identity and describes their key properties. It then explains how the universal resolver allows looking up DID documents across different DID methods through configurable drivers. Finally, it outlines several potential applications of DIDs like verifiable credentials, authentication, and decentralized key management.
Part 1: Introduction to Self-Sovereing Identity (SSI), Verifiable Credentials, Standards defined by Decentralised Identity Foundation and W3C.
Part2: How to use it with Corda to develop scalable, decentralised applications that use smart contracts and SSI to orchestrate complex, multi-party processes.
DevDay: Extending CorDapps with Self-Sovereign Identity: Technology Deepdive ...R3
The document discusses self-sovereign identity and decentralized identifiers (DIDs). It introduces emerging standards for decentralized identity being developed by the Decentralized Identity Foundation. Key concepts discussed include DIDs, DID documents, verifiable credentials, and implementations like uPort, Ethereum, IPFS, Blockstack, and Sovrin/Indy. The document provides examples of how DIDs can represent identities and how verifiable credentials issued to DID subjects can be verified.
OSCON 2018 Getting Started with Hyperledger IndyTracy Kuhrt
Presented at OSCON 2018. Hyperledger Indy is a distributed ledger built for decentralized identity and is one of the open source frameworks hosted by Hyperledger. It provides tools, libraries, and reusable components for creating and using independent digital identities rooted on blockchains or other distributed ledgers. In this presentation, I introduce The Linux Foundation and Hyperledger. We look at Decentralized Identity Concepts -- identity models, decentralized identity, zero-knowledge proofs, and verifiable credentials. We look at a demo that utilizes Hyperledger Indy and these concepts. We then look at Hyperledger Indy's software stack and roadmap and touch on how you can get involved.
The Web of Linked Open Data, or LOD, is the most relevant achievement of the Semantic Web. Initially proposed by Tim Berners-Lee in a seminal paper published in Scientific American in 2001, the Semantic Web envisions a web where software agents can interact with large volumes of structured, easy to process data. It is now when users have at our disposal the first, mature results of this vision. Among them, and probably the most significant ones, are the different LOD initiatives and projects that publish open data in standard formats like RDF.
This presentation provides an overview and comparison of different LOD initiatives in the area of patent information, and analyses potential opportunities for building new information services based on largely available datasets of patent information. Information is based on different interviews conducted with innovation agents and on the analysis of professional bibliography and current implementations.
LOD opportunities are not only restricted to information aggregators, but also to end-users and innovation agents that need to face with the difficulties of dealing with large amounts of data. In both cases, the opportunities offered by LOD need to be assessed, as LOD has just become a standard, universal method to distribute, share and access data.
The digital object identifier (DOI) system provides a persistent unique identifier for digital objects. A DOI name consists of a prefix assigned to a registrant and a suffix chosen by the registrant. The DOI remains permanently linked to the object even if its location changes. Resolving a DOI provides current metadata and links to access the object.
This document provides an overview of authentication and authorization with federated identity services. It defines key concepts like authentication vs authorization, federated identity, assertions, OpenID, OAuth, Active Directory Federation Services, OpenID Connect, Security Assertion Markup Language, JSON Web Tokens, and FIDO U2F. It also discusses user experience wins, threat modeling considerations, example attacks to consider, and questions from the audience.
Decentralized Identifiers (DIDs): The Fundamental Building Block of Self-Sove...SSIMeetup
Drummond Reed, Chief Trust Officer at Evernym, will explain in our second Webinar "Decentralized Identifiers (DIDs) - Building Block of Self-Sovereign Identity (SSI)" giving us the background on how DIDs work, where they come from and why they are important for Blockchain based Digital Identity.
The document discusses technical issues and opportunities for improving the Global Biodiversity Information Facility's (GBIF) registry and portals for discovering biodiversity resources. It analyzes GBIF's past use of UDDI registry and data portal, and outlines challenges in developing a new graph-based registry model to better represent the network of institutions, collections, and relationships. The new registry aims to improve discoverability through associating automated and human-generated metadata, uniquely identifying resources, and defining services and vocabularies.
This is a talk I was asked to give at the What is Universe? at the University of Oregon, (on their Portland Campus). I cover this history of the Internet Identity Workshop and talk about its core nature as a torus / bowl a feminine form and how this has resulted in the innovation of Self-Sovereign Identity
Decentralized identity aims to give users control over their digital identities and data. However, decentralized identity systems also introduce new attack surfaces. Attackers could abuse protocols to access sensitive user data or present fake credentials. Successful attacks could undermine user trust and adoption of decentralized identity. Ongoing research and adoption of security best practices are needed to strengthen decentralized identity systems against current and future threats.
Introduction to Self-Sovereign IdentityKaryl Fowler
Juan Caballero from Spherity and Karyl Fowler from Transmute co-presented the Introduction to Self-Sovereign Identity (SSI) session at the 30th Internet Identity Workshop (IIW) in April 2020, demonstrating to newcomers the difference between the values associated with the "SSI movement" and "collection of technologies" that power applications that embody some of said values.
Returning to Online Privacy - W3C/ANU Future of the Web Roadshow 20190221David Wood
This document discusses decentralized identifiers (DIDs) and verifiable credentials. It begins by explaining problems with current online identifiers, such as being controlled by centralized entities and not belonging to individuals. It then introduces DIDs as a new type of globally unique identifier that is owned by individuals, stored on a decentralized ledger, and cryptographically verified. DIDs resolve to DID documents containing public keys and authentication mechanisms. The document discusses the W3C efforts to standardize DIDs and verifiable credentials that can be issued and verified using DIDs. It provides examples of DID syntax and components of DID documents.
A system for distributed minting and management of persistent identifiersLukasz Bolikowski
The document proposes a system called Peer-Minted Persistent Identifiers (PMPIs) to decentralize the minting and management of persistent identifiers. PMPIs would allow anyone to mint and manage their own IDs by storing the full database of IDs and revisions across many copies in a distributed network, similar to Bitcoin and currencies backed by central banks. The system aims to provide long-term persistence of IDs even if managing organizations cease to exist, through properties like integrity verification of the ID database and no single party being able to shut down the system. Next steps include finding stakeholders, securing funding, and implementing a prototype to evaluate design decisions like proof-of-work calibration and authorized key roles.
Similar to Decentralised identifiers for CLARIAH infrastructure (20)
Dataverse repository for research data in the COVID-19 Museumvty
The Covid-19 Museum has an ambition to create a platform to deposit, consult, aggregate and study heterogeneous data about the pandemics using features of a distributed web service. To achieve this purpose, Dataverse has been selected as a reliable FAIR data repository with built-in search engine and functionality that allows adding computing resources to explore archived resources both on data and metadata. Presentation by
Slava Tykhonov, DANS-KNAW (The Royal Netherlands Academy of Arts and Sciences). Université Paris Cité, 19 April 2022.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Controlled vocabularies and ontologies in Dataverse data repositoryvty
This document discusses supporting external controlled vocabularies in Dataverse. It proposes implementing a JavaScript interface to allow linking metadata fields to terms from external vocabularies accessed via SKOSMOS APIs. Several challenges are identified, such as applying support to any field, backward compatibility, and ensuring vocabularies come from authoritative sources. Caching concepts and linking dataset files directly to terms are also proposed to improve interoperability.
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
This document summarizes an presentation about automating CI/CD testing, installation, and deployment of Dataverse in the European Open Science Cloud. It discusses using Docker and Kubernetes for deployment, a community-driven QA plan using pyDataverse for test automation, and providing quality assurance as a service. The presentation also covers topics like the CESSDA maturity model, integrating Dataverse on Google Cloud, and using serverless computing for some Dataverse applications and services.
Building COVID-19 Museum as Open Science Projectvty
This document discusses building a COVID-19 Museum as an open science project. It describes the speaker's background working on various data management projects. It discusses moving towards open science and sharing data according to FAIR principles. It outlines the Time Machine project for digitizing historical documents and its approach to data management. The rest of the document discusses using the Dataverse platform to build repositories, linking metadata to ontologies, using tools like Weblate for translations, and exploring the use of artificial intelligence and machine learning to enhance metadata and facilitate human-in-the-loop review processes.
External controlled vocabularies support in Dataversevty
This presentation discusses adding support for external controlled vocabularies to the Dataverse data repository platform. It describes how ontologies like SKOS can be used to represent vocabularies and allow linking metadata fields in Dataverse to terms. The presentation proposes developing a Semantic Gateway plugin for Dataverse that would allow browsing and linking to external vocabularies hosted in the SKOSMOS framework via its API. This could improve metadata by allowing standardized, linked terms and help make data more FAIR.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
This document discusses the 5 year evolution of Dataverse, an open source data repository platform. It began as a tool for collaborative data curation and sharing within research teams. Over time, features were added like dataset version control, APIs, and integration with other systems. The document outlines challenges around maintenance and sustainability. It also covers efforts to improve Dataverse's interoperability, such as integrating metadata standards and controlled vocabularies, and making datasets FAIR compliant. The goal is to establish Dataverse as a core component of the European Open Science Cloud by improving areas like software quality, integration with tools, and standardization.
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
Dataverse can be deployed using Docker containers to improve maintainability and portability. The document discusses how Docker can isolate applications and their dependencies into portable containers. It provides an example of deploying Dataverse as a set of microservices within Docker containers. Instructions are included on building Docker images, running containers, and managing the containers and images through commands and tools like Docker Desktop, Docker Hub, and Docker Compose.
Technical integration of data repositories status and challengesvty
This document discusses technical integration of data repositories, including:
- Previous integration initiatives focused on metadata integration using OAI-PMH and ResourceSync protocols, as well as aggregators like OpenAIRE.
- Challenges to integration include different levels of software/service maturity, maintenance of distributed applications, and use of common standards and vocabularies.
- Potential integration efforts could focus on improving FAIRness, metadata/data flexibility, and connections between repositories, software, and computing resources to better enable reuse of EOSC data and services.
SSHOC Dataverse in the European Open Science Cloudvty
This project summary covers the SSHOC project which aims to create a social sciences and humanities section of the European Open Science Cloud by maximizing data reuse through open science principles. The project will interconnect existing and new infrastructures through a clustered cloud, establish governance for SSH-EOSC, and provide a research data repository service for SSH institutions through further developing the Dataverse platform on EOSC. The project involves 47 partners across 20 beneficiaries and 27 linked third parties with a budget of €14,455,594.08 over 40 months to achieve these objectives.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
2. Using Decentralized identifiers (DIDs) for any type of content
Source: Wikipedia
We’re considering experimental implementation of the decentralized identifiers for controlled
vocabularies and content types extension to archive various types of content.
DIDs can be assigned to any artefacts including images, audio and video, for example, to store
and link metadata records and provenance information next to their digitized content.
3. DOI costs
DataCite agency charge some fee from data providers depending on the amount of identifiers
and it can be significant amount starting from 1 million DOIs. What about DIDs?
4. Typical problems of “centralized” identifiers
Disambiguation and authorship issues:
● two authors with the same name mentioned in different papers, how do you know who is who?
● it’s very difficult to assign a paper to a specific person with ORCID without knowing the fact that it’s the original author
● some people can claim their false (fraudulent) authorship
Centralized entity which can be considered as a single point of failure.
Typical questions:
● can email be considered as identifier?
● what to do when email is changed because the domain name is changing and the identifier disappears
or not resolvable any more?
● how reliable is ORCID database?
5. “Centralized” controlled vocabularies
The European Language Social
Science Thesaurus (ELSST) hosted
by various data providers like
CESSDA and ODISSEI in Skosmos.
CESSDA has updated version with
more language properties.
How about versions of
vocabularies and concepts
changes and drift?
6. Decentralized identifiers as possible solution
We envision the near future where the it will be possible to create a decentralized system does which will not depend on any specific
registry, one provider, one authority, etc., so all connections will be established in a peer-to-peer network, and but will be persistent at
the same time.
The resolution of the global decentralized identifier (DID) should be cryptographically verifiable to prove the identity and the
ownership of that identifier.
Core DID features are listed below:
1. A permanent (persistent) identifier (never change)
2. A resolvable identifier (you can look it up to discover metadata)
3. A cryptographically-verifiable identifier (with private and public keys)
4. A decentralized identifier (no centralized authority)
DID should bring control of all provenance and metadata back to their owners instead of giving them away. In the same time public part
will/could not be very different from other persistent identifiers like DOIs and even replace them for the specific use cases like sharing
sensitive data.
7. The place of DID as unified resource
Source: “Self-Sovereign Identity”. by Alex Preukschat, Drummond Reed
DID can be considered as “replacement” of domain names and DNS from the “centralized” network
8. Example of DID with private and public key, and service endpoints
Service endpoints can tell how exactly to interact with the subject, what kind of protocols, what kind of network endpoints
are available to connect, for example, to an agent that represents the data subjects so that you can then exchange
credentials or some other messages.
10. DID URLs with parameters
Source: Decentralized identifiers (DIDs) fundamentals and deep dive, SSIMeetup
11. “Decentralized” technology is not the same as “Blockchain” technology
“Blockchain is a digitally distributed database that is shared among nodes, which are computers in the blockchain network, that makes
it difficult or impossible to change, hack, or cheat the system”.
Blockchain parties:
- Holder (Owner of the Verifiable Credential)
- Issuer (provides a credential to a holder and signs the credential with their private key)
- Verifier can check the blockchain to ensure that the issued certificate belongs to who it was issued to.
it’s not necessary to use blockchain to release decentralized identifiers as there are about 100 methods to register DIDs being
developed by various companies and organizations in the world. They implemented in the different way the same spec for interface
where input and output are standardized.
OYDID method was developed in Vienna and provides a self-sustained environment for managing digital identifiers
(DIDs). The did:oyd method links the identifier cryptographically to the DID Document and through also cryptographically
linked provenance information in a public log it ensures resolving to the latest valid version of the DID Document.
12. Universal Resolver for DIDs
Try this! https://dev.uniresolver.io
curl https://dev.uniresolver.io/1.0/identifiers/did:oyd:zQmdQvLdpogfEf5EHK7778EM9xoxFMVFdJgRD7SdYRcCHeL
13. OYDID methods explained
“OYDID (Own Your Decentralized IDentifier) takes the approach to not maintain DID and DID Document on a public ledger
but on one or more local storages (that usually are publicly available). Through cryptographically linking the DID identifier
to the DID Document, and furthermore linking the DID Document to a chained provenance trail, the same security and
validation properties as a traditional DID are maintained while avoiding highly redundant storage and general public access.”
(from OYDID docs)
14. DIDs for controlled vocabularies
Generic problem of CVs: the most of controlled vocabularies are published and distributed in not sustainable way and often
don’t even have persistent identifiers resolving to their concepts.
Possible solution for CLARIAH FAIR vocabularies:
● assign DID identifier to every vocabulary concept and use their built-in “update” mechanism to keep all revisions in the chain of
linked DIDs resolving to the archived version of every change
● metadata records can be linked in the distributed way to DID identifiers corresponding to a specific version of concept
preserved in data ledger
● this approach is more sustainable by design and can be considered as a step towards FAIR vocabularies, also high scores after
FAIR assessment
● vocabulary management/update in the hands of vocabulary owner/creator, separate private key will be generated for every
concept and should be stored it in a secure place
● extra properties and attributes could be added to DID documents representing specific vocabulary concept, such as
provenance information containing the date of creation or modification, authors, the name of ontology, relations to other
ontologies. They can even have their own labels.
● statistics of concepts usage, linkages, relations and other metrics will be available directly from the DID chains
15. CoronaWhy Proof of Concept on DIDs
Dataverse with information on Monkeypox 2022 outbreak use DIDs as persistent identifiers
https://datasets.coronawhy.org
16. Graph Network Sustainability with DIDs
COVID-19 Museum Knowledge Graph. Q142 Wikidata: France@en, Frankrijk@nl, Frankreich@de, Франція@ua, France@fr