Intervention de Stefanie Gehrke au Workshop "TEI and Neighbouring Standards" à la DiXiT Convention Week 2015 (Huygens ING, La Haye, 15 septembre 2015).
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
- SURF is an organization in the Netherlands that works to improve ICT infrastructure for higher education and research.
- SURF is working on projects to develop "enhanced publications" which combine traditional publications like text with additional materials like data, maps, images and annotations.
- Several projects have been funded to create enhanced publications in fields like archaeology and psychology. Challenges include presentation, identification, long-term preservation and developing tools and infrastructure to support enhanced publications.
- Moving forward, SURF will work on developing repository infrastructure to store and share enhanced publications, creating guidelines and incentivizing their creation through things like legal reports and reward systems.
This document describes two metadata enrichment micro-services developed for the LoCloud project: a background link micro-service and a vocabulary matching micro-service. The background link micro-service uses DBpedia Spotlight, a state-of-the-art tool for Named Entity Disambiguation, to automatically link cultural heritage items to Wikipedia pages. The vocabulary matching micro-service links items to relevant concepts from a provided vocabulary. Both micro-services are implemented as REST services and deployed on virtual machines. The services are used in enrichment workflows through the LoCloud Generic Enrichment Service.
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...locloud
The document introduces the core technical infrastructure specifications of the LoCloud project. The various technical aspects of the infrastructure are presented in detail, focusing on their advantages and limitations. The challenges of implementing this infrastructure are also described throughout this deliverable.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Intervention de Stefanie Gehrke au Workshop "TEI and Neighbouring Standards" à la DiXiT Convention Week 2015 (Huygens ING, La Haye, 15 septembre 2015).
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
- SURF is an organization in the Netherlands that works to improve ICT infrastructure for higher education and research.
- SURF is working on projects to develop "enhanced publications" which combine traditional publications like text with additional materials like data, maps, images and annotations.
- Several projects have been funded to create enhanced publications in fields like archaeology and psychology. Challenges include presentation, identification, long-term preservation and developing tools and infrastructure to support enhanced publications.
- Moving forward, SURF will work on developing repository infrastructure to store and share enhanced publications, creating guidelines and incentivizing their creation through things like legal reports and reward systems.
This document describes two metadata enrichment micro-services developed for the LoCloud project: a background link micro-service and a vocabulary matching micro-service. The background link micro-service uses DBpedia Spotlight, a state-of-the-art tool for Named Entity Disambiguation, to automatically link cultural heritage items to Wikipedia pages. The vocabulary matching micro-service links items to relevant concepts from a provided vocabulary. Both micro-services are implemented as REST services and deployed on virtual machines. The services are used in enrichment workflows through the LoCloud Generic Enrichment Service.
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...locloud
The document introduces the core technical infrastructure specifications of the LoCloud project. The various technical aspects of the infrastructure are presented in detail, focusing on their advantages and limitations. The challenges of implementing this infrastructure are also described throughout this deliverable.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsMatteoBelcao
A theorical & practical comparison between the currently most used open-source Knowledge Graphs: DBpedia, Wikidata, Yago
Practical explaination of how to query each Knwlwdge Graph with SPARQL and the sandboxes
BESOCIAL A Knowledge Graph for Social Media ArchivingSven Lieber
The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.
Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou
The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912
For Biodiversity Informatics workshop in Stockholm, Friday September 13. Describing some of the tools in the mx system for mx; a collaborative web-based content management system for evolutionary systematists, particularly those working on descriptive taxonomy.
Yoder, M.J., Dole, K., Seltmann, K., and Deans, A. 2006-Present. Mx, a collaborative web based
content management for biological systematists.
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioCulturaItalia
Maria Emilia Masci, Scuola Normale Superiore, Linked Open Data (LOD): Un’Opportunità per il Patrimonio Culturale Digitale, Roma, ICCU, 29 novembre 2013
The document discusses the Research and Education Space (RES) project, which aims to create a web-based platform called Acropolis that aggregates and interconnects cultural heritage resources from various institutions like the British Library, British Museum, BBC archive, and others. It describes Acropolis' technical approach of using crawlers, indexes, and APIs to make these resources searchable. It also outlines challenges around standardizing heterogeneous metadata, reliably linking entities, and usability issues regarding tools, licensing, and stakeholder engagement. The author is looking to provide guidance on publishing cultural data as linked open data to help address these challenges.
This document summarizes a workshop on metadata and digital libraries. It discusses the objectives of library systems and how they impact metadata. The workshop introduces Dublin Core metadata and examines how functional requirements inform system design and metadata decisions. Participants analyze sample metadata and use cases to understand these concepts. The summary highlights that system objectives guide metadata, and functional requirements defined through use cases specify required system behaviors and metadata.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
This document discusses the power of repositories as infrastructure for open science. It notes that individual repositories have value for their institutions, but that their true value lies in their potential for interconnection to create a unified network providing access to research results. This network requires open access content and interoperability between repositories. OpenAIRE is presented as working to realize this potential through services that support content enrichment, notifications to repositories of relevant research, and usage statistics. Funders are also integrating with OpenAIRE to help monitor open access compliance and the impact of research funding.
UNED Online Reputation Monitoring Team at RepLab 2013Damiano Spina
This document summarizes the UNED NLP & IR Group's participation in the RepLab 2013 evaluation forum. It describes their approaches to the four subtasks of the reputation monitoring task: filtering, polarity classification, topic detection, and topic priority. For filtering, they found that using entity-specific training data improved performance. Their domain-adaptive affective lexicon approach was less competitive for polarity classification compared to other RepLab submissions. For topic detection, three of their approaches performed competitively with other RepLab submissions. Integrating different signals like polarity, novelty and centrality to determine topic priority remains a challenge.
From data portal to knowledge portal: Leveraging semantic technologies to sup...Xiaogang (Marshall) Ma
The document discusses the Deep Carbon Virtual Observatory (DCvO), which aims to provide integrated access to scientific data and information related to deep carbon science. It describes the DCvO's architecture, which leverages semantic technologies and ontologies to link different types of data and entities. The DCvO provides visualization tools to discover information by clicking through linked concepts, facilitates potential collaborations, and acts as an integrated portal for diverse scientific content. It also discusses some of the DCvO's boundary activities, like linking datasets to publications and integrating with external data repositories.
This document discusses developing a shared vocabulary for systems engineering by migrating content from documents to a wiki environment. It addresses challenges like justifying an open approach, determining the best structure for the wiki, and establishing governance models for multi-author teams. The paper proposes an integrated architecture using techniques like text extraction, morphological analysis, and integrating different dimensions to develop and manage the shared vocabulary in a wiki.
The document discusses developing a shared vocabulary for systems engineering by migrating content from documents to a wiki environment. It addresses challenges around justifying an open approach, determining the best structure and governance model, and proposes extracting terms from text and using morphological analysis to represent concepts and their relationships in an integrated way. Future work would evaluate automated approaches against existing content and develop integrated metrics.
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaElena-Oana Tabaranu
Tags are a very effcient method of describing information
with metadata. Adding semantic information to the keywords allows
computers to comprehend what the pages are saying and use that knowledge to oer better service to humans when interacting with them. The
tagging extension for the XWiki Platform links the user-defined keywords
with semantic information from the DBpedia knowledge base.
Building COVID-19 Museum as Open Science Projectvty
This document discusses building a COVID-19 Museum as an open science project. It describes the speaker's background working on various data management projects. It discusses moving towards open science and sharing data according to FAIR principles. It outlines the Time Machine project for digitizing historical documents and its approach to data management. The rest of the document discusses using the Dataverse platform to build repositories, linking metadata to ontologies, using tools like Weblate for translations, and exploring the use of artificial intelligence and machine learning to enhance metadata and facilitate human-in-the-loop review processes.
The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
Seminar: OAIS Model application in digital preservation projectsMichael Day
Presentation slides from a seminar on the Reference Model for an Open Archival Information System (OAIS) at La preservación del patrimonio digital: conceptos básicos y principales iniciativas, Ministerio de Cultura, Madrid, Spain, March 15th, 2006
- The document discusses the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard which allows interoperability between digital archives and repositories.
- It describes key aspects of the OAI-PMH standard including verbs, identifiers, sets, data and service providers, and harvesting metadata from multiple sources.
- The document also provides an example of implementing OAI-PMH through the CulturaItalia project in Italy which aggregates metadata about artworks in Tuscany from different source repositories.
One Standard to rule them all?: Descriptive Choices for Open EducationR. John Robertson
R. John Robertson1, Lorna Campbell1, Phil Barker2, Li Yuan3, and Sheila MacNeill1
1Centre for Academic Practice and Learning Enhancement, University of Strathclyde, 2Institute for Computer Based Learning, Heriot-Watt University 3Institute for Cybernetic Education, University of Bolton
Drawing on our experience of supporting a nationwide Open Educational Resources programme (the UKOER programme), this presentation will consider the diverse range of approaches to describing OERs that have emerged across the programme and their impact on resource sharing, workflows, and an aggregate view of the resources.
Due to the diverse nature of the projects in the programme, ranging from individual educators to discipline-based consortia and institutions, it was apparent that no one technical or descriptive solution would fit all. Consequently projects were mandated to supply only a limited amount of descriptive information (programme tag, author, title, date, url, file format, file size, rights) with some additional information suggested (language, subject classifications, keywords, tags, comments, description). Projects were free to choose how this information should be encoded (if at all), stored, and shared.
In response, the projects have taken many different approaches to the description and management of resources. These range from using traditional highly structured and detailed metadata standards to approaches using whatever descriptions are supported by particular web2.0 applications. This experimental approach to resource description offers the wider OER community an opportunity to examine and assess the implications of different strategies for resource description and management
This paper illustrates a number of examples of projects’ approaches to description, noting the workflows and effort involved. We will consider the relationship of the choice of tool (repository, web2.0 application, VLE) to the choice of standards; and the relationship between local requirements and those of the wider community.
We will consider the impact of those choices on the dissemination and discoverability of resources. For example, the implications of resource description choices for discovery services which draw on multiple sources of OERs.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsMatteoBelcao
A theorical & practical comparison between the currently most used open-source Knowledge Graphs: DBpedia, Wikidata, Yago
Practical explaination of how to query each Knwlwdge Graph with SPARQL and the sandboxes
BESOCIAL A Knowledge Graph for Social Media ArchivingSven Lieber
The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.
Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou
The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912
For Biodiversity Informatics workshop in Stockholm, Friday September 13. Describing some of the tools in the mx system for mx; a collaborative web-based content management system for evolutionary systematists, particularly those working on descriptive taxonomy.
Yoder, M.J., Dole, K., Seltmann, K., and Deans, A. 2006-Present. Mx, a collaborative web based
content management for biological systematists.
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioCulturaItalia
Maria Emilia Masci, Scuola Normale Superiore, Linked Open Data (LOD): Un’Opportunità per il Patrimonio Culturale Digitale, Roma, ICCU, 29 novembre 2013
The document discusses the Research and Education Space (RES) project, which aims to create a web-based platform called Acropolis that aggregates and interconnects cultural heritage resources from various institutions like the British Library, British Museum, BBC archive, and others. It describes Acropolis' technical approach of using crawlers, indexes, and APIs to make these resources searchable. It also outlines challenges around standardizing heterogeneous metadata, reliably linking entities, and usability issues regarding tools, licensing, and stakeholder engagement. The author is looking to provide guidance on publishing cultural data as linked open data to help address these challenges.
This document summarizes a workshop on metadata and digital libraries. It discusses the objectives of library systems and how they impact metadata. The workshop introduces Dublin Core metadata and examines how functional requirements inform system design and metadata decisions. Participants analyze sample metadata and use cases to understand these concepts. The summary highlights that system objectives guide metadata, and functional requirements defined through use cases specify required system behaviors and metadata.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
This document discusses the power of repositories as infrastructure for open science. It notes that individual repositories have value for their institutions, but that their true value lies in their potential for interconnection to create a unified network providing access to research results. This network requires open access content and interoperability between repositories. OpenAIRE is presented as working to realize this potential through services that support content enrichment, notifications to repositories of relevant research, and usage statistics. Funders are also integrating with OpenAIRE to help monitor open access compliance and the impact of research funding.
UNED Online Reputation Monitoring Team at RepLab 2013Damiano Spina
This document summarizes the UNED NLP & IR Group's participation in the RepLab 2013 evaluation forum. It describes their approaches to the four subtasks of the reputation monitoring task: filtering, polarity classification, topic detection, and topic priority. For filtering, they found that using entity-specific training data improved performance. Their domain-adaptive affective lexicon approach was less competitive for polarity classification compared to other RepLab submissions. For topic detection, three of their approaches performed competitively with other RepLab submissions. Integrating different signals like polarity, novelty and centrality to determine topic priority remains a challenge.
From data portal to knowledge portal: Leveraging semantic technologies to sup...Xiaogang (Marshall) Ma
The document discusses the Deep Carbon Virtual Observatory (DCvO), which aims to provide integrated access to scientific data and information related to deep carbon science. It describes the DCvO's architecture, which leverages semantic technologies and ontologies to link different types of data and entities. The DCvO provides visualization tools to discover information by clicking through linked concepts, facilitates potential collaborations, and acts as an integrated portal for diverse scientific content. It also discusses some of the DCvO's boundary activities, like linking datasets to publications and integrating with external data repositories.
This document discusses developing a shared vocabulary for systems engineering by migrating content from documents to a wiki environment. It addresses challenges like justifying an open approach, determining the best structure for the wiki, and establishing governance models for multi-author teams. The paper proposes an integrated architecture using techniques like text extraction, morphological analysis, and integrating different dimensions to develop and manage the shared vocabulary in a wiki.
The document discusses developing a shared vocabulary for systems engineering by migrating content from documents to a wiki environment. It addresses challenges around justifying an open approach, determining the best structure and governance model, and proposes extracting terms from text and using morphological analysis to represent concepts and their relationships in an integrated way. Future work would evaluate automated approaches against existing content and develop integrated metrics.
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaElena-Oana Tabaranu
Tags are a very effcient method of describing information
with metadata. Adding semantic information to the keywords allows
computers to comprehend what the pages are saying and use that knowledge to oer better service to humans when interacting with them. The
tagging extension for the XWiki Platform links the user-defined keywords
with semantic information from the DBpedia knowledge base.
Building COVID-19 Museum as Open Science Projectvty
This document discusses building a COVID-19 Museum as an open science project. It describes the speaker's background working on various data management projects. It discusses moving towards open science and sharing data according to FAIR principles. It outlines the Time Machine project for digitizing historical documents and its approach to data management. The rest of the document discusses using the Dataverse platform to build repositories, linking metadata to ontologies, using tools like Weblate for translations, and exploring the use of artificial intelligence and machine learning to enhance metadata and facilitate human-in-the-loop review processes.
The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
Seminar: OAIS Model application in digital preservation projectsMichael Day
Presentation slides from a seminar on the Reference Model for an Open Archival Information System (OAIS) at La preservación del patrimonio digital: conceptos básicos y principales iniciativas, Ministerio de Cultura, Madrid, Spain, March 15th, 2006
- The document discusses the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard which allows interoperability between digital archives and repositories.
- It describes key aspects of the OAI-PMH standard including verbs, identifiers, sets, data and service providers, and harvesting metadata from multiple sources.
- The document also provides an example of implementing OAI-PMH through the CulturaItalia project in Italy which aggregates metadata about artworks in Tuscany from different source repositories.
One Standard to rule them all?: Descriptive Choices for Open EducationR. John Robertson
R. John Robertson1, Lorna Campbell1, Phil Barker2, Li Yuan3, and Sheila MacNeill1
1Centre for Academic Practice and Learning Enhancement, University of Strathclyde, 2Institute for Computer Based Learning, Heriot-Watt University 3Institute for Cybernetic Education, University of Bolton
Drawing on our experience of supporting a nationwide Open Educational Resources programme (the UKOER programme), this presentation will consider the diverse range of approaches to describing OERs that have emerged across the programme and their impact on resource sharing, workflows, and an aggregate view of the resources.
Due to the diverse nature of the projects in the programme, ranging from individual educators to discipline-based consortia and institutions, it was apparent that no one technical or descriptive solution would fit all. Consequently projects were mandated to supply only a limited amount of descriptive information (programme tag, author, title, date, url, file format, file size, rights) with some additional information suggested (language, subject classifications, keywords, tags, comments, description). Projects were free to choose how this information should be encoded (if at all), stored, and shared.
In response, the projects have taken many different approaches to the description and management of resources. These range from using traditional highly structured and detailed metadata standards to approaches using whatever descriptions are supported by particular web2.0 applications. This experimental approach to resource description offers the wider OER community an opportunity to examine and assess the implications of different strategies for resource description and management
This paper illustrates a number of examples of projects’ approaches to description, noting the workflows and effort involved. We will consider the relationship of the choice of tool (repository, web2.0 application, VLE) to the choice of standards; and the relationship between local requirements and those of the wider community.
We will consider the impact of those choices on the dissemination and discoverability of resources. For example, the implications of resource description choices for discovery services which draw on multiple sources of OERs.
Similar to TellMeFirst - A knowledge domain discovery framework (20)
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Challenges of Nation Building-1.pptx with more important
TellMeFirst - A knowledge domain discovery framework
1. TellMeFirst
Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2
A Knowledge Domain Discovery Framework
THE HAGUE, NETHERLANDS – Feb 12th 2016
1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino
2- Istituto Superiore Mario Boella (ISMB)
2. Nexa Center for
Internet & Society
2
Interdisciplinary Research
Digital Culture
Support to Policy
Community
http://nexa.polito.it/
@nexacenter
3. What is TellMeFirst and how it works
How we build a generalist training set based on DBpedia and
Wikipedia
What is a domain training set (wrt the generalist one)
How we create a domain training set using a configurable pipeline
Agenda
3
8. 8
TellMeFirst Classifier
TellMeFirst exploits an approach where the training set based on
DBpedia and Wikipedia is compared with the target document
In the training set, each DBpedia entity (i.e., Barack Obama) is
represented by all the Wikipedia paragraphs in which it appears
as wikilink (http://en.wikipedia.org/wiki/Barack_Obama)
A vector distance metric is used to understand how much a
Wikipedia paragraph is similar to the target document (Mendes,
2011)
9. How we build a generalist
training set based on DBpedia
and Wikipedia
9
11. Traditional approach
(based on DBpedia Spotlight)
11
The DBpedia Extractor
It takes as input some datasets built through the DBpedia Information
Extraction Framework (such as labels, redirects, disambiguations)
The output is a list of “good” URIs that effectively represent entities
(avoiding disambiguations and redirects pages)
The DBpedia/Wikipedia Mapper
It maps “good” URIs on the dump of Wikipedia and then it creates a
Lucene Index that defines the training set
14. Domain training set
It contains a subset of DBpedia entities indexed in the generalist
training set
It is defined according to the domain of documents that you need
to classify
It is build through a software component properly driven by
SPARQL queries and advanced services (i.e., Linked Data
Recommenders), to create a new list of “good” URIs
14
15. How we build a domain
training set using a configurable
pipeline
15
21. Domain Engine - LDR
First implementation: Linked Data Recommender (LDR)
developed by the SoftEng group of the Politecnico di
Torino
Get all DBpedia categories from a DBpedia entity
Get DBpedia entities related to a specific DBpedia entity and a DBpedia
category
Pipeline: get new entities with LDR from resources
retrieved with SPARQL queries 21
22. Example - Colosseum
The Colosseum or Coliseum (/kɒləˈsiːəm/ kol-ə-see-əm), also known as the
Flavian Amphitheatre (Latin: Amphitheatrum Flavium; Italian: Anfiteatro Flavio
[amfiteˈaːtro ˈflaːvjo] or Colosseo [kolosˈsɛːo]), is an oval amphitheatre in the
centre of the city of Rome, Italy. Built of concrete and sand, it is the largest
amphitheatre ever built and is considered one of the greatest works of
architecture and engineering ever.
The Colosseum is situated just east of the Roman Forum. Construction began
under the emperor Vespasian in 72 AD, and was completed in 80 AD under his
successor and heir Titus. Further modifications were made during the reign of
Domitian (81–96). These three emperors are known as the Flavian dynasty, and
the amphitheatre was named in Latin for its association with their family name
(Flavius).
22
25. Comparison of results (i)
25
Titus, Vespasian, and Domitian are identified through the
generalist training set and are directly mentioned in the
text
Arch of Titus, Temple of Vespasian and Titus, obtained with
the domain training set, are related to emperors
mentioned in the previous point, but refer to the cultural
heritage of the Ancient Rome
26. Comparison of results (ii)
26
Flavian dynasty and Flavia entities are mentioned in the text,
but they are not so relevant for the cultural heritage
domain
The Great Fire of Rome is not strictly related to the entities
mentioned in the text, but it is relevant from an historical
point of view
27. Wrap up
27
TellMeFirst is a tool for classifying and enriching textual
documents using a training set based on DBpedia and
Wikipedia
We are capable to build a training set for TellMeFirst with a
configurable pipeline to get a subset of all DBpedia entities
Driving this configurable pipeline, we are able to create a training
set for a specific knowledge domain (such as cultural heritage)
28. Future developments
Define a training set for classifying scientific publications
available in Open Access
Build a GUI in order to enable domain experts to create a
domain training set, without a specific knowledge of
Linked Data framework
We are open to collaborations on TellMeFirst!
28
29. Acknowledgments
●Joint Open Lab of Telecom Italia
(http://www.telecomitalia.com/tit/it/innovazione/
i-luoghi-della-ricerca/joint-open-labs.html)
●Software Engineering Research Group (DAUIN),
Politecnico di Torino (http://softeng.polito.it/)
30. • Giuseppe Futia
– Mail: giuseppe.futia@polito.it
– Twitter: @giuseppe_futia
• Antonio Vetrò
– Mail: antonio.vetro@polito.it
– Twitter: @phisaz
• Giuseppe Rizzo
– Mail: giuseppe.rizzo@ismb.it
– Twitter: @giusepperizzo
Contacts