How to create a taxonomy by a paid workforce provided by Amazon Mechanical Turk. Evaluative comparison to an existing community of motivated students and domain experts.
Presentation held at JCDL 2010, Brisbane, Australia (http://www.jcdl2010.org).
Arts & Sciences New Student Enrollment PresentationMike O'Connor
The document summarizes information presented at a College of Arts and Sciences Parent Session. It discusses the value of an Arts and Sciences degree, including the diverse majors and pre-professional programs offered. It also outlines opportunities for undergraduate research, study abroad, and student organizations. Components of the degree plan and advising services are reviewed. Student quotes provide personal perspectives on their experiences.
The document summarizes an information session for parents about the College of Arts and Sciences at UNL. It discusses the value of an Arts and Sciences degree, opportunities for undergraduate research and study abroad, requirements for graduation, and resources for academic advising. Student quotes are included that highlight their experiences in different majors and activities. Contact information is provided for the Assistant and Associate Deans to answer any additional questions.
The survey of 3,352 UNL students found that most students believe their professors treat students fairly regardless of political beliefs. A majority supported UNL joining the Big Ten for both academic and athletic reasons. While most students supported renovations to the campus recreation center, opinions were split on whether UNL should become a "wet campus" that allows alcohol. Students held a variety of views on national political issues, leaders, and 2012 elections.
Linked Open Projects (DCMI Library Community)Kai Eckert
This document discusses making data from research projects more reusable by publishing it as linked open data. It describes several existing research projects that have generated datasets and outlines how publishing their data as linked open data using standards like RDF and SPARQL could make the data more accessible and reusable. This would allow the datasets to be more easily combined and integrated. The document then presents the Linked Data Service developed by Mannheim University Library as a way to publish this project data as linked open data and provides examples of how the data could be queried and reused through this service.
Hannes Svardal - The role of environmental variance as adaptive response to f...Seminaire MEE
The document discusses whether fluctuating selection favors an increase in environmental or genetic variance in quantitative traits. It presents a model where the optimal phenotype varies temporally. The model finds that environmental variance (noise) will evolve to its optimal level, while additional genetic polymorphisms are only selected if the distribution of optimal phenotypes is sufficiently asymmetric or has fatter tails than a Gaussian. Simulation results support that genetic polymorphisms are often unstable, while divergence in both the genetic component and environmental variance can occur if conditions allow genetic branching. In conclusion, fluctuating selection more easily favors increased environmental rather than genetic variance, unless the optimal phenotype distribution meets certain criteria.
François Rousset - présentation MEE2013Seminaire MEE
This document discusses methods for understanding social evolution, particularly in spatially structured populations. It focuses on relatedness concepts under localized dispersal, stability of kin recognition polymorphisms, and the relationship between inclusive fitness and evolutionary stability. The document seeks to reduce these problems to more basic elements in order to build an understanding of social evolution forces. It presents concepts and methods for analyzing frequency-dependent selection and population structure, including the effects of fitness costs and benefits among relatives.
Towards Interoperable Metadata ProvenanceKai Eckert
The document discusses metadata provenance and proposes a model for tracking the provenance of metadata using named graphs and semantic web technologies. Key elements of the proposed model include using named graphs to represent different metadata sources, tracking the source and confidence value of metadata using semantic triples, and executing SPARQL queries to retrieve metadata based on provenance information like source or confidence value.
Arts & Sciences New Student Enrollment PresentationMike O'Connor
The document summarizes information presented at a College of Arts and Sciences Parent Session. It discusses the value of an Arts and Sciences degree, including the diverse majors and pre-professional programs offered. It also outlines opportunities for undergraduate research, study abroad, and student organizations. Components of the degree plan and advising services are reviewed. Student quotes provide personal perspectives on their experiences.
The document summarizes an information session for parents about the College of Arts and Sciences at UNL. It discusses the value of an Arts and Sciences degree, opportunities for undergraduate research and study abroad, requirements for graduation, and resources for academic advising. Student quotes are included that highlight their experiences in different majors and activities. Contact information is provided for the Assistant and Associate Deans to answer any additional questions.
The survey of 3,352 UNL students found that most students believe their professors treat students fairly regardless of political beliefs. A majority supported UNL joining the Big Ten for both academic and athletic reasons. While most students supported renovations to the campus recreation center, opinions were split on whether UNL should become a "wet campus" that allows alcohol. Students held a variety of views on national political issues, leaders, and 2012 elections.
Linked Open Projects (DCMI Library Community)Kai Eckert
This document discusses making data from research projects more reusable by publishing it as linked open data. It describes several existing research projects that have generated datasets and outlines how publishing their data as linked open data using standards like RDF and SPARQL could make the data more accessible and reusable. This would allow the datasets to be more easily combined and integrated. The document then presents the Linked Data Service developed by Mannheim University Library as a way to publish this project data as linked open data and provides examples of how the data could be queried and reused through this service.
Hannes Svardal - The role of environmental variance as adaptive response to f...Seminaire MEE
The document discusses whether fluctuating selection favors an increase in environmental or genetic variance in quantitative traits. It presents a model where the optimal phenotype varies temporally. The model finds that environmental variance (noise) will evolve to its optimal level, while additional genetic polymorphisms are only selected if the distribution of optimal phenotypes is sufficiently asymmetric or has fatter tails than a Gaussian. Simulation results support that genetic polymorphisms are often unstable, while divergence in both the genetic component and environmental variance can occur if conditions allow genetic branching. In conclusion, fluctuating selection more easily favors increased environmental rather than genetic variance, unless the optimal phenotype distribution meets certain criteria.
François Rousset - présentation MEE2013Seminaire MEE
This document discusses methods for understanding social evolution, particularly in spatially structured populations. It focuses on relatedness concepts under localized dispersal, stability of kin recognition polymorphisms, and the relationship between inclusive fitness and evolutionary stability. The document seeks to reduce these problems to more basic elements in order to build an understanding of social evolution forces. It presents concepts and methods for analyzing frequency-dependent selection and population structure, including the effects of fitness costs and benefits among relatives.
Towards Interoperable Metadata ProvenanceKai Eckert
The document discusses metadata provenance and proposes a model for tracking the provenance of metadata using named graphs and semantic web technologies. Key elements of the proposed model include using named graphs to represent different metadata sources, tracking the source and confidence value of metadata using semantic triples, and executing SPARQL queries to retrieve metadata based on provenance information like source or confidence value.
LOHAI: Providing a baseline for KOS based automatic indexingKai Eckert
Automatic KOS based indexing – i.e. indexing based on a
restricted, controlled vocabulary, a thesaurus or a classification – can play
an important role to close the gap between the intellectually, high quality
indexed publications and the mass of unindexed publications. Especially
for unknown, heterogeneous publications, like web publications, simple
processes that do not rely on manually created training data are needed.
With this contribution, we propose a straight-forward linguistic indexer,
that can be used as a basis for own developments and for experiments
and analyses to explore own documents and KOSs; it uses state-of-the-
art information retrieval techniques and hence forms a suitable baseline
for evaluations. Finally, it is free and open source.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Kai Eckert
Presentation held at the DCMI Conference 2015 in Sao Paulo.
http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386
In the context of the DCMI RDF Application Profile task group and the W3C Data Shapes Working Group solutions for the proper formulation of constraints and validation of RDF data on these constraints are being developed. Several approaches and constraint languages exist but there is no clear favorite and none of the languages is able to meet all requirements raised by data practitioners. To support the work, a comprehensive, community-driven database has been created where case studies, use cases, requirements and solutions are collected. Based on this database, we have hitherto published 81 types of constraints that are required by various stakeholders for data applications. We are using this collection of constraint types to gain a better understanding of the expressiveness of existing solutions and gaps that still need to be filled. Regarding the implementation of constraint languages, we have already proposed to use high-level languages to describe the constraints, but map them to SPARQL queries in order to execute the actual validation; we have demonstrated this approach for the Web Ontology Language in its current version 2 and Description Set Profiles. In this paper, we generalize from the experience of implementing OWL 2 and DSP by introducing an abstraction layer that is able to describe constraints of any constraint type in a way that mappings from high-level constraint languages to this intermediate representation can be created more or less straight-forwardly. We demonstrate that using another layer on top of SPARQL helps to implement validation consistently accross constraint languages, simplifies the actual implementation of new languages, and supports the transformation of semantically equivalent constraints across constraint languages.
A Unified Approach for Representing MetametadataKai Eckert
This document proposes a unified approach for representing metametadata, or statements about statements, using RDF reification. It discusses two scenarios where metametadata is needed: crosswalks between metadata schemas, and integrating metadata from different sources. For each scenario, examples are given of how metametadata could provide additional context about the origin and generation of statements to help debug errors, update rules, and assess statement quality. The approach uses RDF and SPARQL to represent and query this metametadata.
Specialising the EDM for Digitised Manuscript (SWIB13)Kai Eckert
The DM2E project developed a data model to standardize metadata for digitized manuscripts. It specialized the Europeana Data Model (EDM) by adding over 50 new properties and 23 classes to better represent physical and conceptual aspects of manuscripts. The DM2E model was documented in PDF and OWL formats and made available online for humans and machines. Future work includes addressing uncertain statements about timespans and creators.
The Metadata Provenance Task Group aims to define a data model that allows for making
assertions about description sets. Creating a shared model of the data elements required to
describe an aggregation of metadata statements allows to collectively import, access, use and
publish facts about the quality, rights, timeliness, data source type, trust situation, etc. of the
described statements. In this paper we describe the preliminary model created by the task group,
together with first examples that demonstrate how the model is to be used.
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
The slides of part one of the Metadata Provenance Tutorial (Linked Data Provenance). Part 2 is here: http://de.slideshare.net/MagnusPfeffer/metadata-provenance-tutorial-part-2-modelling-provenance-in-rdf
This document discusses the need for and use of metametadata, or metadata about metadata, in two scenarios: crosswalks between metadata schemas and integrating metadata from different sources. It proposes using metametadata to record additional provenance information like the rule or source that generated a metadata statement to help with tasks like debugging crosswalks, updating rules, and improving search by weighting statements. Examples are given of how this could be implemented using RDF reification or named graphs.
Thomas Bataillon - présentation MEE 2013Seminaire MEE
The document discusses using patterns of phenotypic and molecular evolution to infer properties of fitness landscapes and the dynamics of adaptation. It summarizes different models that aim to predict the distribution of fitness effects of beneficial mutations, including heuristics based on extreme value theory and explicit fitness landscape models. Experimental evolution data on the fitness effects of mutations, fitness trajectories over time, and genomic diversity can help distinguish between these models. The author examines using such data from an experimental evolution study in E. coli to infer properties of Fisher's geometric fitness landscape model, including the distribution of fitness effects and genome-wide mutation rate.
Nicolas Loeuille - présentation MEE2013Seminaire MEE
This document summarizes a model exploring how local negative feedbacks can influence the evolution of species diversity within metacommunities. The model shows three possible outcomes: 1) permanent generalism with one widespread species, 2) permanent specialization with many specialist species, or 3) taxon cycles where specialist species periodically go extinct and are replaced. For diversity to emerge, parameters like colonization rate and negative feedback strength must be high, while extinction rate and environmental contamination must be low. The model reproduces classical macroecological patterns like species-area relationships and species abundance distributions. This framework provides insights into how local interactions can maintain biodiversity at larger scales.
The document discusses using fundraising data from the Association of Fundraising Professionals' Fundraising Effectiveness Project to improve a healthcare organization's fundraising efforts. It provides examples of specific fundraising goals and metrics that could be improved from a sample organization's data compared to industry averages. The document also outlines best practices from high performing organizations and next steps for analyzing an organization's own fundraising data to identify areas for increased donor retention, acquisition of new donors, and higher average gift amounts.
1. The document discusses how dispersal evolves in heterogeneous and uncertain environments, considering factors like kin competition, inbreeding avoidance, catastrophes, temporal variability, environmental heterogeneity, and the cost of dispersal.
2. It examines how dispersal patterns are influenced by these direct processes and how dispersal strategies evolve to an evolutionarily stable strategy. Dispersal polymorphisms may also arise.
3. The larger context of how dispersal relates to other ecological patterns and traits is explored, such as its association with diversity, gene flow, and other organismal traits.
This document provides a map and descriptions of the rooms and facilities at Leith Academy. The map shows the locations of classrooms, offices, gymnasiums, and outdoor spaces. Descriptions are then provided for various subject classrooms and facilities, along with the teachers for each subject. The summary highlights the key areas and subjects covered at the school.
Virginie Ravigné - Dynamique adaptativeSeminaire MEE
Adaptive dynamics is a framework for modeling frequency-dependent selection and its effects on evolution. It models the invasion and potential success of rare mutations in a resident population using mathematical tools. Key assumptions include clonal reproduction, rare mutations of small effect, and population dynamics determining invasion fitness. The framework can identify singular strategies and classify them based on their stability and potential to lead to evolutionary branching. While useful for qualitative insights, adaptive dynamics has limitations and may not accurately predict evolution in all cases.
Thomas Lenormand - Génétique des populationsSeminaire MEE
This document summarizes a population genetics model for studying the evolution of recombination rates. The model considers three loci, including a modifier locus that can alter the recombination rates between two selected loci. Using recursion equations and assumptions like weak epistasis and separation of timescales, the model analyzes when and why the frequency of a recombination-increasing allele at the modifier locus may increase over time through indirect selection. The key results are that more recombination evolves if epistasis between the selected loci is weakly negative and their association is negative.
Nils Poulicard - Relations entre histoire évolutive et capacité d'adaptation ...Seminaire MEE
The document discusses how ancient host adaptation of Rice yellow mottle virus (RYMV) to different rice species modulated its current ability to break plant resistance. RYMV adapted to infect Oryza glaberrima rice around 500,000 years ago. This is evidenced by a threonine residue at codon 49 of the viral genome that enhances infection of O. glaberrima but limits resistance breaking in O. sativa rice. Directed mutations showed codon 49 influences the virus's ability to overcome two major resistance genes in its hosts. Ancient adaptation to a rice species continues to impact RYMV's resistance-breaking potential today.
François Blanquart - Evolution of migration in a fluctuating environmentSeminaire MEE
The document discusses how local adaptation and migration evolve in populations living in environments where selection fluctuates over time and space. It presents a model where migration is shown to increase local adaptation by reducing the lag time for adaptation, though more individuals are temporarily maladapted. The model finds an intermediate migration rate maximizes local adaptation across different environment shapes. It also shows migration can be selected for due to linkage disequilibrium dynamics, with the evolutionarily stable migration rate depending on the selection parameters. The document concludes migration may be an adaptive strategy for fluctuating environments.
Marco Andrello - Incongruency between model-based and genetic-based estimates...Seminaire MEE
This document discusses inconsistencies between estimates of effective population size (Ne) from demographic and genetic methods. Demographic methods use population census data while genetic methods use allele frequency changes. The document analyzes Ne estimates from both approaches in a plant species and finds they differ, possibly due to biases in demographic estimates from uncertain parameters and an oversimplified model. It recommends further assessing genetic methods like ONeSAMP with simulations to evaluate reliability across conditions and using multiple estimation methods when feasible to estimate Ne.
The Linked Open Citation Database (LOC-DB) aims to create a fully linked and curated list of references as part of the cataloging process in libraries. It would include monographs, conference papers, and journal articles. The architecture involves OCR processing, linking to existing LOC-DB instances and linked open data. It reuses the Open Citations Data Model and collaborates to extend and maintain the model, with the goal of making existing citation databases obsolete by encouraging publishers to openly provide citation data.
LOHAI: Providing a baseline for KOS based automatic indexingKai Eckert
Automatic KOS based indexing – i.e. indexing based on a
restricted, controlled vocabulary, a thesaurus or a classification – can play
an important role to close the gap between the intellectually, high quality
indexed publications and the mass of unindexed publications. Especially
for unknown, heterogeneous publications, like web publications, simple
processes that do not rely on manually created training data are needed.
With this contribution, we propose a straight-forward linguistic indexer,
that can be used as a basis for own developments and for experiments
and analyses to explore own documents and KOSs; it uses state-of-the-
art information retrieval techniques and hence forms a suitable baseline
for evaluations. Finally, it is free and open source.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Kai Eckert
Presentation held at the DCMI Conference 2015 in Sao Paulo.
http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386
In the context of the DCMI RDF Application Profile task group and the W3C Data Shapes Working Group solutions for the proper formulation of constraints and validation of RDF data on these constraints are being developed. Several approaches and constraint languages exist but there is no clear favorite and none of the languages is able to meet all requirements raised by data practitioners. To support the work, a comprehensive, community-driven database has been created where case studies, use cases, requirements and solutions are collected. Based on this database, we have hitherto published 81 types of constraints that are required by various stakeholders for data applications. We are using this collection of constraint types to gain a better understanding of the expressiveness of existing solutions and gaps that still need to be filled. Regarding the implementation of constraint languages, we have already proposed to use high-level languages to describe the constraints, but map them to SPARQL queries in order to execute the actual validation; we have demonstrated this approach for the Web Ontology Language in its current version 2 and Description Set Profiles. In this paper, we generalize from the experience of implementing OWL 2 and DSP by introducing an abstraction layer that is able to describe constraints of any constraint type in a way that mappings from high-level constraint languages to this intermediate representation can be created more or less straight-forwardly. We demonstrate that using another layer on top of SPARQL helps to implement validation consistently accross constraint languages, simplifies the actual implementation of new languages, and supports the transformation of semantically equivalent constraints across constraint languages.
A Unified Approach for Representing MetametadataKai Eckert
This document proposes a unified approach for representing metametadata, or statements about statements, using RDF reification. It discusses two scenarios where metametadata is needed: crosswalks between metadata schemas, and integrating metadata from different sources. For each scenario, examples are given of how metametadata could provide additional context about the origin and generation of statements to help debug errors, update rules, and assess statement quality. The approach uses RDF and SPARQL to represent and query this metametadata.
Specialising the EDM for Digitised Manuscript (SWIB13)Kai Eckert
The DM2E project developed a data model to standardize metadata for digitized manuscripts. It specialized the Europeana Data Model (EDM) by adding over 50 new properties and 23 classes to better represent physical and conceptual aspects of manuscripts. The DM2E model was documented in PDF and OWL formats and made available online for humans and machines. Future work includes addressing uncertain statements about timespans and creators.
The Metadata Provenance Task Group aims to define a data model that allows for making
assertions about description sets. Creating a shared model of the data elements required to
describe an aggregation of metadata statements allows to collectively import, access, use and
publish facts about the quality, rights, timeliness, data source type, trust situation, etc. of the
described statements. In this paper we describe the preliminary model created by the task group,
together with first examples that demonstrate how the model is to be used.
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
The slides of part one of the Metadata Provenance Tutorial (Linked Data Provenance). Part 2 is here: http://de.slideshare.net/MagnusPfeffer/metadata-provenance-tutorial-part-2-modelling-provenance-in-rdf
This document discusses the need for and use of metametadata, or metadata about metadata, in two scenarios: crosswalks between metadata schemas and integrating metadata from different sources. It proposes using metametadata to record additional provenance information like the rule or source that generated a metadata statement to help with tasks like debugging crosswalks, updating rules, and improving search by weighting statements. Examples are given of how this could be implemented using RDF reification or named graphs.
Thomas Bataillon - présentation MEE 2013Seminaire MEE
The document discusses using patterns of phenotypic and molecular evolution to infer properties of fitness landscapes and the dynamics of adaptation. It summarizes different models that aim to predict the distribution of fitness effects of beneficial mutations, including heuristics based on extreme value theory and explicit fitness landscape models. Experimental evolution data on the fitness effects of mutations, fitness trajectories over time, and genomic diversity can help distinguish between these models. The author examines using such data from an experimental evolution study in E. coli to infer properties of Fisher's geometric fitness landscape model, including the distribution of fitness effects and genome-wide mutation rate.
Nicolas Loeuille - présentation MEE2013Seminaire MEE
This document summarizes a model exploring how local negative feedbacks can influence the evolution of species diversity within metacommunities. The model shows three possible outcomes: 1) permanent generalism with one widespread species, 2) permanent specialization with many specialist species, or 3) taxon cycles where specialist species periodically go extinct and are replaced. For diversity to emerge, parameters like colonization rate and negative feedback strength must be high, while extinction rate and environmental contamination must be low. The model reproduces classical macroecological patterns like species-area relationships and species abundance distributions. This framework provides insights into how local interactions can maintain biodiversity at larger scales.
The document discusses using fundraising data from the Association of Fundraising Professionals' Fundraising Effectiveness Project to improve a healthcare organization's fundraising efforts. It provides examples of specific fundraising goals and metrics that could be improved from a sample organization's data compared to industry averages. The document also outlines best practices from high performing organizations and next steps for analyzing an organization's own fundraising data to identify areas for increased donor retention, acquisition of new donors, and higher average gift amounts.
1. The document discusses how dispersal evolves in heterogeneous and uncertain environments, considering factors like kin competition, inbreeding avoidance, catastrophes, temporal variability, environmental heterogeneity, and the cost of dispersal.
2. It examines how dispersal patterns are influenced by these direct processes and how dispersal strategies evolve to an evolutionarily stable strategy. Dispersal polymorphisms may also arise.
3. The larger context of how dispersal relates to other ecological patterns and traits is explored, such as its association with diversity, gene flow, and other organismal traits.
This document provides a map and descriptions of the rooms and facilities at Leith Academy. The map shows the locations of classrooms, offices, gymnasiums, and outdoor spaces. Descriptions are then provided for various subject classrooms and facilities, along with the teachers for each subject. The summary highlights the key areas and subjects covered at the school.
Virginie Ravigné - Dynamique adaptativeSeminaire MEE
Adaptive dynamics is a framework for modeling frequency-dependent selection and its effects on evolution. It models the invasion and potential success of rare mutations in a resident population using mathematical tools. Key assumptions include clonal reproduction, rare mutations of small effect, and population dynamics determining invasion fitness. The framework can identify singular strategies and classify them based on their stability and potential to lead to evolutionary branching. While useful for qualitative insights, adaptive dynamics has limitations and may not accurately predict evolution in all cases.
Thomas Lenormand - Génétique des populationsSeminaire MEE
This document summarizes a population genetics model for studying the evolution of recombination rates. The model considers three loci, including a modifier locus that can alter the recombination rates between two selected loci. Using recursion equations and assumptions like weak epistasis and separation of timescales, the model analyzes when and why the frequency of a recombination-increasing allele at the modifier locus may increase over time through indirect selection. The key results are that more recombination evolves if epistasis between the selected loci is weakly negative and their association is negative.
Nils Poulicard - Relations entre histoire évolutive et capacité d'adaptation ...Seminaire MEE
The document discusses how ancient host adaptation of Rice yellow mottle virus (RYMV) to different rice species modulated its current ability to break plant resistance. RYMV adapted to infect Oryza glaberrima rice around 500,000 years ago. This is evidenced by a threonine residue at codon 49 of the viral genome that enhances infection of O. glaberrima but limits resistance breaking in O. sativa rice. Directed mutations showed codon 49 influences the virus's ability to overcome two major resistance genes in its hosts. Ancient adaptation to a rice species continues to impact RYMV's resistance-breaking potential today.
François Blanquart - Evolution of migration in a fluctuating environmentSeminaire MEE
The document discusses how local adaptation and migration evolve in populations living in environments where selection fluctuates over time and space. It presents a model where migration is shown to increase local adaptation by reducing the lag time for adaptation, though more individuals are temporarily maladapted. The model finds an intermediate migration rate maximizes local adaptation across different environment shapes. It also shows migration can be selected for due to linkage disequilibrium dynamics, with the evolutionarily stable migration rate depending on the selection parameters. The document concludes migration may be an adaptive strategy for fluctuating environments.
Marco Andrello - Incongruency between model-based and genetic-based estimates...Seminaire MEE
This document discusses inconsistencies between estimates of effective population size (Ne) from demographic and genetic methods. Demographic methods use population census data while genetic methods use allele frequency changes. The document analyzes Ne estimates from both approaches in a plant species and finds they differ, possibly due to biases in demographic estimates from uncertain parameters and an oversimplified model. It recommends further assessing genetic methods like ONeSAMP with simulations to evaluate reliability across conditions and using multiple estimation methods when feasible to estimate Ne.
The Linked Open Citation Database (LOC-DB) aims to create a fully linked and curated list of references as part of the cataloging process in libraries. It would include monographs, conference papers, and journal articles. The architecture involves OCR processing, linking to existing LOC-DB instances and linked open data. It reuses the Open Citations Data Model and collaborates to extend and maintain the model, with the goal of making existing citation databases obsolete by encouraging publishers to openly provide citation data.
JudaicaLink: Linked Data in the Jewish Studies FIDKai Eckert
This document discusses the JudaicaLink project, which aims to create a central portal for accessing digital Judaica collections and linking different data sources as linked open data. It describes ongoing work to contextualize collections, enrich metadata, re-transliterate text from Romanized to Hebrew script, and extract and link relevant information from sources like the YIVO Encyclopedia and Biographisches Handbuch der Rabbiner. The system uses a triple store, SPARQL endpoint, and static site generator to manage and deploy the linked data.
Slides (in German) for a talk of Magnus Pfeffer and Kai Eckert. We propose the linked data/semantic web technology as an infrastructure to publish the results of research projects for easy reuse.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Crowdsourcing the Assembly of Concept Hierarchies
1. Crowdsourcing
the Assembly of Concept
Hierarchies
Kai Eckert¹ Cameron Buckner²
Mathias Niepert¹ Colin Allen²
Christof Niemann¹ Heiner Stuckenschmidt¹
¹ University of Mannheim, Germany
² Indiana University, USA
Presentation: Kai Eckert
Wednesday, June 23, 2010
Joint Conference on Digital Libraries (JCDL), Brisbane, Australia, 2010
2. Motivation
● Various types of Concept Hierarchies:
● Thesauri
● Taxonomies
● Classifications
● Ontologies
● ...
● Manual creation is expensive.
● Automatic creation lacks quality.
3. Could the users do the work?
● Divide the work between a lot of users.
● Motivate them to be part of a community.
● Achieve quality control by means of redundancy.
● Can a concept hierarchy be
created like e.g. Wikipedia?
4. ● The Indiana Philosophy Ontology Project.
● A browsable taxonomy of philosophical ideas.
● Ideas are extracted from the Stanford Encyclopedia of
Philosophy (SEP).
● Intuitive access to the SEP via the InPhO taxonomy.
● Entry point for other philosophical ressources on the web.
5. From the SEP to InPhO
Start with a hand-built
formal ontology
describing major Extraction of new
topics and sub-topics. ideas and relationships
Process feedback and Gathering community
infer positions in the feedback about ideas
classification tree and relationships
10. Great stuff, but...
● what, if you do not have a motivated community of expert
users?
● Well,...
● Like almost everything,
you can buy it
at Amazon...
● Amazon Mechanical Turk
11. Amazon Mechanical Turk (AMT)
● Platform for the placing and taking of
Human Intelligence Tasks (HIT).
● 100,000 – 400,000 HITs available.
● Number of workers: ??? (100,000 in 100 countries,
2007, New York Times).
12. HIT Definition
Time allotted per assignment: Maximum time
a worker can work on a single task.
Worker restrictions: Approval Rate, Location
Reward per assignment: How much do you pay for
each HIT?
Number of assignments per HIT: How many unique
workers do you want to work on each HIT?
13. HIT Result
Answer of each worker for each HIT
Accept Time, Submit Time, Work Time In
Seconds
Worker ID
14. Our questions
Can we replace the InPhO community by means
of Amazon Mechanical Turk?
How much does it cost and what is the resulting
quality?
15. Experimental Setup
● We wanted some overlap within the experts:
Minimum overlap i=1 2 3 4 5
Number of pairs 3,237 1,154 370 187 92
We decided for the 1,154 pairs.
● Each pair was evaluated by 5 different workers.
● Each worker evaluated at least 12 pairs (1 HIT).
● 87 distinct workers.
● The HITs were completed in 20 hours.
16. Measuring Agreement
● Calculation of the distance between two answers:
● Relatedness: Absolute value of the difference
● Relative Generality: Match: 0, otherwise: 1
● The evaluation deviation is the mean distance of a user
to the users in a reference group.
17. Comparison with Experts
(Relative Generality)
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
18. Comparison with Experts
(Relative Generality)
Random Clicker
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
19. Comparison with Experts
(Relative Generality)
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
20. Comparison with Experts
(Relative Generality)
InPhO Users are quite consistent.
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
21. Comparison with Experts
(Relative Generality)
InPhO Users are quite consistent.
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
AMT Users are not consistent.
→ Are there good ones?
22. Comparison with Experts
(Relative Generality)
InPhO Users are quite consistent.
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
AMT Users are not consistent. Yes, there are!
→ Are there good ones? → But which ones?
23. Comparison with Experts
(Relative Generality)
InPhO Users are quite consistent.
30
InPhO Users
AMT Users
Fraction of users in %
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Follow Experts Own Opinion
AMT Users are not consistent. Yes, there are!
→ Are there good ones? → But which ones?
25. Telling the good from the bad
● First approach: Filtering by working time
● Hypothesis 1: Workers who think some time before they
answer, give better answers.
● Hypothesis 2: Probably there are workers who give quick
random responses.
26. Filtering by working time
100
84 80
75
Number of Users
68
60
57
44 40
36
29
22 20
17
# Users
13
9 9 8 7
5 4 4 3 0
0s
s
s
s
s
s
s
s
s
s
s
s
s
00
40
00
40
40
60
20
80
60
20
80
00
>8
>2
>4
>5
>7
>8
>1
>2
>3
>3
>5
>6
>6
Average working time for one HIT (12 pairs)
27. Filtering by working time
48
47
1,
1,5 100
1,
41
39
1,
38
37
36
1,
35
1,
1,
1,
1,42
1,
31
1,
27
1,
1,2 84 1,21 80
10
1,
Deviation from Experts
75
1,06
Number of Users
68
0,9 60
57
0,64
0,6 44 40
36
29
0,3 22 20
17
# Users
13
Deviation
9 9 8 7
5 4 4
0 3 0
0s
s
s
s
s
s
s
s
s
s
s
s
s
00
40
00
40
40
60
20
80
60
20
80
00
>8
>2
>4
>5
>7
>8
>1
>2
>3
>3
>5
>6
>6
Average working time for one HIT (12 pairs)
28. Telling the good from the bad
● Second approach: Filtering by comparison with a hidden
gold standard.
● Test pairs:
● Social Epistemology – Epistemology (P1)
● Computer Ethics – Ethics (P2)
● Chinese Room Argument – Chinese Philosophy (P3)
● Dualism - Philosophy of Mind (P4)
29. Applying filters
● Test pairs:
● Social Epistemology – Epistemology (P1)
● Computer Ethics – Ethics (P2)
● Chinese Room Argument – Chinese Philosophy (P3)
● Dualism - Philosophy of Mind (P4)
● Filters:
1) P1 and P2 are correct (Common Sense)
2) Like 1), additionally P4 is correct (+Background)
3) Like 1), additionally P3 is correct (+Lexical)
4) All have to be correct (All)
30. Filter results for relatedness
Filter Users Deviation Max. Dev.
All (4) 7 0.60 1.00
+Lexical (3) 10 0.87 1.78
+Background (2) 23 0.84 1.41
Common Sense (1) 40 1.11 1.96
All AMT 87 1.39 2.96
All InPhO 25 0.77 1.75
Random --- 1.8 ---
31. Filter results for relative generality
Filter Users Deviation Max. Dev.
All (4) 7(5) 0.12 0.22
+Lexical (3) 10(8) 0.14 0.27
+Background (2) 23(20) 0.15 0.45
Common Sense (1) 40(35) 0.21 0.59
All AMT 87(78) 0.45 1.00
All InPhO 25 0.23 0.47
Random --- 0.75 ---
32. Financial considerations
Filter Pairs Evaluations Cost per Pair Cost per Evaluation
--- 1,138 5,690 US$ 0.111 US$ 0.022
Common Sense (1) 1,074 1,909 US$ 0.117 US$ 0.066
+Background (2) 1,018 1,558 US$ 0.124 US$ 0.081
+Lexical (3) 215 215 US$ 0.586 US$ 0.586
All (4) 183 183 US$ 0.689 US$ 0.689
● Overall payments: 126 US$
● Estimation for all pairs with filter „All (4)“: 784 US$
● Estimation for all pairs with redundancy (5x): 3,920 US$.
33. Conclusion
AMT answers are of varying quality. But this is true
for many communities, too.
With moderate filtering („Background“), we achieved
a quality comparable to the InPhO community.
With 5 evaluations per pair, we still covered 89% of
all pairs with this filter.
The resulting InPhO taxonomy is online:
http://inpho.cogs.indiana.edu/amt_taxonomy
No need for existing data, gold standards or training
data (Beside the filter pairs).
No need for a community?
34. Thank you
Questions?
Kai Eckert
kai@informatik.uni-mannheim.de
http://www.slideshare.net/kaiec
„Computer ethics doesn't exist. Blue is
black and red is blood on the internet.
Nobody cares, because they are lonely.“
Anonymous Mechanical Turk Worker