1) The document describes a workshop on research synthesis and reproducibility.
2) It discusses challenges with reproducibility in science and proposes provenance and conceptual tools like PRIMAD to help address these challenges.
3) The document presents a case study where an intern was able to reproduce results from a 2006 ecological niche modeling paper using the Whole Tale environment and MaxEnt software, demonstrating computational reproducibility.
A technology architecture for managing explicit knowledge over the entire lif...William Hall
This document discusses managing explicit knowledge over the entire lifecycle of large projects. It covers theories of knowledge management, including different paradigms of knowledge and how technology has revolutionized knowledge transmission. As an example, it examines issues around managing knowledge for an ANZAC ship project. It suggests content management needs to evolve to understand paradigm shifts in how knowledge itself is defined and managed.
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
This slideshow provides insights on how to write and publish about applied technologies in tech journals and books, including the following:
Getting started in tech publishing
Cost-benefit calculations
Parts to an article; parts to a chapter
Writing process
Collaborating
Publishing process
Acquiring readers (and citations)
Post-publishing
Next works
This document discusses several key concepts related to information architecture and understanding systems. It addresses issues like fragmentation in websites, findability of information, and the relationship between information and culture. It also discusses categories as cornerstones of cognition, connections in systems happening simultaneously in many directions, and the importance of making the invisible visible.
Andrea Scharnhorst (2016) Why do we need to model the science system? Talk at the seminar of the Eindhoven Centre for Innovation Sciences, June 2, 2016
The document discusses several key challenges for users of the Library of Congress:
1) Fragmentation across multiple sites and domains causes confusion for users about where to find different resources.
2) Users have difficulty finding what they need from the home page or when entering via search/deep links.
3) Many potential users never access the Library's resources because they are not easily findable.
The document argues that improving findability and reducing fragmentation across the Library's online presence would help more users access and utilize its resources.
Kno.e.sis is an Ohio Center of Excellence focused on knowledge-enabled computing. It was established to contribute to basic theory about computation and cognitive systems, and address problems associated with productive thinking using large amounts of data. Kno.e.sis has exceptional regional, national, and international collaborations with organizations like AFRL, Microsoft, IBM, and W3C. It is well funded with over $10 million currently and has world class students and faculty, including one of the most cited computer science authors.
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
A technology architecture for managing explicit knowledge over the entire lif...William Hall
This document discusses managing explicit knowledge over the entire lifecycle of large projects. It covers theories of knowledge management, including different paradigms of knowledge and how technology has revolutionized knowledge transmission. As an example, it examines issues around managing knowledge for an ANZAC ship project. It suggests content management needs to evolve to understand paradigm shifts in how knowledge itself is defined and managed.
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
This slideshow provides insights on how to write and publish about applied technologies in tech journals and books, including the following:
Getting started in tech publishing
Cost-benefit calculations
Parts to an article; parts to a chapter
Writing process
Collaborating
Publishing process
Acquiring readers (and citations)
Post-publishing
Next works
This document discusses several key concepts related to information architecture and understanding systems. It addresses issues like fragmentation in websites, findability of information, and the relationship between information and culture. It also discusses categories as cornerstones of cognition, connections in systems happening simultaneously in many directions, and the importance of making the invisible visible.
Andrea Scharnhorst (2016) Why do we need to model the science system? Talk at the seminar of the Eindhoven Centre for Innovation Sciences, June 2, 2016
The document discusses several key challenges for users of the Library of Congress:
1) Fragmentation across multiple sites and domains causes confusion for users about where to find different resources.
2) Users have difficulty finding what they need from the home page or when entering via search/deep links.
3) Many potential users never access the Library's resources because they are not easily findable.
The document argues that improving findability and reducing fragmentation across the Library's online presence would help more users access and utilize its resources.
Kno.e.sis is an Ohio Center of Excellence focused on knowledge-enabled computing. It was established to contribute to basic theory about computation and cognitive systems, and address problems associated with productive thinking using large amounts of data. Kno.e.sis has exceptional regional, national, and international collaborations with organizations like AFRL, Microsoft, IBM, and W3C. It is well funded with over $10 million currently and has world class students and faculty, including one of the most cited computer science authors.
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
The document discusses several key concepts related to information architecture and understanding systems. In 3 sentences:
The document discusses the challenges of fragmentation and findability on websites, and how users struggle to understand complex systems when they are described only with words. It emphasizes that information architecture must account for how information and culture are interconnected in systems. The effective design and management of information systems requires understanding the nature of information and how it relates to categories, connections, and consequences within a cultural context.
Presentation to CRC Mental Health Early Career Researcher Workshop, Melbourne 29.11.17 for @andsdata.
Workshop title: A by-product of scientific training: We're all a little bit biased.
The Architecture of Understanding (and Happiness)Peter Morville
This document discusses several key topics related to information architecture and understanding systems. It addresses issues like fragmentation in websites, problems with findability of resources, and the importance of understanding the nature of information in systems. It also discusses concepts like categories, connections, consequences, and culture as they relate to information and understanding. Throughout the document there are various quotes about topics like systems thinking, planning, and the role of the information architect.
Reproducibility of computational research: methods to avoid madness (Session ...Mike Hucka
Introduction on the session "Reproducibility of computational research: methods to avoid madness" held Wednesday, September 17, during ICSB 2014 in Melbourne, Australia, 2014.
Open Educational Resources (OER) are fast gaining traction amongst the academic community as a viable means of increasing access and equity in education. The concept of OER is of especial significance to the marginalised communities in the Global South where distance education is prominent due to the inability of conventional brick and mortar institutions to cope with the growing demand. However, the wider adoption of OER by academics in the Global South has been inhibited due to various socio, economic and technological reasons. One of the major technological inhibitors is the current inability to search for OER which are academically useful and are of an acceptable academic standard. Many technological initiatives have been proposed over the recent past to provide potential solutions to this issue. Among these are OER curartion standards such as GLOBE, federated search, social semantic search and search engines such as DiscoverEd, OCW Finder, Pearson’s Project Blue Sky. The research discussed in this paper is carried out in the form of literature review and informal interviews with experts. The objective of the study is to document the extent of the OER search issues contributing to the slow uptake of the concept of OER. This review paper discusses the current OER search dilemma and the impact of some of the key initiatives which propose potential solutions.
Future agenda: repositories, and the research processMartin Donnelly
This document discusses research data management in the context of non-standard archiving of research outputs, with a focus on challenges in the arts and humanities. It notes that while data reuse has long been integral to various creative disciplines, archiving creative research data presents unique issues not present in scientific disciplines. These include the personal nature of creative works, differentiating between research and personal works, issues with non-digital materials, and the blurry boundaries of creative research processes. The document raises questions around concepts like evidence, facts, and replication in subjective creative research.
Scientific software engineering methods and their validityDaniel Mendez
This document summarizes a talk on scientific methods and their validity given at Technische Universität München. The talk discusses key concepts in the philosophy of science like epistemology and different views of science. It provides an overview of common scientific methods like empirical methods, case studies, and hypothesis testing. The talk delves into challenges of obtaining truth and impacts of human factors. It also discusses how scientific methods can be applied in a PhD dissertation and the importance of increasing validity. The overall document aims to discuss implications of scientific methods for everyday scientific work.
Presentation given at NUI, Galway 2019-04-11 for Open Science Week.
An overview of Early Career Researchers, their innovation and contribution towards Open Infrastructure
The document discusses inquiry as a cognitive process that allows humans to understand their surroundings through discovery, invention, and testing of solutions to problems. It describes inquiry as a process of considered thought rather than reflexive response. It then outlines different types of knowledge generation and structures of disciplinary inquiry, including proto-curiosity, curiosity, replicative, technological, informal personal learning, formal authoritative community instruction, and more.
The paper discusses distributed cognition in an airline cockpit, where the cognitive labor of flying a modern jet is distributed across the crew. It presents a case study simulation of a flight from Sacramento to Los Angeles to illustrate how information processing is distributed across representational media like checklists, displays, and standard operating procedures. The study puts forth the hypothesis that understanding human cognition requires examining how it is distributed in social and cultural systems using tools and artifacts.
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Aaron Sloman
In contrast with ontology developers concerned with a symbolic or digital environment (e.g. the internet), I draw attention to some features of our 3-D spatio-temporal environment that challenge young humans and other intelligent animals and will also challenge future robots. Evolution provides most animals with an ontology that suffices for life, whereas some animals, including humans, also have mechanisms for substantive ontology extension based on results of interacting with the environment. Future human-like robots will also need this. Since pre-verbal human children and many intelligent non-human animals, including hunting mammals, nest-building birds and primates can interact, often creatively, with complex structures and processes in a 3-D environment, that suggests (a) that they use ontologies that include kinds of material (stuff), kinds of structure, kinds of relationship, kinds of process (some of which are process-fragments composed of bits of stuff changing their properties, structures or relationships), and kinds of causal interaction and (b) since they don't use a human communicative language they must use information encoded in some form that existed prior to human communicative languages both in our evolutionary history and in individual development. Since evolution could not have anticipated the ontologies required for all human cultures, including advanced scientific cultures, individuals must have ways of achieving substantive ontology extension. The research reported here aims mainly to develop requirements for explanatory designs. The attempt to develop forms of representation, mechanisms and architectures that meet those requirements will be a long term research project.
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...BillHall
Seminar presentation: University of Melbourne Department of Information Systems, 13 October, 2006. Summarises development of a biologically based theory of knowledge based on combining Karl Popper's evolutionary epistemology (as developed in his 1972 book, Objective Knowlege) and Humberto Maturana and Francisco Varela's concept of autopoiesis (as developed in their 1980 book, Autopoiesis and Cognition).
Open Data and the Social Sciences - OpenCon Community WebcastRight to Research
The document discusses issues with transparency and reproducibility in social science research. It notes that research influences policy and decisions that affect millions of lives. However, weak academic norms like publication bias, p-hacking, non-disclosure, and failure to replicate can distort the body of evidence. The document proposes solutions like pre-registering studies and pre-specifying analyses to address these issues. It also discusses resources and efforts like the Berkeley Initiative for Transparency in the Social Sciences to raise awareness, foster adoption of transparent practices, and identify strategies to improve reproducibility.
This document provides details of a proposed panel discussion on domain analysis at the CoLIS9 conference in Uppsala, Sweden. The panel aims to introduce emerging methodological approaches and analytical techniques for conducting domain analysis. It will feature presentations from several experts in the field, including Birger Hjørland, Sanna Talja, Isto Huvila, Eva Jansen, and Jenna Hartel. They will discuss techniques such as ethnographic studies, arts-informed research, and ecological approaches. The goal is to disrupt normative assumptions about domain analysis and represent the expanding diversity of approaches. The panel also seeks to inspire more researchers to engage with domain analysis and contribute to ongoing debates around research methods in library and information
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
Yilin Xia (yilinx2@illinois.edu),
Shawn Bowers (bowers@gonzaga.edu),
Lan Li (lanl2@illinois.edu), and
Bertram Ludäscher (ludaesch@illinois.edu)
Presented at IDCC-2024 in Edinburg.
ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal argumentation framework (AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program PAF whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
Research Seminar Talk (online) at KRR@UP (Uni Potsdam) on Dec 6, 2023, loosely based on a paper with the same title at the 7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3)
More Related Content
Similar to Dissecting Reproducibility: A case study with ecological niche models in the Whole Tale environment
The document discusses several key concepts related to information architecture and understanding systems. In 3 sentences:
The document discusses the challenges of fragmentation and findability on websites, and how users struggle to understand complex systems when they are described only with words. It emphasizes that information architecture must account for how information and culture are interconnected in systems. The effective design and management of information systems requires understanding the nature of information and how it relates to categories, connections, and consequences within a cultural context.
Presentation to CRC Mental Health Early Career Researcher Workshop, Melbourne 29.11.17 for @andsdata.
Workshop title: A by-product of scientific training: We're all a little bit biased.
The Architecture of Understanding (and Happiness)Peter Morville
This document discusses several key topics related to information architecture and understanding systems. It addresses issues like fragmentation in websites, problems with findability of resources, and the importance of understanding the nature of information in systems. It also discusses concepts like categories, connections, consequences, and culture as they relate to information and understanding. Throughout the document there are various quotes about topics like systems thinking, planning, and the role of the information architect.
Reproducibility of computational research: methods to avoid madness (Session ...Mike Hucka
Introduction on the session "Reproducibility of computational research: methods to avoid madness" held Wednesday, September 17, during ICSB 2014 in Melbourne, Australia, 2014.
Open Educational Resources (OER) are fast gaining traction amongst the academic community as a viable means of increasing access and equity in education. The concept of OER is of especial significance to the marginalised communities in the Global South where distance education is prominent due to the inability of conventional brick and mortar institutions to cope with the growing demand. However, the wider adoption of OER by academics in the Global South has been inhibited due to various socio, economic and technological reasons. One of the major technological inhibitors is the current inability to search for OER which are academically useful and are of an acceptable academic standard. Many technological initiatives have been proposed over the recent past to provide potential solutions to this issue. Among these are OER curartion standards such as GLOBE, federated search, social semantic search and search engines such as DiscoverEd, OCW Finder, Pearson’s Project Blue Sky. The research discussed in this paper is carried out in the form of literature review and informal interviews with experts. The objective of the study is to document the extent of the OER search issues contributing to the slow uptake of the concept of OER. This review paper discusses the current OER search dilemma and the impact of some of the key initiatives which propose potential solutions.
Future agenda: repositories, and the research processMartin Donnelly
This document discusses research data management in the context of non-standard archiving of research outputs, with a focus on challenges in the arts and humanities. It notes that while data reuse has long been integral to various creative disciplines, archiving creative research data presents unique issues not present in scientific disciplines. These include the personal nature of creative works, differentiating between research and personal works, issues with non-digital materials, and the blurry boundaries of creative research processes. The document raises questions around concepts like evidence, facts, and replication in subjective creative research.
Scientific software engineering methods and their validityDaniel Mendez
This document summarizes a talk on scientific methods and their validity given at Technische Universität München. The talk discusses key concepts in the philosophy of science like epistemology and different views of science. It provides an overview of common scientific methods like empirical methods, case studies, and hypothesis testing. The talk delves into challenges of obtaining truth and impacts of human factors. It also discusses how scientific methods can be applied in a PhD dissertation and the importance of increasing validity. The overall document aims to discuss implications of scientific methods for everyday scientific work.
Presentation given at NUI, Galway 2019-04-11 for Open Science Week.
An overview of Early Career Researchers, their innovation and contribution towards Open Infrastructure
The document discusses inquiry as a cognitive process that allows humans to understand their surroundings through discovery, invention, and testing of solutions to problems. It describes inquiry as a process of considered thought rather than reflexive response. It then outlines different types of knowledge generation and structures of disciplinary inquiry, including proto-curiosity, curiosity, replicative, technological, informal personal learning, formal authoritative community instruction, and more.
The paper discusses distributed cognition in an airline cockpit, where the cognitive labor of flying a modern jet is distributed across the crew. It presents a case study simulation of a flight from Sacramento to Los Angeles to illustrate how information processing is distributed across representational media like checklists, displays, and standard operating procedures. The study puts forth the hypothesis that understanding human cognition requires examining how it is distributed in social and cultural systems using tools and artifacts.
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Aaron Sloman
In contrast with ontology developers concerned with a symbolic or digital environment (e.g. the internet), I draw attention to some features of our 3-D spatio-temporal environment that challenge young humans and other intelligent animals and will also challenge future robots. Evolution provides most animals with an ontology that suffices for life, whereas some animals, including humans, also have mechanisms for substantive ontology extension based on results of interacting with the environment. Future human-like robots will also need this. Since pre-verbal human children and many intelligent non-human animals, including hunting mammals, nest-building birds and primates can interact, often creatively, with complex structures and processes in a 3-D environment, that suggests (a) that they use ontologies that include kinds of material (stuff), kinds of structure, kinds of relationship, kinds of process (some of which are process-fragments composed of bits of stuff changing their properties, structures or relationships), and kinds of causal interaction and (b) since they don't use a human communicative language they must use information encoded in some form that existed prior to human communicative languages both in our evolutionary history and in individual development. Since evolution could not have anticipated the ontologies required for all human cultures, including advanced scientific cultures, individuals must have ways of achieving substantive ontology extension. The research reported here aims mainly to develop requirements for explanatory designs. The attempt to develop forms of representation, mechanisms and architectures that meet those requirements will be a long term research project.
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...BillHall
Seminar presentation: University of Melbourne Department of Information Systems, 13 October, 2006. Summarises development of a biologically based theory of knowledge based on combining Karl Popper's evolutionary epistemology (as developed in his 1972 book, Objective Knowlege) and Humberto Maturana and Francisco Varela's concept of autopoiesis (as developed in their 1980 book, Autopoiesis and Cognition).
Open Data and the Social Sciences - OpenCon Community WebcastRight to Research
The document discusses issues with transparency and reproducibility in social science research. It notes that research influences policy and decisions that affect millions of lives. However, weak academic norms like publication bias, p-hacking, non-disclosure, and failure to replicate can distort the body of evidence. The document proposes solutions like pre-registering studies and pre-specifying analyses to address these issues. It also discusses resources and efforts like the Berkeley Initiative for Transparency in the Social Sciences to raise awareness, foster adoption of transparent practices, and identify strategies to improve reproducibility.
This document provides details of a proposed panel discussion on domain analysis at the CoLIS9 conference in Uppsala, Sweden. The panel aims to introduce emerging methodological approaches and analytical techniques for conducting domain analysis. It will feature presentations from several experts in the field, including Birger Hjørland, Sanna Talja, Isto Huvila, Eva Jansen, and Jenna Hartel. They will discuss techniques such as ethnographic studies, arts-informed research, and ecological approaches. The goal is to disrupt normative assumptions about domain analysis and represent the expanding diversity of approaches. The panel also seeks to inspire more researchers to engage with domain analysis and contribute to ongoing debates around research methods in library and information
Similar to Dissecting Reproducibility: A case study with ecological niche models in the Whole Tale environment (20)
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
Yilin Xia (yilinx2@illinois.edu),
Shawn Bowers (bowers@gonzaga.edu),
Lan Li (lanl2@illinois.edu), and
Bertram Ludäscher (ludaesch@illinois.edu)
Presented at IDCC-2024 in Edinburg.
ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal argumentation framework (AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program PAF whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
Research Seminar Talk (online) at KRR@UP (Uni Potsdam) on Dec 6, 2023, loosely based on a paper with the same title at the 7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3)
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3) at
AIxIA 2023: 22nd International Conference of the Italian Association for Artificial Intelligence.
Presentation of a paper by Bertram Ludäscher, Shawn Bowers, and Yilin Xia, given virtually on November 9, 2023.
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
Slides of my PhD defense at the University of Freiburg, 1998.
Statelog and similar state-oriented extensions of Datalog have seen renewed interest subsequently, e.g., see
[Hel10] Hellerstein, J.M., 2010. The declarative imperative: experiences and conjectures in distributed logic. ACM SIGMOD Record, 39(1), pp.5-19.
[AMC+11]
Alvaro, P., Marczak, W.R., Conway, N., Hellerstein, J.M., Maier, D. and Sears, R., 2011. Dedalus: Datalog in time and space. In Datalog Reloaded: First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers (pp. 262-281). Springer
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
This document discusses Statelog, which integrates active and deductive database rules. Statelog allows both active rules, which trigger actions and modify the database, and deductive rules, which derive new facts. It defines the semantics of different types of rules and how they interact. Statelog guarantees termination of rule evaluation at both compile-time and runtime through techniques like state-stratification and delta-monotonicity. It can express complex temporal queries and supports features like nested transactions.
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
This document discusses using provenance information to improve transparency and reproducibility in research. It begins by asking questions about the input data, methods, and parameter settings used in a study in order to assess its reliability. It then provides examples of how workflow systems can capture provenance at both the design level (prospective provenance) and runtime level (retrospective provenance). These include a Kepler workflow that simulates X-ray data collection and provenance traces captured by DataONE. The document argues that provenance is a critical link between workflow modeling and runtime traces that can increase trust in research findings.
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
Keynote at CLIR Workshop (Webinar): Torward Open, Reproducible, and Reusable Research. February 10, 2021. https://reusableresearch.com/
ABSTRACT. The “reproducibility crisis” has resulted in much interest in methods and tools to improve computational reproducibility. FAIR data principles (data should be findable, accessible, interoperable, and reusable) are also being adapted and evolved to apply to other artifacts, notably computational analyses (scientific workflows, Jupyter notebooks, etc.). The current focus on computational reproducibility of scripts and other computational workflows sometimes overshadows a somewhat neglected and arguably more important issue: transparency of data analysis, including data wrangling and cleaning. In this talk I will ask the question: What information is gained by conducting a reproducibility experiment? This leads to a simple model (PRIMAD) that aims to answer this question by sorting out different scenarios. Finally, I will present some features of Whole-Tale, a computational platform for reproducible and transparent computational experiments.
By Michael Gryk and Bertram Ludäscher. Presented at 2020 JCDL-SIGCM Workshop, August 1, 2020.
ABSTRACT. Conceptual models can serve multiple purposes: communication of information between stakeholders, information abstraction and generalization, and information organization for archival and retrieval. An ongoing research question is how to formally define the fit-for-purpose of a conceptual model as well as to define metrics or tests to determine whether a given model faithfully supports a designated purpose.
This paper summarizes preliminary investigations in this area by presenting toy problems along with different conceptual models for the system under study. It is argued that the different models are adequate in supporting a sophisticated query and yet they adopt different normalization schemes and will differ in expressiveness depending on the implied purpose of the models. As the subtitle suggests, this work is intended to be primarily exploratory as to the constraints a formal system would require in defining the “usefulness”, “expressiveness” and “equivalence” of conceptual models.
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
The document discusses prospective and retrospective provenance in scientific workflows. Prospective provenance involves modeling the workflow design, while retrospective provenance records the workflow execution. The YesWorkflow and noWorkflow tools demonstrate these two types of provenance. YesWorkflow annotates scripts to recreate a workflow model from the script, while noWorkflow records step-by-step runtime logs. Combining both approaches provides a more complete view of a workflow's provenance. Maintaining provenance is important for reproducibility and understanding the origins of scientific results.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
PWE: Datalog & ASP for the Rest of Us discusses using Possible Worlds Explorer (PWE) to make Datalog and Answer Set Programming (ASP) more accessible to non-experts. It covers topics like using provenance to explain query results, capturing rule firings to track provenance, representing provenance as a graph, using states to track derivation rounds, and declarative profiling of Datalog programs. The presentation advocates for tools like PWE that wrap Datalog/ASP engines to combine them with Python ecosystems and allow interactive use in Jupyter notebooks. This makes the languages more approachable and helps users build on existing work by experimenting further.
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
Deductive Databases & Logic Programs: Back to the Future!
Colloquium talk on the occasion of the retirement of Prof. Dr. Georg Lausen, May 10th, 2019, Universität Freiburg, Germany
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
Talk given at "Problems and techniques for Incremental Re-computation: provenance and beyond".
A workshop co-organized with Provenance Week 2018
King's College London, 12th and 13th July, 2018
Organizers: Paolo Missier (Newcastle University), Tanu Malik (DePaul University), Jacek Cala (Newcastle University)
Abstract: Incremental recomputation has applications, e.g., in databases and workflow systems. Methods and algorithms for recomputation depend on the underlying model of computation (MoC) and model of provenance (MoP). This relation is explored with some examples from databases and workflow systems.
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
Presentation slides of paper by Shawn Bowers, Timothy McPhillips, and Bertram Ludäscher, given by Shawn at Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, King's College London, UK, July 9-10, 2018.
The paper won a the IPAW best paper award: https://twitter.com/kbelhajj/status/1017082775856467968
ABSTRACT. An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage'' relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs. Determining lineage relationships, however, requires an understanding of the dependency patterns that exist between each workflow step's inputs and outputs, and this information is often under-specified or generally assumed by workflow systems. For instance, most approaches assume all outputs depend on all inputs, which can lead to lineage "false positives''. In prior work, we defined annotations for specifying detailed dependency relationships between inputs and outputs of computation steps. These annotations are used to define corresponding rules for inferring fine-grained data dependencies from a trace. In this paper, we extend our previous work by considering the impact of dependency annotations on workflow specifications. In particular, we provide a reasoning framework to ensure the set of dependency annotations on a workflow specification is consistent. The framework can also infer a complete set of annotations given a partially annotated workflow. Finally, we describe an implementation of the reasoning framework using answer-set programming.
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
Presentation given by Bertram at the Data Integration in the Life Sciences (DILS) Workshop in Leipzig, Germany, 2004.
Reference:
Bowers, Shawn, and Bertram Ludäscher. "An ontology-driven framework for data transformation in scientific workflows." In International Workshop on Data Integration in the Life Sciences (DILS), pp. 1-16. Springer, 2004.
So this isn't new -- but still relevant :-)
ABSTRACT. Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate seman- tic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, ap- propriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.
The document describes the Whole Tale platform, which aims to facilitate reproducibility in computational research. Whole Tale allows researchers to package computational narratives, data, code, and provenance information into "tales" that can be shared and re-executed. Key features of Whole Tale include running interactive notebooks, versioning and sharing tales, and integrating provenance tracking tools to provide transparency into computational workflows. The speaker demonstrates several example tales and discusses upcoming Whole Tale features and applications in different domains like archaeology, astronomy, and materials science.
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
The document discusses computational provenance and the need for tracking data lineage and workflow processes. It presents several tools and projects that aim to capture and manage provenance information, including DataONE, SKOPE, KURATOR, WHOLE-TALE, and YesWorkflow. The document argues that provenance is important for understanding what happened in computational and data-driven research in order to ensure transparency and reproducibility.
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionBertram Ludäscher
The document discusses two ideas: 1) Embracing multiple possible worlds by using techniques like answer set programming to represent alternative scenarios rather than a single consensus view. 2) Abandoning strict adherence to technology stacks and standards ("techno-ligion") by focusing on simple powerful solutions, using natural language when possible, and paying a fee each time a complex technical term is used. It suggests using techniques like technology golf to explore problems through minimal programs instead of lengthy debates over formal representations.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. All-in-One (Teaser)• Reproducibility Crisis in Science
• A conceptual tool: Provenance
• Transparency? Explanation? Provenance !
• … why-, how-, where-, why-not-, data-, workflow- ... provenance ...
• Terminological Chaos Reigns
– … replicability … reproducibility … repeatability …
• A modest proposal and (evolving) conceptual tool: PRIMAD
– What’s fixed? What varies? (X à X’, Y à Y’ , … )
– What is the information gain when succeeding, failing to reproduce?
• Tool Tools (cf. audio-book, e-book, book-book)
– Computational Reproducibility? Whole-Tale (vms++) !
– Modeling (Dataflow) Dependencies? YesWorkflow !
– Terminological Confusion? EulerX ! (“Semantics”)
• A Case Study
– Whole Tale Summer Internship (Santiago Núñez-Corrales):
– Reproducibility in Ecological Niche Models: the case of Phillips et al (2006)
Ludäscher & Núñez-Corrales
Whole Tale
6. Computational Provenance …
• Origin, processing history of artifacts
– data products, figures, ...
– also: underlying workflow
è understand methods, dataflow, and dependencies
è role of computational provenance in HoH !?
Ludäscher & Núñez-Corrales
Climate Change Impacts
in the United States
U.S. National Climate Assessment
U.S. Global Change Research Program
9. : Provenance in DataONE
A DataONE search (here: “grass”) yields different packages with Data Provenance
(not covered: Semantic Search)
Ludäscher & Núñez-Corrales
10. Exploring Provenance in DataONE
• Let’s go there è Mark Carls. 2017. Analysis of hydrocarbons following
the Exxon Valdez oil spill, Gulf of Alaska, 1989 - 2014. Gulf of Alaska
Data Portal. urn:uuid:3249ada0-afe3-4dd6-875e-0f7928a4c171.
Ludäscher & Núñez-Corrales
13. Adding YesWorkflow to DataONE
Yaxing’s script with
inputs & output
products
Christopher’s
YesWorkflow
model
Christopher using
Yaxing’s outputs as
inputs for his script
Christopher’s results
can be traced back all
the way to Yaxing’s
input
Ludäscher & Núñez-Corrales
17. To succeed or to fail? What do we gain?
• Successful reproducibility study:
– increases trust in prior study J
– … but no surprises L
• Failed reproducibility study :
– decreases trust (or falsifies) prior study L
– … but surprising failure yields new info/knowledge J
• Learning from failures!
– not really a totally new idea..
– What does a positive vs negative result mean anyways?
– When developing s/w, tools: fail early, fail often ...
Ludäscher & Núñez-Corrales
21. • SKOPE: system and tools to discover, access,
analyze, visualize paleoenvironmental data
– unprecedented ability to explore provenance
(detailed, comprehensible record of computational
derivation of results)
– for researchers, tinkerers, and modelers
• Whole Tale:
– leverage & contribute to existing CI to support the
whole tale (“living paper”), from workflow run to
scholarly publication
– integrate tools & CI (DataONE, Globus, iRODS,
NDS, ...) to simplify use and promote best
practices.
– driven by science WGs (Archaeology/SKOPE,
materials science, astro, bio ..)
But first: Some Tools (“Cyberinfrastructure”)
Ludäscher: Provenance Back & Forth 21
25. Project Goals (… Reproducibility in Ecological Niche Models … )
● Try to reproduce one set of results reported in the literature
using maximum entropy methods (MaxEnt) within The Whole
Tale environment
○ Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum
entropy modeling of species geographic distributions. Ecological
modelling, 190(3-4), 231-259.
● Determine whether existing software tools focus more on the
scientific modeling problem instead of on software usage
while covering reproducibility concerns
○ Not with existing tools, either incomplete or desktop-based, not
comparable
● Build scientific software for ecological niche modeling that
helps users diversify and trace their stories
○ Introspection-based model
26. intros-MaxEnt: view in PRIMAD++
Actions Parameter Raw data Platform /
Stack
Implem. Method Research
Objective
Actor Gain
Re-code (x) x Run MaxEnt models in the Whole Tale
Validate (x) (x) (x) (x) x Determine MaxEnt robustness factors
Re-use x Increase the user base for MaxEnt methods
Independent x x Collectively verify MaxEnt experiments
Introspect (x) (x) x Explore and adjust model contents
Diff (x) (x) (x) x Test hypotheses dependent on state-change
Trace (log) (x) (x) (x) (x) x Capture time-dep decision modeling pathways
Package (x) (x) (x) (x) x Provide a zero cold-start entry for experiments
Freire, J., Fuhr, N., & Rauber, A. (2016). Reproducibility of data-oriented experiments in e-Science (Dagstuhl Seminar 16041). In Dagstuhl Reports (Vol. 6, No. 1). Schloss Dagstuhl-Leibniz-Zentrum fuer
Informatik.
27. Ecological niche models ..
1. Positive observations (i.e. presence-only data) suffice to
compute a distribution of a species
2. The likelihood of the presence of an individual depends on
biologically relevant environmental factors
3. Interactions between species can be abstracted as
environmental factors, hence not modeled explicitly
4. The distribution is stated in terms of the probability of finding
a member of the species at the locations of interest
5. An exact fit is not a good fit, but rather an overfit
38. Summary of Outcomes
1. Able to execute a version of MaxEnt with original data from
Phillips et al (2006) within The Whole Tale
a. Stated in terms of a regularized support vector machine
(complex code!)
b. Discovered problems with reproducibility and how to
evaluate it
2. A tool for batch georeferencing DarwinCore based on minimal
location data was implemented
a. Helpful to assign geolocation data after taxonomy
alignment
b. Discovered data is much less clean than expected
3. A new “introspective” software version of MaxEnt
a. Available in PyPI
b. Based on a state machine
39. ... now what?
• PRIMAD ++
– PRIMAD is built on the idea of
– … keeping some things the same
– ... and “wiggling” some things
– We can start from the “execution stack”:
• Hardware … Operating System … Libraries ... PLs ... IDEs ..
– Then going into the domain:
• … varying datasets, parameters, assumptions ...
– Experimental Design ++ !
• PRIMAD ++ HoH (v2?)
• Tools to support
– “higher order” {data, parameter, method, …} sweeps
– Automate these (workflow tools!)
Ludäscher & Núñez-Corrales
43. Taxonomic concept alignment, Andropogon glomeratus-virginicus
complex, spanning across 11 classifications authored 1889-2015
• 36 unique taxonomic names
• 88 taxonomic concept labels
Þ name sec. author strings
• Alignment by A.S. Weakley
Þ row position = congruence
• 1/36 names with unique 1 : 1
name : meaning cardinality
across all classifications
• Andropogon virginicus
• Source: Franz et al. 20161
1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex.
Semantic Web Journal (IOS). doi:10.3233/SW-160220
Ludäscher & Núñez-Corrales
45. Half-Smokes in DC: Typical for the Northeast?
… or the South !? (A tale of two taxonomies: NDC vs CEN)
“…in the face of incompatible information or data structures among users or among those specifying
the system, attempts to create unitary knowledge categories are futile. Rather, parallel or multiple
representational forms are required” [Bowker & Star, 2000, p.159]
West
Southwest Southeast
Midwest North-
east
West
South
Midwest North-
east
National Diversity Council map (NDC) US Census Buero map (CEN)
Source: Yi-Yun (Jessica) Cheng (PhD student, iSchool @ Illinois)
Ludäscher & Núñez-Corrales