Research Seminar Talk (online) at KRR@UP (Uni Potsdam) on Dec 6, 2023, loosely based on a paper with the same title at the 7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3)
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3) at
AIxIA 2023: 22nd International Conference of the Italian Association for Artificial Intelligence.
Presentation of a paper by Bertram Ludäscher, Shawn Bowers, and Yilin Xia, given virtually on November 9, 2023.
This talk covers the indexing structures considered and ultimately implemented in the Apache Lucene Open Source Project along with the 25 - 30X boost in performance and centimeter spatial accuracy achieved in the latest release. Have a look and see what's next for scalable Geospatial Search in Apache Lucene and Elasticsearch.
Presentation given at LogicBlox, Atlanta. December 2012. See also: Köhler, Sven, Bertram Ludäscher, and Daniel Zinn. 2013. “First-Order Provenance Games.” In Search of Elegance in the Theory and Practice of Computation, edited by Val Tannen, Limsoon Wong, Leonid Libkin, Wenfei Fan, Wang-Chiew Tan, and Michael Fourman, 8000:382–99. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
1. The document examines a local Nigerian game called "tsorry checkerboard" and applies group theory concepts.
2. The game is played on a 2x2 board with each player having up to 3 pieces, and the goal is to line up all three of one's pieces horizontally, vertically, or diagonally.
3. The possible moves of each piece (vertical, horizontal, diagonal, or staying in place) form a Klein four-group, satisfying the group properties of closure, associativity, identity, and inverses. Therefore, group theory can be applied to model the game.
The document discusses probability and provides examples to explain key concepts such as sample space, probability calculations, independent and dependent events, odds, and more. Probability is defined as the chance of an event occurring and is calculated by taking the number of outcomes in the event and dividing by the total number of possible outcomes. A variety of examples using coins, cards, and dice help illustrate how to determine probabilities and odds for different scenarios.
This document discusses using game theory to model provenance. It presents provenance as games where positions are won, lost, or drawn based on successor positions. Solving these provenance games determines why a position was won or lost. The document proposes that provenance games can provide a uniform approach for both why-provenance and why-not provenance. It also notes that constraints may be needed to handle domain dependencies in provenance games.
The document discusses a study that found cooperating with others activates reward centers in the brain. Researchers used brain imaging to study women playing a game where they could choose cooperation or not. Surprisingly, the women experienced the most pleasure when both chose cooperation over acting selfishly. The longer they cooperated, the stronger the brain's reward response became. This suggests humans are wired to experience joy from cooperation with others.
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3) at
AIxIA 2023: 22nd International Conference of the Italian Association for Artificial Intelligence.
Presentation of a paper by Bertram Ludäscher, Shawn Bowers, and Yilin Xia, given virtually on November 9, 2023.
This talk covers the indexing structures considered and ultimately implemented in the Apache Lucene Open Source Project along with the 25 - 30X boost in performance and centimeter spatial accuracy achieved in the latest release. Have a look and see what's next for scalable Geospatial Search in Apache Lucene and Elasticsearch.
Presentation given at LogicBlox, Atlanta. December 2012. See also: Köhler, Sven, Bertram Ludäscher, and Daniel Zinn. 2013. “First-Order Provenance Games.” In Search of Elegance in the Theory and Practice of Computation, edited by Val Tannen, Limsoon Wong, Leonid Libkin, Wenfei Fan, Wang-Chiew Tan, and Michael Fourman, 8000:382–99. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
1. The document examines a local Nigerian game called "tsorry checkerboard" and applies group theory concepts.
2. The game is played on a 2x2 board with each player having up to 3 pieces, and the goal is to line up all three of one's pieces horizontally, vertically, or diagonally.
3. The possible moves of each piece (vertical, horizontal, diagonal, or staying in place) form a Klein four-group, satisfying the group properties of closure, associativity, identity, and inverses. Therefore, group theory can be applied to model the game.
The document discusses probability and provides examples to explain key concepts such as sample space, probability calculations, independent and dependent events, odds, and more. Probability is defined as the chance of an event occurring and is calculated by taking the number of outcomes in the event and dividing by the total number of possible outcomes. A variety of examples using coins, cards, and dice help illustrate how to determine probabilities and odds for different scenarios.
This document discusses using game theory to model provenance. It presents provenance as games where positions are won, lost, or drawn based on successor positions. Solving these provenance games determines why a position was won or lost. The document proposes that provenance games can provide a uniform approach for both why-provenance and why-not provenance. It also notes that constraints may be needed to handle domain dependencies in provenance games.
The document discusses a study that found cooperating with others activates reward centers in the brain. Researchers used brain imaging to study women playing a game where they could choose cooperation or not. Surprisingly, the women experienced the most pleasure when both chose cooperation over acting selfishly. The longer they cooperated, the stronger the brain's reward response became. This suggests humans are wired to experience joy from cooperation with others.
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
Yilin Xia (yilinx2@illinois.edu),
Shawn Bowers (bowers@gonzaga.edu),
Lan Li (lanl2@illinois.edu), and
Bertram Ludäscher (ludaesch@illinois.edu)
Presented at IDCC-2024 in Edinburg.
ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal argumentation framework (AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program PAF whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
Slides of my PhD defense at the University of Freiburg, 1998.
Statelog and similar state-oriented extensions of Datalog have seen renewed interest subsequently, e.g., see
[Hel10] Hellerstein, J.M., 2010. The declarative imperative: experiences and conjectures in distributed logic. ACM SIGMOD Record, 39(1), pp.5-19.
[AMC+11]
Alvaro, P., Marczak, W.R., Conway, N., Hellerstein, J.M., Maier, D. and Sears, R., 2011. Dedalus: Datalog in time and space. In Datalog Reloaded: First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers (pp. 262-281). Springer
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
This document discusses Statelog, which integrates active and deductive database rules. Statelog allows both active rules, which trigger actions and modify the database, and deductive rules, which derive new facts. It defines the semantics of different types of rules and how they interact. Statelog guarantees termination of rule evaluation at both compile-time and runtime through techniques like state-stratification and delta-monotonicity. It can express complex temporal queries and supports features like nested transactions.
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
This document discusses using provenance information to improve transparency and reproducibility in research. It begins by asking questions about the input data, methods, and parameter settings used in a study in order to assess its reliability. It then provides examples of how workflow systems can capture provenance at both the design level (prospective provenance) and runtime level (retrospective provenance). These include a Kepler workflow that simulates X-ray data collection and provenance traces captured by DataONE. The document argues that provenance is a critical link between workflow modeling and runtime traces that can increase trust in research findings.
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
Keynote at CLIR Workshop (Webinar): Torward Open, Reproducible, and Reusable Research. February 10, 2021. https://reusableresearch.com/
ABSTRACT. The “reproducibility crisis” has resulted in much interest in methods and tools to improve computational reproducibility. FAIR data principles (data should be findable, accessible, interoperable, and reusable) are also being adapted and evolved to apply to other artifacts, notably computational analyses (scientific workflows, Jupyter notebooks, etc.). The current focus on computational reproducibility of scripts and other computational workflows sometimes overshadows a somewhat neglected and arguably more important issue: transparency of data analysis, including data wrangling and cleaning. In this talk I will ask the question: What information is gained by conducting a reproducibility experiment? This leads to a simple model (PRIMAD) that aims to answer this question by sorting out different scenarios. Finally, I will present some features of Whole-Tale, a computational platform for reproducible and transparent computational experiments.
By Michael Gryk and Bertram Ludäscher. Presented at 2020 JCDL-SIGCM Workshop, August 1, 2020.
ABSTRACT. Conceptual models can serve multiple purposes: communication of information between stakeholders, information abstraction and generalization, and information organization for archival and retrieval. An ongoing research question is how to formally define the fit-for-purpose of a conceptual model as well as to define metrics or tests to determine whether a given model faithfully supports a designated purpose.
This paper summarizes preliminary investigations in this area by presenting toy problems along with different conceptual models for the system under study. It is argued that the different models are adequate in supporting a sophisticated query and yet they adopt different normalization schemes and will differ in expressiveness depending on the implied purpose of the models. As the subtitle suggests, this work is intended to be primarily exploratory as to the constraints a formal system would require in defining the “usefulness”, “expressiveness” and “equivalence” of conceptual models.
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
The document discusses prospective and retrospective provenance in scientific workflows. Prospective provenance involves modeling the workflow design, while retrospective provenance records the workflow execution. The YesWorkflow and noWorkflow tools demonstrate these two types of provenance. YesWorkflow annotates scripts to recreate a workflow model from the script, while noWorkflow records step-by-step runtime logs. Combining both approaches provides a more complete view of a workflow's provenance. Maintaining provenance is important for reproducibility and understanding the origins of scientific results.
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
PWE: Datalog & ASP for the Rest of Us discusses using Possible Worlds Explorer (PWE) to make Datalog and Answer Set Programming (ASP) more accessible to non-experts. It covers topics like using provenance to explain query results, capturing rule firings to track provenance, representing provenance as a graph, using states to track derivation rounds, and declarative profiling of Datalog programs. The presentation advocates for tools like PWE that wrap Datalog/ASP engines to combine them with Python ecosystems and allow interactive use in Jupyter notebooks. This makes the languages more approachable and helps users build on existing work by experimenting further.
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
Deductive Databases & Logic Programs: Back to the Future!
Colloquium talk on the occasion of the retirement of Prof. Dr. Georg Lausen, May 10th, 2019, Universität Freiburg, Germany
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
1) The document describes a workshop on research synthesis and reproducibility.
2) It discusses challenges with reproducibility in science and proposes provenance and conceptual tools like PRIMAD to help address these challenges.
3) The document presents a case study where an intern was able to reproduce results from a 2006 ecological niche modeling paper using the Whole Tale environment and MaxEnt software, demonstrating computational reproducibility.
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
Talk given at "Problems and techniques for Incremental Re-computation: provenance and beyond".
A workshop co-organized with Provenance Week 2018
King's College London, 12th and 13th July, 2018
Organizers: Paolo Missier (Newcastle University), Tanu Malik (DePaul University), Jacek Cala (Newcastle University)
Abstract: Incremental recomputation has applications, e.g., in databases and workflow systems. Methods and algorithms for recomputation depend on the underlying model of computation (MoC) and model of provenance (MoP). This relation is explored with some examples from databases and workflow systems.
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
Presentation slides of paper by Shawn Bowers, Timothy McPhillips, and Bertram Ludäscher, given by Shawn at Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, King's College London, UK, July 9-10, 2018.
The paper won a the IPAW best paper award: https://twitter.com/kbelhajj/status/1017082775856467968
ABSTRACT. An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage'' relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs. Determining lineage relationships, however, requires an understanding of the dependency patterns that exist between each workflow step's inputs and outputs, and this information is often under-specified or generally assumed by workflow systems. For instance, most approaches assume all outputs depend on all inputs, which can lead to lineage "false positives''. In prior work, we defined annotations for specifying detailed dependency relationships between inputs and outputs of computation steps. These annotations are used to define corresponding rules for inferring fine-grained data dependencies from a trace. In this paper, we extend our previous work by considering the impact of dependency annotations on workflow specifications. In particular, we provide a reasoning framework to ensure the set of dependency annotations on a workflow specification is consistent. The framework can also infer a complete set of annotations given a partially annotated workflow. Finally, we describe an implementation of the reasoning framework using answer-set programming.
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
Presentation given by Bertram at the Data Integration in the Life Sciences (DILS) Workshop in Leipzig, Germany, 2004.
Reference:
Bowers, Shawn, and Bertram Ludäscher. "An ontology-driven framework for data transformation in scientific workflows." In International Workshop on Data Integration in the Life Sciences (DILS), pp. 1-16. Springer, 2004.
So this isn't new -- but still relevant :-)
ABSTRACT. Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate seman- tic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, ap- propriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.
The document describes the Whole Tale platform, which aims to facilitate reproducibility in computational research. Whole Tale allows researchers to package computational narratives, data, code, and provenance information into "tales" that can be shared and re-executed. Key features of Whole Tale include running interactive notebooks, versioning and sharing tales, and integrating provenance tracking tools to provide transparency into computational workflows. The speaker demonstrates several example tales and discusses upcoming Whole Tale features and applications in different domains like archaeology, astronomy, and materials science.
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
The document discusses computational provenance and the need for tracking data lineage and workflow processes. It presents several tools and projects that aim to capture and manage provenance information, including DataONE, SKOPE, KURATOR, WHOLE-TALE, and YesWorkflow. The document argues that provenance is important for understanding what happened in computational and data-driven research in order to ensure transparency and reproducibility.
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionBertram Ludäscher
The document discusses two ideas: 1) Embracing multiple possible worlds by using techniques like answer set programming to represent alternative scenarios rather than a single consensus view. 2) Abandoning strict adherence to technology stacks and standards ("techno-ligion") by focusing on simple powerful solutions, using natural language when possible, and paying a fee each time a complex technical term is used. It suggests using techniques like technology golf to explore problems through minimal programs instead of lengthy debates over formal representations.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
Yilin Xia (yilinx2@illinois.edu),
Shawn Bowers (bowers@gonzaga.edu),
Lan Li (lanl2@illinois.edu), and
Bertram Ludäscher (ludaesch@illinois.edu)
Presented at IDCC-2024 in Edinburg.
ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal argumentation framework (AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program PAF whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
Slides of my PhD defense at the University of Freiburg, 1998.
Statelog and similar state-oriented extensions of Datalog have seen renewed interest subsequently, e.g., see
[Hel10] Hellerstein, J.M., 2010. The declarative imperative: experiences and conjectures in distributed logic. ACM SIGMOD Record, 39(1), pp.5-19.
[AMC+11]
Alvaro, P., Marczak, W.R., Conway, N., Hellerstein, J.M., Maier, D. and Sears, R., 2011. Dedalus: Datalog in time and space. In Datalog Reloaded: First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers (pp. 262-281). Springer
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
This document discusses Statelog, which integrates active and deductive database rules. Statelog allows both active rules, which trigger actions and modify the database, and deductive rules, which derive new facts. It defines the semantics of different types of rules and how they interact. Statelog guarantees termination of rule evaluation at both compile-time and runtime through techniques like state-stratification and delta-monotonicity. It can express complex temporal queries and supports features like nested transactions.
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
This document discusses using provenance information to improve transparency and reproducibility in research. It begins by asking questions about the input data, methods, and parameter settings used in a study in order to assess its reliability. It then provides examples of how workflow systems can capture provenance at both the design level (prospective provenance) and runtime level (retrospective provenance). These include a Kepler workflow that simulates X-ray data collection and provenance traces captured by DataONE. The document argues that provenance is a critical link between workflow modeling and runtime traces that can increase trust in research findings.
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
Keynote at CLIR Workshop (Webinar): Torward Open, Reproducible, and Reusable Research. February 10, 2021. https://reusableresearch.com/
ABSTRACT. The “reproducibility crisis” has resulted in much interest in methods and tools to improve computational reproducibility. FAIR data principles (data should be findable, accessible, interoperable, and reusable) are also being adapted and evolved to apply to other artifacts, notably computational analyses (scientific workflows, Jupyter notebooks, etc.). The current focus on computational reproducibility of scripts and other computational workflows sometimes overshadows a somewhat neglected and arguably more important issue: transparency of data analysis, including data wrangling and cleaning. In this talk I will ask the question: What information is gained by conducting a reproducibility experiment? This leads to a simple model (PRIMAD) that aims to answer this question by sorting out different scenarios. Finally, I will present some features of Whole-Tale, a computational platform for reproducible and transparent computational experiments.
By Michael Gryk and Bertram Ludäscher. Presented at 2020 JCDL-SIGCM Workshop, August 1, 2020.
ABSTRACT. Conceptual models can serve multiple purposes: communication of information between stakeholders, information abstraction and generalization, and information organization for archival and retrieval. An ongoing research question is how to formally define the fit-for-purpose of a conceptual model as well as to define metrics or tests to determine whether a given model faithfully supports a designated purpose.
This paper summarizes preliminary investigations in this area by presenting toy problems along with different conceptual models for the system under study. It is argued that the different models are adequate in supporting a sophisticated query and yet they adopt different normalization schemes and will differ in expressiveness depending on the implied purpose of the models. As the subtitle suggests, this work is intended to be primarily exploratory as to the constraints a formal system would require in defining the “usefulness”, “expressiveness” and “equivalence” of conceptual models.
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
The document discusses prospective and retrospective provenance in scientific workflows. Prospective provenance involves modeling the workflow design, while retrospective provenance records the workflow execution. The YesWorkflow and noWorkflow tools demonstrate these two types of provenance. YesWorkflow annotates scripts to recreate a workflow model from the script, while noWorkflow records step-by-step runtime logs. Combining both approaches provides a more complete view of a workflow's provenance. Maintaining provenance is important for reproducibility and understanding the origins of scientific results.
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
PWE: Datalog & ASP for the Rest of Us discusses using Possible Worlds Explorer (PWE) to make Datalog and Answer Set Programming (ASP) more accessible to non-experts. It covers topics like using provenance to explain query results, capturing rule firings to track provenance, representing provenance as a graph, using states to track derivation rounds, and declarative profiling of Datalog programs. The presentation advocates for tools like PWE that wrap Datalog/ASP engines to combine them with Python ecosystems and allow interactive use in Jupyter notebooks. This makes the languages more approachable and helps users build on existing work by experimenting further.
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
Deductive Databases & Logic Programs: Back to the Future!
Colloquium talk on the occasion of the retirement of Prof. Dr. Georg Lausen, May 10th, 2019, Universität Freiburg, Germany
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
1) The document describes a workshop on research synthesis and reproducibility.
2) It discusses challenges with reproducibility in science and proposes provenance and conceptual tools like PRIMAD to help address these challenges.
3) The document presents a case study where an intern was able to reproduce results from a 2006 ecological niche modeling paper using the Whole Tale environment and MaxEnt software, demonstrating computational reproducibility.
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
Talk given at "Problems and techniques for Incremental Re-computation: provenance and beyond".
A workshop co-organized with Provenance Week 2018
King's College London, 12th and 13th July, 2018
Organizers: Paolo Missier (Newcastle University), Tanu Malik (DePaul University), Jacek Cala (Newcastle University)
Abstract: Incremental recomputation has applications, e.g., in databases and workflow systems. Methods and algorithms for recomputation depend on the underlying model of computation (MoC) and model of provenance (MoP). This relation is explored with some examples from databases and workflow systems.
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
Presentation slides of paper by Shawn Bowers, Timothy McPhillips, and Bertram Ludäscher, given by Shawn at Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, King's College London, UK, July 9-10, 2018.
The paper won a the IPAW best paper award: https://twitter.com/kbelhajj/status/1017082775856467968
ABSTRACT. An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage'' relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs. Determining lineage relationships, however, requires an understanding of the dependency patterns that exist between each workflow step's inputs and outputs, and this information is often under-specified or generally assumed by workflow systems. For instance, most approaches assume all outputs depend on all inputs, which can lead to lineage "false positives''. In prior work, we defined annotations for specifying detailed dependency relationships between inputs and outputs of computation steps. These annotations are used to define corresponding rules for inferring fine-grained data dependencies from a trace. In this paper, we extend our previous work by considering the impact of dependency annotations on workflow specifications. In particular, we provide a reasoning framework to ensure the set of dependency annotations on a workflow specification is consistent. The framework can also infer a complete set of annotations given a partially annotated workflow. Finally, we describe an implementation of the reasoning framework using answer-set programming.
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
Presentation given by Bertram at the Data Integration in the Life Sciences (DILS) Workshop in Leipzig, Germany, 2004.
Reference:
Bowers, Shawn, and Bertram Ludäscher. "An ontology-driven framework for data transformation in scientific workflows." In International Workshop on Data Integration in the Life Sciences (DILS), pp. 1-16. Springer, 2004.
So this isn't new -- but still relevant :-)
ABSTRACT. Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate seman- tic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, ap- propriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.
The document describes the Whole Tale platform, which aims to facilitate reproducibility in computational research. Whole Tale allows researchers to package computational narratives, data, code, and provenance information into "tales" that can be shared and re-executed. Key features of Whole Tale include running interactive notebooks, versioning and sharing tales, and integrating provenance tracking tools to provide transparency into computational workflows. The speaker demonstrates several example tales and discusses upcoming Whole Tale features and applications in different domains like archaeology, astronomy, and materials science.
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
The document discusses computational provenance and the need for tracking data lineage and workflow processes. It presents several tools and projects that aim to capture and manage provenance information, including DataONE, SKOPE, KURATOR, WHOLE-TALE, and YesWorkflow. The document argues that provenance is important for understanding what happened in computational and data-driven research in order to ensure transparency and reproducibility.
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionBertram Ludäscher
The document discusses two ideas: 1) Embracing multiple possible worlds by using techniques like answer set programming to represent alternative scenarios rather than a single consensus view. 2) Abandoning strict adherence to technology stacks and standards ("techno-ligion") by focusing on simple powerful solutions, using natural language when possible, and paying a fee each time a complex technical term is used. It suggests using techniques like technology golf to explore problems through minimal programs instead of lengthy debates over formal representations.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
1. 1
Games, Queries, and Argumentation Frameworks:
Time for a Family Reunion!
Bertram Ludäscher1, Shawn Bowers2, Yilin Xia1
1 School of Information Sciences, University of Illinois, Urbana-Champaign, IL, USA
2 Department of Computer Science, Gonzaga University, WA, USA
{ludaesch,yilinx2}@illinois.edu
bowers@gonzaga.edu
7th Workshop on Advances in Argumentation in Artificial Intelligence (AI3)
AIxIA 2023: 22nd International Conference of the Italian Association for Artificial Intelligence
2. Games, Queries, Argumentation
Outline
1. What’s this? (a puzzle)
2. Identical Twins & Some History
3. The Correspondence
4. Harvesting Time (translational research)
5. Family Reunion & Clingo clinic J
KRR@UP Seminar, Dec 6 2023 2
3. Games, Queries, Argumentation
What’s this? (an easy puzzle ..)
• q --> e, e, e.
• q(X,Y) :- e(X,A), e(A,B), e(B,Y).
• Input: digraph with edges e(X,Y)
• Output: binary answer relation q(X,Y)
• q(X,Y) iff there is a path of length=3 from X to Y in e/2.
KRR@UP Seminar, Dec 6 2023 3
4. Games, Queries, Argumentation
What’s (not) in a query?
• q(X,Y) :- e(X,A), e(A,B), e(B,Y).
• We can interpret e/2 differently => output q/2 is a different relation
• e/2 ≅ parent/2
• => q/2 ≅ great_grandparent/2
• e/2 ≅ one_hour_trail/2
• => q/2 ≅ three_hour_hike/2
KRR@UP Seminar, Dec 6 2023 4
Bonus question:
How many patterns are there for
hikes? For great-grandparents?
5. Games, Queries, Argumentation
What’s this? (a harder query puzzle ..)
• q(X) :- e(X,Y), not q(Y).
•Standard LP semantics:
•stratified
•stable models
•well-founded
KRR@UP Seminar, Dec 6 2023 5
6. Games, Queries, Argumentation
What’s this? (a harder query puzzle ..)
• q(X) :- e(X,Y), not q(Y).
• Stable models => (complement of) graph kernels of G = (V,E).
• K ⊆ V is a kernel if K is independent and dominating (aka absorbing).
• out(X) :- e(X,Y), not out(Y).
• in(X) :- not out(X).
KRR@UP Seminar, Dec 6 2023 6
7. Games, Queries, Argumentation
What’s this? (a harder query puzzle ..)
• q(X) :- e(X,Y), not q(Y).
• Well-founded model => solves the game G = (Positions,Move).
• win(X) :- move(X,Y), not win(Y).
KRR@UP Seminar, Dec 6 2023 7
8. Games, Queries, Argumentation
What’s this? (a harder query puzzle ..)
• q(X) :- e(X,Y), not q(Y).
• Stable and Well-founded model
• => solves the Argumentation Framework AF = (Args, Attacks).
• defeated(X) :- attacks(Y,X), not defeated(Y).
• defeated(X) :- attacked_by(X,Y), not defeated(Y).
KRR@UP Seminar, Dec 6 2023 8
9. Games, Queries, Argumentation
Summary: One rule to rule them all …
• q(X) :- e(X,Y), not q(Y).
• win(X) :- move(X,Y), not win(Y).
• defeated(X) :- attacked_by(X,Y), not defeated(Y).
• kerC(X) :- edge(X,Y), not kerC(Y).
• Has this been known in AF?
• … or hiding in plain sight?
KRR@UP Seminar, Dec 6 2023 9
18. A question from the DB-Theory “bible” [AHV95]
18
Well-founded (WF-)Datalog queries
have 3-valued models in general.
Can every query Q in WF-Datalog-3
be rewritten into an equivalent Q’ in
WF-Datalog-2?
=> Total WF-Datalog-2 =?=
Partial WF-Datalog-3?
Example:
Can we detected draws for GAME?
win(X) :- move(X,Y), not win(Y).
KRR@UP Seminar, Dec 6 2023
19. … answering the question! [FKL-ICDT’97]
19
All you need is GAME!
(i.e., the win-move / GAME query)
KRR@UP Seminar, Dec 6 2023
20. … answering the question!
20
The tricky bit!
Useful notion: Length of a position!
All you need is DRAW-FREE GAMEs!
(i.e., the win-move / GAME query,
… but draws can be detected and avoided!)
KRR@UP Seminar, Dec 6 2023
21. Games, Queries, Argumentation
Win-Move vs Argumentation Frameworks
% We understand this now:
• win(X) :- move(X,Y), not win(Y).
% This is the mother of AF rules:
• defeated(X) :- attacks(Y,X), not defeated(Y).
% But they are both equivalent to this:
• q(X) :- edge(X,Y), not q(Y).
• GAME: q = win edge = move
• AF: q = defeated edge = attacks-1 (= attacked_by)
KRR@UP Seminar, Dec 6 2023 21
27. Games, Queries, Argumentation
Harvesting Time: Not all edges are created equal!
• Notions from games
translate to AF via the
natural correspondence!
• Length of a position (i.e.,
argument)
• Type of an edge (not all
edges are created equal)
• winning, delaying,
drawing, bad
KRR@UP Seminar, Dec 6 2023 27
28. Games, Queries, Argumentation
Harvesting Time!
• Provenance of a position (i.e., argument)
• ... = Explanations of the labeling
• … can be computed via Regular Path Queries (RPQs):
• prov(X,Y):-
path(X, green(.red.green)*, Y)
KRR@UP Seminar, Dec 6 2023 28
• Question:
• What is the provenance of games?
• Answer:
• Solve the game (AF) and look!
• Provenance/Explanations for free!
34. Edge Types => New explanations for
Argumentation Frameworks ... !?
34
Applying this to AF (coming from GAME and provenance) seems new…
KRR@UP Seminar, Dec 6 2023
35. Games, Queries, Argumentation
Finally: Computing WFS with Clingo
• How do you compute the well-founded semantics with an ASP system?
• … should be easy …
KRR@UP Seminar, Dec 6 2023 35