This document discusses improving scholarly communication through knowledge graphs. It describes some current issues with scholarly communication like lack of structure, integration, and machine-readability. Knowledge graphs are proposed as a solution to represent scholarly concepts, publications, and data in a structured and linked manner. This would help address issues like reproducibility, duplication, and enable new ways of exploring and querying scholarly knowledge. The document outlines a ScienceGRAPH approach using cognitive knowledge graphs to represent scholarly knowledge at different levels of granularity and allow for intuitive exploration and question answering over semantic representations.
Towards an Open Research Knowledge GraphSören Auer
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
Dan Brickley, 3rd European Commission Metadata Workshop, Luxemburg, April 12th 1999
Understanding RDF: the Resource Description Framework in Context
http://ilrt.org/discovery/2001/01/understanding-rdf/
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Towards an Open Research Knowledge GraphSören Auer
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
Dan Brickley, 3rd European Commission Metadata Workshop, Luxemburg, April 12th 1999
Understanding RDF: the Resource Description Framework in Context
http://ilrt.org/discovery/2001/01/understanding-rdf/
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
We are in the era of graphs. Graphs are hot. Why? Flexibility is one strong driver: Heterogeneous data, integrating new data sources, and analytics all require flexibility. Graphs deliver it in spades.
Over the last few years, a number of new graph databases came to market. As we start the next decade, dare we say “the semantic twenties,” we also see vendors that never before mentioned graphs starting to position their products and solutions as graphs or graph-based.
Graph databases are one thing, but “Knowledge Graphs” are an even hotter topic. We are often asked to explain Knowledge Graphs.
Today, there are two main graph data models:
• Property Graphs (also known as Labeled Property Graphs)
• RDF Graphs (Resource Description Framework) aka Knowledge Graphs
Other graph data models are possible as well, but over 90 percent of the implementations use one of these two models. In this webinar, we will cover the following:
I. A brief overview of each of the two main graph models noted above
II. Differences in Terminology and Capabilities of these models
III. Strengths and Limitations of each approach
IV. Why Knowledge Graphs provide a strong foundation for Enterprise Data Governance and Metadata Management
Ontology for Knowledge and Data Strategies.pptxMike Bennett
Ontology suffers from an adoption problem. If we are to describe the benefits of ontologies and knowledge graphs, we need to demonstrate how these can contribute to the business. That means addressing the knowledge and data management strategies of the organization.
A knowledge management strategy addresses a range of concerns, including terminology, business semantics, data provenance and quality, information availability and a rigorous treatment of context. Ontology is just one tool among many in the overall strategy for managing knowledge assets and their use.
In this seminar we will unpack the components of an organizational knowledge strategy and show in terms that both business and IT can understand, how different types of ontology fit in to the firm’s wider data management and knowledge strategies, alongside a range of other tools and techniques.
Attendees do not need any prior knowledge of ontology, knowledge graphs or semantic technology, but should ideally have an appreciation of data and knowledge management issues.
Mike Bennett's presentation on Ontology for Knowledge and Data Strategies, as presented at University of Westminster in December 2022.
This covers how ontologies may be used as part of a broader business strategy for knowledge and data management, including how different styles of ontology are needed or different parts of such a strategy.
Build an application upon Semantic Web models. Brief overview of Apache Jena and OWL-API.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
Knowledge Graph Research and Innovation ChallengesSören Auer
Gives an overview on some challenges regarding the combination of machine-learning and knowledge graph technologies and the vision of devising a concept of Cognitive Knowledge Graphs consisting of graphlets instead of mere entity descriptions.
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
We are in the era of graphs. Graphs are hot. Why? Flexibility is one strong driver: Heterogeneous data, integrating new data sources, and analytics all require flexibility. Graphs deliver it in spades.
Over the last few years, a number of new graph databases came to market. As we start the next decade, dare we say “the semantic twenties,” we also see vendors that never before mentioned graphs starting to position their products and solutions as graphs or graph-based.
Graph databases are one thing, but “Knowledge Graphs” are an even hotter topic. We are often asked to explain Knowledge Graphs.
Today, there are two main graph data models:
• Property Graphs (also known as Labeled Property Graphs)
• RDF Graphs (Resource Description Framework) aka Knowledge Graphs
Other graph data models are possible as well, but over 90 percent of the implementations use one of these two models. In this webinar, we will cover the following:
I. A brief overview of each of the two main graph models noted above
II. Differences in Terminology and Capabilities of these models
III. Strengths and Limitations of each approach
IV. Why Knowledge Graphs provide a strong foundation for Enterprise Data Governance and Metadata Management
Ontology for Knowledge and Data Strategies.pptxMike Bennett
Ontology suffers from an adoption problem. If we are to describe the benefits of ontologies and knowledge graphs, we need to demonstrate how these can contribute to the business. That means addressing the knowledge and data management strategies of the organization.
A knowledge management strategy addresses a range of concerns, including terminology, business semantics, data provenance and quality, information availability and a rigorous treatment of context. Ontology is just one tool among many in the overall strategy for managing knowledge assets and their use.
In this seminar we will unpack the components of an organizational knowledge strategy and show in terms that both business and IT can understand, how different types of ontology fit in to the firm’s wider data management and knowledge strategies, alongside a range of other tools and techniques.
Attendees do not need any prior knowledge of ontology, knowledge graphs or semantic technology, but should ideally have an appreciation of data and knowledge management issues.
Mike Bennett's presentation on Ontology for Knowledge and Data Strategies, as presented at University of Westminster in December 2022.
This covers how ontologies may be used as part of a broader business strategy for knowledge and data management, including how different styles of ontology are needed or different parts of such a strategy.
Build an application upon Semantic Web models. Brief overview of Apache Jena and OWL-API.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
Knowledge Graph Research and Innovation ChallengesSören Auer
Gives an overview on some challenges regarding the combination of machine-learning and knowledge graph technologies and the vision of devising a concept of Cognitive Knowledge Graphs consisting of graphlets instead of mere entity descriptions.
Westminster Higher Education Forum policy conference Open research data in the UK: https://www.westminsterforumprojects.co.uk/conference/open-research-data-20
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
IUI 2010: An Informal Summary of the International Conference on Intelligent ...J S
Highlights from the main track, poster/demo-session & the VISSW/UDISW/EGIHMI workshops. This is an informal compilation of personal notes from the conference & proceedings, twitter (#iui2010), Ian Ozsvald's blog (http://ianozsvald.com/), and other sources. Citations were not coherently possible, so I chose to stick with links instead. Please let me know if you'd like to see your work more thoroughly referenced.
Collections meet the researcher. Digitalization, disintegration and disillusi...Jessica Parland-von Essen
Presentation at the LAM3 seminar in Uppsala, 9th of October 2019. On digitalization, researchers and data in the context of cultural heritage collections. The slides mostly contain headings, but the two last slides include a list of relevant reading on the subject.
Drowning in information – the need of macroscopes for research fundingAndrea Scharnhorst
Andrea Scharnhorst (2015) Drowning in information – the need of macroscopes for research funding. Presentation at the international conference: PLANNING, PREDICTION, SCENARIOS - Using Simulations and Maps - 2015 Annual EA Conference - 11–12 May 2015 Bonn
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
Research software is a key asset for understanding, reusing and reproducing results in computational sciences. An increasing amount of software is stored in code repositories, which usually contain human readable instructions indicating how to use it and set it up. However, developers and researchers often need to spend a significant amount of time to understand how to invoke a software component, prepare data in the required format, and use it in combination with other software. In addition, this time investment makes it challenging to discover and compare software with similar functionality. In this talk I will describe our efforts to address these issues by creating and using Open Knowledge Graphs that describe research software in a machine readable manner. Our work includes: 1) an ontology that extends schema.org and codemeta, designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework for automatically extracting metadata from software repositories; and 4) a framework to curate, query, explore and compare research software metadata in a collaborative manner. The talk will illustrate our approach with real-world examples, including a domain application for inspecting and discovering hydrology, agriculture, and economic software models; and the results of our framework when enriching the research software entries in Zenodo.org.
The slides that will accompany my live webcast for OpenCon 2014 attendees, all about open data in research. The benefits, the how to (both legally & technically), examples, pitfalls, and the future of open research data.
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...DataScienceConferenc1
Data science is not only about numbers and how to crunch them; it is also about how to communicate project results with the various audience. Scientific journals and conferences are an excellent venue for getting a wider audience reach and gathering valuable comments. The talk will answer the questions: How to structure a scientific paper in data science? What are relevant venues for showcasing your work to gain the most relevant reach? To demystify the process of scientific writing, the case study will be presented: Messy process: Story of the birth of one data science paper.
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
This presentation gives a brief overview on achievements and challenges of the Data Web and describes different aspects of using the Semantic Data Wiki OntoWiki for Linked Data management.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
In silico drugs analogue design: novobiocin analogues.pptx
Towards Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communications
1. Prof. Dr. Sören Auer
Faculty of Electrical Engineering & Computer Science
Leibniz University of Hannover
TIB Technische Informationsbibliothek
Towards Knowledge Graph based
Representation, Augmentation and
Exploration of Scholarly Communications
2. Page 2
Zuse Z3: the
beginning of
Computing –
close to the
hardware
Foto: Konrad Zuse
Internet
Archiv/Deutsches
Museum/DFG
4. Page 4
We can make things
more intuitive
Picture: The illustrated recipes
of lucy eldridge
http://thefoxisblack.com/2013/0
7/18/the-illustrated-recipes-of-
lucy-eldridge/
10. Page 10
Linked Data Principles
Addressing the neglected third V (Variety)
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can look them up
on the web
3. When a URI is looked up, return a description of the thing in
the W3C Resource Description Format (RDF)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
11. Page 11
1. Graph based RDF data model consisting of S-P-O statements (facts)
2. Serialised as RDF Triples:
Et-Inf conf:organizes Antrittsvorlesung2019 .
Antrittsvorlesung2019 conf:starts “2019-20-07”^^xsd:date .
Antrittsvorlesung2019 conf:takesPlaceAt dbpedia:Hannover .
3. Publication under URL in Web, Intranet, Extranet
RDF & Linked Data in a Nutshell
Antritts-
vorlesung2019
dbpedia:Hannover
20.05.2019
Et-Inf
conf:organizes conf:starts
conf:takesPlaceInSubject Predicate Object
12. Page 12
Creating Knowledge Graphs with RDF
Linked Data
DHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
??
located in
label
industry
headquarters
height
label
full name
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
13. Page 13
Fabric of concept, class, property, relationships, entity desc.
Uses a knowledge representation formalism (RDF, OWL)
Holistic knowledge (multi-domain, source, granularity):
instance data (ground truth),
open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models),
derived, aggregated data,
schema data (vocabularies, ontologies)
meta-data (e.g. provenance, versioning, documentation licensing)
comprehensive taxonomies to categorize entities
links between internal and external data
mappings to data stored in other systems and databases
Knowledge Graphs – A definition
15. WDAqua project vision
● Answer natural language
questions
● Exploit knowledge encoded in
the Web of Data
● Provide QA services to
citizens, communities, and
industry
15
Q
A
Web of Data
17. Who is the director of
Clockwork Orange?
17
Understand a
spoken question
18. Who is the director of
Clockwork Orange?
18
Understand a
spoken question
Analyse
question
19. Who is the director of
Clockwork Orange?
19
Understand a
spoken question
Analyse
question
Find data to
answer the
question
20. Who is the director of
Clockwork Orange?
20
Understand a
spoken question
Analyse
question
Find data to
answer the
question
Present the
answer
21. Who is the director of
Clockwork Orange?
21
Understand a
spoken question
Analyse
question
Find data to
answer the
question
Present the
answer
Data
source:
22. 22
Which publications and
health reports are
related to Alzheimer in
Greece?
Understand a
spoken question
Analyse
question
Find data to
answer the
question
Present the
answer
23. 23
Which publications and
health reports are
related to Alzheimer in
Greece?
Understand a
spoken question
Analyse
question
Find data to
answer the
question
Present the
answer
Data
sources
:
24. WDAquaQAarchitecture 24
Data management layer
Data layer
Query
decomposition
Data source
selection
Query
execution
Benchmarkin
g
Profiling
Data
qualityData generation
QA pipeline
configurator
Service
repository
Monitoring RESTful API Versioning
Message
dispatcher
Voice to text NL to SPARQLDisambiguator Rel. extraction
UIAnswer
generation
25. 25
Who is the director of
Clockwork Orange?
Understand a
spoken question
Analyse
question
Find data to
answer the
question
Present the
answer
Demo:
http://wdaqua.eu/qa
26. Page 26
How did information flows
change in the digital era?
34. Page 34
New means adapted to the new posibilities were developed,
e.g. „zooming“, dynamics
Business models changed completely
More focus on data, interlinking of data / services and search in the data
Integration, crowdsourcing play an important role
The World of Publishing &
Communication has profundely changed
39. Page 39
WE HAVE
BUT
Mainly based on PDF
Is only partially machine-readable
Does not preserve structure
Does not allow embedding of semantics
Does not facilitate interactivity / dynamicity / repurposing
…
Scientific publishing today
Source: https://www.researchgate.net/publication/264412537_AGDISTIS_-_Graph-Based_Disambiguation_of_Named_Entities_using_Linked_Data
40. Page 40
Scholarly Communication has not changed (much)
17th century 19th century 20th century 21th century
Meanwhile other information intense domains were completely disrupted:
mail order catalogs, street maps, phone books, …
41. Page 41
Challenges we are facing:
We need to rethink the way how research
is represented and communicated
[1] http://thecostofknowledge.com, https://www.projekt-deal.de
[2] M. Baker: 1,500 scientists lift the lid on reproducibility, Nature, 2016.
[3] Science and Engineering Publication Output Trends, National Science Foundation, 2018.
[4] J. Couzin-Frankel: Secretive and Subjective, Peer Review Proves Resistant to Study. Science, 2013.
Digitalisation
of Science
Data integration
and analysis
Digital
collaboration
Monopolisation by
commercial actors
Publisher
look-in effects
Maximization
of profits [1]
Reproducibility
Crisis
Majority of
experiments are
hard or not
reproducible [2]
Proliferation
of publications
Publication output
doubled within a
decade
continues to rise [3]
Deficiency
of Peer Review
Deteriorating
quality [4]
Predatory
publishing
42. Page 42
Science and engineering articles by region, country: 2004 and 2014
Proliferation of scientific literature
Source: National Science Foundation: Science and Engineering Publication Output Trends: https://www.nsf.gov/statistics/2018/nsf18300/nsf18300.pdf
44. Page 44
How can we avoid duplication if the terminology, research problems, approaches, methods,
characteristics, evaluations, … are not properly defined and identified?
How would you build an engine / building without properly defining their parts, relationships,
materials, characteristics …?
Duplication and Inefficiency
Source: https://thumbs.worthpoint.com/zoom/images2/1/0316/22/revell-
4-visible-8-engine-plastic_1_d2162f52c3fa3a6f72d2722f6c50b7b2.jpg
Source: http://xnewlook.com/cad-and-revit-3d-design.html/bill-ferguson-portfolio-computer-graphics-games-cad-related-3d-
models-cad-and-revit-design
45. Page 45
Lack of…
Root Cause –
Deficiency of Scholarly Communication?
Transparency
information is hidden
in text
Integratability
fitting different
research results
together
Machine assistance
unstructured content
is hard to process
Identifyability
of concepts beyond
metadata
Collaboration
one brain barrier
Overview
Schientists look for
the needle in the
haystack
47. Page 47
How good is CRISPR
(wrt. precision, safety, cost)?
What specifics has genome
editing with insects?
Who has applied it to
butterflies?
Search for CRISPR:
> 163.000 Results
Source: https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=CRISPR&btnG=, 04.2019
52. Page 52
1. Original Publication
Chemistry Example: Populating the Graph
2. Adaptive Graph Curation & Completion
Author Robert Reed
Research Problem Genome editing in Lepidoptera
Methods CRISPR / cas9
Applied on Lepidoptera
Experimental Data https://doi.org/10.5281/zenodo.896916
3. Graph representation
CRISPR / cas9 editing
in Lepidoptera
https://doi.org/10.1101/130344
Robert Reed
https://orcid.org/0000-0002-6065-6728
Genome editing in
Lepidoptera
Experimental Data
https://doi.org/10.5281/zenodo.896916
adresses
CRSPRS/cas9isEvaluatedWith
Genome editing
https://www.wikidata.org/wiki/Q24630389
53. Page 53
KGs are proven to capture factual knowledge [1]
Research Challenge: Manage
• Uncertainty & disagreement
• Varying semantic granularity
• Emergence, evolution & provenance
• Integrating existing domain models
But maintain flexibility and simplicity
Cognitive Knowledge Graphs
for scholarly knowledge
ScienceGRAPH approach:
Cognitive Knowledge Graphs
• Fabric of knowledge molecules – compact,
relatively simple, structured units of knowledge
• Can be incrementally enriched, annotated, interlinked …
[1] S Auer et al.: DBpedia: A nucleus for a web of open data. 6th Int. Semantic Web Conf. (ISWC) – 10-year best paper award.
cf. also knowledge graphs from: WikiData, BBC, Google, Bing, Thomson Reuters, AirBnB, BNY Mellon …
54. Page 54
Factual
Base entities Real world
Granularity Atomic Entities
Evolution
Addition/deletion
of facts
Collaboration Fact enrichment
From Factual Knowledge Graphs
Today
55. Page 55
Factual Cognitive
Base entities Real world Conceptual
Granularity Atomic Entities
Interlinked descriptions (molecules)
with annotations (provenance)
Evolution
Addition/deletion
of facts
Concept drift,
varying aggregation levels
Collaboration Fact enrichment Emergent semantics
From Factual to Cognitive Knowledge Graphs
Today ScienceGRAPH
56. Page 56
Research Challenge:
• Intuitive exploration leveraging the
rich semantic representations
• Answer natural language questions
Exploration and Question Answering
ScienceGRAPH Approach:
• KG-based QA component integration for dynamic and
automated composition of QA pipelines for cognitive
knowledge graphs (e.g. following [1])
• Round-trip refinement and integration of search,
faceted exploration, question answering and
conversational interfaces
Question
parsing
Named
Entity
Recognition
(NER) &
Linking
(NEL)
Relation
extraction
Query
con-
struction
Query
execution
Result
rendering
Q: How do different
genome editing techniques compare?
SELECT Approach, Feature WHERE {
Approach adresses GenomEditing .
Approach hasFeature Feature }
[1] K. Singh, S. Auer et al: Why Reinvent
the Wheel? Let's Build Question
Answering Systems Together. The Web
Conference (WWW 2018).
Q: How do different
genome editing techniques compare?
57. Page 57
Engineered Nucleases Site-specificity Safety Ease-of-use / costs/ speed
zinc finger nucleases (ZFN) ++
9-18nt
+ --
$$$: screening, testing to define efficiency
transcription activator-like
effector nucleases (TALENs)
+++
9-16nt
++ ++
Easy to engineer
1 week / few hundred dollar
engineered meganucleases +++
12-40 nt
0 --
$$$ Protein engineering, high-throughput
screening
CRISPR system/cas9 ++
5-12 nt
- +++
Easy to engineer
few days / less 200 dollar
Result:
Automatic Generation of Comparisons / Surveys
Q: How do different genome editing techniques compare?
87. The authorsThe research contribution
The research result
The paperA continuous
variable value
88. Page 88
More projects
Stay tuned
https://tib.eu
Mailinglist/group:
https://groups.google.com/forum/#!forum/orkg
Open Research Knowledge Graph:
https://orkg.org
ERC Consolidator Grant ScienceGRAPH
started in May
Transfer event on International Data Space on
June 19:
https://events.tib.eu/transfer/
89. Page 89
The Team
Prof. (Univ. S. Bolivar)
Dr. Maria Esther Vidal
Software Development
Kemele Endris Farah Karim
Collaborators TIB/L3S Scientific Data Management
Group Leaders PostDocs
Project Management
Doctoral Researchers
Dr. Markus Stocker Dr. Gábor Kismihók Dr. Javad Chamanara Dr. Jennifer D’Souza
Olga Lezhnina Allard Oelen Yaser Jaradeh Shereif Eid
Manuel Prinz
Alex Garatzogianni Laura Granzow
Collaborators InfAI Leipzig / AKSW
Dr. Michael Martin Natanael Arndt
Sarven Capadisli Vitalis Wiens
Wazed Ali
90. Contact
Prof. Dr. Sören Auer
TIB & Leibniz University of Hannover
auer@tib.eu
https://de.linkedin.com/in/soerenauer
https://twitter.com/soerenauer
https://www.xing.com/profile/Soeren_Auer
http://www.researchgate.net/profile/Soeren_Auer
Editor's Notes
Die Z3 war der erste funktionsfähige Digitalrechner weltweit und wurde 1941 von Konrad Zuse in Zusammenarbeit mit Helmut Schreyer in Berlin gebaut. Die Z3 wurde in elektromagnetischer Relaistechnik mit 600 Relais für das Rechenwerk und 1400 Relais für das Speicherwerk ausgeführt.
Longquan stoneware incense burner, China, 12th-13th century AD. Part of the Percival David Collection of Chinese Ceramics.
Kemele M. Endris, Mikhail Galkin, Ioanna Lytra, Mohamed Nadjib Mami, Maria-Esther Vidal, Sören Auer:MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates. DEXA (1) 2017: 3-18
D. Diefenbach, K. Singh, A. Both, D. Cherix, C. Lange, S. Auer. 2017. The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines. Int. Conf. on Web Engineering ICWE 2017.
K. Singh, A. Sethupat, A. Both, S. Shekarpour, I. Lytra, R. Usbeck, A. Vyas, A. Khikmatullaev, D. Punjani, C. Lange, M.-E. Vidal, J. Lehmann, S. Auer: Why Reinvent the Wheel-Let's Build Question Answering Systems Together. The Web Conference (WWW 2018).
S. Shekarpour, E. Marx, S. Auer, A. P. Sheth: RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem. AAAI 2017: 3936-3943
D. Lukovnikov, A. Fischer, J. Lehmann, S. Auer: Neural Network-based Question Answering over Knowledge Graphs on Word and Character Level. WWW 2017: 1211-1220
We reproduce a statistical hypothesis test published as a result in this paper, namely https://doi.org/10.1093/eurheartj/ehw333
We represent this result in a machine readable form following the concept description for a kind of statistical hypothesis test of the statistical methods ontology (STATO), namely http://purl.obolibrary.org/obo/STATO_0000304
We store the representation in the ORKG database
Specifically, we replicate, describe in machine readable form, and store in the ORKG database the statistical hypothesis test result highlighted and shown here in human readable form
Note that the relevant information is presented in multiple modalities, both text and images, and none of them is easily read and interpreted by machines
In particular, the relevant data is presented as plot in Figure 1 B (image)
Furthermore, the kind of statistical hypothesis test performed, the fact that IRE binding activity is the dependent variable, and the p-value are all implicit information.
We conduct the statistical hypothesis test in Jupyter using Python
We have IRE binding activity data for two groups, called non-failing heart (i.e., healthy individuals) and failing heart (i.e., patients)
We compute a t-test and obtain a p-value
This is the classical workflow a typical researcher would do using SPSS or similar statistical computing environment
However, in contrast to the classical workflow, here we represent and store a machine readable description of the statistical hypothesis test (one that includes the input data, the output p-value, the dependent variable, and the kind of statistical hypothesis test used) in the ORKG
Later, when the paper and its results are published we will be able to relate to this result in the overall research contribution description
Let’s look at how this is done using the ORKG User Interface
We add a paper by DOI lookup or alternatively manually (e.g., if a DOI is not available)
The bibliographic metadata about the paper (title, authors, etc.) is automatically fetched from Crossref and displayed in the user interface
Users can then classify the paper according to research field
More interesting is the possibility to describe the research contributions this paper makes
First, researchers and provide a research problem description
Next, the researcher can further describe the contribution
Here we show how to link to the statistical hypothesis test result obtained earlier in data analysis and published in the paper as a result of this research contribution
We say that research contributions “yield” research results; hence, the “Yields” attribute shown here
The machine readable result has a human readable label which is shown in the user interface by simply typing some included words, here “IRE”
The user can select the correct result and save it
The research contribution can be further described, e.g. the approach used
The paper may make further contributions, which can be described as well
For the purpose here, we skip this and move to the next step
That’s it, the paper description, its research contribution, addressed problem and one result are added
Note that we did not describe the research result, the statistical hypothesis test conducted earlier. We just linked to it!
Let’s look at the paper
The paper can be browsed by research field and is shown as recently added
It can be selected here
Here we see the details, in addition to bibliographic metadata the research contributions of this paper
For the research contribution we just described, we see the problem and we can now inspect the yielded research result
In addition to a human readable label, the statistical hypothesis test description has a specific type, has three inputs and an output, namely the p-value
Let’s look at the output
The output is indeed typed as a p-value, a concept of the Ontology for Biomedical Investigations (i.e., http://purl.obolibrary.org/obo/OBI_0000175)
It has a value specification, namely the specific value computed earlier in data analysis
We can take a look at the value by expanding the value specification
Here it is, the specified numeric value of the computed p-value typed as a scalar value specification, another term of the Ontology for Biomedical Investigations (i.e., http://purl.obolibrary.org/obo/OBI_0001931)
Now, let’s go back to our research result description and take a look at the three specified inputs of the statistical hypothesis test
Here they are: The study design dependent variable and the two continuous variables for failing and non-failing hearts
Let’s take a quick look at what was the study design dependent variable
Where, it was Iron-Responsive Element (IRE) binding
Which is a term of the Gene Ontology (i.e., http://purl.obolibrary.org/obo/GO_0030350)
Finally, let’s take a look at the non-failing heart continuous variable used as specified input in the statistical hypothesis test
Each continuous variable (a term of the statistical methods ontology) has parts, namely scalar measurement data
These are the actual data values and we can explore them
Here is an example, the numeric value 105.0 of the value specification #4 in the non-failing heart continuous variable
Here is an example, the numeric value 105.0 of the value specification #4 in the non-failing heart continuous variable