This document discusses Real Time Semantic Data Warehousing (RETIS) technology provided by Sindice.com. RETIS allows pharmaceutical companies to integrate diverse public and private data sources in real-time to help data scientists discover new insights and connections. It provides unified search and browsing of live internal and external datasets. Sindice's semantic warehousing approach uses Linked Data clouds, semantic sandboxes, and cloud computing to easily integrate new databases with unprecedented flexibility and scale.
Introduction to semantic web. The first results in publication of library data into the semantic web at the National Széchényi Libary (National Library of Hungary)
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
Introduction to semantic web. The first results in publication of library data into the semantic web at the National Széchényi Libary (National Library of Hungary)
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
Create Linked Open Data (LOD) Microthesauri using Art & Architecture Thesaurus (AAT) LOD. View and manage options by a non-techy person. Everyone can use, create,
derive from, & map to AAT microthesauri and make the digital collection become LOD-ready dataset.
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and other online resources by leveraging open linked data principles. The library has been actively evaluating linked data’s potential to replace current library processes and services (bibliographic services, finding aids, cataloging, and metadata work) as a more efficient and sustainable means, and one that could bring greater benefit to end users for research and learning. The Library’s initial focus was on workforce education and hands-on learning through real-time experiments: the Connections project was begun to prepare staff to work with linked data, a process that has culminated in a 3-month hands-on pilot to build and convert some data. The pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, metadata librarians, and programmers. Emory’s “silos” of data were interlinked with other open data sources as a way to enhance user discovery and use of library materials on a very limited scale.
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
The slides of part one of the Metadata Provenance Tutorial (Linked Data Provenance). Part 2 is here: http://de.slideshare.net/MagnusPfeffer/metadata-provenance-tutorial-part-2-modelling-provenance-in-rdf
It's 2017, and I still want to sell you a graph databaseSwanand Pagnis
The aha!s and the oh-noe!s of over one year of building our product with a graph database, Neo4j, along with big brother PostgreSQL and hipster cousin Redis with Rails.
This talk will attempt to answer an important question, "when does using a graph database make sense?", through retrospection.
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
This paper presents Pundit, a novel semantic web annotation tool, and demonstrates its use in producing structured data out of users annotations. Pundit allows communities of scholars to produce machine-readable annotations that can be made public and thus consumable as web data via SPARQL and ad-hoc REST APIs.
Pundit is highly configurable and can deployed in custom instances to include well-defined and agreed annotation vocabularies. Such instances can be distributed as bookmaklets to community users so they can create uniformly structured data in a certain application scenario. Basing on the provided APIs, some demonstrative applications have been developed, exploring different use scenarios, ranging from philosophy to journalism and cultural heritage.
The main aim of this paper is to demonstrate how such uniformly structured annotations can be quickly re-used on the web to make information discoverable or to visualize it in interesting ways.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
RDF is a general method to decompose knowledge into small pieces, with some rules about the semantics or meaning of those pieces. The point is to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF.
Challenges and opportunities in library discovery services genrobin fay
A 2016 survey conducted by Simon Inger Consulting found that library web pages (i.e. search engines) are as important to many academics as abstracting and indexing sources. At the same time, library service platforms such as WMS and Alma have been widely adopted, but the “discovery of library-provided resources remains a complex issue with many unfulfilled expectations… and many challenges remain in improving discoverability” as noted by Marshall Breeding in his 2018 library systems report.
This short presentation was designed to highlight strengths and weaknesses of search discovery tool for libraries while identifying opportunities to improve the discoverability of our resources using the catalog.
Presentation & Discussion May 2018
This is part 2 of the ISWC 2009 tutorial on the GoodRelations ontology and RDFa for e-commerce on the Web of Linked Data.
See also
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
This is part 2 of the ISWC 2009 tutorial on the GoodRelations ontology and RDFa for e-commerce on the Web of Linked Data.
See also
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
Create Linked Open Data (LOD) Microthesauri using Art & Architecture Thesaurus (AAT) LOD. View and manage options by a non-techy person. Everyone can use, create,
derive from, & map to AAT microthesauri and make the digital collection become LOD-ready dataset.
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and other online resources by leveraging open linked data principles. The library has been actively evaluating linked data’s potential to replace current library processes and services (bibliographic services, finding aids, cataloging, and metadata work) as a more efficient and sustainable means, and one that could bring greater benefit to end users for research and learning. The Library’s initial focus was on workforce education and hands-on learning through real-time experiments: the Connections project was begun to prepare staff to work with linked data, a process that has culminated in a 3-month hands-on pilot to build and convert some data. The pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, metadata librarians, and programmers. Emory’s “silos” of data were interlinked with other open data sources as a way to enhance user discovery and use of library materials on a very limited scale.
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
The slides of part one of the Metadata Provenance Tutorial (Linked Data Provenance). Part 2 is here: http://de.slideshare.net/MagnusPfeffer/metadata-provenance-tutorial-part-2-modelling-provenance-in-rdf
It's 2017, and I still want to sell you a graph databaseSwanand Pagnis
The aha!s and the oh-noe!s of over one year of building our product with a graph database, Neo4j, along with big brother PostgreSQL and hipster cousin Redis with Rails.
This talk will attempt to answer an important question, "when does using a graph database make sense?", through retrospection.
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
This paper presents Pundit, a novel semantic web annotation tool, and demonstrates its use in producing structured data out of users annotations. Pundit allows communities of scholars to produce machine-readable annotations that can be made public and thus consumable as web data via SPARQL and ad-hoc REST APIs.
Pundit is highly configurable and can deployed in custom instances to include well-defined and agreed annotation vocabularies. Such instances can be distributed as bookmaklets to community users so they can create uniformly structured data in a certain application scenario. Basing on the provided APIs, some demonstrative applications have been developed, exploring different use scenarios, ranging from philosophy to journalism and cultural heritage.
The main aim of this paper is to demonstrate how such uniformly structured annotations can be quickly re-used on the web to make information discoverable or to visualize it in interesting ways.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
RDF is a general method to decompose knowledge into small pieces, with some rules about the semantics or meaning of those pieces. The point is to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF.
Challenges and opportunities in library discovery services genrobin fay
A 2016 survey conducted by Simon Inger Consulting found that library web pages (i.e. search engines) are as important to many academics as abstracting and indexing sources. At the same time, library service platforms such as WMS and Alma have been widely adopted, but the “discovery of library-provided resources remains a complex issue with many unfulfilled expectations… and many challenges remain in improving discoverability” as noted by Marshall Breeding in his 2018 library systems report.
This short presentation was designed to highlight strengths and weaknesses of search discovery tool for libraries while identifying opportunities to improve the discoverability of our resources using the catalog.
Presentation & Discussion May 2018
This is part 2 of the ISWC 2009 tutorial on the GoodRelations ontology and RDFa for e-commerce on the Web of Linked Data.
See also
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
This is part 2 of the ISWC 2009 tutorial on the GoodRelations ontology and RDFa for e-commerce on the Web of Linked Data.
See also
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
My talk at Barcamp Bangalore Spring 2014 on Redis. It talks about what Redis is and its API's. I also talk about its architecture and scaling it up. Also talking about how Adnear is taking advantage of this great tool.
RDFa: introduction, comparison with microdata and microformats and how to use itJose Luis Lopez Pino
Presentation for the course 'XML and Web Technologies' of the IT4BI Erasmus Mundus Master's Programme. Introduction, motivation, target domain, schema, attributes, comparing RDFa with RDF, comparing RDFa with Microformats, comparing RDFa with Microdata, how to use RDFa to improve websites, how to extract metadata defined with RDFa, GRDDL and a simple exercise.
A review of the state of the art in Machine Learning on the Semantic WebSimon Price
Paper presentation at UK Computation Intelligence workshop 2003, Bristol. This paper reviews the current state of the art of machine learning applied to the Semantic Web. It looks at the Semantic Web and its languages, including RDF and OWL, from a machine learning perspective. Trends in the Semantic Web are mentioned throughout and the relationship with Web Services is examined. Applications are discussed with recent examples and pointers to data sets. Finally, the emerging field of Semantic Web Mining is introduced.
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
Presented by Declan Fleming, Arwen Hutt, and Matt Critchlow. The second in a three part Webinar series on Research Data Curation at UC San Diego, as part of the larger Research Cyberinfrastructure initiative.
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
1. Real Time Semantic
Warehousing: Sindice.com
technology for the enterprise
Giovanni Tummarello, Ph.D
Data Intensive Infrastructure UNIT -
DERI.ie
CEO SindiceTech
2. How we started : Sindice.com
80 Billions triple, 500,000,000 RDF Graphs, 5 TB of data.
The Sindice Suite powers Sindice.com. Online with 99,9%+
3. Semantic Sandboxes on: Sindice.com
Data Sandboxes in Sindice.com – Powered by CloudSpaces
4. And then we met people asking
can you do it for us
5. Example story (Pharmaceutical company0
To stay competitive, Pharmaceutical companies need to leverage all the data available from
inside sources as well as from the increasingly many public HCLS data sources available. Due to
the diversity of this data with respect to nature, formats, quality, there are complex integration
issues. Traditional data warehousing technology require big upfront thinking and is handled
within a company in the “go via the IT department” approach. This does not meet the need of
data scientists who are the only ones that can do the complex cross-use case thinking required.
Via Real Time Semantic Data Warehousing (RETIS) data scientist expect to get:
• The ability to speed up “In silico” scientific workflows (interrelation of diverse large
datasets) by orders of magnitude by relying on a data warehousing approach.
• The ability to create large scale “data maps” or “aggregated views” which would allow
researchers to see “trends” and gather insights at high level which would not be possible by
data accessed via single lookups.
• The ability to receive recommendations and suggestions for new data connections based on
an ever evolving ecosystem of available experimental datasets.
• Provide their R&D departments with superior tools for investigating their internal
knowledge; search engines and data browsing tools which provide unified views of multiple,
evolving, live datasets without leakage of specific “queries” to the outside world which would
reveal internal research trends
• The ability to leverage the ever increasing body of public, crowd curated open data
5 of 16
6. Linked Data clouds for the Enterprise
– Strategic knowledge spaces, where new
databases can be added and “leveraged” with an
unprecedented ease
– Integration “Pay as you go” : explore now, fine
tune later.
– Its BigData (Cluster+Clouds) meets RDF and
Semantic Technologies
9. A Dataspace Template
Semantic Web
A typical implementation template.
Data
Dataspaces own:
• Resources
• Services
• Datasets for others to reuse
10. Dataspace Composition
Scalable cascading semantic ‘Dataspaces”
• Resources allocated in public/private clouds
• Allow to get Sindice Data and mix it/ process it for private purposes
10 of 16
12. Scale is only 1 dimension
Multiple dimensions of WeD data integration
• RDF tool stack flexibility
• Cluster scalable processing scalability
• “Cloud” Pipelines dynamicity
13. Full Json Like Search.
On Solr.
All operators supported.
14. What is SIREn ?
• Plugin to Solr
• Built for searching and operating on
semistructured data and relational
datastructures
15. SIREn: Semantic IR Engine
• Extension to Enterprise Search Engine Solr
• Semantic, full-text, incremental updates,
distributed search
Semantic
SIREn
Databases
Constant time
16. Limitations of Apache Solr
• Not efficient with highly heterogeneous
structured data sources
– Limitation on the number of attributes:
Dictionary size explosion
18. Dictionary Size Explosion
Dictionary
label:renaud
Record 1
label Renaud Delbru label:delbru
name Renaud Delbru name:renaud
name:delbru
Dictionary construction
Concatenation of attribute name and term
N * M complexity (worst case)
2 attributes * 2 terms = 4 dictionary entries
100K attributes * 1B terms = 100B entries
19. Limitations of Apache Solr
• Not efficient with highly heterogeneous
structured data sources
– Limitation on the number of attributes:
Dictionary size explosion
Query clause explosion when searching across all
attributes
20. Limitations of Apache Solr
• Not efficient with highly heterogeneous
structured data sources
– Limitation on the number of attributes:
Dictionary size explosion
Query clause explosion when searching across all
attributes
• Limited support for structured query
– Multi-valued attributes
21. Multi-valued attributes
• No support in Solr for "all words must match
in the same value of a multi-valued field".
• A field value is a bag of words
– No distinction between multiple values
Record 1 Record 2
label man's best pooch label man's worst friend to no one
friend enemy
22. Multi-valued attributes
• No support in Solr for "all words must match
in the same value of a multi-valued field".
• A field value is a bag of words
– No distinction between multiple values
• Query example
– label : man’s friend
– Solr returns Record 1 & 2 as results
Record 1 Record 2
label man's best friend pooch label man's worst enemy friend to no one
23. Limitations of Apache Solr
• Not efficient with highly heterogeneous
structured data sources
– Limitation on the number of attributes:
Dictionary size explosion
Query clause explosion when searching across all
attributes
• Limited support for structured query
– Multi-valued attributes
– No full-text search on attribute names
24. Full-text search on attribute names
• No support in Solr for “keyword search in
attribute names".
• Query example
– (name OR label) = “Renaud Delbru”
– Solr is unable to find the records without the exact
attribute name
Record 1 Record 2
rdfs:label Renaud Delbru foaf:name Renaud Delbru
Record 3 Record 4
sioc:name Renaud Delbru full_name Renaud Delbru
25. Limitations of Apache Solr
• Not efficient with highly heterogeneous
structured data sources
– Limitation on the number of attributes:
Dictionary size explosion
Query clause explosion when searching across all
attributes
• Limited support for structured query
– Multi-valued attributes
– No full-text search on attribute names
– No 1:N relationship materialisation
29. Introducing large scale RDF ‘Summaries”
We do it for:
• Data exploration
– How to find datasets about movies ?
• Assisted SPARQL Query Editor
– What is the data structure ?
• Dataset Quality
– How to differentiate relevant form irrelevant
dataset ?
30. Large Scale RDF summaries
Class Level
12M relationships
10B relationships
34. Thank you
Sindice.com team April 2012
With the contribution of
Editor's Notes
Search record (instead of entity)Record-centric indexing model
Use Case: Let’s index the entire web of dataDoc/s, lucene in action, uptime, etc.
How important a dataset is to my information need ?How to help users to browse and filter irrelevant datasets ?How can I measure the quality of a dataset ? Data quality, objective measuresTwo datasets can overlap, provide similar information, but one dataset is providing more fresh information, is updated more frequently.Concrete scenarios to test such assumptionsData Quality can be also useful for improving data acquisition, optimising resources to retrieve only top quality data
- Define “relationships” when introducing the graph, BEFORE talking about the numbers
Number of entities per classNumber of relations of a certain predicateOther metadata can be added to a class, e.g., other predicates used with the entities of that class