Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Parallel Datalog Reasoning in RDFox PresentationDBOnto
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Parallel Datalog Reasoning in RDFox PresentationDBOnto
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)Dan Starr
Summarizing work porting our T.C.P. Noisification software on Yahoo's M45 Hadoop cluster. Discusses metrics, dataflow of the Hadoop based Noisification Pipeline.
Slide presentation discussing linkages between interprofessional education and health literacy, inspired by the All Together Better Health Conference in Kobe, Japan, Oct. 5-8, 2012
Presented in Second Life on November 12, 2012
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)Dan Starr
Summarizing work porting our T.C.P. Noisification software on Yahoo's M45 Hadoop cluster. Discusses metrics, dataflow of the Hadoop based Noisification Pipeline.
Slide presentation discussing linkages between interprofessional education and health literacy, inspired by the All Together Better Health Conference in Kobe, Japan, Oct. 5-8, 2012
Presented in Second Life on November 12, 2012
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
PubChem QC project. In this project we calculate all molecules in the PubChem Project. Currently 1,100,000 molecules are available at http://pubchemqc.riken.jp/ . Results are in public domain.
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
ACL 2018에 다녀오신 NAVER 개발자 분들께서 그 내용을 공유해 주십니다.
1. Overview - Lucy Park
2. Tutorials - Xiaodong Gu
3. Main conference
a. Semantic parsing - Soonmin Bae
b. Dialogue - Kyungduk Kim
c. Machine translation - Zae Myung Kim
d. Summarization - Hye-Jin Min
4. Workshops - Minjoon Seo
We explain various kinds of bad memory utilization patterns in Java applications, present a tool to efficiently detect them, and give a number of common solutions to these problems.
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks
Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner. We’ll survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This package simplifies Deep Learning in three major ways:
1. It has a simple API that integrates well with enterprise Machine Learning pipelines.
2. It automatically scales out common Deep Learning patterns, thanks to Apache Spark.
3. It enables exposing Deep Learning models through the familiar Spark APIs, such as MLlib and Spark SQL.
In this talk, we will look at a complex problem of image classification, using Deep Learning and Spark. Using Deep Learning Pipelines, we will show:
how to build deep learning models in a few lines of code;
how to scale common tasks like transfer learning and prediction; and how to publish models in Spark SQL.
1. Querying
Cultural Heritage Data
Dr. Barry Norton,
Development Manager, ResearchSpace*
* Funded by the Andrew W. Mellon Foundation
* Hosted by the Curatorial Directorate, British Museum
2. Statements and Patterns
• For one edge in a graph:
crm:P52_has_current_owner
bm-obj:EOC3130
bm-id:the-british-museum
3. Statements and Patterns
• For one edge in a graph:
crm:P52_has_current_owner
bm-obj:EOC3130
bm-id:the-british-museum
• We can declare/retrieve one (N)Triple:
4. Statements and Patterns
• For one edge in a graph:
crm:P52_has_current_owner
bm-obj:EOC3130
bm-id:the-british-museum
• We can declare/retrieve one (N)Triple:
• Or write this in Turtle:
@prefix crm: <http://erlangen-crm.org/current/> .
@prefix bm-obj: <http://collection.britishmuseum.org/id/object/> .
@prefix bm-id: <http://collection.britishmuseum.org/id/> .
bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum .
5. Statements and Patterns
• For one edge in a graph:
crm:P52_has_current_owner
bm-obj:EOC3130
bm-id:the-british-museum
• We can write this in Turtle:
• And check for it in SPARQL:
bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum .
PREFIX crm: <http://erlangen-crm.org/current/>
PREFIX bm-obj: <http://collection.britishmuseum.org/id/object/>
PREFIX bm-id: <http://collection.britishmuseum.org/id/>
ASK {bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum}
true
6. Statements and Patterns
• For a set of edges:
bm-obj:EOC3130
bm-id:the-british-museum
?
crm:P51_has_former_or_current_owner
?
• We can do the work on the client:
• Or have the server do it by turning the
triple into a triple pattern:
bm-obj:EOC3130 crm:P51_has_former_or_current_owner ?owner
7. Exercise
?
Questions:
• Why is the answer different?
• Who are the two (other) one-time owners?
?
8. Solutions & Exercises
• Why is the answer different?
– Reasoning, part of the work by the server
(being a triplestore) means that if two things
are related by crm:P52_has_current_owner
then they’’re related by
crm:P51_has_former_or_current_owner
• This is part of the work that the server
(triplestore) can do for you
• Exercise: query for the (strictly) former
owners… ?
?
11. Solutions & Exercises
Who are the two (other) one-time owners?
• Since people and institutions (and places) are
?
?
treated as are concepts, the names of the former
owners are attached using skos:prefLabel
• Exercise: if you didn’t already, include the
names in your query results
12. Solutions & Exercises
If you didn’t already, include the names in
your query results:
Question:
Why are we back at two answers?
13. Answer
• Answer:
– Just as we can add triples together to make a
graph in RDF, so we can add triple patterns
together in SPARQL to make a graph pattern
– By default all triple patterns must be matched,
but we can use the OPTIONAL {} pattern to
allow variation
• Exercise:
– Query for the owners and their names, if they
exist*
* N.B. this bug in the BM data will be fixed soon
15. Exercise
• Take a look here:
• Exercise: copy and run this query
16. CSV Exercise
• Type:
• Observe that one can now paste the query
including line breaks*
• Type:
* N.B. for now you should first replace the "s with 's and
change the one occurrence of ecrm: with crm: - we’ll fix this
* N.B. currently the query needs to be simplified as the BBC
data is not loaded – this will be available soon
17. Data Analysis
• One can import this CSV file into many
tools:
– A spreadsheet can be a good way to carry out
basic visualisations
– A scripting environment like (i)python/scipy or
R can allow more analysis before
visualisation, but:
• both languages also have libraries to encapsulate
interaction via SPARQL (rdflib/sparqlwrapper and
SPARQL/RCurl respectively)
• one should decide whether more analysis should
first be carried out using SPARQL…
18. Exercise
• If you haven’t so far, click on one of the
(HotW) 100 Objects (such as number 70,
Hoa Hakananai'a Easter Island Statue)
having run the main query
• Choose a material and observe the query
for other objects in this material
• Adapt this query to count how many BM
objects are made from basalt
19. Solution & Exercise
• Exercise: Now count the ‘top ten’ materials
and the number of objects for each
21. A Last Word
• SPARQLing a ‘native RDF’ database
(often called a ‘triplestore’) is not the only
option before defaulting to programming
• A ‘native graph’ database indexes the
graph in a different way, supporting
traversal-oriented queries