VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'integrazione di dati
1. Big Data e tecnologie semantiche -
Utilizzare i Linked Data come driver
d'integrazione di dati
Giuseppe Futia
Nexa Center for Internet and Society, Politecnico di Torino
(DAUIN)
27 July 2016
2. Outline
• Information management challenges and Big Data
• Linked Data framework (explained with examples)
• Linked Data approach for Big Data community
• The impact of Big Structured Data
3. Enterprise/Research Information
Management Challenges
• Disparate data sources and data silos
• Data sources with similar/inconsistent information
• Most of the knowledge is hidden in texts (unstructured data)
• Difficult to integrate and analyse structured and unstructured
data
4. The 3 V’s of Big Data
• Velocity
• Volume
• Variety
5. The 3 V’s of Big Data
• Velocity
• Volume
• Variety (Veracity and Value)
9. Linked Data Vision (W3C)
• Extend principles of the Web from documents to data
• Data should be accessed using the general Web architecture
(e.g., URIs, HTTP, …)
• Data should be linked each other just as documents
• Creation of a common framework that allows:
– Data to be shared and reused across applications
– Data to be processed automatically
– New relationships between pieces of data to be
inferred
10. Resource Description Framework
• Everything is a triple – Subject (resource), Predicate
(relation), Object (resource or literal)
•The Resource Description Framework (RDF) graph is a
collection of triples predicate subject object
11. SPARQL
11
• SQL-like query language for RDF data
• Simple protocol for querying remote databases over
HTTP
• Query types
– select: query data by complex graph pattern
– ask: whether a query returns results (result is true/false)
– describe: returns all triples about a particular resource
– construct: create new triples based on query results
28. “The final work of legendary director Stanley Kubrick, who died
within a week of completing the edit, is based upon a novel by
Arthur Schnitzler. Tom Cruise and Nicole Kidman play William and
Alice Harford, a physician and a gallery manager who are wealthy,
successful, and travel in a sophisticated social circle.”
31. Linked Data approach adopted by
the Big Data community
• RDF data model for Variety
– Flexible, easy to evolve data model
– Efficiently integrate structured and unstructured data
• Enrich Big Data with metadata and semantics
–More powerful analytics on top of it
–Discover implicit links and relationships
• Interlink Big Data sets
–Information interchange across a value chain
33. Blazegraph and DASL
• Blazegraph is a high performance graph database platform
that supports RDF/SPARQL APIs
• In 2016 Blazegraph introduced a programming environment
called DASL
• DASL supports the development of graph algorithms within
the Apache Spark ecosystem specifically optimised for GPUs
• Complex graph analytic environments, especially where
relationships are unknown in advance
34. EP-SPARQL
• Event processing provides on-the-fly analysis of event
streams, but cannot combine streams with background
knowledge and cannot performing reasoning tasks
• Semantic tools can effectively handle background
knowledge and perform reasoning tasks, but cannot deal
with rapidly changing data provided by event streams
• Event Processing SPARQL (EP-SPARQL) as a new language for
complex event and stream reasoning