In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.
Sempala - Interactive SPARQL Query Processing on HadoopAlexander Schätzle
Driven by initiatives like Schema.org, the amount of semantically annotated data is expected to grow steadily towards massive scale, requiring cluster-based solutions to query it. At the same time, Hadoop has become dominant in the area of Big Data processing with large infrastructures being already deployed and used in manifold application fields. For Hadoop-based applications, a common data pool (HDFS) provides many synergy benefits, making it very attractive to use these infrastructures for semantic data processing as well.
Indeed, existing SPARQL-on-Hadoop (MapReduce) approaches have already demonstrated very good scalability, however, query runtimes are rather slow due to the underlying batch processing framework. While this is acceptable for data-intensive queries, it is not satisfactory for the majority of SPARQL queries that are typically much more selective requiring only small subsets of the data.
In this paper, we present Sempala, a SPARQL-over-SQL-on-Hadoop approach designed with selective queries in mind. Our evaluation shows performance improvements by an order of magnitude compared to existing approaches, paving the way for interactive-time SPARQL query processing on Hadoop.
PigSPARQL: A SPARQL Query Processing Baseline for Big DataAlexander Schätzle
In this paper we discuss PigSPARQL, a competitive yet easy to use SPARQL query processing system on MapReduce that allows ad-hoc SPARQL query processing on large RDF graphs out of the box. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop MapReduce, as an intermediate layer between SPARQL and MapReduce. This additional level of abstraction makes our approach independent of the actual Hadoop version and thus ensures the compatibility to future changes of the Hadoop framework as they will be covered by the underlying Pig layer. We revisit PigSPARQL and demonstrate the performance improvement when simply switching the underlying version of Pig from 0.5.0 to 0.11.0 without any changes to PigSPARQL itself. Because of this sustainability, PigSPARQL is an attractive long-term baseline for comparing various MapReduce based SPARQL implementations which is also underpinned by its competitiveness with existing systems, e.g. HadoopRDF.
Sempala - Interactive SPARQL Query Processing on HadoopAlexander Schätzle
Driven by initiatives like Schema.org, the amount of semantically annotated data is expected to grow steadily towards massive scale, requiring cluster-based solutions to query it. At the same time, Hadoop has become dominant in the area of Big Data processing with large infrastructures being already deployed and used in manifold application fields. For Hadoop-based applications, a common data pool (HDFS) provides many synergy benefits, making it very attractive to use these infrastructures for semantic data processing as well.
Indeed, existing SPARQL-on-Hadoop (MapReduce) approaches have already demonstrated very good scalability, however, query runtimes are rather slow due to the underlying batch processing framework. While this is acceptable for data-intensive queries, it is not satisfactory for the majority of SPARQL queries that are typically much more selective requiring only small subsets of the data.
In this paper, we present Sempala, a SPARQL-over-SQL-on-Hadoop approach designed with selective queries in mind. Our evaluation shows performance improvements by an order of magnitude compared to existing approaches, paving the way for interactive-time SPARQL query processing on Hadoop.
PigSPARQL: A SPARQL Query Processing Baseline for Big DataAlexander Schätzle
In this paper we discuss PigSPARQL, a competitive yet easy to use SPARQL query processing system on MapReduce that allows ad-hoc SPARQL query processing on large RDF graphs out of the box. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop MapReduce, as an intermediate layer between SPARQL and MapReduce. This additional level of abstraction makes our approach independent of the actual Hadoop version and thus ensures the compatibility to future changes of the Hadoop framework as they will be covered by the underlying Pig layer. We revisit PigSPARQL and demonstrate the performance improvement when simply switching the underlying version of Pig from 0.5.0 to 0.11.0 without any changes to PigSPARQL itself. Because of this sustainability, PigSPARQL is an attractive long-term baseline for comparing various MapReduce based SPARQL implementations which is also underpinned by its competitiveness with existing systems, e.g. HadoopRDF.
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LF3pBA
This CloudxLab Introduction to Pig & Pig Latin tutorial helps you to understand Pig and Pig Latin in detail. Below are the topics covered in this tutorial:
1) Introduction to Pig
2) Why Do We Need Pig?
3) Pig - Usecases
4) Pig - Philosophy
5) Pig Latin - Data Flow Language
6) Pig - Local and MapReduce Mode
7) Pig Data Types
8) Load, Store, and Dump in Pig
9) Lazy Evaluation in Pig
10) Pig - Relational Operators - FOREACH, GROUP and FILTER
11) Hands-on on Pig - Calculate Average Dividend of NYSE
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
Compare and contrast RDF triple stores and NoSQL: are triples stores NoSQL or not?
Talk given 2011-09-08 tot he BigData/NoSQL meetup at Bristol University.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
Kubernetes Chaos Engineering: Lessons Learned in Networking danielepolencic
When you deploy an application in Kubernetes, your code ends up running on one or more worker nodes. A node may be a physical machine or VM such as AWS EC2 or Google Compute Engine and having several of them means you can run and scale your application across instances efficiently. When there is an incoming request, the cluster routes the traffic to one of the nodes using a network proxy. But what happens when network proxy crashes? Does the cluster still work? Can Kubernetes recover from the failure?
In this talk, you'll learn how the traffic is distributed within a Kubernetes cluster and what happens when the network proxy is misbehaving.
Apache Pig is a high-level platform for creating programs that runs on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark.
Web Application Security 101 - 05 EnumerationWebsecurify
In part 5 of Web Application Security 101 we will dive into the various enumeration techniques attackers use to fingerprint web applications. This steps is very important because it gives a lot of insight about weak areas that can be exploited at later stage. You will learn about fingerprinting software versions and firewalls, discovering virtual hosts, google hacking and more.
Our presentation to UKNOF in September 2020
In two very long nights of maintenance we acheived:
- Full table BGP on VyOS converge time in seconds
- Routing on MikroTiks converges near-instantly
- BCP38 (customers cannot spoof source address)
- IRR filtering* (only accept where route/route6 object)
- RPKI (will not accept invalid routes from P/T)
- Templated configuration (repeatable, automated) Single source of truth (the docs become the config)
Bootstrapping Meta-Languages of Language WorkbenchesGabriël Konat
It is common practice to bootstrap compilers of programming languages. By using the compiled language to implement the compiler, compiler developers can code in their own high-level language and gain a large-scale test case. In this paper, we investigate bootstrapping of compiler-compilers as they occur in language workbenches. Language workbenches support the development of compilers through the application of multiple collaborating domain-specific meta-languages for defining a language's syntax, analysis, code generation, and editor support. We analyze the bootstrapping problem of language workbenches in detail, propose a method for sound bootstrapping based on fixpoint compilation, and develop recipes for conducting breaking meta-language changes in a bootstrapped language workbench. We have applied sound bootstrapping to the Spoofax language workbench and report on our experience.
The advent of Network Function Virtualization (NFV) is dramatically changing the way in which telecommunication networks are designed and operated. Traditional specialized physical appliances are replaced with software modules, called Virtual Network functions(VNFs), running on a virtualization infrastructure made up of general purpose servers. Examples of VNFs categories are NATs (Network Address Translation), firewalls, DPIs (Deep Packet Inspection), IDSs (Intrusion Detection System), load balancers, HTTP proxies. Service Function Chaining (SFC) denotes the process of forwarding packets through the sequence of VNFs. IPv6 Segment Routing (SRv6) is a source routing paradigm that allows to steer packets through an ordered list of VNFs in a simple and scalable manner. In this slides, we present the architecture of SFC using SRv6 for both cases of SRv6-aware and SRv6-unaware VNFs. We provide an open source implementation and easy replicable testbed for the presented work.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
More Related Content
Similar to PigSPARQL - Mapping SPARQL to Pig Latin
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LF3pBA
This CloudxLab Introduction to Pig & Pig Latin tutorial helps you to understand Pig and Pig Latin in detail. Below are the topics covered in this tutorial:
1) Introduction to Pig
2) Why Do We Need Pig?
3) Pig - Usecases
4) Pig - Philosophy
5) Pig Latin - Data Flow Language
6) Pig - Local and MapReduce Mode
7) Pig Data Types
8) Load, Store, and Dump in Pig
9) Lazy Evaluation in Pig
10) Pig - Relational Operators - FOREACH, GROUP and FILTER
11) Hands-on on Pig - Calculate Average Dividend of NYSE
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
Compare and contrast RDF triple stores and NoSQL: are triples stores NoSQL or not?
Talk given 2011-09-08 tot he BigData/NoSQL meetup at Bristol University.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
Kubernetes Chaos Engineering: Lessons Learned in Networking danielepolencic
When you deploy an application in Kubernetes, your code ends up running on one or more worker nodes. A node may be a physical machine or VM such as AWS EC2 or Google Compute Engine and having several of them means you can run and scale your application across instances efficiently. When there is an incoming request, the cluster routes the traffic to one of the nodes using a network proxy. But what happens when network proxy crashes? Does the cluster still work? Can Kubernetes recover from the failure?
In this talk, you'll learn how the traffic is distributed within a Kubernetes cluster and what happens when the network proxy is misbehaving.
Apache Pig is a high-level platform for creating programs that runs on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark.
Web Application Security 101 - 05 EnumerationWebsecurify
In part 5 of Web Application Security 101 we will dive into the various enumeration techniques attackers use to fingerprint web applications. This steps is very important because it gives a lot of insight about weak areas that can be exploited at later stage. You will learn about fingerprinting software versions and firewalls, discovering virtual hosts, google hacking and more.
Our presentation to UKNOF in September 2020
In two very long nights of maintenance we acheived:
- Full table BGP on VyOS converge time in seconds
- Routing on MikroTiks converges near-instantly
- BCP38 (customers cannot spoof source address)
- IRR filtering* (only accept where route/route6 object)
- RPKI (will not accept invalid routes from P/T)
- Templated configuration (repeatable, automated) Single source of truth (the docs become the config)
Bootstrapping Meta-Languages of Language WorkbenchesGabriël Konat
It is common practice to bootstrap compilers of programming languages. By using the compiled language to implement the compiler, compiler developers can code in their own high-level language and gain a large-scale test case. In this paper, we investigate bootstrapping of compiler-compilers as they occur in language workbenches. Language workbenches support the development of compilers through the application of multiple collaborating domain-specific meta-languages for defining a language's syntax, analysis, code generation, and editor support. We analyze the bootstrapping problem of language workbenches in detail, propose a method for sound bootstrapping based on fixpoint compilation, and develop recipes for conducting breaking meta-language changes in a bootstrapped language workbench. We have applied sound bootstrapping to the Spoofax language workbench and report on our experience.
The advent of Network Function Virtualization (NFV) is dramatically changing the way in which telecommunication networks are designed and operated. Traditional specialized physical appliances are replaced with software modules, called Virtual Network functions(VNFs), running on a virtualization infrastructure made up of general purpose servers. Examples of VNFs categories are NATs (Network Address Translation), firewalls, DPIs (Deep Packet Inspection), IDSs (Intrusion Detection System), load balancers, HTTP proxies. Service Function Chaining (SFC) denotes the process of forwarding packets through the sequence of VNFs. IPv6 Segment Routing (SRv6) is a source routing paradigm that allows to steer packets through an ordered list of VNFs in a simple and scalable manner. In this slides, we present the architecture of SFC using SRv6 for both cases of SRv6-aware and SRv6-unaware VNFs. We provide an open source implementation and easy replicable testbed for the presented work.
Similar to PigSPARQL - Mapping SPARQL to Pig Latin (20)
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
PigSPARQL - Mapping SPARQL to Pig Latin
1. PigSPARQL
Mapping SPARQL to Pig Latin
3rd International Workshop on
Semantic Web Information Management (SWIM 2011)
Alexander Schätzle
Martin Przyjaciel-Zablocki
Georg Lausen
University of Freiburg
Databases & Information Systems
12 June 2011
2. 1. Motivation
2. Framework
3. PigSPARQL
4. Evaluation
5. Summary
PigSPARQL: Mapping SPARQL to Pig Latin 2
Overview
Overview
3. Semantic Web
◦ Amount of Semantic Data increases steadily
◦ RDF is W3C standard for representing Semantic Data
Social Networks
◦ > 500 million active users in Facebook
◦ Interacting with >900 million objects (pages, groups, events)
◦ Social Graphs can also be represented in RDF
But how can we execute SPARQL queries on very
large RDF datasets with billions of triples?
Our Approach: Distributed execution of SPARQL
queries using MapReduce
PigSPARQL: Mapping SPARQL to Pig Latin 3
1. Motivation
Large RDF Graphs
Motivation
5. Properties of Pig Latin (Yahoo!)
◦ „High-Level“ Language for Data Analysis with Hadoop
◦ Automatic Translation into MapReduce-Jobs
◦ Link between User & MapReduce
Utilize Advantages of MapReduce
◦ Parallelization done by the System
◦ Good Fault Tolerance & Scalability
Avoid Drawbacks of MapReduce
◦ „Low-Level“ to implement & hard to maintain
◦ No Primitives like JOIN or GROUP
PigSPARQL: Mapping SPARQL to Pig Latin 5
2. Framework
Pig Latin
Pig Latin
6. Data model of Pig Latin
◦ Flexible, fully nested
◦ Main Data type: Tuple -> Sequence of fields
◦ A field of a Tuple can be of any Data type, i.e.
PigSPARQL: Mapping SPARQL to Pig Latin 6
2. Framework
Pig Latin
Pig Latin (2)
'Bob' or 24
('John', 'Doe')
('Bob' , 'Sarah')
('Peter', ('likes', 'football'))
'knows' -> {('Sarah')}
'age' -> 24
Atom:
Tuple:
Bag:
Map:
7. Important Operators of Pig Latin
PigSPARQL: Mapping SPARQL to Pig Latin 7
2. Framework
Pig Latin
Pig Latin (3)
FOREACH: Apply Processing on every Tuple of a Bag
result = FOREACH input GENERATE field1*field2 AS mul ;
FILTER: Delete unwanted Tuples of a Bag
adults = FILTER persons BY age >= 18 ;
[OUTER] JOIN: Join two or more Bags
result = JOIN left BY field1 [LEFT OUTER], right BY field2;
UNION: Combine two or more Bags
result = UNION bag1, bag2 ;
ORDER: Order a Bag by the specified field(s)
result = ORDER input BY field1 ;
[more]
11. 3. Step
◦ Translate Algebra Tree to Pig Latin Program
PigSPARQL: Mapping SPARQL to Pig Latin 11
3. PigSPARQL
Mapping to Pig Latin
Mapping to Pig Latin
indata = LOAD 'pathToFile' USING rdfLoader() AS (s,p,o) ;
BGP
?person knows :Peter.
?person age ?age
BGP
?person mbox ?mb
LeftJoin
Filter
?age >= 18
12. 3. Step
◦ Translate Algebra Tree to Pig Latin Program
PigSPARQL: Mapping SPARQL to Pig Latin 12
3. PigSPARQL
Mapping to Pig Latin
Mapping to Pig Latin (2)
indata = LOAD 'pathToFile' USING rdfLoader() AS (s,p,o) ;
f1 = FILTER indata BY p=='foaf:knows' AND o==':Peter' ;
t1 = FOREACH f1 GENERATE s AS person ;
f2 = FILTER indata BY p=='foaf:age';
t2 = FOREACH f2 GENERATE s AS person, o AS age ;
j1 = JOIN t1 BY person, t2 BY person ;
BGP1 = FOREACH j1 GENERATE
t1::person AS person, t2::age AS age ;
BGP
?person knows :Peter.
?person age ?age
BGP
?person mbox ?mb
LeftJoin
Filter
?age >= 18
13. 3. Step
◦ Translate Algebra Tree to Pig Latin Program
PigSPARQL: Mapping SPARQL to Pig Latin 13
3. PigSPARQL
Mapping to Pig Latin
Mapping to Pig Latin (3)
indata = LOAD 'pathToFile' USING rdfLoader() AS (s,p,o) ;
f1 = FILTER indata BY p=='foaf:knows' AND o==':Peter' ;
t1 = FOREACH f1 GENERATE s AS person ;
f2 = FILTER indata BY p=='foaf:age';
t2 = FOREACH f2 GENERATE s AS person, o AS age ;
j1 = JOIN t1 BY person, t2 BY person ;
BGP1 = FOREACH j1 GENERATE
t1::person AS person, t2::age AS age ;
F1 = FILTER BGP1 BY age >= 18 ;
BGP
?person knows :Peter.
?person age ?age
BGP
?person mbox ?mb
LeftJoin
Filter
?age >= 18
14. 3. Step
◦ Translate Algebra Tree to Pig Latin Program
PigSPARQL: Mapping SPARQL to Pig Latin 14
3. PigSPARQL
Mapping to Pig Latin
Mapping to Pig Latin (4)
indata = LOAD 'pathToFile' USING rdfLoader() AS (s,p,o) ;
f1 = FILTER indata BY p=='foaf:knows' AND o==':Peter' ;
t1 = FOREACH f1 GENERATE s AS person ;
f2 = FILTER indata BY p=='foaf:age';
t2 = FOREACH f2 GENERATE s AS person, o AS age ;
j1 = JOIN t1 BY person, t2 BY person ;
BGP1 = FOREACH j1 GENERATE
t1::person AS person, t2::age AS age ;
F1 = FILTER BGP1 BY age >= 18 ;
f1 = FILTER indata BY p=='foaf:mbox' ;
BGP2 = FOREACH indata GENERATE s AS person, o AS mb ;
BGP
?person knows :Peter.
?person age ?age
BGP
?person mbox ?mb
LeftJoin
Filter
?age >= 18
15. 3. Step
◦ Translate Algebra Tree to Pig Latin Program
PigSPARQL: Mapping SPARQL to Pig Latin 15
3. PigSPARQL
Mapping to Pig Latin
Mapping to Pig Latin (5)
indata = LOAD 'pathToFile' USING rdfLoader() AS (s,p,o) ;
f1 = FILTER indata BY p=='foaf:knows' AND o==':Peter' ;
t1 = FOREACH f1 GENERATE s AS person ;
f2 = FILTER indata BY p=='foaf:age';
t2 = FOREACH f2 GENERATE s AS person, o AS age ;
j1 = JOIN t1 BY person, t2 BY person ;
BGP1 = FOREACH j1 GENERATE
t1::person AS person, t2::age AS age ;
F1 = FILTER BGP1 BY age >= 18 ;
f1 = FILTER indata BY p=='foaf:mbox' ;
BGP2 = FOREACH indata GENERATE s AS person, o AS mb ;
lj = JOIN F1 BY person LEFT OUTER, BGP2 BY person ;
LJ1 = FOREACH lj GENERATE F1::person AS person,
F1::age AS age, BGP2::mb AS mb ;
STORE LJ1 INTO 'pathToOutput' USING resultWriter() ;
BGP
?person knows :Peter.
?person age ?age
BGP
?person mbox ?mb
LeftJoin
Filter
?age >= 18
16. SPARQL Algebra
◦ Filter Optimization (Pushing, Splitting, Substitution)
◦ Triple Pattern Reordering by Selectivity
Algebra Translation
◦ Delete unnecessary Data as early as possible
◦ Multi-Joins to reduce the number of Joins
Data Representation
◦ Vertical Partitioning of the RDF Data by Predicate
PigSPARQL: Mapping SPARQL to Pig Latin 16
3. PigSPARQL
Optimizations
Optimizations
21. PigSPARQL: SPARQL execution with MapReduce
All features of SPARQL 1.0 (not only BGP)
Evaluation with a SPARQL specific performance
benchmark (more complex queries than LUBM)
Linear scaling behavior with up to 1.6 Billion triples
Future Work: SPARQL 1.1
PigSPARQL: Mapping SPARQL to Pig Latin 21
5. Summary
Summary
23. Backup Slides
a) SPARQL Graph Pattern
b) Pig Latin – Operators
PigSPARQL: Mapping SPARQL to Pig Latin 23
Backup Slides
24. PigSPARQL: Mapping SPARQL to Pig Latin 24
Backup Slides
SPARQL
a) SPARQL Graph Pattern
↴ ↶
Basic Graph Pattern
◦ Finite set of Triple Patterns concatenated with AND (.)
◦ A Triple Pattern is an RDF Triple with variables
Graph Pattern
◦ A Basic Graph Pattern is a Graph Pattern
◦ If P and P' are Graph Patterns, then {P}.{P'}, {P} UNION {P'} and
{P} OPTIONAL {P'} are also Graph Patterns
◦ If P is a Graph Pattern and R is a Filter Condition, then
P FILTER (R) is also a Graph Pattern
◦ If P is a Graph Pattern, u an URI and ?v a variable, then
GRAPH u {P} and GRAPH ?v {P} are also Graph Pattern
25. PigSPARQL: Mapping SPARQL to Pig Latin 25
Backup Slides
Pig Latin
b) Pig Latin - Operators
↴ ↶
FOREACH: Apply Processing on every Tuple of a Bag
Ex: result = FOREACH input GENERATE field1*field2 AS mul ;
field1 field2
2 3
4 7
mul
6
28
input result
FILTER: Delete unwanted Tuples of a Bag
Ex: adults = FILTER persons BY age >= 18 ;
name age
Bob 21
Sarah 17
persons
adults
name age
Bob 21
26. PigSPARQL: Mapping SPARQL to Pig Latin 26
Backup Slides
Pig Latin
b) Pig Latin – Operators (2)
↴ ↶
field1
a
b
left
[OUTER] JOIN: Join two or more Bags
Ex: result = JOIN left BY field1 [LEFT OUTER], right BY field2 ;
field1 field2
4 a
7 a
right
left::
field1
right::
field1
right::
field2
a 4 a
a 7 a
result
field1
a
b
left
field1 field2
4 a
7 a
right
left::
field1
right::
field1
right::
field2
a 4 a
a 7 a
b
result
Outer
27. PigSPARQL: Mapping SPARQL to Pig Latin 27
Backup Slides
Pig Latin
b) Pig Latin – Operators (3)
↴ ↶
field1 field2
a 1
bag1
UNION: Ex: result = UNION bag1, bag2 ;
field1 field2
a 1
b 3
result
field1 field2
b 3
bag2
U
field1 field2
3 a
1 b
input
ORDER: Ex: result = ORDER input BY field1 ;
field1 field2
1 b
3 a
result