1. PAXQuery
Efficient
Parallel
Processing
of
Complex
XQuery
Owner
:
Juan
A.
M.
Naranjo
Presenter
:
Katerina
Tzompanaki
2%
2. PAXQuery
• Execu&on
engine
for
the
XML
query
language
(XQuery)
that
runs
on
the
Apache
Flink
(previously
known
as
Stratosphere)
plaBorm
that
• Translates
XML
queries
to
algebraic
trees
•
Maps
algebraic
trees
to
PACT
plans
•
Parallelizes
XML
queries
• Development
Period:
January
2013-‐
December
2014
2
3. Code
locaGon
• Under
Stratosphere’s
Project
github
page
(not
accessible
&ll
12/2014)
URL:
hXps://github.com/stratosphere/paxquery
• In
Gforge
(not
frequently
updated,
needs
permission)
URL:
hXps://gforge.inria.fr/scm/viewvc.php/xmlstratosphere/paxquery/?root=xmlinthecloud
• Parts
are
based
on
code
from
ViP2P
project
URL:
hXps://scm.gforge.inria.fr/svn/vip2p/trunk/vip2p
3
7. Code
structure
7
It
is
a
Maven
Project
that
has:
• Input:
Xquery
in
text
file
• Output:
XML
result,
in
text
file
• Modules:
• Paxquery-‐algebra
Algebraic
plan
and
algebraic
operators
• Paxquery-‐client
Old
client
(not
to
be
used
in
the
release)
• Paxquery-‐common
Global-‐scope
func&onality
(e.g.
XML
naviga&on)
• Paxquery-‐pact
Custom
PACT
operators
for
Apache
Flink
• Paxquery-‐transla5on
Algebraic
tree
to
PACT
tree
• Paxquery-‐xparser
Xquery
to
algebraic
tree
(under
construc,on)
8. 8
Flink
org.apache.flink.client.CliFrontend
paxquery-xparser
fr.inria.oak.paxquery.xparser.client.Xclient
2.
instantiation
invocation of getLogicalPlan()
fr.inria.oak.paxquery.XQueryVisitorImplementati
on
paxquery-algebra
7.
returns LogicalPlan object
(an algebraic plan)
3.
instantiation of:
-LogicalPlan object
-BaseLogicalOperator objects
3.
instantiation of:
-BasePredicate objects
-NavigationTreePattern objects
-ConstructionTreePattern objects
paxquery-common
paxquery-translation
4.
invocation of:
planTranslate()
paxquery-pact
5.
instantiation of:
-Operator objects
6.
planTranslate() returns
Plan object (a PACT plan)
1.
invocation of getPlan()
8.
getPlan() returns a Plan object (a PACT plan)
PAXQuery workflow
Source instantiates or invokes end
9. Dependencies
between
modules
9
Flink
org.apache.flink.client.CliFrontend
paxquery-xparser
fr.inria.oak.paxquery.xparser.client.Xclient
fr.inria.oak.paxquery.XQueryVisitorImplementati
on
paxquery-algebra
paxquery-common
Source depends on end
(we can see it as “import” statements)
paxquery-pact
paxquery-translation
10. External
Code
Dependencies
• Apache
Flink
• Log4j
• Apache
Commons
Configura&on
• Apache
Commons
Lang
• Google
Guava
• Junit
• ANTLRv4
–
the
grammar
parser
used
in
paxquery-‐xparser
• XMLUnit
• JSON
Simple
• DOT
• To
be
used:
Dagre-‐d3
–
for
the
new
web
interface
design
10
11. TODO
• Finish
the
developement
of
paxquery-‐xparser
• Web
client
with
the
following
features:
•
Input
query
•
XML
output
•
Diagrams
of
algebraic
and
PACT
plans,
as
well
as
naviga&on
tree
paXerns.
11