Adaptive Semantic Data Management Techniques for         Federations of Endpoints -Tutorial Description                   ...
their applicability to the Semantic Web. Also, we will describe current solutions that have beenproposed in the context of...
3    Outline of the TutorialThe goal of the tutorial is to highlight limitations of existing RDF query engines, introduce ...
Afternoon Session      Lecture 4: Adaptive Approaches for Federations of SPARQL endpoints(50 minutes):             • Requi...
(RED2010) and accompanying professor of On the Move Academy (OTMa). Co-Chair of      the Organizing Committee of the ESWC ...
Upcoming SlideShare
Loading in …5
×

Adaptive Semantic Data Management Techniques for Federations of Endpoints

868 views

Published on

The tutorial will be presented on May 27 2012 at the 9th Extended Semantic Web Conference (ESWC 2012).

Short description of the tutorial:

The tutorial describes the traditional optimize-then-execute paradigm implemented in existing RDF engines and its main drawbacks when a large volume of data needs to be remotely accessed. As a solution to overcome limitations of current query processing approaches, we will present existing adaptive query processing techniques defined in the context of database management systems, and their applicability to the Semantic Web. Also, we will describe current solutions that have been proposed in the context of the Semantic Web to access remote data. The target audience includes researchers and practitioners that develop or use query engines to consume Linked and Big Data through SPARQL endpoints. The participants will learn limitations of existing RDF query engines and how current techniques can be extended to access remote data from Linked Data sets, and hide delays caused by unpredictable data transfers and datasets availability. A hands-on session will allow attendees to evaluate the performance and robustness of existing approaches.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
868
On SlideShare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Adaptive Semantic Data Management Techniques for Federations of Endpoints

  1. 1. Adaptive Semantic Data Management Techniques for Federations of Endpoints -Tutorial Description Maria-Esther Vidal1 , Edna Ruckhaus1 Maribel Acosta1,2 , Cosmin Basca3 , Gabriela Montoya1 1 Universidad Sim´ n Bol´var, Venezuela o ı {mvidal, ruckhaus, macosta,gmontoya}@ldc.usb.ve 2 Institute AIFB, Karlsruhe Institute of Technology, Germany Maribel.Acosta@aifb.uni-karlsruhe.de 3 Department of Informatics, University of Zurich, Switzerland basca@ifi.uzh.ch January 20, 2012 Abstract Emerging technologies that support networks of sensors or mobile smartphones are making available an extremely large volume of data or Big Data; additionally, in the context of the Cloud of Linked Data, a large number of huge RDF linked datasets have become available, and this number keeps growing. Simultaneously, although scalable and efficient RDF engines that follow the traditional optimize-then-execute paradigm have been developed to locally access RDF data, SPARQL endpoints have been implemented for remote query processing. Given the size of existing datasets, lack of statistics to describe available sources, and unpredictable conditions of remote queries, existing solutions are still insufficient. First, the most efficient RDF engines rely their query processing algorithms on physical access and storage structures that are locally stored; however, because of the size of existing linked datasets, loading the data and their links is not always feasible. Second, remote linked data query processing can be extremely costly because of the lack of query planning; also, current techniques are not adaptable to unpredictable data transfers or data availability, thus, executions can be unsuccess- ful. To overcome these limitations, query physical operators and execution engines need to be able to access remote data and adapt query execution schedulers to data availability. In this tutorial we present the basis of adaptive query processing frameworks defined in the database area, and their applicability in the Linked and Big Data context where data can be accessed through SPARQL endpoints. This tutorial targets any conference attendee who wants to know limitations of existing RDF engines, adaptive query processing techniques, and how traditional RDF data management approaches can be well-suitable to runtime conditions, and extended to access a large volume of data distributed in federations of SPARQL endpoints. The first edition of this tutorial was presented at ESWC 2011.1 Tutorial Description1.1 Aims and Target AudienceThe tutorial describes the traditional optimize-then-execute paradigm implemented in existing RDFengines and its main drawbacks when a large volume of data needs to be remotely accessed. As asolution to overcome limitations of current query processing approaches, we will present existingadaptive query processing techniques defined in the context of database management systems, and 1
  2. 2. their applicability to the Semantic Web. Also, we will describe current solutions that have beenproposed in the context of the Semantic Web to access remote data. The target audience includesresearchers and practitioners that develop or use query engines to consume Linked and Big Datathrough SPARQL endpoints. The participants will learn limitations of existing RDF query enginesand how current techniques can be extended to access remote data from Linked datasets, and hidedelays caused by unpredictable data transfers and datasets availability. A hands-on session willallow attendees to evaluate the performance and robustness of existing approaches.1.2 Presentation Method and Technical RequirementsWe propose a full-day tutorial; first theoretical issues will be presented; then, a hands-on sessionwill allow attendees to evaluate existing query processing approaches and determine pros and consof each one. The morning session will comprise a short introduction, three lectures and one coffee-break of fifteen minutes. In the introduction the core concepts of a data management engine willbe presented. Next, in the first and second lectures, query execution and optimization techniquesof the classical approach of optimize-then-execute paradigm will be described; limitations of exist-ing SPARQL endpoints and existing approaches to query Linked and Big Data will be illustrated.Then, adaptive query processing techniques proposed in the context of Databases and the Seman-tic Web will be presented in the third lecture. In the afternoon session, applicability of existingapproaches to consume Linked data will be described and an evaluation of state-of-the-art engineswill be conducted. We expect participants to have just a basic understanding of RDF and SPARQL.2 Justification for the tutorial in ESWC 2012In the context of the Cloud of Linked Data, a large number of diverse datasets have become avail-able, and an exponential growth of the published data and links has occurred during the last years.Billions of triples from life science research groups, government agencies, Wikipedia or entertain-ment organizations, currently comprise the Cloud. Following the guidelines to publish and link data on the Cloud, a great number of availableSPARQL endpoints that support remote query processing to linked data have become available,and this number keeps growing. Additionally, to scale up to the size of existing datasets, RDFengines have implemented storage and access structures and query processing techniques for localquery processing. However, although the semantic data management community actively workson more suitable linked data query processing techniques, access to the Cloud of Linked datasetsis still limited and insufficient because data have to be locally stored or some SPARQL endpointsonly support very light-weight use. To successfully execute real-world queries, in addition to accessremote data, existing query solutions have to be able to adapt query execution schedulers to dataavailability. This tutorial aims to illustrate limitations of existing approaches and how they can beextended to be well-suitable for remote query processing and runtime conditions. We consider thatthis tutorial is ideally co-located with ESWC 2012, because research institutions that traditionallyattend ESWC, have an active contribution in the domain of RDF data management. Particularly,one of the conference research tracks is on semantic data management, being query processing ofsemantic data one of the topics of interests. Thus, many of the conference attendees could see thetutorial as a place to discuss possible solutions to current semantic data management limitations. 2
  3. 3. 3 Outline of the TutorialThe goal of the tutorial is to highlight limitations of existing RDF query engines, introduce the basicconcepts of existing adaptive query processing techniques and how they can be used to effectivelyand efficiency access SPARQL endpoints.3.1 ContentThe tutorial will cover traditional data management solutions that implement the optimize-then-execute paradigm, and their pros and cons for Linked Data query processing; novel storage andaccess data structures, and query optimization and execution techniques implemented by state-of-the-art RDF engines will be described. Then, adaptive frameworks defined in the database area tomanage remote query processing, will be analyzed; adaptive operators such as symmetric hash joins(binary and n-ary), routing operators, and adaptive engines will be studied. Finally, applicability ofadaptive techniques will be illustrated with existing query processing engines for federations ofSPARQL endpoints. Attendees will evaluate the performance and robustness of state-of-the-artapproaches during a hands-on session; observed results will be discussed with the attendees.3.2 ScheduleMorning Session Introduction (20 minutes): • Traditional data management system architecture and its main components. • Basic terminology. Lecture 1-The Optimize-then-Execute Paradigm (50 minutes): • Cost-based optimization techniques. • Traditional iterator model architecture. • Centralized data management physical operators. • Centralized data management query engines. Lecture 2-Existing RDF Engines (50 minutes): • Query optimization and execution techniques in existing RDF engines like RDF- 3X [3]. • SPARQL endpoints and their execution model. • The SPARQL 1.1 Federation extension [6]. • RDF engines for query processing against federations of SPARQL endpoints; ap- proaches as FedX [5] and ARQ [7] will be studied. Coffee-Break (15 minutes) Lecture 3-Adaptive Query Processing Techniques (100 minutes): • Intra-operators solutions; adaptive physical operators: symmetric hash joins, n-ary joins. • Inter-operators solutions; Eddy operators, query processing schedulers, and routing policies. • Adaptive query engines.Lunch (120 minutes) 3
  4. 4. Afternoon Session Lecture 4: Adaptive Approaches for Federations of SPARQL endpoints(50 minutes): • Requirements for query processing in Federations of SPARQL endpoints. • Existing benchmarks for evaluating query processing engines for Federations of SPARQL endpoints, e.g., FedBench [4]. • Adaptive query processing engines for Federations of endpoints; approaches as ANAPSID [1] and Avalanche [2] will be studied. Coffee-Break (15 minutes) Hands-on Session: RDF Storage Systems Evaluation (100 minutes): existing benchmarks will be used to evaluate performance and robustness of state-of-the-art solutions; ARQ, FedX, ANAPSID and Avalanche will be analyzed. Analysis and Discussion of the Evaluation Results (30 minutes): results of the evaluation will be analyzed and discussed with the attendees.4 Tutorial Former EditionsThe first edition of the tutorial named Adaptive Semantic Data Management Techniques for LinkedData, was held at ESWC 2011(http://www.eswc2011.org/content/tutorials); it was a half day tu-torial that did not include a hands-on session and the evaluation of state-of-the-art approaches asAvalanche, ARQ, ANAPSID and FedX.5 Information of PresentersEdna Ruckhaus is a Full Professor of the Computer Science department at the Universidad Sim´ n o Bol´var, Venezuela since 1998, where she has taught several Database courses at undergrad- ı uate level. Visiting scholar of the research group Mindswap (Maryland Information and Net- work Dynamic Lab Semantic Web Agents Project), 2004-2005. Over 20 publications in in- ternational and national conferences and journals. She has been reviewer and has participated in the Program Committee of several International Conferences. Member of the Organizing Committee of the Workshop on Applications of Logic Programming to the Semantic Web and Semantic Web Services (ALPSWS2007) co-located with the International Conference on Logic Programming. Co-Chair of the Organizing Committee of the ESWC 2011 and 2012 Workshops on Resource Discovery; she co-organized and co-lectured the tutorial on Adaptive Semantic Data Management Techniques for Linked Data at ESWC 2011.Maria-Esther Vidal is a Full Professor of the Computer Science department at the Universidad Universidad Sim´ n Bol´var, Venezuela, where she has taught several Database and Semantic o ı Web courses at undergraduate and graduate level. Prof. Vidal has been also a Research Asso- ciate and Visiting Researcher at the Institute of Advanced Computer Studies of the University of Maryland, and Visiting Professor at Universidad Polit´ cnica de Catalunya, University of e Laguna Spain, and Leipzig, Germany. She has participated in several international projects supported by NFS (USA), AECI (Spain) and CNRS (France), and advised six PhD students and more than 55 master and undergraduate students. Professor Vidal has published more than 60 papers in International Conferences and Journals of the Database and The Semantic Web areas. She has been reviewer and has participated in the Program Committee of sev- eral International Journals and Conferences. Co-chair of Workshop on Resource Discovery 4
  5. 5. (RED2010) and accompanying professor of On the Move Academy (OTMa). Co-Chair of the Organizing Committee of the ESWC 2011 and 2012 Workshops on Resource Discov- ery; she co-organized and co-lectured the tutorial on Adaptive Semantic Data Management Techniques for Linked Data at ESWC 2011.Maribel Acosta is a PhD student at Institute AIFB, Karlsruhe Institute of Technology, Germany. She has Master on Computer Science from the Universidad Sim´ n Bol´var where she was a o ı Teaching Assistant and has taught Logic, Discrete Math, and Databases labs at the undergrad- uate level. She has published seven publications in international conferences and workshops. Her topics of interests are Adaptive Query Execution techniques for Linked and Big Data.Cosmin Basca is a PhD student at the University of Zurich, Department of Informatics, Switzer- land. He holds a master in Computer Science from “Lucian Blaga” University of Sibiu, Romania where he did research in image processing and computer vision. Later, while being part of Digital Enterprise Research Institute in Galway, Ireland he focused his research on Se- mantic Web, specifically Semantic Data Management. His research interests include among others: large scale distributed graph data management systems and algorithms and Linked Data.Gabriela Montoya is a Lecturer of the Computer Science Department at the Universidad Sim´ n o Bol´var, where she has taught Logic, Algorithms and Programming Languages courses and ı labs at undergraduate level. She has Master on Computer Science from the Universidad Sim´ n Bol´var and currently, she is a doctoral student at the same university; her topics of o ı interests are Data Integration and Query Processing techniques in Emerging Infrastructures.References[1] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, and E. Ruckhaus. ANAPSID: AN Adaptive query ProcesSing engIne for sparql enDpoints. In Proceedings of the International Semantic Web Conference (ISWC), 2011.[2] C. Basca and A. Bernstein. Avalanche: Putting the Spirit of the Web back into Semantic Web Querying. In SSWS2010 Workshop, Shanghai, China, 2010.[3] T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. VLDB, 1(1), 2008.[4] M. Schmidt, O. Gorlitz, P. Haase, A. Schwarte, G. Ladwig, and T. Tran. Fedbench: A bench- mark suite for federated semantic data query processing. International Semantic Web Confer- ence, 2011.[5] A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (1), pages 601–616, 2011.[6] E. P. Steve Harris, Andy Seaborne. SPARQL 1.1 Query Language, June 2010.[7] M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In International Semantic Web Conference (ISWC), Beijing, China, 2008. ACM. 5

×