Characterizing Machine Agent Behavior through SPARQL Query Mining


Published on

Mining SPARQL queries to understand the behavior of au-
tomated programs (or machine agents) is an important step
in designing systems for the semantic web. We present
techniques that differ from state-of-the-art SPARQL mining
techniques in two ways: 1. Move away from one SPARQL
query at a time view to SPARQL user session view 2. Look
at the results of SPARQL queries in addition to the query
itself. Due to these two approaches, we are able to find two
new patterns in SPARQL queries that help us reason better
about the underlying program that generated the SPARQL
queries. Through a variety of experiments, we show that
the patterns found have significant support in all the four
datasets provided by the USEWOD committee.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Characterizing Machine Agent Behavior through SPARQL Query Mining

  1. 1. Characterizing Machine AgentBehavior through SPARQL Query Mining Aravindan Raghuveer Yahoo! Inc, Bangalore.
  2. 2. Introduction: LOD Users  The LOD cloud has two types of users - Humans (browsers). - Programs / machine agents. 2Yahoo! Confidential
  3. 3. Introduction: LOD Access Methods  The data on the LOD cloud can be accessed in multiple ways.  For this work, we categorize them into two buckets: - SPARQL : A powerful declarative graph query language - Non-SPARQL: Direct linked data requests. 3Yahoo! Confidential
  4. 4. Motivation: User Behavior Understanding  Deep Understanding of client behavior can help build “better” serving systems  Better: - Secure - Scalable - Available  Prior Work: - Moller et al , WebSci 2010 - Picalausa et al. Swim 2011 - Kirchberg et. al Usewod 2011 4Yahoo! Confidential - Mario et. Al, Usewod 2011
  5. 5. Summarizing. . . Human Users Machine Agents Non-SPARQL SPARQL This paper’s focus 5Yahoo! Confidential
  6. 6. What this paper is about?  Mining of the USEWOD query log dataset to identify: - Two Trends in Machine Agent Querying - Two Patterns in Machine Agent Querying 6Yahoo! Confidential
  7. 7. The USEWOD dataset  Query logs of servers hosting a part of LOD cloud data. Type # records % SPARQL (million) bio2rdf Life sciences ~ 0.2 100% lgd Geo ~ 1.9 100% SWDF Conference ~ 16.7 43.38% dbpedia Structured ~ 36.2 46.9% wikipedia 7Yahoo! Confidential
  8. 8. Part-1: Two Trends in Machine Agent Querying The Theme “What are the overarching trends for SPARQL queries?” 8Yahoo! Confidential
  9. 9. Trend-1: SPARQL is here to stay! 0.1 – 1million SWDF Dbpedia Take-away: SPARQL query volume is pretty significant 9Yahoo! Confidential
  10. 10. Trend-2: SPARQL is heavily used by machine agents. Took 17 million user agents from SPARQL queries from dbpedia and.. 10Yahoo! Confidential
  11. 11. Part-2: Two Patterns in Machine Agent Querying The Theme “Looking at SPARQL query logs, can we reason about the program that generated the queries?” 11Yahoo! Confidential
  12. 12. Salient aspects of proposed Query Mining Techniques  Move from per query analysis to query session analysis  Move from query analysis to query result analysis 12Yahoo! Confidential
  13. 13. Pattern -1 : Loops in Programs Take-away • Through a per-user, temporal mining of logs, we discover patterns that are caused by loops in program. • Significant support in all 4 datasets 13Yahoo! Confidential
  14. 14. Per-user Temporal miningTIME Loop Original Logs User level Session Analysis 14Yahoo! Confidential User-1 User-2 User-3 User-4
  15. 15. Intra Pattern Loop  successive queries from the same user, use the same “template”  Example: Two successive queries: SELECT * WHERE {} SELECT * WHERE{}  Only the subject (D00332,D00333) varies 15Yahoo! Confidential
  16. 16. Detecting Intra Pattern Loop  We convert a query to its canonical form by replacing variables, URI and literals by “keywords”. SELECT * WHERE {  Canonical Form of the previous queries: SELECT *} WHERE { _URI_ _URI_ _URI_ }  Queries generated by the same template will have the same canonical form. 16Yahoo! Confidential
  17. 17. Salient Aspects of Intra Pattern loops  Iterate over a dictionary of values (categorical)  Iterate over a numerical range (example LIMIT, OFFSET parameters in SPARQL queries)  Multiple levels of nested loops with the same intra loop pattern.  4 Parameters to quantify above (in paper) 17Yahoo! Confidential
  18. 18. Inter Pattern Loops  Found loops that iterate over a set of patterns P1,P2,P3 ,P1,P2,P3 ,P1,P2,P3  Typically used when the output of the first query goes as a parameter to the second query. (examples in paper) 18Yahoo! Confidential
  19. 19. Results 86% 32% Take-away: bio2rdf Significant support 40% for loops! lgd 16% swdf dbpedia 19Yahoo! Confidential
  20. 20. Pattern-2: Querying for dbpedia Linkage Take-away: • By executing each query • analyze the results, we find that a portion of queries “look” for dbpedia links • Results: - 20 months of SWDF queries had average of 8% look for dbpedia urls - 2 days worth of lgd queries had 26.5% queries look for dbpedia urls 20Yahoo! Confidential
  21. 21. Summary & Conclusions  Proposed 2 new ways of SPARQL query mining: - Session view - Analyze results in addition to query  Showed that machine agents look for dbpedia using the owl:sameas annotation.  Influence on system design: - Can we pre-fetch elements in loop beforehand? - Priortitize dbpedia attributes for caching  Influence on log collection & analysis: - Stratified random sampling to remove effect of loops. 21Yahoo! Confidential
  22. 22. For the great data !! For the great feedback & comments For listening! 22Yahoo! Confidential
  23. 23. The famous LOD Cloud . . . 7 billion triples and counting!! 23Yahoo! Confidential