• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Characterizing Machine Agent Behavior through SPARQL Query Mining

Characterizing Machine Agent Behavior through SPARQL Query Mining



Mining SPARQL queries to understand the behavior of au-...

Mining SPARQL queries to understand the behavior of au-
tomated programs (or machine agents) is an important step
in designing systems for the semantic web. We present
techniques that differ from state-of-the-art SPARQL mining
techniques in two ways: 1. Move away from one SPARQL
query at a time view to SPARQL user session view 2. Look
at the results of SPARQL queries in addition to the query
itself. Due to these two approaches, we are able to find two
new patterns in SPARQL queries that help us reason better
about the underlying program that generated the SPARQL
queries. Through a variety of experiments, we show that
the patterns found have significant support in all the four
datasets provided by the USEWOD committee.



Total Views
Views on SlideShare
Embed Views



3 Embeds 11

http://www.linkedin.com 9
http://www.docseek.net 1
https://www.linkedin.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Characterizing Machine Agent Behavior through SPARQL Query Mining Characterizing Machine Agent Behavior through SPARQL Query Mining Presentation Transcript

    • Characterizing Machine AgentBehavior through SPARQL Query Mining Aravindan Raghuveer Yahoo! Inc, Bangalore. aravindr@yahoo-inc.com
    • Introduction: LOD Users  The LOD cloud has two types of users - Humans (browsers). - Programs / machine agents. 2Yahoo! Confidential
    • Introduction: LOD Access Methods  The data on the LOD cloud can be accessed in multiple ways.  For this work, we categorize them into two buckets: - SPARQL : A powerful declarative graph query language - Non-SPARQL: Direct linked data requests. 3Yahoo! Confidential
    • Motivation: User Behavior Understanding  Deep Understanding of client behavior can help build “better” serving systems  Better: - Secure - Scalable - Available  Prior Work: - Moller et al , WebSci 2010 - Picalausa et al. Swim 2011 - Kirchberg et. al Usewod 2011 4Yahoo! Confidential - Mario et. Al, Usewod 2011
    • Summarizing. . . Human Users Machine Agents Non-SPARQL SPARQL This paper’s focus 5Yahoo! Confidential
    • What this paper is about?  Mining of the USEWOD query log dataset to identify: - Two Trends in Machine Agent Querying - Two Patterns in Machine Agent Querying 6Yahoo! Confidential
    • The USEWOD dataset  Query logs of servers hosting a part of LOD cloud data. Type # records % SPARQL (million) bio2rdf Life sciences ~ 0.2 100% lgd Geo ~ 1.9 100% SWDF Conference ~ 16.7 43.38% dbpedia Structured ~ 36.2 46.9% wikipedia 7Yahoo! Confidential
    • Part-1: Two Trends in Machine Agent Querying The Theme “What are the overarching trends for SPARQL queries?” 8Yahoo! Confidential
    • Trend-1: SPARQL is here to stay! 0.1 – 1million SWDF Dbpedia Take-away: SPARQL query volume is pretty significant 9Yahoo! Confidential
    • Trend-2: SPARQL is heavily used by machine agents. Took 17 million user agents from SPARQL queries from dbpedia and.. 10Yahoo! Confidential
    • Part-2: Two Patterns in Machine Agent Querying The Theme “Looking at SPARQL query logs, can we reason about the program that generated the queries?” 11Yahoo! Confidential
    • Salient aspects of proposed Query Mining Techniques  Move from per query analysis to query session analysis  Move from query analysis to query result analysis 12Yahoo! Confidential
    • Pattern -1 : Loops in Programs Take-away • Through a per-user, temporal mining of logs, we discover patterns that are caused by loops in program. • Significant support in all 4 datasets 13Yahoo! Confidential
    • Per-user Temporal miningTIME Loop Original Logs User level Session Analysis 14Yahoo! Confidential User-1 User-2 User-3 User-4
    • Intra Pattern Loop  successive queries from the same user, use the same “template”  Example: Two successive queries: SELECT * WHERE {http://bio2rdf.org/dr:D00332 http://bio2rdf.org/ns/bio2rdf#xRef http://bio2rdf.org/cas:54-47-7} SELECT * WHERE{http://bio2rdf.org/dr:D00333 http://bio2rdf.org/ns/bio2rdf#xRef http://bio2rdf.org/cas:54-47-7}  Only the subject (D00332,D00333) varies 15Yahoo! Confidential
    • Detecting Intra Pattern Loop  We convert a query to its canonical form by replacing variables, URI and literals by “keywords”. SELECT * WHERE {http://bio2rdf.org/dr:D00332  Canonical Form of the previous queries: SELECT * http://bio2rdf.org/ns/bio2rdf#xRef http://bio2rdf.org/cas:54-47-7} WHERE { _URI_ _URI_ _URI_ }  Queries generated by the same template will have the same canonical form. 16Yahoo! Confidential
    • Salient Aspects of Intra Pattern loops  Iterate over a dictionary of values (categorical)  Iterate over a numerical range (example LIMIT, OFFSET parameters in SPARQL queries)  Multiple levels of nested loops with the same intra loop pattern.  4 Parameters to quantify above (in paper) 17Yahoo! Confidential
    • Inter Pattern Loops  Found loops that iterate over a set of patterns P1,P2,P3 ,P1,P2,P3 ,P1,P2,P3  Typically used when the output of the first query goes as a parameter to the second query. (examples in paper) 18Yahoo! Confidential
    • Results 86% 32% Take-away: bio2rdf Significant support 40% for loops! lgd 16% swdf dbpedia 19Yahoo! Confidential
    • Pattern-2: Querying for dbpedia Linkage Take-away: • By executing each query • analyze the results, we find that a portion of queries “look” for dbpedia links • Results: - 20 months of SWDF queries had average of 8% look for dbpedia urls - 2 days worth of lgd queries had 26.5% queries look for dbpedia urls 20Yahoo! Confidential
    • Summary & Conclusions  Proposed 2 new ways of SPARQL query mining: - Session view - Analyze results in addition to query  Showed that machine agents look for dbpedia using the owl:sameas annotation.  Influence on system design: - Can we pre-fetch elements in loop beforehand? - Priortitize dbpedia attributes for caching  Influence on log collection & analysis: - Stratified random sampling to remove effect of loops. 21Yahoo! Confidential
    • For the great data !! For the great feedback & comments For listening! 22Yahoo! Confidential
    • The famous LOD Cloud . . . 7 billion triples and counting!! 23Yahoo! Confidential