Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)
 

Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)

on

  • 3,031 views

Title: Using Graph Theory to understand User Intent ...

Title: Using Graph Theory to understand User Intent

Subtitle: Graph-based Natural Language Processing applied to real-time Machine Learning

Abstract:

We are in a Graph Renaissance period. The advent of high-performance free/open-source software combined with inexpensive Cloud computing platforms enable graphs of information to be manipulated and utilised at scales never before seen. While use-cases like mining social and web data with graphs are common-place, their use in Natural Language Processing has largely been overlooked. In this presentation Michael Cutler will describe how TUMRA have used graph-based NLP algorithms as a core component of their upcoming digital marketing product TUMRA Optimize.


Presenter: Michael Cutler

Bio:

Michael is the CTO co-founder of TUMRA, a Data Science startup based in Chiswick, West London. First discovering Hadoop back in 2008, Michael has been following the bleeding edge of ‘Big Data’ technology since before it was called ‘Big Data’ and has applied it to solve real-world problems.

Before starting TUMRA, Michael was a senior researcher in the R&D labs for British Sky Broadcasting, inventing new technologies and solutions for everything from Satellite, Video and Network systems through to Web and Mobile-based applications.

Website: http://tumra.com http://cotdp.com
Twitter: @tumra @cotdp

Statistics

Views

Total Views
3,031
Views on SlideShare
2,979
Embed Views
52

Actions

Likes
9
Downloads
55
Comments
1

4 Embeds 52

http://thereferencebox.jimdo.com 21
http://www66.jimdo.com 14
https://twitter.com 13
http://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013) Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013) Presentation Transcript

  • Using Graph Theory to understand Intent & Concepts – January 2013   tumra.com  
  • UNDERSTANDING INTENT & CONCEPTS  •  Use case: -  Enhancing Social TV user experience -  Matching users to content that interests them•  Topics we’ll cover: -  Natural Language Processing -  Graph Theory -  Machine Learning tumra.com  
  • USE CASE ENHANCED SOCIAL TV  •  Objectives: -  Increase engagement with content -  Enhance multi-channel user experience•  We built a prototype solution: -  Mines unstructured data in real-time -  Understands: -  What interests individual users -  Entities & Concepts (People, Places, Events) tumra.com  
  • THE CHALLENGE  THANKS FORtoLISTENING   Help users to “follow the story” regardless of the news outlet, integrated web / second-screen   tumra.com   Photo Credit: byrion on Flickr (cc)
  • THE PROBLEM  Unstructured Data Magic?!?! Awesomeness! tumra.com  
  • THE PROBLEM  •  Little useful data to work with… -  Streams of continuous live TV -  Have to create metadata•  Where did we start? -  Ingest several live news channels -  Extract whatever data was available: -  In-video text using OCR -  Subtitles / Closed Captions tumra.com  
  • STEP 1 NAMED ENTITY RECOGNITION  We used a simple N-Gram model for exact matches; then Apache Lucene for everything else…   tumra.com  
  • EXAMPLE N.E.R.   “David Cameron and the GermanChancellor Angela Merkel meets to discuss the debt crisis and signaltheir approval for greater eurozone integration.”   tumra.com  
  • EXAMPLE N.E.R.   “David Cameron and the GermanChancellor Angela Merkel meets to discuss the debt crisis and signaltheir approval for greater eurozone integration.”   tumra.com  
  • INITIAL SOLUTION   NoSQLUnstructured Awesomeness! Data NER tumra.com  
  • OH NO!!! *facepalm*   Photo Credit: cesarastudillo on Flickr (cc)
  • DISAMBIGUATION  •  Which “David Cameron”? -  We have many in our Knowledgebase -  Sportsmen, actors, painters & characters…•  Our initial simplistic approach was naïve -  Works great with unambiguous matches -  Best-case returns top-scoring entity•  We needed a smarter approach tumra.com  
  • RECAP  •  We have an effectively ‘flat’ KB of Entities -  “David Cameron” -> Politician (Person) -  “Angela Merkel” -> Politician (Person) -  “German Chancellor” -> Political office (Concept) -  “Debt” -> Economic concept (Concept) -  “Eurozone” -> Economic area (Place)•  We needed a way to find relationships between Entities tumra.com  
  • THE BIG IDEA  Graphs allow us to store relationships between entities, andgraph algorithms allow us to interrogate those connections…  
  • GRAPH DATABASES   Graph Neo4J Lab Apache Golden Giraph Orb… of course there are many more open-source & proprietary ones   tumra.com  
  • SO, WHICH ONE?   ???… it had to be fast, scalable, active development   tumra.com  
  • STEP 2 BUILDING RELATIONSHIPS  We had 250 million Nodes, and 4 billion Edges…great initial results but horrendously inefficient! Example: “David Cameron” & “Angela Merkel”   tumra.com  
  • INITIAL IMPROVEMENTS  •  We didn’t need everything… just: -  People: “David Cameron”, “Angela Merkel” -  Places: “London”, “Downing Street”, “Eurozone” -  Concepts: “Debt”, “President”, “Eurozone” -  Things: Companies, Products etc.•  Pruned the graph using Map/Reduce•  This reduced the number of Entities… -  … but we still had billions of connections tumra.com  
  • EXAMPLE PEOPLE, PLACES, CONCEPTS   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • EXAMPLE PEOPLE, PLACES, CONCEPTS     “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”  Concepts Places People tumra.com  
  • DISAMBIGUATION   Angela Merkel David Cameron (painter) Living Person Politician Head of State David Cameron David(footballer) David Cameron Cameron (actor) (politician)Possibilities: shortest path, number of common connections etc.  
  • STEP 3 SIMPLIFYING THE GRAPH  Sure all that extra metadata was tasty but we didn’t need it all to solve the use-case… So we used Map/Reduce to count the common connections   tumra.com  
  • SIMPLIFIED   Angela Merkel David Cameron (painter) 1 3 1 David Cameron David(footballer) David Cameron Cameron (actor) (politician) Woah … that looks a lot like Least Cost Routing problem  
  • LEAST COST PATH   Angela Merkel David Cameron (painter) 1/1 1/3 1/1 David Cameron David(footballer) David Cameron Cameron (actor) (politician) 1 / number of common connections = cost  
  • UPDATED SOLUTION   Neo4J NoSQLUnstructured Disambiguation Awesomeness! Data NER tumra.com  
  • RECAP  •  Graphs allow us to interrogate relationships -  Disambiguate when faced with multiple possibilities -  Infer more about the context of what’s happening•  Went through iterations of improvements -  Kept our Entity data in NoSQL = TB’s -  Used the Graph as an index of sorts = GB’s•  Neo4j was a great fit for our needs tumra.com  
  • STEP 4 MAKING IT WORK REAL-TIME  Some queries were taking ‘seconds’ and we needed to go a lot faster because TV wont wait for us … Do we really need to check the Graph everytime?   tumra.com  
  • ENTER MACHINE LEARNING  •  We can use simple predictors to estimate the likelihood of Entities occurring -  i.e. every time we’ve looked for “David Cameron” in the past the best match was the Politician•  Keeping a ‘probabilistic context’ of recent Entities allows us to detect shifts in topics -  Works especially well on News channels -  Reduces the demand on Graph lookups tumra.com  
  • BAYES THEOREM  Looks complicated, but its basically just counting & division   Photo Credit: mattbuck007 on Flickr (cc)
  • STEP 5 MAKING IT WORK WORLDWIDE   We solved the problem for English, but what about other languages?   tumra.com  
  • LANGUAGE  •  Our core Entities of ‘People’, ‘Places’, & ‘Concepts’ are language agnostic…•  We needed a way to ditch ‘language’ and jump straight to entities… -  The colour ‘Red’ means the same thing regardless of you calling it ‘Rot’, ‘Rouge’ or ‘赤’•  Again, Graphs could solve the problem tumra.com  
  • LANGUAGE INDEPENDENT  Red !"#‫أ‬ Color:Rouge Red 赤 Rot Röd Rojo 紅
  • PROBLEM SOLVED  Typical response time ~30ms … relevancy improves over time and learns new entities ‘online’   tumra.com  
  • FINAL SOLUTION   Neo4J NoSQLUnstructured Language Model Disambiguation Awesomeness! Data Machine Learning NER tumra.com  
  • ABOUT US  •  We’ve built a product… -  Our ‘Digital Marketing Optimization’ platform improves conversion rates & customer satisfaction for eCommerce & Marketing campaigns -  Launches Q1 2013•  What else do we do? -  ‘Big Data’ & ‘Data Science’ professional services -  Bespoke prototype & solution development “TUMRA” is a transliteration of the Sanskrit word for “BIG”; we thought it’s a great name … ( and the .COM was available ) tumra.com  
  • TUMRA You?THANKS FOR LISTENING   We’re hiring! Data Scientists & Developers work@tumra.com tumra.com  
  • THANKS FOR LISTENING Questions?   tumra.com hello@tumra.com   twitter.com/tumra tumra.com