• Like
  • Save
Jetlore: Technical Overview
Upcoming SlideShare
Loading in...5
×
 

Jetlore: Technical Overview

on

  • 1,847 views

We describe Jetlore's platform, provide an overview of the challenges Jetlore's team is tackling on the AI side, and provide an overview of our systems architecture. ...

We describe Jetlore's platform, provide an overview of the challenges Jetlore's team is tackling on the AI side, and provide an overview of our systems architecture.

The first half of the talk appeals to the researchers and engineers interested in AI and data mining technologies. We describe how and why entity extraction and text classification is hard on short colloquial text and compare search ranking design for social content vs. web content. In addition, we touch upon some of the smaller-scale AI projects at Jetlore.

The second half of the talk appeals to engineers building distributed systems. Our current system infrastructure includes over 20 servers able to process over 25 million posts per day. We will provide an overview our backend architecture and discuss the design and infrastructure choices for large real-time data stream processing.

Statistics

Views

Total Views
1,847
Views on SlideShare
1,549
Embed Views
298

Actions

Likes
3
Downloads
0
Comments
0

4 Embeds 298

http://www.jetlore.com 239
http://localhost 35
http://blargh.internal.qwhisper.com 23
http://staging.jetlore.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Jetlore: Technical Overview Jetlore: Technical Overview Presentation Transcript

    • Search and Analytics Platform for Social Content
    • Agenda • Overview of what Jetlore does • Overview of data mining technology • Overview of system architecture • Lessons learned and Questions
    • "count": 318 }, "comments": { "count": 39 }, "is_published": true }, { "id": "100000468770980_400645419961080", "from": { "name": "Rodrigo Flausino", "id": "100000468770980" }, "message": "Apesar de ter dado RT em piadinhas do ECAD, no fundo nofundo ando meio preocupado. u00c9 um absurdo tambu00e9m...", "icon": "http://photos-d.ak.fbcdn.net/photos-ak-snc1/v85006/23/2231777543/app_2_2231777543_7567.gif", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/100000468770980/posts/400645419961080" }, { "name": "Like", "link": "http://www.facebook.com/100000468770980/posts/400645419961080" }, { "name": "u0040rodrigoflausino on Twitter", "link": "http://twitter.com/rodrigoflausino?utm_source=fb&utm_medium=fb&utm_campaign=rodrigoflausino&utm_content=177545736488103936" } ], "type": "status", "application": { "name": "Twitter", "canvas_name": "twitter", "namespace": "twitter", "id": "2231777543" }, "created_time": "2012-03-08T00:06:17+0000", "updated_time": "2012-03-08T00:06:17+0000", "comments": { "count": 0 }, "is_published": true }, { "id": "1302721557_374629822555343", "from": { "name": "Rodrigo de Lucca", "id": "1302721557" }, "message": "OMFG ! That was wonderful !!!!!nnhttp://www.youtube.com/watch?v=Dou4Gy0p97Y", "picture": "http://external.ak.fbcdn.net/safe_image.php?d=AQC9NH0w4DRh46Y1&w=130&h=130&url=httpu00253Au00252Fu00252Fi1.ytimg.comu00252Fviu00252FDou4Gy0p97Yu00252Fhqdefault.jpg", "link": "http://www.youtube.com/watch?v=Dou4Gy0p97Y",
    • Understanding Content  Rich Profiles of Users m83" 80s" the cure" depeche mode" Hello Kitty heaven" island" Tokyo" Romney" Obama" Obama"
    • Rich User Analytics Interests of people who buy Coach Purses Florence + Lucky One Republican Neiman Law & OrderThe Machine movie Party Marcus Series
    • Agenda • Overview of what Jetlore does • Overview of data mining technology • Overview of system architecture • Lessons learned and Questions
    • Why has it not been possible?
    • Technology: Entity Extraction I’ve been waiting all week for Sunday night: Thrones.Girls." Mad Girls. Men." I’m about to try a deep fried Mars …should be interesting! Lol
 Mars"•  Naïve algorithms ~ 50% accuracy •  We achieve ~90% accuracy
    • Technology: Text Classification Why bother? Wow, you look like James Bond on this picture. Have you been James Bond" working out?" Why hard? ok local ham + metro west peeps, where and when do you watch fireworks?" •  Naïve Bayes text classifier ~ 50% accuracy •  We achieve >90% accuracy
    • Technology: Search Gotye" piano" tracks"vocals" pop" songs" indie-pop" band" album" music"
    • Other Interesting Projects • Detecting language of the post •  We do this for 60+ languages with >97% accuracy • Finding what’s trending in one’s network •  Relevant research in epidemiology! • Finding a good image for an entity/concept •  Have a corpus of images for over 2M entities
    • Agenda • Overview of what Jetlore does • Overview of data mining technology • Overview of system architecture • Lessons learned and Questions
    • {" }" Jetlore API Social Data Set of Users How it Works Index Analysis Storage
    • Buenos Aires" restaurants" Rodizio" La Brigada"Buenos Aires" Restaurant" Rodizio" La Brigada" Analysis
    • Restaurant"Buenos Aires"La Brigada" Rodizio" Index
    • travelhelpersamantha@bird.com********* travel helper
    • travelhelper
    • System Architecture Soc. Networks Hadoop/Spark Internal (offline processing) DNS Consumers (Storm DRPC) Zookeeper [sharded] MongoDB Kafka Qwhisper [sharded] Custom Kafka Spout Rest API Redis (Finagle) Topology Link Processing Other Storm Entity Extraction [sharded] Clients Classification Elastic Saving, Indexing, … Search
    • Why Messaging Middleware? •  Stream may be bursty •  Reliability •  Isolation of layers: errors in one should not break the other •  Ability to replay messages in a particular layer •  Elastically add computing units to any layer •  Independently upgrade any layer
    • Why Storm? •  Developed by Backtype for a similar use case (analytics) •  Pipes and Filters design using topology model •  Parallelization and elastic horizontal scaling •  Guaranteed message processing in case of a failure •  Provides distributed RPC for costly computations
    • Why Redis? •  Fast in-memory database for real-time stats •  Special data structures and operators for counters, sets, sorted sets, etc. •  Very small memory footprint •  Can be used as object cache (same as Memcached) •  In use by many companies at large scale with high performance
    • Things We Plan to Open-Source •  Reliable Kafka Spout for Storm (consumer of Kafka messages for Storm) •  Scala Redis driver almost fully re-written to fix bugs, support automated sharding, operation pipelining •  Search DSL (similar in spirit to 4Square’s Rogue) •  Syntactic sugar for Finagle for building REST APIs
    • Agenda • Overview of what Jetlore does • Overview of data mining technology • Overview of system architecture • Lessons learned and Questions
    • Main Lessons Learned •  Data mining technology requires metrics and rigorous evaluation •  Decouple everything •  Make everything elastic to scale •  Connect components via messaging middleware •  Leverage open-source software