Advertisement
Advertisement

More Related Content

Similar to Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks(20)

More from Lucidworks(20)

Advertisement

Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

  1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  2. Event Processing and Data Analytics with Lucidworks Fusion Kiran Chitturi Software Engineer, Lucidworks
  3. 3 • How to capture/record user events ? • How to use events/signals for recommendations ? • How to produce reports/analytics from user events ? • What type of recommendations can be generated for different user types? Problem Statement
  4. 4 • Library to collect user events from client-side tier of websites and apps (https:// github.com/snowplow/snowplow-javascript-tracker) • Open source equivalent for enterprise analytics • Sends events using tracking pixel • Signals API acts as a collector for Snowplow events • Tracks page views, page pings, links and any custom configured events • https://github.com/snowplow/snowplow/wiki/javascript-tracker Event collection - Snowplow JS tracker
  5. 6 • Examples: • page-view, query, search-click, add-to-cart, rating • Signals Schema: • required fields: type • additional properties can be specified in ‘params’ map • Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’, ‘weight’, ‘count’ • Processing logic in ‘_signals_ingest’ pipeline Event collection - JSON payloads
  6. test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Signals Service JSON payloads Snowplow payloads Solr Signals - data flow
  7. 8 Example: page-view signal { "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "url": "http://www.ecommerce.com/abws-mcl008-080201" } } { "type_s": "pv", "flag_s": "event", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "62a26152-7971-406e-bf06-3df44974c220", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057367743463400 } Input signal Indexed signal document
  8. 9 Example: page-view signal { "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "page": "Dark Gray Wool Suit", "url": "http://www.ecommerce.com/abws-mcl008-080201", "userId": "12891291", "useragent_type_name_s": "Browser", "ipAddr": "64.134.151.1" "tz": "America/NewYork" } } { "type_s": "pv", "params.tz_s": "America/NewYork", "user_id_s": "12891291", "params.page_s": "Dark Gray Wool Suit", "tz_timestamp_txt": [ "Mon 2015-09-14 10:12:13.456 UTC" ], "flag_s": "event", "params.ipAddr_s": "64.134.151.1", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057643959353300 } Input signal Indexed signal document
  9. 10 Example: click signal { "type": "click", "params": { "query": "Madden 12", "docId": "2375201", "userId": "abc121", "position" : "4", "filterQueries": [ "cat00000", "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008" ] } } { "filters_orig_ss":[ "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008", "cat00000" ], "user_id_s":"abc121", "query_s":"madden 12", "type_s":"click", "params.position_s" : "4", "query_t": "madden 12", "doc_id_s":"2375201", "tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"], "filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002 $ abcat0703008 $ cat00000", "flag_s":"event", "query_orig_s":"Madden 12", "id":"69c609f6-a2c1-4f89-990e-88a63e68063d", "timestamp_tdt":"2015-10-13T18:33:04.01Z", "count_i":1, "_version_":1514941903557099520 } Input signal Indexed signal document
  10. 11 • Batch processing using Apache Spark • spark-solr library (https://github.com/LucidWorks/spark-solr) • Types • Simple • Click • EventMiner Aggregations
  11. 12 Aggregations - data flow Aggregation job Aggregator Spark Agent test Primary collection Raw signals collection Worker Worker Cluster Mgr. Spark Aggregated signals collection Spark Driver Stores aggregated results Fetches raw signals for processing test_signals test_signals_ aggr
  12. 13 • Simple aggregations • Top queries • Top clicked documents • Most popular categories • … • Complex aggregations • Click stream aggregations with decaying weights • Generate a Co-occurence matrix for (user, docId, query) tuple Aggregation examples
  13. 14 Example: simple aggregation { "type": "rating", "params": { "rating": “5.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } } API test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Solr Signals Service
  14. 15 Example: simple aggregation (continued) 15 test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Solr Submitted manually or via scheduler Aggregation Service Spark Fetches raw signals for processing Stores aggregated results { "id" : "test_simple_aggr", "signalTypes" : [ "rating" ], "selectQuery" : "*:*", "aggregator" : "simple", "groupingFields" : "params.source_s", "aggregates" : [ { "type" : "stddev", "sourceFields" : [ "params.rating_s" ], "targetField" : "stddev_rating_d" }, { "type": "topk", "sourceFields": ["params.rating_s"], "targetField": "topk_rating_ss" }, { "type": "mean", "sourceFields": ["params.rating_s"], "targetField": "mean_position_d" } ] } Aggregation definition job submission
  15. 16 • Aggregated document: Example: simple aggregation (continued) { "aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7", "aggr_type_s": "simple@doc_id_s-query_s-filters_s", "flag_s": "aggr", "type_s": "rating", "id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e", "aggr_id_s": "test_simple_aggr", "timestamp_tdt": "2015-10-15T02:26:17.337Z", "count_i": 5, “grouping_key_s": "web", "stddev_rating_d": 1.6431676725154982, "mean_position_d": 2.2, "values.topk_rating_ss": ["2.0", "1.0", "5.0"], "counts.topk_rating_ss": ["2", "2", "1"], "errors.topk_rating_ss": ["0", "0", "0"] }
  16. 17 Example: Click aggregation [ { "timestamp": "2014-09-01T23:44:52.533Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-09-05T12:25:37.420Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-08-24T12:56:58.910Z", "params": { "query": "Sharp TV", "docId": "1517163" }, "type": "click" }, { "timestamp": "2015-10-25T07:18:14.722Z", "params": { "query": "rca", "docId": "2877125" }, "type": "click" } ] Signals indexed and aggregated { "doc_id_s": "1517163", "query_s": "sharp tv", "weight_d": 0.000006602878329431405, "count_i": 1 }, { "doc_id_s": "2009324", "query_s": "sharp", "weight_d": 0.000016734602468204685, "count_i": 2 }, { “doc_id_s”: "2877125", "query_s": "rca", "weight_d": 0.06324164569377899, "count_i": 1 } aggregated docsraw docs
  17. 18 • How to mix signals with search results ? • Recommendation API • Generic query pipeline configuration using 3 stage approach • Sub-query • Rollup-results • Advanced-boost Driving search relevancy
  18. 19 Boosting search results using aggregated documents User App Search query Query-pipeline stages Set Params Query Solr Raw signals collection Aggregated signals collection test_signals test_signals _aggr Recommendation Stages test Primary collection 1. Query aggregated documents 2. Process results 3. Add parameters to the request Search response
  19. 20
  20. 21 • Calculate Co-occurence matrix for tuples based on sessions • Example: (userId, query, docId) • Construct DAG from matrix data • Recommendations are powered from Graph at query time • Increases diversity in recommendations • See https://lucidworks.com/blog/2015/08/31/mining-events- recommendations/ Event Miner aggregation
  21. 22 Graph Navigation - Example Query
  22. 23 Graph Navigation - Example Query
  23. 24 Graph Navigation - Example Query
  24. 25 Graph Navigation - Example Query
  25. Graph Navigation - Example Query
  26. 27 Demo
  27. 28 Using Signals = Modifying Your Behavior in Response to your Environment Events & Signals
Advertisement