SlideShare a Scribd company logo
1 of 29
Download to read offline
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Event Processing and Data Analytics with Lucidworks Fusion
Kiran Chitturi
Software Engineer, Lucidworks
3
• How to capture/record user events ?
• How to use events/signals for recommendations ?
• How to produce reports/analytics from user events ?
• What type of recommendations can be generated for different user
types?
Problem Statement
4
• Library to collect user events from client-side tier of websites and apps (https://
github.com/snowplow/snowplow-javascript-tracker)
• Open source equivalent for enterprise analytics
• Sends events using tracking pixel
• Signals API acts as a collector for Snowplow events
• Tracks page views, page pings, links and any custom configured events
• https://github.com/snowplow/snowplow/wiki/javascript-tracker
Event collection - Snowplow JS tracker
6
• Examples:
• page-view, query, search-click, add-to-cart, rating
• Signals Schema:
• required fields: type
• additional properties can be specified in ‘params’ map
• Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’,
‘weight’, ‘count’
• Processing logic in ‘_signals_ingest’ pipeline
Event collection - JSON payloads
test
Primary
collection
Raw
signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Signals
Service
JSON
payloads
Snowplow
payloads
Solr
Signals - data flow
8
Example: page-view signal
{
"timestamp": "2015-09-14T10:12:13.456Z",
"type": "pv",
"params": {
"url": "http://www.ecommerce.com/abws-mcl008-080201"
}
}
{
"type_s": "pv",
"flag_s": "event",
"params.url_s": "http://www.ecommerce.com/abws-mcl008-080201",
"id": "62a26152-7971-406e-bf06-3df44974c220",
"timestamp_tdt": "2015-09-14T10:12:13.45Z",
"count_i": 1,
"_version_": 1515057367743463400
}
Input signal Indexed signal document
9
Example: page-view signal
{
"timestamp": "2015-09-14T10:12:13.456Z",
"type": "pv",
"params": {
"page": "Dark Gray Wool Suit",
"url": "http://www.ecommerce.com/abws-mcl008-080201",
"userId": "12891291",
"useragent_type_name_s": "Browser",
"ipAddr": "64.134.151.1"
"tz": "America/NewYork"
}
}
{
"type_s": "pv",
"params.tz_s": "America/NewYork",
"user_id_s": "12891291",
"params.page_s": "Dark Gray Wool Suit",
"tz_timestamp_txt": [
"Mon 2015-09-14 10:12:13.456 UTC"
],
"flag_s": "event",
"params.ipAddr_s": "64.134.151.1",
"params.url_s": "http://www.ecommerce.com/abws-mcl008-080201",
"id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202",
"timestamp_tdt": "2015-09-14T10:12:13.45Z",
"count_i": 1,
"_version_": 1515057643959353300
}
Input signal Indexed signal document
10
Example: click signal
{
"type": "click",
"params": {
"query": "Madden 12",
"docId": "2375201",
"userId": "abc121",
"position" : "4",
"filterQueries": [
"cat00000",
"abcat0700000",
"abcat0703000",
"abcat0703002",
"abcat0703008"
]
}
}
{
"filters_orig_ss":[
"abcat0700000",
"abcat0703000",
"abcat0703002",
"abcat0703008",
"cat00000"
],
"user_id_s":"abc121",
"query_s":"madden 12",
"type_s":"click",
"params.position_s" : "4",
"query_t": "madden 12",
"doc_id_s":"2375201",
"tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"],
"filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002
$ abcat0703008 $ cat00000",
"flag_s":"event",
"query_orig_s":"Madden 12",
"id":"69c609f6-a2c1-4f89-990e-88a63e68063d",
"timestamp_tdt":"2015-10-13T18:33:04.01Z",
"count_i":1,
"_version_":1514941903557099520
}
Input signal Indexed signal document
11
• Batch processing using Apache Spark
• spark-solr library (https://github.com/LucidWorks/spark-solr)
• Types
• Simple
• Click
• EventMiner
Aggregations
12
Aggregations - data flow
Aggregation job
Aggregator
Spark
Agent
test
Primary
collection
Raw signals
collection
Worker Worker Cluster Mgr.
Spark
Aggregated signals
collection
Spark
Driver
Stores
aggregated results
Fetches raw signals
for processing
test_signals
test_signals_
aggr
13
• Simple aggregations
• Top queries
• Top clicked documents
• Most popular categories
• …
• Complex aggregations
• Click stream aggregations with decaying weights
• Generate a Co-occurence matrix for (user, docId, query) tuple
Aggregation examples
14
Example: simple aggregation
{
"type": "rating",
"params": {
"rating": “5.0”,
"source": “web”
}
},
{
"type": "rating",
"params": {
"rating": “1.0”,
"source": “web”
}
},
{
"type": "rating",
"params": {
"rating": “2.0”,
"source": “web”,
}
},
{
"type": "rating",
"params": {
"rating": “2.0”,
"source": “web”,
}
},
{
"type": "rating",
"params": {
"rating": “1.0”,
"source": “web”
}
}
API
test
Primary
collection
Raw signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Solr
Signals
Service
15
Example: simple aggregation (continued)
15
test
Primary
collection
Raw signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Solr
Submitted
manually or
via scheduler
Aggregation
Service
Spark
Fetches raw signals
for processing
Stores
aggregated results
{
"id" : "test_simple_aggr",
"signalTypes" : [ "rating" ],
"selectQuery" : "*:*",
"aggregator" : "simple",
"groupingFields" : "params.source_s",
"aggregates" : [ {
"type" : "stddev",
"sourceFields" : [ "params.rating_s" ],
"targetField" : "stddev_rating_d"
},
{
"type": "topk",
"sourceFields": ["params.rating_s"],
"targetField": "topk_rating_ss"
},
{
"type": "mean",
"sourceFields": ["params.rating_s"],
"targetField": "mean_position_d"
}
]
}
Aggregation
definition
job
submission
16
• Aggregated document:
Example: simple aggregation (continued)
{
"aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7",
"aggr_type_s": "simple@doc_id_s-query_s-filters_s",
"flag_s": "aggr",
"type_s": "rating",
"id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e",
"aggr_id_s": "test_simple_aggr",
"timestamp_tdt": "2015-10-15T02:26:17.337Z",
"count_i": 5,
“grouping_key_s": "web",
"stddev_rating_d": 1.6431676725154982,
"mean_position_d": 2.2,
"values.topk_rating_ss": ["2.0", "1.0", "5.0"],
"counts.topk_rating_ss": ["2", "2", "1"],
"errors.topk_rating_ss": ["0", "0", "0"]
}
17
Example: Click aggregation
[
{
"timestamp": "2014-09-01T23:44:52.533Z",
"params": {
"query": "Sharp",
"docId": "2009324"
},
"type": "click"
},
{
"timestamp": "2014-09-05T12:25:37.420Z",
"params": {
"query": "Sharp",
"docId": "2009324"
},
"type": "click"
},
{
"timestamp": "2014-08-24T12:56:58.910Z",
"params": {
"query": "Sharp TV",
"docId": "1517163"
},
"type": "click"
},
{
"timestamp": "2015-10-25T07:18:14.722Z",
"params": {
"query": "rca",
"docId": "2877125"
},
"type": "click"
}
]
Signals indexed
and aggregated
{
"doc_id_s": "1517163",
"query_s": "sharp tv",
"weight_d": 0.000006602878329431405,
"count_i": 1
},
{
"doc_id_s": "2009324",
"query_s": "sharp",
"weight_d": 0.000016734602468204685,
"count_i": 2
},
{
“doc_id_s”: "2877125",
"query_s": "rca",
"weight_d": 0.06324164569377899,
"count_i": 1
}
aggregated
docsraw docs
18
• How to mix signals with search results ?
• Recommendation API
• Generic query pipeline configuration using 3 stage approach
• Sub-query
• Rollup-results
• Advanced-boost
Driving search relevancy
19
Boosting search results using aggregated documents
User
App
Search
query
Query-pipeline
stages
Set Params Query Solr
Raw signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Recommendation
Stages
test
Primary
collection
1. Query aggregated documents
2. Process results
3. Add parameters to the request
Search
response
20
21
• Calculate Co-occurence matrix for tuples based on sessions
• Example: (userId, query, docId)
• Construct DAG from matrix data
• Recommendations are powered from Graph at query time
• Increases diversity in recommendations
• See https://lucidworks.com/blog/2015/08/31/mining-events-
recommendations/
Event Miner aggregation
22
Graph Navigation - Example Query
23
Graph Navigation - Example Query
24
Graph Navigation - Example Query
25
Graph Navigation - Example Query
Graph Navigation - Example Query
27
Demo
28
Using Signals
=
Modifying Your Behavior in Response to your Environment
Events & Signals
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

More Related Content

Viewers also liked

Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0
BigDataExpo
 
D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web development
NAVER D2
 
Science ABC Book
Science ABC BookScience ABC Book
Science ABC Book
tjelk1
 

Viewers also liked (20)

Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0
 
1st step LogicFlow
1st step LogicFlow1st step LogicFlow
1st step LogicFlow
 
Rapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningRapid Infrastructure Provisioning
Rapid Infrastructure Provisioning
 
Introduction to QC
Introduction to QCIntroduction to QC
Introduction to QC
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
 
Cloud Camp Azure概要
Cloud Camp Azure概要Cloud Camp Azure概要
Cloud Camp Azure概要
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web development
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
EventoDadosAbertos v17ago16
EventoDadosAbertos v17ago16EventoDadosAbertos v17ago16
EventoDadosAbertos v17ago16
 
GDPR. Et alors?
GDPR. Et alors?GDPR. Et alors?
GDPR. Et alors?
 
stagerapport2.3
stagerapport2.3stagerapport2.3
stagerapport2.3
 
ecdevday4
ecdevday4ecdevday4
ecdevday4
 
Architecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsArchitecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi Accounts
 
How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?
 
E learning: kansen en risico's
E learning: kansen en risico'sE learning: kansen en risico's
E learning: kansen en risico's
 
Big Data Expo 2015 - Teradata Big Data : Just use it!
Big Data Expo 2015 - Teradata Big Data : Just use it!Big Data Expo 2015 - Teradata Big Data : Just use it!
Big Data Expo 2015 - Teradata Big Data : Just use it!
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 final
 
okspring3x
okspring3xokspring3x
okspring3x
 
Science ABC Book
Science ABC BookScience ABC Book
Science ABC Book
 

Similar to Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

SH 1 - SES 8 - Stitch_Overview_TLV.pptx
SH 1 - SES 8 - Stitch_Overview_TLV.pptxSH 1 - SES 8 - Stitch_Overview_TLV.pptx
SH 1 - SES 8 - Stitch_Overview_TLV.pptx
MongoDB
 
Building Progressive Web Apps for Android and iOS
Building Progressive Web Apps for Android and iOSBuilding Progressive Web Apps for Android and iOS
Building Progressive Web Apps for Android and iOS
FITC
 
201410 2 fiware-orion-contextbroker
201410 2 fiware-orion-contextbroker201410 2 fiware-orion-contextbroker
201410 2 fiware-orion-contextbroker
FIWARE
 

Similar to Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks (20)

Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
 
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
 
SH 1 - SES 8 - Stitch_Overview_TLV.pptx
SH 1 - SES 8 - Stitch_Overview_TLV.pptxSH 1 - SES 8 - Stitch_Overview_TLV.pptx
SH 1 - SES 8 - Stitch_Overview_TLV.pptx
 
MongoDB Stich Overview
MongoDB Stich OverviewMongoDB Stich Overview
MongoDB Stich Overview
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
Insight User Conference Bootcamp - Use the Engagement Tracking and Metrics A...
Insight User Conference Bootcamp - Use the Engagement Tracking  and Metrics A...Insight User Conference Bootcamp - Use the Engagement Tracking  and Metrics A...
Insight User Conference Bootcamp - Use the Engagement Tracking and Metrics A...
 
Building Progressive Web Apps for Android and iOS
Building Progressive Web Apps for Android and iOSBuilding Progressive Web Apps for Android and iOS
Building Progressive Web Apps for Android and iOS
 
Going Serverless with Azure Functions
Going Serverless with Azure FunctionsGoing Serverless with Azure Functions
Going Serverless with Azure Functions
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Semantic Web & TYPO3
Semantic Web & TYPO3Semantic Web & TYPO3
Semantic Web & TYPO3
 
201410 2 fiware-orion-contextbroker
201410 2 fiware-orion-contextbroker201410 2 fiware-orion-contextbroker
201410 2 fiware-orion-contextbroker
 
The Rise of NoSQL
The Rise of NoSQLThe Rise of NoSQL
The Rise of NoSQL
 
LJC Conference 2014 Cassandra for Java Developers
LJC Conference 2014 Cassandra for Java DevelopersLJC Conference 2014 Cassandra for Java Developers
LJC Conference 2014 Cassandra for Java Developers
 

More from Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Event Processing and Data Analytics with Lucidworks Fusion Kiran Chitturi Software Engineer, Lucidworks
  • 3. 3 • How to capture/record user events ? • How to use events/signals for recommendations ? • How to produce reports/analytics from user events ? • What type of recommendations can be generated for different user types? Problem Statement
  • 4. 4 • Library to collect user events from client-side tier of websites and apps (https:// github.com/snowplow/snowplow-javascript-tracker) • Open source equivalent for enterprise analytics • Sends events using tracking pixel • Signals API acts as a collector for Snowplow events • Tracks page views, page pings, links and any custom configured events • https://github.com/snowplow/snowplow/wiki/javascript-tracker Event collection - Snowplow JS tracker
  • 5.
  • 6. 6 • Examples: • page-view, query, search-click, add-to-cart, rating • Signals Schema: • required fields: type • additional properties can be specified in ‘params’ map • Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’, ‘weight’, ‘count’ • Processing logic in ‘_signals_ingest’ pipeline Event collection - JSON payloads
  • 8. 8 Example: page-view signal { "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "url": "http://www.ecommerce.com/abws-mcl008-080201" } } { "type_s": "pv", "flag_s": "event", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "62a26152-7971-406e-bf06-3df44974c220", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057367743463400 } Input signal Indexed signal document
  • 9. 9 Example: page-view signal { "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "page": "Dark Gray Wool Suit", "url": "http://www.ecommerce.com/abws-mcl008-080201", "userId": "12891291", "useragent_type_name_s": "Browser", "ipAddr": "64.134.151.1" "tz": "America/NewYork" } } { "type_s": "pv", "params.tz_s": "America/NewYork", "user_id_s": "12891291", "params.page_s": "Dark Gray Wool Suit", "tz_timestamp_txt": [ "Mon 2015-09-14 10:12:13.456 UTC" ], "flag_s": "event", "params.ipAddr_s": "64.134.151.1", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057643959353300 } Input signal Indexed signal document
  • 10. 10 Example: click signal { "type": "click", "params": { "query": "Madden 12", "docId": "2375201", "userId": "abc121", "position" : "4", "filterQueries": [ "cat00000", "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008" ] } } { "filters_orig_ss":[ "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008", "cat00000" ], "user_id_s":"abc121", "query_s":"madden 12", "type_s":"click", "params.position_s" : "4", "query_t": "madden 12", "doc_id_s":"2375201", "tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"], "filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002 $ abcat0703008 $ cat00000", "flag_s":"event", "query_orig_s":"Madden 12", "id":"69c609f6-a2c1-4f89-990e-88a63e68063d", "timestamp_tdt":"2015-10-13T18:33:04.01Z", "count_i":1, "_version_":1514941903557099520 } Input signal Indexed signal document
  • 11. 11 • Batch processing using Apache Spark • spark-solr library (https://github.com/LucidWorks/spark-solr) • Types • Simple • Click • EventMiner Aggregations
  • 12. 12 Aggregations - data flow Aggregation job Aggregator Spark Agent test Primary collection Raw signals collection Worker Worker Cluster Mgr. Spark Aggregated signals collection Spark Driver Stores aggregated results Fetches raw signals for processing test_signals test_signals_ aggr
  • 13. 13 • Simple aggregations • Top queries • Top clicked documents • Most popular categories • … • Complex aggregations • Click stream aggregations with decaying weights • Generate a Co-occurence matrix for (user, docId, query) tuple Aggregation examples
  • 14. 14 Example: simple aggregation { "type": "rating", "params": { "rating": “5.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } } API test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Solr Signals Service
  • 15. 15 Example: simple aggregation (continued) 15 test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Solr Submitted manually or via scheduler Aggregation Service Spark Fetches raw signals for processing Stores aggregated results { "id" : "test_simple_aggr", "signalTypes" : [ "rating" ], "selectQuery" : "*:*", "aggregator" : "simple", "groupingFields" : "params.source_s", "aggregates" : [ { "type" : "stddev", "sourceFields" : [ "params.rating_s" ], "targetField" : "stddev_rating_d" }, { "type": "topk", "sourceFields": ["params.rating_s"], "targetField": "topk_rating_ss" }, { "type": "mean", "sourceFields": ["params.rating_s"], "targetField": "mean_position_d" } ] } Aggregation definition job submission
  • 16. 16 • Aggregated document: Example: simple aggregation (continued) { "aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7", "aggr_type_s": "simple@doc_id_s-query_s-filters_s", "flag_s": "aggr", "type_s": "rating", "id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e", "aggr_id_s": "test_simple_aggr", "timestamp_tdt": "2015-10-15T02:26:17.337Z", "count_i": 5, “grouping_key_s": "web", "stddev_rating_d": 1.6431676725154982, "mean_position_d": 2.2, "values.topk_rating_ss": ["2.0", "1.0", "5.0"], "counts.topk_rating_ss": ["2", "2", "1"], "errors.topk_rating_ss": ["0", "0", "0"] }
  • 17. 17 Example: Click aggregation [ { "timestamp": "2014-09-01T23:44:52.533Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-09-05T12:25:37.420Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-08-24T12:56:58.910Z", "params": { "query": "Sharp TV", "docId": "1517163" }, "type": "click" }, { "timestamp": "2015-10-25T07:18:14.722Z", "params": { "query": "rca", "docId": "2877125" }, "type": "click" } ] Signals indexed and aggregated { "doc_id_s": "1517163", "query_s": "sharp tv", "weight_d": 0.000006602878329431405, "count_i": 1 }, { "doc_id_s": "2009324", "query_s": "sharp", "weight_d": 0.000016734602468204685, "count_i": 2 }, { “doc_id_s”: "2877125", "query_s": "rca", "weight_d": 0.06324164569377899, "count_i": 1 } aggregated docsraw docs
  • 18. 18 • How to mix signals with search results ? • Recommendation API • Generic query pipeline configuration using 3 stage approach • Sub-query • Rollup-results • Advanced-boost Driving search relevancy
  • 19. 19 Boosting search results using aggregated documents User App Search query Query-pipeline stages Set Params Query Solr Raw signals collection Aggregated signals collection test_signals test_signals _aggr Recommendation Stages test Primary collection 1. Query aggregated documents 2. Process results 3. Add parameters to the request Search response
  • 20. 20
  • 21. 21 • Calculate Co-occurence matrix for tuples based on sessions • Example: (userId, query, docId) • Construct DAG from matrix data • Recommendations are powered from Graph at query time • Increases diversity in recommendations • See https://lucidworks.com/blog/2015/08/31/mining-events- recommendations/ Event Miner aggregation
  • 22. 22 Graph Navigation - Example Query
  • 23. 23 Graph Navigation - Example Query
  • 24. 24 Graph Navigation - Example Query
  • 25. 25 Graph Navigation - Example Query
  • 26. Graph Navigation - Example Query
  • 28. 28 Using Signals = Modifying Your Behavior in Response to your Environment Events & Signals