SlideShare a Scribd company logo
A Publish/Subscribe Model for
Top-k Matching Over
Continuous Data-streams
Author:
Y.S. Horawalavithana
10002103
Supervisor:
Dr. D.N. Ranasinghe
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Motivation – “The Big Filter”
General Publish/Subscribe Model
Traditional Pub/Sub Matching
Drawbacks in Boolean Matching
Traditional
Publish/SubscribePublish
Subscribe
Notify
Bob likes to update about
smartphones. He prefers to get
notify on products from Verizon &
AT&T.
But Ideally Bob prefers to get
notify on products from Verizon
only if there are not enough
notifications from AT&T.
Drawbacks in Boolean Matching (Contd.)
• Subscriptions & matching publications are
considered as equally important.
• Publications are delivered to Bob
whenever there is a satisfied
subscription.
• Bob may be either overloaded with
publications or receive too few
publications over time,
• Impossible to compare different matching
publications with respect to Bob’s
subscriptions as ranking functions are not
defined, and
• Partial matching between subscriptions and
publications is not supported.
Top-k Publish/Subscribe
• Expressive stateful query processing systems
• to overcome the drawbacks identified in traditional pub/sub systems
• User defined parameter k restricts the delivered publications
• Pub/Sub Matching?
• Top-k pub/sub scoring or ranking
• Pub/Sub Indexing?
• Indexing to support personalized subscriptions
• Indexing to support continuous Top-k publications retrieval
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Research Goal
How to alleviate the Information Overload
problem based on publish/subscribe
communication paradigm which is augmented
by different scoring mechanisms over
continuous information-streams?
Research Problem
1. How to define an efficient scoring algorithm by integrating query
independent & dependent score metrics taken into account?
- Relevance, Freshness & Diversity
2. How to adapt existing indexing data structures used in state-of-the-
art publish/subscribe systems under
a) large subscription volume,
b) high event rate(velocity) and,
c) the variety of subscribable attributes,
to support top-k matching queries?
Scope
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Centralized Top-k Publish/Subscribe
Why not client centered Top-k matching with
Traditional pub/sub layer on Top?
• In subscriber point of view,
• We support partial matching between subscriptions & publications
• Personalized subscriptions
• We address the overlapping interest of many subscribers
• Experiment with system resiliency: Retrieve Top-k results on domain knowledge
• We can have large volume of subscription space with variety of attributes through an
efficient in-memory indexing mechanism
• In publisher point of view,
• Depend on the order of incoming matched publications
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Expire
Expire
Publication
Store
Subscription
Store
Subscription
Indexing
Relevance
Matching
Publication
Stream
Matching
Publication
Store
Publication
(Relevance
Score)
Publication
Indexing
Top-k
Continuous
Diversity
Personalized
Subscription
Personalized
Subscription
Personalized
Subscription
Dissimilarity
Relevancy
Event
Delivery
Top-k
Notification
Store
Notification
Notification
Notification
Sliding window
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
k-index(Whang2009)
BE*Tree-index(Sadhogi2012)
gridIndex(Pripuzi2012)
opIndex(Zhang2014)
MAXMIN Diversity/ Cover
Tree(Drosou2014)
Pref_pub/sub(Drosou2009)
Top-k/w pub/sub (Pripuzi2012)
Forward_Decay (Cormode2009)
Binary_Decsions (Campailla2001)
Publication_Aging (Shraer2013)
Pref_pub/sub with
diversity (Pitoura2009)
DIsC_diversity (Drosou2012)
Top-k representative Queries
(Ranu2014)
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Comparison: Subscription (Contd.)
Typical Pub/Sub
• Just matching a publication
whenever there’s a satisfied
subscription
Top-k Pub/Sub
• A publication is scored against a
satisfied subscription space
Item = Smartphone
Item = Smartphone
Carrier = AT&T
Carrier = AT&T
Item = Smartphone
Carrier = AT&T
Item = Smartphone
Item = Smartphone
Carrier = AT&T
Carrier = AT&T
Item = Smartphone
Carrier = AT&T
Comparison: Subscription
Typical Pub/Sub
• All subscriptions are considered
equally
• No personalized subscriptions
Top-k Pub/Sub
• Subscribers can express some
events are more important than
others by ranking subscriptions
• can have a degree of user interest
over subscription space
• limit redundancy by avoiding
results with overlapping content
• “AT&T Smartphone" include in
“Smartphone“
• Make rare events visible
How to assign preference over subscription?
Quantitative approach
• Assign interest to each
subscription
Qualitative approach
• Specify the interest between two
subscriptions
Item = Smartphone
Item = Smartphone
Carrier = AT&T
Carrier = AT&T
0.7
0.5
0.9
Item = Smartphone
Item = Smartphone
Carrier = AT&T
Carrier = AT&T
>
<
Personalized subscriptions
Explicit Global Ordering Explicit Local Ordering Explicit Local + Implicit Global Ordering
Subscription Preferences Attribute Preferences Attribute-Subscription Preferences
Carrier = AT&T
OS = Android
0.9
Carrier = Verizon
OS = iOS
0.7
>
Carrier = AT&T
Carrier = Verizon
>
OS = iOS
OS = Android
<
Carrier = AT&T (0.6)
OS = Android (0.3)
Carrier = Verizon (0.2)
OS = iOS (0.5)
Carrier = AT&T (0.3)
OS = iOS (0.7)
Brand = Apple (0.4)
We Propose: Relating Attributes
a) Subscription covering b) Subscription Merging c) Relating Attributes
attribute1
attribute2
attribute1
attribute2
attribute1
attribute2
S1
S2
S3
S1 S2
S3
Relating Attributes: Demonstration
• Let's assume that, Bob would like to get notify on products related
with following personalized queries:
Relating Attributes: Demonstration
Brand=HTC(0.3)
Storage ≤ 32GB (0.6)
2
Carrier = Verizon (0.5)
Storage ≤ 32GB (0.2)
2.5
Carrier = AT&T (0.4)
Storage ≤ 16𝐺𝐵(0.7)
1.75
Brand = HTC (0.3)
1.3
2.3
2
Carrier = Verizon
Storage ≤ 32GB
2.5
Carrier = AT&T
Storage ≤ 16𝐺𝐵
1.75
Brand = HTC
1.3
2.3
Relating Attributes: Demonstration
• A seller pushes a product
2
Carrier = Verizon
Storage ≤ 32GB
2.5
Carrier = AT&T
Storage ≤ 16𝐺𝐵
1.75
Brand = HTC
1.3
2.3
Relevancy Score
Subscription Indexing
• Can have a performance bottleneck when,
• Matching between publication & user personalized subscription space.
• Extensively studied in pub/sub community
• Don’t re-invent the wheel
• We extend an existing indexing mechanism to,
• Apply our personalized subscription model
Decision Making
opIndex
• Dynamically adopt to the variety of
attributes
• Two-space partitioning
• Attribute & operator
• Can support a wide range of operators
• Ex: Regular Expression
• Perform better when subscription
space become larger
• index construction time,
• memory cost and,
• query processing time.
k-Index, BE* Index
• Can’t deal with the variety of attributes
• Three-space partitioning
• Subscription size, Attribute & Value
• Supports only a small set of operators
• Are outperformed by opIndex
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Events Novelty
• Motivation:
• A popular news pub/sub system like Google news maintain publications
within last 30 days, but most of the time produce top-k results within last day
or two.
• Most important in Top-k computation,
• Demonstration using time policy to compute Top-k results
When to compute Top-k results?
• Our matching model deal with continuous data-stream
• Impossible to filter an unbounded stream
• We should have a time policy to compute Top-k results per subscription
I. Continuous
II. Periodic
III. Sliding Windows
Sliding Window Top-k computation
• Compute top-k results based on publications within moving windows
(time or events) e.g. w=2
P1 P2 P3 P4 P5 P6 P7 P8 P9 …
T 2T 3T 4T 5T
P1 P2 P4
Remark: Sliding Window
• Adaptive than continuous & periodic
• when w = 1; act as continuous
• when w = T; act as periodic
• But here w is Flexible
• We can dynamically change w based on event arrival rate
• Can address streams other than Poisson distribution
• Without losing generality, our model based on sliding event windows
• But when event window becomes larger?
Freshness: Time Decaying
Problem
• Older publications may prevent the newer publications to enter into
top-k results
Solution
• Lease or Expire using a time decay function
• We combine Freshness with relevancy score
Time Decaying Function
• We consider “Forward decay” to compute the publication age
• So we don’t have to compute the decay score each window
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Relevancy Decaying Function
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
Event Diversity
• In Top-k publish/subscribe,
• getting a diverse results within Top-k publications play a major role
• As an example, Bob would like to get notify about smart-phones from
the carrier=AT&T and brand=HTC.
• Without the notion of diversity, delivered top-k publications may have much
similarity between them.
• Even though, the received publications are personalized, Bob may recognize
such a system as not effective.
Define Diversity: Taxonomy
Result
Diversification
Dissimilarity Coverage Novelty
Discrete or
continuous domain
Dissimilarity
• Choosing to deliver items that are dissimilar to each other
• P-dispersion problem
• Selecting k items out of n, such that, the average pairwise distance between
the selected items is maximized
• NP-Hard
• k-diversity problem
• Is based on p-dispersion problem
• Rely on heuristics to solve large instance of the problem
K-diversity problem
• Let P be the set of matching publications; |P| = n, and given a
distance metric d to express the dissimilarity between publication
points, finding the diverse set 𝑆∗of P such that
𝑆∗ = arg max 𝑓 𝑆, 𝑑 ;
MAXDIVREL Diversity
Address continuous k-diversity problem
Not to reinvent the wheel
• Most diversity definitions are aligned with,
• P-dispersion problem
• Here, we do consider to combine diversity & relevancy as,
• mono-objective formulation
• Not more based on p-dispersion
Beyond Diversity & Relevance
• We select a set of diverse set which,
• increase the "global" importance of a selected publication, and
• reduce the "global" importance of a non-selected publication.
• We define the problem in static version,
• MAXDIVREL k-diversity problem
• We define the problem in continuous version,
• MAXDIVREL continuous k-diversity problem
Demonstration: MAXDIVREL
Definition: MAXDIVREL (static version)
MAXDIVREL k-diversity problem
• Can map into Top-k representative query problem in graph databases
which is NP-Hard
• Specialized version of set cover problem
• Can prove! 
MAXDIVREL k-diversity Algorithm: Greedy
MAXDIVREL Continuous k-diversity problem
• Continuity Requirements
• Durability
• an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in
𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎwindow are failed to
compete with it.
• Order
• Publication stream follow the chronological order
• We avoid the selection of item j as diverse later, when we already selected an item i
which is not-older than j.
Definition: MAXDIVREL (continuous version)
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
MAXDIVREL continuous k-diversity problem
• Apply MAXDIVREL k-diversity Greedy algorithm in each window
• Time complexity
• When re-calculating neighborhood
• We propose an incremental MAXDIVREL algorithm
• Calculate neighborhood at window 𝑖 + 1 𝑡ℎ using already calculated neighborhood at
window 𝑖 𝑡ℎ
• Indexing publications at each window
• Combine with subscription indexing
• Dual-indexing mechanism!
Outline
• Motivation
• Research Problem
• Re-cap proposal defense!
• Design & Architecture
• Related Work
• Contribution
• Scoring Algorithm
• Query Personalization
• Events Novelty
• Relevancy + Freshness
• MAXDIVREL Diversity
• Dual-Indexing mechanism
• To Do List
To Do List: Implementation
• Indexing based on inverted-index
• Why inverted index?
• Centralized, will try Cloud Based
• Using message broker system E.g. RabbitMQ, ZeroMQ, ActiveMQ
• Why RabbitMQ?
To Do List: Evaluation
• Multiple Directions
• Zipf property
• Using synthetic & real data-set (e.g. zipf distribution tool, Ebay, AOL Query logs)
• Algorithm efficiency
• Experiment with,
• The volume of subscriptions
• The variety of publications
• The arrival rate of publications (e.g. dynamic sliding window model)
• Using POIKILO evaluation tool
• Dual-Indexing Performance & Scalability
• Experiment with,
• Index construction time at each window
• Memory cost
• Query processing time (e.g. Neighborhood calculation)
Thank You!
Your review will be Golden!
Welcome to read the design chapters!

More Related Content

Viewers also liked

Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...
Grant Ingersoll
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
Mustafa Ahmed
 
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache SparkA Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
J Singh
 
Best Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle DatabaseBest Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle Database
Christopher Jones
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
Spark Summit
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
Durgadatta Dash
 

Viewers also liked (7)

Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache SparkA Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Best Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle DatabaseBest Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle Database
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
 

Similar to [Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for Top-k Matching Over Continuous Data-streams

2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
Chris Dwan
 
The Future of BriteCore - Product Development
The Future of BriteCore - Product DevelopmentThe Future of BriteCore - Product Development
The Future of BriteCore - Product Development
Phil Reynolds
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
Daniel Upton
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
Sebastian Montini
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
AWS Chicago
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
Francisco Couto
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
IntoTheMinds
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas SuravarapuGraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
Neo4j
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Story Mapping in Depth
Story Mapping in DepthStory Mapping in Depth
Story Mapping in Depth
LitheSpeed
 
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioCost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Docker, Inc.
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
Sense Corp
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
VMware Tanzu Korea
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 

Similar to [Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for Top-k Matching Over Continuous Data-streams (20)

2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
 
The Future of BriteCore - Product Development
The Future of BriteCore - Product DevelopmentThe Future of BriteCore - Product Development
The Future of BriteCore - Product Development
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas SuravarapuGraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Story Mapping in Depth
Story Mapping in DepthStory Mapping in Depth
Story Mapping in Depth
 
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioCost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 

More from Sameera Horawalavithana

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
Sameera Horawalavithana
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Sameera Horawalavithana
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Sameera Horawalavithana
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Sameera Horawalavithana
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Sameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
Sameera Horawalavithana
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
Sameera Horawalavithana
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
Sameera Horawalavithana
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
Sameera Horawalavithana
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
Sameera Horawalavithana
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
Sameera Horawalavithana
 
Query personalization
Query personalizationQuery personalization
Query personalization
Sameera Horawalavithana
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Sameera Horawalavithana
 

More from Sameera Horawalavithana (16)

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for Top-k Matching Over Continuous Data-streams

  • 1. A Publish/Subscribe Model for Top-k Matching Over Continuous Data-streams Author: Y.S. Horawalavithana 10002103 Supervisor: Dr. D.N. Ranasinghe
  • 2. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 3. Motivation – “The Big Filter”
  • 6. Drawbacks in Boolean Matching Traditional Publish/SubscribePublish Subscribe Notify Bob likes to update about smartphones. He prefers to get notify on products from Verizon & AT&T. But Ideally Bob prefers to get notify on products from Verizon only if there are not enough notifications from AT&T.
  • 7. Drawbacks in Boolean Matching (Contd.) • Subscriptions & matching publications are considered as equally important. • Publications are delivered to Bob whenever there is a satisfied subscription. • Bob may be either overloaded with publications or receive too few publications over time, • Impossible to compare different matching publications with respect to Bob’s subscriptions as ranking functions are not defined, and • Partial matching between subscriptions and publications is not supported.
  • 8. Top-k Publish/Subscribe • Expressive stateful query processing systems • to overcome the drawbacks identified in traditional pub/sub systems • User defined parameter k restricts the delivered publications • Pub/Sub Matching? • Top-k pub/sub scoring or ranking • Pub/Sub Indexing? • Indexing to support personalized subscriptions • Indexing to support continuous Top-k publications retrieval
  • 9. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 10. Research Goal How to alleviate the Information Overload problem based on publish/subscribe communication paradigm which is augmented by different scoring mechanisms over continuous information-streams?
  • 11. Research Problem 1. How to define an efficient scoring algorithm by integrating query independent & dependent score metrics taken into account? - Relevance, Freshness & Diversity 2. How to adapt existing indexing data structures used in state-of-the- art publish/subscribe systems under a) large subscription volume, b) high event rate(velocity) and, c) the variety of subscribable attributes, to support top-k matching queries?
  • 12. Scope
  • 13. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 15. Why not client centered Top-k matching with Traditional pub/sub layer on Top? • In subscriber point of view, • We support partial matching between subscriptions & publications • Personalized subscriptions • We address the overlapping interest of many subscribers • Experiment with system resiliency: Retrieve Top-k results on domain knowledge • We can have large volume of subscription space with variety of attributes through an efficient in-memory indexing mechanism • In publisher point of view, • Depend on the order of incoming matched publications
  • 16. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 18. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 19. k-index(Whang2009) BE*Tree-index(Sadhogi2012) gridIndex(Pripuzi2012) opIndex(Zhang2014) MAXMIN Diversity/ Cover Tree(Drosou2014) Pref_pub/sub(Drosou2009) Top-k/w pub/sub (Pripuzi2012) Forward_Decay (Cormode2009) Binary_Decsions (Campailla2001) Publication_Aging (Shraer2013) Pref_pub/sub with diversity (Pitoura2009) DIsC_diversity (Drosou2012) Top-k representative Queries (Ranu2014)
  • 20. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 21. Comparison: Subscription (Contd.) Typical Pub/Sub • Just matching a publication whenever there’s a satisfied subscription Top-k Pub/Sub • A publication is scored against a satisfied subscription space Item = Smartphone Item = Smartphone Carrier = AT&T Carrier = AT&T Item = Smartphone Carrier = AT&T Item = Smartphone Item = Smartphone Carrier = AT&T Carrier = AT&T Item = Smartphone Carrier = AT&T
  • 22. Comparison: Subscription Typical Pub/Sub • All subscriptions are considered equally • No personalized subscriptions Top-k Pub/Sub • Subscribers can express some events are more important than others by ranking subscriptions • can have a degree of user interest over subscription space • limit redundancy by avoiding results with overlapping content • “AT&T Smartphone" include in “Smartphone“ • Make rare events visible
  • 23. How to assign preference over subscription? Quantitative approach • Assign interest to each subscription Qualitative approach • Specify the interest between two subscriptions Item = Smartphone Item = Smartphone Carrier = AT&T Carrier = AT&T 0.7 0.5 0.9 Item = Smartphone Item = Smartphone Carrier = AT&T Carrier = AT&T > <
  • 24. Personalized subscriptions Explicit Global Ordering Explicit Local Ordering Explicit Local + Implicit Global Ordering Subscription Preferences Attribute Preferences Attribute-Subscription Preferences Carrier = AT&T OS = Android 0.9 Carrier = Verizon OS = iOS 0.7 > Carrier = AT&T Carrier = Verizon > OS = iOS OS = Android < Carrier = AT&T (0.6) OS = Android (0.3) Carrier = Verizon (0.2) OS = iOS (0.5) Carrier = AT&T (0.3) OS = iOS (0.7) Brand = Apple (0.4)
  • 25. We Propose: Relating Attributes a) Subscription covering b) Subscription Merging c) Relating Attributes attribute1 attribute2 attribute1 attribute2 attribute1 attribute2 S1 S2 S3 S1 S2 S3
  • 26. Relating Attributes: Demonstration • Let's assume that, Bob would like to get notify on products related with following personalized queries:
  • 27. Relating Attributes: Demonstration Brand=HTC(0.3) Storage ≤ 32GB (0.6) 2 Carrier = Verizon (0.5) Storage ≤ 32GB (0.2) 2.5 Carrier = AT&T (0.4) Storage ≤ 16𝐺𝐵(0.7) 1.75 Brand = HTC (0.3) 1.3 2.3
  • 28. 2 Carrier = Verizon Storage ≤ 32GB 2.5 Carrier = AT&T Storage ≤ 16𝐺𝐵 1.75 Brand = HTC 1.3 2.3
  • 29. Relating Attributes: Demonstration • A seller pushes a product
  • 30. 2 Carrier = Verizon Storage ≤ 32GB 2.5 Carrier = AT&T Storage ≤ 16𝐺𝐵 1.75 Brand = HTC 1.3 2.3
  • 32. Subscription Indexing • Can have a performance bottleneck when, • Matching between publication & user personalized subscription space. • Extensively studied in pub/sub community • Don’t re-invent the wheel • We extend an existing indexing mechanism to, • Apply our personalized subscription model
  • 33. Decision Making opIndex • Dynamically adopt to the variety of attributes • Two-space partitioning • Attribute & operator • Can support a wide range of operators • Ex: Regular Expression • Perform better when subscription space become larger • index construction time, • memory cost and, • query processing time. k-Index, BE* Index • Can’t deal with the variety of attributes • Three-space partitioning • Subscription size, Attribute & Value • Supports only a small set of operators • Are outperformed by opIndex
  • 34. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 35. Events Novelty • Motivation: • A popular news pub/sub system like Google news maintain publications within last 30 days, but most of the time produce top-k results within last day or two. • Most important in Top-k computation, • Demonstration using time policy to compute Top-k results
  • 36. When to compute Top-k results? • Our matching model deal with continuous data-stream • Impossible to filter an unbounded stream • We should have a time policy to compute Top-k results per subscription I. Continuous II. Periodic III. Sliding Windows
  • 37. Sliding Window Top-k computation • Compute top-k results based on publications within moving windows (time or events) e.g. w=2 P1 P2 P3 P4 P5 P6 P7 P8 P9 … T 2T 3T 4T 5T P1 P2 P4
  • 38. Remark: Sliding Window • Adaptive than continuous & periodic • when w = 1; act as continuous • when w = T; act as periodic • But here w is Flexible • We can dynamically change w based on event arrival rate • Can address streams other than Poisson distribution • Without losing generality, our model based on sliding event windows • But when event window becomes larger?
  • 39. Freshness: Time Decaying Problem • Older publications may prevent the newer publications to enter into top-k results Solution • Lease or Expire using a time decay function • We combine Freshness with relevancy score
  • 40. Time Decaying Function • We consider “Forward decay” to compute the publication age • So we don’t have to compute the decay score each window
  • 41. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 43. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 44. Event Diversity • In Top-k publish/subscribe, • getting a diverse results within Top-k publications play a major role • As an example, Bob would like to get notify about smart-phones from the carrier=AT&T and brand=HTC. • Without the notion of diversity, delivered top-k publications may have much similarity between them. • Even though, the received publications are personalized, Bob may recognize such a system as not effective.
  • 45. Define Diversity: Taxonomy Result Diversification Dissimilarity Coverage Novelty Discrete or continuous domain
  • 46. Dissimilarity • Choosing to deliver items that are dissimilar to each other • P-dispersion problem • Selecting k items out of n, such that, the average pairwise distance between the selected items is maximized • NP-Hard • k-diversity problem • Is based on p-dispersion problem • Rely on heuristics to solve large instance of the problem
  • 47. K-diversity problem • Let P be the set of matching publications; |P| = n, and given a distance metric d to express the dissimilarity between publication points, finding the diverse set 𝑆∗of P such that 𝑆∗ = arg max 𝑓 𝑆, 𝑑 ;
  • 49. Not to reinvent the wheel • Most diversity definitions are aligned with, • P-dispersion problem • Here, we do consider to combine diversity & relevancy as, • mono-objective formulation • Not more based on p-dispersion
  • 50. Beyond Diversity & Relevance • We select a set of diverse set which, • increase the "global" importance of a selected publication, and • reduce the "global" importance of a non-selected publication. • We define the problem in static version, • MAXDIVREL k-diversity problem • We define the problem in continuous version, • MAXDIVREL continuous k-diversity problem
  • 53. MAXDIVREL k-diversity problem • Can map into Top-k representative query problem in graph databases which is NP-Hard • Specialized version of set cover problem • Can prove! 
  • 55. MAXDIVREL Continuous k-diversity problem • Continuity Requirements • Durability • an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎwindow are failed to compete with it. • Order • Publication stream follow the chronological order • We avoid the selection of item j as diverse later, when we already selected an item i which is not-older than j.
  • 57. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 58. MAXDIVREL continuous k-diversity problem • Apply MAXDIVREL k-diversity Greedy algorithm in each window • Time complexity • When re-calculating neighborhood • We propose an incremental MAXDIVREL algorithm • Calculate neighborhood at window 𝑖 + 1 𝑡ℎ using already calculated neighborhood at window 𝑖 𝑡ℎ • Indexing publications at each window • Combine with subscription indexing • Dual-indexing mechanism!
  • 59. Outline • Motivation • Research Problem • Re-cap proposal defense! • Design & Architecture • Related Work • Contribution • Scoring Algorithm • Query Personalization • Events Novelty • Relevancy + Freshness • MAXDIVREL Diversity • Dual-Indexing mechanism • To Do List
  • 60. To Do List: Implementation • Indexing based on inverted-index • Why inverted index? • Centralized, will try Cloud Based • Using message broker system E.g. RabbitMQ, ZeroMQ, ActiveMQ • Why RabbitMQ?
  • 61. To Do List: Evaluation • Multiple Directions • Zipf property • Using synthetic & real data-set (e.g. zipf distribution tool, Ebay, AOL Query logs) • Algorithm efficiency • Experiment with, • The volume of subscriptions • The variety of publications • The arrival rate of publications (e.g. dynamic sliding window model) • Using POIKILO evaluation tool • Dual-Indexing Performance & Scalability • Experiment with, • Index construction time at each window • Memory cost • Query processing time (e.g. Neighborhood calculation)
  • 62. Thank You! Your review will be Golden! Welcome to read the design chapters!

Editor's Notes

  1. Hence, it addresses the efficient processing of top-k queries over multiple data streams which filters out irrelevant data stream objects, and delivers only top-k objects relevant to user interests.
  2. Traditional: Users can only express their interest over a set of predicates or expressions in the subscription.