Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 1150691 views
- AI and Machine Learning Demystified... by Carol Smith 3651636 views
- 10 facts about jobs in the future by Pew Research Cent... 678378 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 1094265 views
- Harry Surden - Artificial Intellige... by Harry Surden 639943 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1221805 views

596 views

Published on

Adaptive and Reflective Middleware (ARM) is the main forum for researchers on adaptive and reflective middleware platforms and systems. It was the first ever workshop to be held with the ACM/IFIP/USENIX International Middleware Conference, dating back to the year 2000, in Palisades, NY (Middleware 2000) and has been running every year since.

Authors:

Y.S.Horawalavithana

D.N.Ranasinghe

http://dl.acm.org/citation.cfm?id=2834975

Citation:

Y. S. Horawalavithana and D. N. Ranasinghe. 2015. An Efficient Incremental Indexing Mechanism for Extracting Top-k Representative Queries Over Continuous Data-streams. In Proceedings of the 14th International Workshop on Adaptive and Reflective Middleware (ARM 2015). ACM, New York, NY, USA, , Article 8 . DOI=http://dx.doi.org/10.1145/2834965.2834975

Published in:
Data & Analytics

No Downloads

Total views

596

On SlideShare

0

From Embeds

0

Number of Embeds

196

Shares

0

Downloads

3

Comments

0

Likes

2

No embeds

No notes for slide

dominance condition can be well served. Because the "winner" publication as the most relevant

publication at each bucket, can cover it's neighborhood. Also two buckets represent two separate

neighborhoods. That results all "winner" publications to be dis-similar from each other by at

least d distance. So it also satises the independence condition

- 1. An Efficient incremental indexing mechanism for extracting Top-k representative queries over continuous data streams Y.S. Horawalavithana, D.N. Ranasinghe Adaptive and Reflective Middleware (ARM) ACM/IFIP/USENIX Middleware Vancouver, BC, Canada December 08, 2015 1 University of Colombo School of Computing, Sri Lanka
- 2. 2 Overview • Motivation • Adaptive Diversification • Incremental Top-k • Evaluation • Conclusion • Future work
- 3. 3
- 4. 4 Diversity: Top-k representative set Representative Top-kDrawback (without diversity) What we want (with diversity) Method to retrieve Top-k publications from matching publications 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 5. 5 Minimum independent-dominating set 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2 𝛼 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2 𝑣1 𝑣4 𝑣3 𝑣2 𝑣5 𝑣1 𝑣4 𝑣3 𝑣2 𝑣5 jijiji ppppdppodNeighborho ,|)( 𝑣1 𝑣4 𝑣3𝑣2 𝑣5 Publication space Graph model Independent, dominating Independent, dominating Independent, dominating Dominating, not independent 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 6. 6 NAÏVE Greedy argmax 𝑟(𝑝𝑖)2 𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 7. 7 Handling streaming publications 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝛼 𝑝6 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝑣6 Continuity Requirements 1. Durability an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ window are failed to compete with it. 2. Order Publication stream follow the chronological order We avoid the selection of item j as diverse later, when we already selected an item i which is not- older than j. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 8. 8 Adaptive Diversification 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... Matching publication stream 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... ith window (i+1)th window 𝑆𝑖 ∗ 𝑆𝑖+1 ∗ Independence Dominance Durability Order Straightforward solution: Apply naïve greedy method at each instance Propose incremental index mechanism! Avoid the curse of re-calculating neighborhood 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 9. 9 Locality Sensitive Hashing (LSH) Simple Idea if two points are close together, then after a “projection” operation these two points will remain close together 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 10. 10 LSH in Adaptive Diversification: Publications as categorical data 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 11. 11 LSH in Adaptive Diversification: Characteristic Matrix 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 12. 12 LSH in Adaptive Diversification: Minhashing No Publications any more! Signature to represent Technique Randomly permute the rows at characteristic matrix m times Take the number of the 1st row, in the permuted order, which the column has a 1 for the correspondent column of publications. First permutation of rows at characteristic matrix Advantage: Reduce the dimensions into a small minhash signature 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 13. 13 LSH in Adaptive Diversification: Signature Matrix Fast-minhashing Select m number of random hash functions To model the effect of m number of random permutation Mathematically proved only when, The number of rows is a prime. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 14. 14 LSH in Adaptive Diversification: LSH Buckets Take r sized signature vectors From m sized minhash- signature Map them into, L Hash-Tables Each with arbitrary b number of buckets 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 15. 15 LSH in Adaptive Diversification: Batch-wise Top-k computation Bucket “Winner” – a publication which has the highest relevancy score Winner is dominant to represent it's bucket neighborhood Top-k "winners“ that have a majority of votes k winners are independent 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 16. 16 LSH in Dynamic Diversification: Incremental Top-k computation 𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟 Characteristic Matrix 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒 Signature Matrix Map 𝑖 𝑡ℎ signature into L hash-tables Update “Winner” at bucket 𝑖 𝑡ℎ signature maps into Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 17. 17 LSH in Dynamic Diversification: When new publication F arrives… Only buckets 𝐵13 , 𝐵23 , 𝐵32 , 𝐵43 will vote Follow continuity requirements Durability Order 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window (i+1)th window 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 18. 18 LSH in Adaptive Diversification: Analysis For two vectors x,y 𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ; 𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 = 𝑥 ∩ 𝑦 𝑥 ∪ 𝑦 For publications x & y 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦 At a particular hash table x & y map into the same bucket: 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 x & y does not map into the same bucket: 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 At L Hash-tables x & y does not map into the same bucket: (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 ) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿 True near neighbors will be unlikely to be unlucky in all the projections 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 19. Publication Stream Zipfian subscriptions Normalized preferences 19 Evaluation: Dataset Amazon on-line market place data available at 17th – 19th November 2014 𝑧𝑖𝑝𝑓 𝑘: 𝑠, 𝑁 = 1 𝑘 𝑠 𝑛=1 𝑁 ( 1 𝑛 𝑠) N - number of elements in distribution, k - rank of element s - value of exponent 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑐𝑟𝑖𝑏𝑒𝑟 𝑣𝑖𝑒𝑤𝑠 = 𝑖=2 32 48 𝑐 𝑖 + 42 𝑐 𝑖 + 54 𝑐 𝑖 + 66 𝑐 𝑖 + 57 𝑐 𝑖 + 67 𝑐 𝑖
- 20. 20 Terminology ILSH, BLSH and NAÏVE 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 . . BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE ILSH 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 21. 21 Accuracy: ILSH vs. NAÏVE Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 22. 22 Performance & Efficiency: ILSH vs. BLSH vs. NAÏVE log (Top-k matching time) on number of publications with D=500 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 23. 23 Conclusions Locality Sensitive Hashing (LSH) indexing method Produce diverse set of results at average 70% accuracy over naïve method Reduce the matching time very significantly over NAÏVE method Further, refine by it’s incremental version For handling streaming publications Avoid the curse of re-computing neighborhoods Top k to restrict the delivery of Top publications Given a window size & delivery method Model can produce best diverse set of personalized results To represent the set of all matching publications at given instance 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 24. 24 Future work Explore other suitable use-cases to apply proposed model & develop prototype applications, E.g. Personalized newspaper for every Facebook user Adaptive resource scheduling in large scale distributed system Exploit overlap among diversified results of users who have similar interest Develop LSH based index over multi-threaded distributed environment 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
- 25. 25 Q&A THANK YOU!

No public clipboards found for this slide

Be the first to comment