Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

596 views

Published on

The presentation done at ACM/IFIP/USENIX Middleware workshop 2015

Adaptive and Reflective Middleware (ARM) is the main forum for researchers on adaptive and reflective middleware platforms and systems. It was the first ever workshop to be held with the ACM/IFIP/USENIX International Middleware Conference, dating back to the year 2000, in Palisades, NY (Middleware 2000) and has been running every year since.

Authors:
Y.S.Horawalavithana
D.N.Ranasinghe

http://dl.acm.org/citation.cfm?id=2834975

Citation:
Y. S. Horawalavithana and D. N. Ranasinghe. 2015. An Efficient Incremental Indexing Mechanism for Extracting Top-k Representative Queries Over Continuous Data-streams. In Proceedings of the 14th International Workshop on Adaptive and Reflective Middleware (ARM 2015). ACM, New York, NY, USA, , Article 8 . DOI=http://dx.doi.org/10.1145/2834965.2834975

Published in: Data & Analytics
  • Be the first to comment

[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

  1. 1. An Efficient incremental indexing mechanism for extracting Top-k representative queries over continuous data streams Y.S. Horawalavithana, D.N. Ranasinghe Adaptive and Reflective Middleware (ARM) ACM/IFIP/USENIX Middleware Vancouver, BC, Canada December 08, 2015 1 University of Colombo School of Computing, Sri Lanka
  2. 2. 2 Overview • Motivation • Adaptive Diversification • Incremental Top-k • Evaluation • Conclusion • Future work
  3. 3. 3
  4. 4. 4 Diversity: Top-k representative set Representative Top-kDrawback (without diversity) What we want (with diversity) Method to retrieve Top-k publications from matching publications 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  5. 5. 5 Minimum independent-dominating set 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2 𝛼 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2  𝑣1 𝑣4 𝑣3 𝑣2 𝑣5 𝑣1 𝑣4 𝑣3 𝑣2 𝑣5   jijiji ppppdppodNeighborho  ,|)( 𝑣1 𝑣4 𝑣3𝑣2 𝑣5 Publication space Graph model Independent, dominating Independent, dominating Independent, dominating Dominating, not independent 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  6. 6. 6 NAÏVE Greedy argmax 𝑟(𝑝𝑖)2 𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  7. 7. 7 Handling streaming publications 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝛼 𝑝6 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝑣6 Continuity Requirements 1. Durability an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ window are failed to compete with it. 2. Order Publication stream follow the chronological order We avoid the selection of item j as diverse later, when we already selected an item i which is not- older than j. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  8. 8. 8 Adaptive Diversification 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... Matching publication stream 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... ith window (i+1)th window 𝑆𝑖 ∗ 𝑆𝑖+1 ∗ Independence Dominance Durability Order  Straightforward solution:  Apply naïve greedy method at each instance  Propose incremental index mechanism!  Avoid the curse of re-calculating neighborhood 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  9. 9. 9 Locality Sensitive Hashing (LSH)  Simple Idea  if two points are close together, then after a “projection” operation these two points will remain close together 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  10. 10. 10 LSH in Adaptive Diversification: Publications as categorical data 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  11. 11. 11 LSH in Adaptive Diversification: Characteristic Matrix 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  12. 12. 12 LSH in Adaptive Diversification: Minhashing  No Publications any more!  Signature to represent  Technique  Randomly permute the rows at characteristic matrix m times  Take the number of the 1st row, in the permuted order,  which the column has a 1 for the correspondent column of publications. First permutation of rows at characteristic matrix  Advantage:  Reduce the dimensions into a small minhash signature 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  13. 13. 13 LSH in Adaptive Diversification: Signature Matrix Fast-minhashing Select m number of random hash functions To model the effect of m number of random permutation Mathematically proved only when, The number of rows is a prime. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  14. 14. 14 LSH in Adaptive Diversification: LSH Buckets  Take r sized signature vectors  From m sized minhash- signature  Map them into,  L Hash-Tables  Each with arbitrary b number of buckets 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  15. 15. 15 LSH in Adaptive Diversification: Batch-wise Top-k computation  Bucket “Winner” – a publication which has the highest relevancy score  Winner is dominant to represent it's bucket neighborhood  Top-k "winners“ that have a majority of votes  k winners are independent 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  16. 16. 16 LSH in Dynamic Diversification: Incremental Top-k computation 𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟 Characteristic Matrix 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒 Signature Matrix Map 𝑖 𝑡ℎ signature into L hash-tables Update “Winner” at bucket 𝑖 𝑡ℎ signature maps into Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  17. 17. 17 LSH in Dynamic Diversification: When new publication F arrives…  Only buckets 𝐵13 , 𝐵23 , 𝐵32 , 𝐵43 will vote  Follow continuity requirements  Durability  Order 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window (i+1)th window  1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  18. 18. 18 LSH in Adaptive Diversification: Analysis For two vectors x,y 𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ; 𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 = 𝑥 ∩ 𝑦 𝑥 ∪ 𝑦  For publications x & y 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦  At a particular hash table  x & y map into the same bucket: 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  x & y does not map into the same bucket: 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  At L Hash-tables  x & y does not map into the same bucket: (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 ) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿 True near neighbors will be unlikely to be unlucky in all the projections 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  19. 19. Publication Stream  Zipfian subscriptions  Normalized preferences 19 Evaluation: Dataset Amazon on-line market place data available at 17th – 19th November 2014 𝑧𝑖𝑝𝑓 𝑘: 𝑠, 𝑁 = 1 𝑘 𝑠 𝑛=1 𝑁 ( 1 𝑛 𝑠) N - number of elements in distribution, k - rank of element s - value of exponent 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑐𝑟𝑖𝑏𝑒𝑟 𝑣𝑖𝑒𝑤𝑠 = 𝑖=2 32 48 𝑐 𝑖 + 42 𝑐 𝑖 + 54 𝑐 𝑖 + 66 𝑐 𝑖 + 57 𝑐 𝑖 + 67 𝑐 𝑖
  20. 20. 20 Terminology ILSH, BLSH and NAÏVE 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 . . BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE ILSH 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  21. 21. 21 Accuracy: ILSH vs. NAÏVE Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  22. 22. 22 Performance & Efficiency: ILSH vs. BLSH vs. NAÏVE log (Top-k matching time) on number of publications with D=500 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  23. 23. 23 Conclusions  Locality Sensitive Hashing (LSH) indexing method  Produce diverse set of results at average 70% accuracy over naïve method  Reduce the matching time very significantly over NAÏVE method  Further, refine by it’s incremental version  For handling streaming publications  Avoid the curse of re-computing neighborhoods  Top k to restrict the delivery of Top publications  Given a window size & delivery method  Model can produce best diverse set of personalized results  To represent the set of all matching publications at given instance 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  24. 24. 24 Future work  Explore other suitable use-cases to apply proposed model & develop prototype applications, E.g.  Personalized newspaper for every Facebook user  Adaptive resource scheduling in large scale distributed system  Exploit overlap among diversified results of users who have similar interest  Develop LSH based index over multi-threaded distributed environment 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  25. 25. 25 Q&A THANK YOU!

×