Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
검색엔진의패러다임전환 
-빅데이터분석과검색의융합- 
고려대학교정보대학컴퓨터학과 
강재우
연구배경 
사용자의정보욕구변화 
참여, 공유, 개방의Web 2.0 시대도래 사용자중심의정보생산/소비구조로의변화 
웹및SNS상에개인의의견/주관적정보의양폭증 
“분당상견례하기좋은한식집”, “반전이좋은스릴러“, “유...
3 
Aardvark: Large-Scale Social Search Engine 
(Horowitz and Kamvar, WWW2010) 
“64% of queries contain subjective element ...
검색엔진VS. 컨센서스엔진 
기존문서기반검색엔진의한계 
객관적정보(e.g., ‘액션영화’또는‘핸드백가격‘)는현재의검색엔진에서검색가능하나주관적질의(‘재미있는액션영화’, ’요즘유행하는핸드백‘) 에는적절한대응불가능 
문...
5 
•낮은가격순 
•높은가격순 
•등록일순 
•상품평많은순 
의단순한상품정렬 
단순나열되는사용자리뷰 
•내용파악이힘들며 
•정보의종합이어려움 
복잡한옵션선택 
TV의인치와가격외에유용한정보가없는결과리스트
6 
구매후기|2013.04.12 
고가의전자제품을인터넷구매라많이망설였습니다.설치된후제품을보니너무만족합니다. 화면크고잘나오고저렴하게구입잘한것같아서기분이좋습니다. 
LG전자 
47LM6200 
가격대비막강한성능을가진TV입...
Consensus Search 
최근사용자들은구매활동이나문화생활과관련된의사결정을위해인터넷검색을활발히활용 
공연관람이나, 상품구매를위해타사용자들의리뷰, 후기를참조 
각리뷰는작성자의“주관적의견”을토대로작성 
가능한많...
Consensus Engine 
현재의검색엔진으로는충분하지않다! 
상위몇개의문서에원하는정보가있을수는있다 
하지만각각의문서는각작성자의의견 
대중의consensus를대표할수없다 
하지만답은이미Web에존재! 
많은...
Uhm.. Yeah.. It is noisy, but… 
9 
Online Consumer Posts: 2ndmost trusted forms of advertising (The Nielson Company, Q3 20...
Is consensus search ever possible…? 
“Best Action Movies in 2013” 
Not immediately answerable with conventional search e...
CONSENTO Overview 
11
CONSENTO Overview 
12
The Key Ideas (I) 
Subdocument-level Indexing 
Capture semantics from user opinion more precisely 
Indexing unit no lon...
The Key Ideas (II) 
ConsensusRank: A Unique Ranking Method based on Public Sentiment 
Virtually, all existing ranking me...
15 
(A)Indexing Subsystem 
Web 
Documents 
Parsing & 
Preprocessing 
DOM-tree Parsing 
Contents Extraction 
ContentsSegmen...
The current working prototype of CONSENTO is built on movie domain 
CONSENTO crawled review pages from popular movie rev...
Split the review contents into MCSUs 
e.g., “The storyline is ridiculous, the acting is laughable, and the camera work i...
II: Contents Segmentation
CONSENTOindexes MCSUs on a conventional inverted index that is used in most modern search engines. 
Only mapping needs t...
III: Indexing 
20 
Feature 2 
Feature 1 
excellent 
visual effects, 
but 
plot 
was 
hard to follow 
Entity Name 
Transfor...
III: Indexing 
21 
excellent 
visual effects, 
but 
plot 
was 
hard to follow 
Segment 2 
Segment 1 
SegmentID 
ObjectName...
III: Indexing 
Simply treating an MCSU as a document 
Store additional information in each posting for use in the rankin...
rid 
ts 
rq 
푟1 
푡푠1 
0.8 
푟2 
푡푠2 
0.4 
푟3 
푡푠3 
0.6 
푟4 
푡푠4 
0.9 
푟5 
푡푠5 
0.4 
푟6 
푡푠6 
0.5 
푟7 
푡푠7 
0.7 
푟8 
푡푠8 
0....
IV: Query Parsing 
CONSENTOpreprocesses the query and performs query expansion 
stop-word removal, 
polarity only-word ...
V: Retrieval 
Retrieve MCSU segments that match to the query terms 
Same as the conventional systems retrieve document p...
VI: Ranking 
Group MCSU postings by entity and aggregate the scores of the postings to compute the score of the correspon...
VI: Ranking
VI: Ranking
VI: Ranking 
29
VI: Ranking 
30
Movie data sets 
Source 
•Amazon , IMDB, Metacritic, Flixster, Rotten Tomatoes and Yahoo Movies 
Period 
•2008 ~ 2010 
...
Experiment 
Methods 
Ganesanand Zhai’sOE and QAM methods 
•Opinion expansion word 
•Query aspect model 
Baseline 
1) B...
Experimental Result -Movie
Experimental Result -Hotel
Hawaii 
Cebu 
Gold Coast
Honeymoon 
Snorkeling 
Hawaii! 
Honeymoon 
Whale Watching 
Snorkeling 
Whale watching 
Whale Watching 
Snorkeling 
Snorkel...
1. 웹및소셜네트워크상의다양한정보를 
사전에분석및인덱싱 
스릴러영화? 
반전있는 
스릴러 
영화? 
대학생백팩? 
믿을만한 
중고차딜러? 
믿을만한 
근처어린이집 
2. Ad-hoc 의사결정질의에대한실시간결과도출 
면접...
38 
best thriller with plot twist
The Artist vs. Jack and Jill 
39
40 
good pizza restaurant
Click!
42
CONSENTO Local 서비스예제 
43
CONSENTO Local 서비스예제 
44
‘Napk-In’ 서비스예제 
45
‘Napk-In’ 서비스예제 
46
‘슝’서비스예제 
47
잠재된컨센서스검색시장 
48 
사실검색 
컨센서스검색
ENGINEERINGKNOWLEDGE 
SEARCHINGWISDOM
CONSENTO
THANK YOU
Upcoming SlideShare
Loading in …5
×

[2B1]검색엔진의 패러다임 전환

6,776 views

Published on

DEVIEW 2014 [2B1]검색엔진의 패러다임 전환

Published in: Technology
  • Very nice tips on this. In case you need help on any kind of academic writing visit website ⇒ www.WritePaper.info ⇐ and place your order
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I'd advise you to use this service: ⇒ www.HelpWriting.net ⇐ The price of your order will depend on the deadline and type of paper (e.g. bachelor, undergraduate etc). The more time you have before the deadline - the less price of the order you will have. Thus, this service offers high-quality essays at the optimal price.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very nice tips on this. In case you need help on any kind of academic writing visit our website HelpWriting.net and place your order
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/2ZDZFYj ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[2B1]검색엔진의 패러다임 전환

  1. 1. 검색엔진의패러다임전환 -빅데이터분석과검색의융합- 고려대학교정보대학컴퓨터학과 강재우
  2. 2. 연구배경 사용자의정보욕구변화 참여, 공유, 개방의Web 2.0 시대도래 사용자중심의정보생산/소비구조로의변화 웹및SNS상에개인의의견/주관적정보의양폭증 “분당상견례하기좋은한식집”, “반전이좋은스릴러“, “유행하는핸드백” 등의주관적정보에대한정보요구증가 •사실검색(e.g., ‘action movie’) 수요는정체또는불규칙한반면, ‘best action movie’, ‘best SUV’와같은주관적질의는꾸준히증가 2 “action movie'와best action movie' 질의어에대한구글검색추세그래프 (Google Trends, http://www.google.com/trends/)
  3. 3. 3 Aardvark: Large-Scale Social Search Engine (Horowitz and Kamvar, WWW2010) “64% of queries contain subjective element in Aardvark” (e.g., “Do you know of any great delis in Baltimore, MD?” “What are the things/crafts/toys your children have made that made them really proud of themselves?”) 2010년google이$50,000,000 USD (한화530억) 에인수 사실검색VS. 컨센서스검색 컨센서스검색요구의증가
  4. 4. 검색엔진VS. 컨센서스엔진 기존문서기반검색엔진의한계 객관적정보(e.g., ‘액션영화’또는‘핸드백가격‘)는현재의검색엔진에서검색가능하나주관적질의(‘재미있는액션영화’, ’요즘유행하는핸드백‘) 에는적절한대응불가능 문서내에서기술의대상이되는객체를찾아내어이를색인의대상으로인식하고다양한문서에산재한사용자의의견을대상객체별로종합/분석하여랭킹하는새로운검색기술로의근본적인패러다임의전환요구 4
  5. 5. 5 •낮은가격순 •높은가격순 •등록일순 •상품평많은순 의단순한상품정렬 단순나열되는사용자리뷰 •내용파악이힘들며 •정보의종합이어려움 복잡한옵션선택 TV의인치와가격외에유용한정보가없는결과리스트
  6. 6. 6 구매후기|2013.04.12 고가의전자제품을인터넷구매라많이망설였습니다.설치된후제품을보니너무만족합니다. 화면크고잘나오고저렴하게구입잘한것같아서기분이좋습니다. LG전자 47LM6200 가격대비막강한성능을가진TV입니다.|2013.04.01 제품자체가보급형으로저렴한가격.인터넷, 3D 등의막강한기능을가졌고이곳저곳상품평읽어보니모두만족하는제품이라안심하고구매했습니다. 좋은제품합리적인가격에잘구매한것같습니다. 감사합니다. 탁월한선택... LG 스마트TV 47LM6200...|2012.09.10 특히리모콘의기능과3D안경은S사것보다활용도가아주편하고좋습니다. 3D안경도타사의밧데리로하는3D안경보다훨씬편하고특히안경쓴사람들에게편리한클립형은아이디어가돋보인다. 깔끔한화질및벽걸이설치Good. 제품수급에따른배송지연|2012.07.02 화질도깔끔히잘나오고, 무엇보다벽걸이형으로아주잘설치되어서만족합니다. 나쁘지않습니다.|2013.04.19 가격대비이정도면괜찮은듯싶습니다. 그러나마우스리모컨이은근계륵이네요. 스마트티비엔확실히필요하나감도가영불편하게되어있구요. 리모컨도초간단으로나오는데.. 너무간단해서조작하기영.. 리모컨시스템빼고는뭐나쁘지않습니다. Search 가격성능비가좋은TV 제품자체가보급형으로저렴한가격 LG 47LM 가격대비아주좋은선택이었네요. LG 47LM 가격대비성능비가매우우수한3D 스마트LED TV라고생각합니다. LG 47LM LG 47LM 화면크고잘나오고저렴하게구입잘한것같아서기분이좋습니다. 삼성UN50 무엇보다가격대배최고의제품이라말하고싶습니다 삼성UN50 아주좋은가격에사게되어만족합니다 삼성UN50 가격대비크기및화질좋습니다. 삼성UN50 정말최고의제품&서비스입니다.|2013.07.31 어제주문했는데이렇게빨리배송이올줄이야!!! 배송기사님도너무마음에들게설치해주시고. 무엇보다가격대배최고의제품이라말하고싶습니다. 모든것만족!! 착한가격에만족합니다.|2012.12.18 아주좋은가격에사게되어만족합니다. 삼성스마트TV로성능이나외관은기존에백화점에서보는것과별반다르지않고만족합니다. 현재약2주정도사용중인데기능이나외관모두만족입니다 가격대비최고의가치있는모델|2013.03.21 저녁에주문했는데다음날아침에배송!!!벽걸이로샀는데크기도크고영화보기에는아주좋을것같습니다. 화질도좋고, 크기도좋고, 배송도번개배송!! 저렴하게구입 가격대배최고 저렴한가격 가격대비성능비가매우우수 가격대비크기및화질좋습니다 아주좋은가격 가격대비이정도면괜찮 가격대비아주좋은선택 0.5 0.8 0.9 0.7 0.5 0.8 0.7 0.6 Query Term과매칭된Aspect Segment Score 삼성합계: 2.9 LG합계: 2.6 최종검색순위 1. 삼성UN50ES6800F 2. LG 47LM6200 Click! 삼성전자 UN50ES6800F
  7. 7. Consensus Search 최근사용자들은구매활동이나문화생활과관련된의사결정을위해인터넷검색을활발히활용 공연관람이나, 상품구매를위해타사용자들의리뷰, 후기를참조 각리뷰는작성자의“주관적의견”을토대로작성 가능한많은리뷰를읽어야의사결정에도움 컨센서스엔진이란? 타사용자들이기작성해놓은수많은리뷰를사전에분석 사용자가원하는관점(질의)에서타사용자들의리뷰를분석, 종합해주는검색시스템 7
  8. 8. Consensus Engine 현재의검색엔진으로는충분하지않다! 상위몇개의문서에원하는정보가있을수는있다 하지만각각의문서는각작성자의의견 대중의consensus를대표할수없다 하지만답은이미Web에존재! 많은사용자들이각자의의견을여러형태(SNS, blog, review)로온라인상에게시 이러한온라인의견들을“잠재적투표”로인식 이미피력된온라인의견을검색시점에(query time)모아서분석하면컨센서스검색이가능 8
  9. 9. Uhm.. Yeah.. It is noisy, but… 9 Online Consumer Posts: 2ndmost trusted forms of advertising (The Nielson Company, Q3 2011)
  10. 10. Is consensus search ever possible…? “Best Action Movies in 2013” Not immediately answerable with conventional search engines Because the answer should be based on consensus, which cannot be found in one of “top-10” documents However, the answers are already on the Web Numerous implicit votes from people on the Web and Social Networks Only if we can process them …. … ONLINE! 10
  11. 11. CONSENTO Overview 11
  12. 12. CONSENTO Overview 12
  13. 13. The Key Ideas (I) Subdocument-level Indexing Capture semantics from user opinion more precisely Indexing unit no longer a page but; •a reviewwithin a page if more than one reviews exist on the page, •or a sentencewithin a review, •or even a clauseor phrasewithin a sentence discussing one aspect of the target entity Maximal Coherent Semantic Unit (MCSU) •a finest granule indexing unit used in CONSENTO indexing •maximal subsequence of words within a sentence, which carries single coherent semantics Indexing MCSUs instead of documents enables semantic analysis to be performed during indexing time •facilitating the online processing of consensus search in query time 13
  14. 14. The Key Ideas (II) ConsensusRank: A Unique Ranking Method based on Public Sentiment Virtually, all existing ranking methods rank target objects (either documents or entities) directly based on their relevance to the query terms Contrastingly, ConsensusRankranks the entities indirectly through aggregating the scores of referring segments (e.g., MCSUs) that match to the query context It can be viewed as a voting process where each reviewer casts a weighted vote on an entity with respect to a query by expressing positive or negative opinions about that entity 14
  15. 15. 15 (A)Indexing Subsystem Web Documents Parsing & Preprocessing DOM-tree Parsing Contents Extraction ContentsSegmentation Sentence Splitter MCSU Extraction Entity Search Index (B) Searching Subsystem Query Parsing Query Preprocessing & Expansion Retrieval Matching MCSU Retrieval Ranking Segment Grouping Score Aggregation Entity List User Query 1 2 3 4 5 6 ReviewContents ExpandedQuery MCSU Posting List MCSUs Indexing Inverted Entry Construction & Indexing CONSENTOArchitecture Indexing Subsystem Parsing & Preprocessing Contents Segmentation Indexing Searching Subsystem Query Parsing Retrieval Ranking
  16. 16. The current working prototype of CONSENTO is built on movie domain CONSENTO crawled review pages from popular movie review sites such as IMDB, Meta Critics etc. Review contents are extracted using DOM- tree parsing and XPATH queries Extracted information include: entity name (i.e., movie name) review text, date and time review quality (e.g., “20 out of 30 people found the review helpful”) I: Parsing & Preprocessing
  17. 17. Split the review contents into MCSUs e.g., “The storyline is ridiculous, the acting is laughable, and the camera work is terrible.” s1) “The storyline is ridiculous” s2) “the acting is laughable” s3) “the camera work is terrible” II: Contents Segmentation
  18. 18. II: Contents Segmentation
  19. 19. CONSENTOindexes MCSUs on a conventional inverted index that is used in most modern search engines. Only mapping needs to be redefined logically from (terms → documents) to (terms → MCSUs) III: Indexing
  20. 20. III: Indexing 20 Feature 2 Feature 1 excellent visual effects, but plot was hard to follow Entity Name Transformer 3 sentiment sentiment Document #1 Bag of words excellent effects, plot hard Doc#1 Term Doc excellent #1 hard #1 follow #1 plot #1 visual #1 effects #1 follow visual Traditional Inverted index Query: “excellent plot”. System return this document * Conventional Indexing Method Example
  21. 21. III: Indexing 21 excellent visual effects, but plot was hard to follow Segment 2 Segment 1 SegmentID ObjectName Feature Sentiment Segment1 Transformer 3 visual effects excellent Segment 2 Transformer 3 plot hard to follow Sub-document level indexing Term SegmentID ObjectName Feature Sentiment excellent SID1 Transformer 3 visual effects excellent visual SID1 Transformer 3 visual effects excellent effect SID1 Transformer 3 visual effects excellent plot SID2 Transformer 3 plot hard hard SID2 Transformer 3 plot hard follow SID2 Transformer 3 plot hard Query: “excellent plot”, doesn't match any segment * Subdocument-level Indexing Example
  22. 22. III: Indexing Simply treating an MCSU as a document Store additional information in each posting for use in the ranking stage MCSU posting structure
  23. 23. rid ts rq 푟1 푡푠1 0.8 푟2 푡푠2 0.4 푟3 푡푠3 0.6 푟4 푡푠4 0.9 푟5 푡푠5 0.4 푟6 푡푠6 0.5 푟7 푡푠7 0.7 푟8 푡푠8 0.6 푟9 푡푠9 0.8 Site Name Source ID IMDb 푤1 Flixster 푤2 Metacritic 푤3 Yahoo! 푤4 Feature id music 푎1 soundtrack 푎2 story 푎3 plot 푎4 performance 푎5 acting 푎6 Sentiword id great 푚1 excellent 푚2 superb 푚3 tragic 푚4 Entity id Titanic 푒1 Brokeback Mountain 푒2 Dark Knight 푒3 Avatar 푒4 Term Postings Cameron <푠19, 푒4, [−], [푚3], 푟7, 푤3> Pandora <푠16, 푒4, [푎2], [−], 푟6, 푤3>, <푠18, 푒4, [−], [−], 푟6, 푤3> tragic <푠7, 푒2, [푎3], [푚4], 푟3, 푤1> performance <푠5, 푒1, [푎6], [푚6], 푟2, 푤1>, <푠9, 푒2, [푎6], [푚3], 푟3, 푤1>, <푠11, 푒2, [푎6], [푚1], 푟4, 푤1>, <푠13, 푒3, [푎6], [−], 푟5, 푤2>, <푠15, 푒4, [푎6], [−], 푟5, 푤3>, <푠20, 푒3, [푎6], [−], 푟8, 푤4>, <푠21, 푒3,[푎6], [푚6], 푟9, 푤4> soundtrack <푠4, 푒1, [푎2],[−], 푟2, 푤1>, <푠10, 푒2, [푎2],[푚2], 푟4, 푤1>, <푠16, 푒4, [푎2],[−], 푟6, 푤2>, <푠22, 푒3, [푎2],[푚1], 푟9, 푤4> plot <푠14, 푒3, [푎4],[−], 푟5, 푤2> acting <푠13, 푒4, [푎6], [−], 푟9, 푤4>, music <푠2, 푒1, [푎1], [푚1], 푟1, 푤1>, <푠8, 푒2, [푎1], [푚1], 푟3, 푤1> Yeston <푠2, 푒1, [푎1], [−],푟1, 푤1>, story <푠1, 푒1, [푎3], [푚1],푟1, 푤1>, <푠7, 푒2, [푎3], [−],푟3, 푤1>, <푠12, 푒2, [푎3], [푚2],푟4, 푤1>, <푠17, 푒4, [푎3], [−],푟6, 푤3> (s7) beautiful tragic love story, //(s8)with great music.//(s9) superb performances in movies ever! (s10) The soundtrack is also excellent,// (s11)great performance, //(s12)excellent presentation of a love story… Brokeback Mountain 퐫ퟑ 퐫ퟒ The Dark Knight (s13) The performance by Heath Ledger was outstanding //(s14) and plot is amazing too… 퐫ퟓ The Dark Knight (s20) Joker shows phonemically awesome performance!… (s21) nice performance //(s22)and backed up with great soundtrack. //(s23)excellent casting! 퐫ퟖ 퐫ퟗ (s1) the greatest love stories of all //(s2)and beautiful music from Yeston. // (s3) Everything about this movie was excellent... (푠4) touching soundtrack, //(푠5) and perfect handling of the known tragedy with nice performance. //(푠6)This has the best love scene I have ever seen… Titanic 퐫ퟏ 퐫ퟐ (s15) Navilooks very real, good performance, //(s16) beautiful soundtrack that emphasize the vastness of the Pandora, //(s17)with love story.// (s18) The world of Pandora is stunning Avatar 퐫ퟔ 퐫ퟕ (s19) James Cameron deserves high praise for this creation… Review ID
  24. 24. IV: Query Parsing CONSENTOpreprocesses the query and performs query expansion stop-word removal, polarity only-word removal feature expansion stemming Polarity only-word removal "good action movie" and "greataction movie" should be treated as the same query Feature words expanded for better recall ‘plot’ → {plot, story} ‘music’ → {music, soundtrack}
  25. 25. V: Retrieval Retrieve MCSU segments that match to the query terms Same as the conventional systems retrieve document posting lists
  26. 26. VI: Ranking Group MCSU postings by entity and aggregate the scores of the postings to compute the score of the corresponding entity
  27. 27. VI: Ranking
  28. 28. VI: Ranking
  29. 29. VI: Ranking 29
  30. 30. VI: Ranking 30
  31. 31. Movie data sets Source •Amazon , IMDB, Metacritic, Flixster, Rotten Tomatoes and Yahoo Movies Period •2008 ~ 2010 More than 740 movies, and 30K reviews Hotel data sets hotel data set from Ganesanand Zhai reviews for the hotels in 10 major cities from TripAdvisor The authors provided us the corrected judgment set for our test Experimental Setup: Data Set
  32. 32. Experiment Methods Ganesanand Zhai’sOE and QAM methods •Opinion expansion word •Query aspect model Baseline 1) BM25 •b = 0.75 •k1 = 2 2) VSMBM (lucenedefault) •Vector space model + Boolean model 3) ConsensusRank
  33. 33. Experimental Result -Movie
  34. 34. Experimental Result -Hotel
  35. 35. Hawaii Cebu Gold Coast
  36. 36. Honeymoon Snorkeling Hawaii! Honeymoon Whale Watching Snorkeling Whale watching Whale Watching Snorkeling Snorkeling Active Volcano Honeymoon Honeymoon Whale Watching Snorkeling Honeymoon Whale Watching
  37. 37. 1. 웹및소셜네트워크상의다양한정보를 사전에분석및인덱싱 스릴러영화? 반전있는 스릴러 영화? 대학생백팩? 믿을만한 중고차딜러? 믿을만한 근처어린이집 2. Ad-hoc 의사결정질의에대한실시간결과도출 면접용 메이크업 미용실 학원근처 갈만한 스터디장소 강남상견례한식집 배낭여행숙소 우리동네PT 잘하는 트레이너?
  38. 38. 38 best thriller with plot twist
  39. 39. The Artist vs. Jack and Jill 39
  40. 40. 40 good pizza restaurant
  41. 41. Click!
  42. 42. 42
  43. 43. CONSENTO Local 서비스예제 43
  44. 44. CONSENTO Local 서비스예제 44
  45. 45. ‘Napk-In’ 서비스예제 45
  46. 46. ‘Napk-In’ 서비스예제 46
  47. 47. ‘슝’서비스예제 47
  48. 48. 잠재된컨센서스검색시장 48 사실검색 컨센서스검색
  49. 49. ENGINEERINGKNOWLEDGE SEARCHINGWISDOM
  50. 50. CONSENTO
  51. 51. THANK YOU

×