Mining the social web ch3

324 views
294 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
324
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining the social web ch3

  1. 1. Mining The Social Web NAVER 아키텍트를 꿈꾸는 사람들 발표 : 김연기
  2. 2. Mail Boxes누가 메일을 보내나?답장을 받는 시간대가 있나?누가 자주 메일을 보내나?요즘 핫이슈는??
  3. 3. MboxFrom santa@northpole.example.org Fri Dec 25 >00:06:42 2009 > Please proceed per the norm.Message-ID: ><16159836.1075855377439@mail.northpole.exampl > Regards,e.org> > BuddyReferences: ><88364590.8837464573838@mail.northpole.exampl > --e.org> > Buddy the ElfIn-Reply-To: > Chief Elf<194756537.0293874783209@mail.northpole.exam > Workshop Operationsple.org> > North PoleDate: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.orgFrom: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri DecTo: rudolph@northpole.example.org 25 00:03:34 2009Subject: RE: FWD: Tonight Message-ID:Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.examplContent-Type: text/plain; charset=us-ascii e.org>Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)Sounds good. See you at the usual location. From: Buddy <buddy.the.elf@northpole.example.org>Thanks, To: workshop@northpole.example.org-S Subject: Tonight-----Original Message----- Mime-Version: 1.0From: Rudolph Content-Type: text/plain; charset=us-asciiSent: Friday, December 25, 2009 12:04 AM Content-Transfer-Encoding: 7bitTo: Claus, Santa Last batch of toys was just loaded onto sleigh.Subject: FWD: Tonight Please proceed per the norm.Santa - Regards,Running a bit late. Will come grab you shortly. BuddyStandby. --Rudy Buddy the ElfBegin forwarded message: Chief Elf> Last batch of toys was just loaded onto sleigh. Workshop Operations North Pole buddy.the.elf@northpole.example.org
  4. 4. MboxFrom santa@northpole.example.org Fri Dec 25 >00:06:42 2009 > Please proceed per the norm.Message-ID: ><16159836.1075855377439@mail.northpole.exampl > Regards,e.org> > BuddyReferences: ><88364590.8837464573838@mail.northpole.exampl > --e.org> > Buddy the ElfIn-Reply-To: > Chief Elf<194756537.0293874783209@mail.northpole.exam > Workshop Operationsple.org> > North PoleDate: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.orgFrom: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri DecTo: rudolph@northpole.example.org 25 00:03:34 2009Subject: RE: FWD: Tonight Message-ID:Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.examplContent-Type: text/plain; charset=us-ascii e.org>Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)Sounds good. See you at the usual location. From: BuddyThanks, <buddy.the.elf@northpole.example.org>-S To: workshop@northpole.example.org-----Original Message----- Subject: TonightFrom: Rudolph Mime-Version: 1.0Sent: Friday, December 25, 2009 12:04 AM Content-Type: text/plain; charset=us-asciiTo: Claus, Santa Content-Transfer-Encoding: 7bitSubject: FWD: Tonight Last batch of toys was just loaded onto sleigh.Santa - Please proceed per the norm.Running a bit late. Will come grab you shortly. Regards,Standby. BuddyRudy --Begin forwarded message: Buddy the Elf> Last batch of toys was just loaded onto sleigh. Chief Elf Workshop Operations North Pole buddy.the.elf@northpole.example.org
  5. 5. MboxFrom santa@northpole.example.org Fri Dec 25 >00:06:42 2009 > Please proceed per the norm.Message-ID: ><16159836.1075855377439@mail.northpole.exampl > Regards,e.org> > BuddyReferences: ><88364590.8837464573838@mail.northpole.exampl > --e.org> > Buddy the ElfIn-Reply-To: > Chief Elf<194756537.0293874783209@mail.northpole.exam > Workshop Operationsple.org> > North PoleDate: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.orgFrom: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri DecTo: rudolph@northpole.example.org 25 00:03:34 2009Subject: RE: FWD: Tonight Message-ID:Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.examplContent-Type: text/plain; charset=us-ascii e.org>Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)Sounds good. See you at the usual location. From: BuddyThanks, <buddy.the.elf@northpole.example.org>-S To: workshop@northpole.example.org-----Original Message----- Subject: TonightFrom: Rudolph Mime-Version: 1.0Sent: Friday, December 25, 2009 12:04 AM Content-Type: text/plain; charset=us-asciiTo: Claus, Santa Content-Transfer-Encoding: 7bitSubject: FWD: Tonight Last batch of toys was just loaded onto sleigh.Santa - Please proceed per the norm.Running a bit late. Will come grab you shortly. Regards,Standby. BuddyRudy --Begin forwarded message: Buddy the Elf> Last batch of toys was just loaded onto sleigh. Chief Elf Workshop Operations North Pole buddy.the.elf@northpole.example.org
  6. 6. Mbox{"From": "St. Nick <santa@northpole.example.org>","Content-Transfer-Encoding": "7bit","To": ["rudolph@northpole.example.org"],"parts": [{"content": "Sounds good. See you at the usual location.nnThanks,...","contentType": "text/plain"}],"References": "<88364590.8837464573838@mail.northpole.example.org>","Mime-Version": "1.0","In-Reply-To": "<194756537.0293874783209@mail.northpole.example.org>","Date": "Fri, 25 Dec 2001 00:06:42 -0000 (GMT)","Message-ID": "<16159836.1075855377439@mail.northpole.example.org>","Content-Type": "text/plain; charset=us-ascii","Subject": "RE: FWD: Tonight"},{"From": "Buddy <buddy.the.elf@northpole.example.org>","Content-Transfer-Encoding": "7bit","To": ["workshop@northpole.example.org"],"parts": [{"content": "Last batch of toys was just loaded onto sleigh. nn...","contentType": "text/plain"}],"Mime-Version": "1.0","Date": "Fri, 25 Dec 2001 00:03:34 -0000 (GMT)","Message-ID": "<88364590.8837464573838@mail.northpole.example.org>","Content-Type": "text/plain; charset=us-ascii","Subject": "Tonight"}]
  7. 7. Mbox + couchDBDB 에 저장하여 통계를낼수있다.Json API를 제공
  8. 8. couchDB문서 기반 DB ServerJson API를 제공ViewsSchema-Free
  9. 9. couchDBInstall couchdb on centOS yum install couchdb /etc/init.d/couchdb start
  10. 10. couchDB -+ PythonInstall Couchdb Kit (On CentOS) curl -O http://peak.telecommunity.com/dist/ez_se tup.py http://pypi.python.org/pypi/setuptools#r pm-based-systems $ sudo python ez_setup.py -U setuptoolsPython – Couchdb API http://packages.python.org/CouchDB
  11. 11. couchDB -+ Python{# -*- coding: utf-8 -*-import sysimport osimport couchdbtry:import jsonlib2 as jsonexcept ImportError:import jsonJSON_MBOX = sys.argv[1] # i.e. enron.mbox.jsonDB = os.path.basename(JSON_MBOX).split(.)[0]server = couchdbkit.Server(http://localhost:5984)db = server.create(DB)docs = json.loads(open(JSON_MBOX).read())db.update(docs, all_or_nothing=True)
  12. 12. couchDB - Viewsdef dateTimeToDocMapper(doc):# Note that you need to include imports used by your mapper# inside the function definitionfrom dateutil.parser import parsefrom datetime import datetime as dtif doc.get(Date):# [year, month, day, hour, min, sec]_date = list(dt.timetuple(parse(doc[Date]))[:-3])yield (_date, doc)# Specify an index to back the query. Note that the index wontbe# created until the first time the query is runview = ViewDefinition(index, by_date_time,dateTimeToDocMapper,language=python)view.sync(db)
  13. 13. couchDB – Map/Reducedef dateTimeCountMapper(doc):from dateutil.parser import parsefrom datetime import datetime as dtif doc.get(Date):_date = list(dt.timetuple(parse(doc[Date]))[:-3])yield (_date, 1)def summingReducer(keys, values, rereduce):return sum(values)view = ViewDefinition(index, doc_count_by_date_time,dateTimeCountMapper,reduce_fun=summingReducer, language=python)view.sync(db)
  14. 14. couchDB – LuceneJAVA 기반의 검색 엔진Library
  15. 15. Look Who’s Talking 검색어에 해당하는 메시지 ID를couchdb-lucene 에 질의. 메시지 ID가 있는 모든 메일을찾는다. 메일중에서 메시지가 있는 메일의 유니크한 메일 주소를 찾아 낸다.
  16. 16. Look Who’s Talking
  17. 17. Look Who’s Talking
  18. 18. Look Who’s Talking
  19. 19. Look Who’s Talking
  20. 20. Look Who’s Talking
  21. 21. Analyzing Mail DataGetmailPoplibImaplibGraph Your Inbox Google Chrome Extension

×