• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mining the social web ch3
 

Mining the social web ch3

on

  • 308 views

 

Statistics

Views

Total Views
308
Views on SlideShare
308
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Mining the social web ch3 Mining the social web ch3 Presentation Transcript

    • Mining The Social Web NAVER 아키텍트를 꿈꾸는 사람들 발표 : 김연기
    • Mail Boxes누가 메일을 보내나?답장을 받는 시간대가 있나?누가 자주 메일을 보내나?요즘 핫이슈는??
    • MboxFrom santa@northpole.example.org Fri Dec 25 >00:06:42 2009 > Please proceed per the norm.Message-ID: ><16159836.1075855377439@mail.northpole.exampl > Regards,e.org> > BuddyReferences: ><88364590.8837464573838@mail.northpole.exampl > --e.org> > Buddy the ElfIn-Reply-To: > Chief Elf<194756537.0293874783209@mail.northpole.exam > Workshop Operationsple.org> > North PoleDate: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.orgFrom: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri DecTo: rudolph@northpole.example.org 25 00:03:34 2009Subject: RE: FWD: Tonight Message-ID:Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.examplContent-Type: text/plain; charset=us-ascii e.org>Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)Sounds good. See you at the usual location. From: Buddy <buddy.the.elf@northpole.example.org>Thanks, To: workshop@northpole.example.org-S Subject: Tonight-----Original Message----- Mime-Version: 1.0From: Rudolph Content-Type: text/plain; charset=us-asciiSent: Friday, December 25, 2009 12:04 AM Content-Transfer-Encoding: 7bitTo: Claus, Santa Last batch of toys was just loaded onto sleigh.Subject: FWD: Tonight Please proceed per the norm.Santa - Regards,Running a bit late. Will come grab you shortly. BuddyStandby. --Rudy Buddy the ElfBegin forwarded message: Chief Elf> Last batch of toys was just loaded onto sleigh. Workshop Operations North Pole buddy.the.elf@northpole.example.org
    • MboxFrom santa@northpole.example.org Fri Dec 25 >00:06:42 2009 > Please proceed per the norm.Message-ID: ><16159836.1075855377439@mail.northpole.exampl > Regards,e.org> > BuddyReferences: ><88364590.8837464573838@mail.northpole.exampl > --e.org> > Buddy the ElfIn-Reply-To: > Chief Elf<194756537.0293874783209@mail.northpole.exam > Workshop Operationsple.org> > North PoleDate: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.orgFrom: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri DecTo: rudolph@northpole.example.org 25 00:03:34 2009Subject: RE: FWD: Tonight Message-ID:Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.examplContent-Type: text/plain; charset=us-ascii e.org>Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)Sounds good. See you at the usual location. From: BuddyThanks, <buddy.the.elf@northpole.example.org>-S To: workshop@northpole.example.org-----Original Message----- Subject: TonightFrom: Rudolph Mime-Version: 1.0Sent: Friday, December 25, 2009 12:04 AM Content-Type: text/plain; charset=us-asciiTo: Claus, Santa Content-Transfer-Encoding: 7bitSubject: FWD: Tonight Last batch of toys was just loaded onto sleigh.Santa - Please proceed per the norm.Running a bit late. Will come grab you shortly. Regards,Standby. BuddyRudy --Begin forwarded message: Buddy the Elf> Last batch of toys was just loaded onto sleigh. Chief Elf Workshop Operations North Pole buddy.the.elf@northpole.example.org
    • MboxFrom santa@northpole.example.org Fri Dec 25 >00:06:42 2009 > Please proceed per the norm.Message-ID: ><16159836.1075855377439@mail.northpole.exampl > Regards,e.org> > BuddyReferences: ><88364590.8837464573838@mail.northpole.exampl > --e.org> > Buddy the ElfIn-Reply-To: > Chief Elf<194756537.0293874783209@mail.northpole.exam > Workshop Operationsple.org> > North PoleDate: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.orgFrom: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri DecTo: rudolph@northpole.example.org 25 00:03:34 2009Subject: RE: FWD: Tonight Message-ID:Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.examplContent-Type: text/plain; charset=us-ascii e.org>Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)Sounds good. See you at the usual location. From: BuddyThanks, <buddy.the.elf@northpole.example.org>-S To: workshop@northpole.example.org-----Original Message----- Subject: TonightFrom: Rudolph Mime-Version: 1.0Sent: Friday, December 25, 2009 12:04 AM Content-Type: text/plain; charset=us-asciiTo: Claus, Santa Content-Transfer-Encoding: 7bitSubject: FWD: Tonight Last batch of toys was just loaded onto sleigh.Santa - Please proceed per the norm.Running a bit late. Will come grab you shortly. Regards,Standby. BuddyRudy --Begin forwarded message: Buddy the Elf> Last batch of toys was just loaded onto sleigh. Chief Elf Workshop Operations North Pole buddy.the.elf@northpole.example.org
    • Mbox{"From": "St. Nick <santa@northpole.example.org>","Content-Transfer-Encoding": "7bit","To": ["rudolph@northpole.example.org"],"parts": [{"content": "Sounds good. See you at the usual location.nnThanks,...","contentType": "text/plain"}],"References": "<88364590.8837464573838@mail.northpole.example.org>","Mime-Version": "1.0","In-Reply-To": "<194756537.0293874783209@mail.northpole.example.org>","Date": "Fri, 25 Dec 2001 00:06:42 -0000 (GMT)","Message-ID": "<16159836.1075855377439@mail.northpole.example.org>","Content-Type": "text/plain; charset=us-ascii","Subject": "RE: FWD: Tonight"},{"From": "Buddy <buddy.the.elf@northpole.example.org>","Content-Transfer-Encoding": "7bit","To": ["workshop@northpole.example.org"],"parts": [{"content": "Last batch of toys was just loaded onto sleigh. nn...","contentType": "text/plain"}],"Mime-Version": "1.0","Date": "Fri, 25 Dec 2001 00:03:34 -0000 (GMT)","Message-ID": "<88364590.8837464573838@mail.northpole.example.org>","Content-Type": "text/plain; charset=us-ascii","Subject": "Tonight"}]
    • Mbox + couchDBDB 에 저장하여 통계를낼수있다.Json API를 제공
    • couchDB문서 기반 DB ServerJson API를 제공ViewsSchema-Free
    • couchDBInstall couchdb on centOS yum install couchdb /etc/init.d/couchdb start
    • couchDB -+ PythonInstall Couchdb Kit (On CentOS) curl -O http://peak.telecommunity.com/dist/ez_se tup.py http://pypi.python.org/pypi/setuptools#r pm-based-systems $ sudo python ez_setup.py -U setuptoolsPython – Couchdb API http://packages.python.org/CouchDB
    • couchDB -+ Python{# -*- coding: utf-8 -*-import sysimport osimport couchdbtry:import jsonlib2 as jsonexcept ImportError:import jsonJSON_MBOX = sys.argv[1] # i.e. enron.mbox.jsonDB = os.path.basename(JSON_MBOX).split(.)[0]server = couchdbkit.Server(http://localhost:5984)db = server.create(DB)docs = json.loads(open(JSON_MBOX).read())db.update(docs, all_or_nothing=True)
    • couchDB - Viewsdef dateTimeToDocMapper(doc):# Note that you need to include imports used by your mapper# inside the function definitionfrom dateutil.parser import parsefrom datetime import datetime as dtif doc.get(Date):# [year, month, day, hour, min, sec]_date = list(dt.timetuple(parse(doc[Date]))[:-3])yield (_date, doc)# Specify an index to back the query. Note that the index wontbe# created until the first time the query is runview = ViewDefinition(index, by_date_time,dateTimeToDocMapper,language=python)view.sync(db)
    • couchDB – Map/Reducedef dateTimeCountMapper(doc):from dateutil.parser import parsefrom datetime import datetime as dtif doc.get(Date):_date = list(dt.timetuple(parse(doc[Date]))[:-3])yield (_date, 1)def summingReducer(keys, values, rereduce):return sum(values)view = ViewDefinition(index, doc_count_by_date_time,dateTimeCountMapper,reduce_fun=summingReducer, language=python)view.sync(db)
    • couchDB – LuceneJAVA 기반의 검색 엔진Library
    • Look Who’s Talking 검색어에 해당하는 메시지 ID를couchdb-lucene 에 질의. 메시지 ID가 있는 모든 메일을찾는다. 메일중에서 메시지가 있는 메일의 유니크한 메일 주소를 찾아 낸다.
    • Look Who’s Talking
    • Look Who’s Talking
    • Look Who’s Talking
    • Look Who’s Talking
    • Look Who’s Talking
    • Analyzing Mail DataGetmailPoplibImaplibGraph Your Inbox Google Chrome Extension