SlideShare a Scribd company logo
1 of 21
Download to read offline
Mining The Social Web



 NAVER 아키텍트를 꿈꾸는 사람들
        발표 : 김연기
Mail Boxes
누가 메일을 보내나?
답장을 받는 시간대가 있나?
누가 자주 메일을 보내나?
요즘 핫이슈는??
Mbox
From santa@northpole.example.org Fri Dec 25         >
00:06:42 2009                                       > Please proceed per the norm.
Message-ID:                                         >
<16159836.1075855377439@mail.northpole.exampl       > Regards,
e.org>                                              > Buddy
References:                                         >
<88364590.8837464573838@mail.northpole.exampl       > --
e.org>                                              > Buddy the Elf
In-Reply-To:                                        > Chief Elf
<194756537.0293874783209@mail.northpole.exam        > Workshop Operations
ple.org>                                            > North Pole
Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT)         > buddy.the.elf@northpole.example.org
From: St. Nick <santa@northpole.example.org>        From buddy.the.elf@northpole.example.org Fri Dec
To: rudolph@northpole.example.org                   25 00:03:34 2009
Subject: RE: FWD: Tonight                           Message-ID:
Mime-Version: 1.0                                   <88364590.8837464573838@mail.northpole.exampl
Content-Type: text/plain; charset=us-ascii          e.org>
Content-Transfer-Encoding: 7bit                     Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)
Sounds good. See you at the usual location.         From: Buddy
                                                    <buddy.the.elf@northpole.example.org>
Thanks,                                             To: workshop@northpole.example.org
-S                                                  Subject: Tonight
-----Original Message-----                          Mime-Version: 1.0
From: Rudolph                                       Content-Type: text/plain; charset=us-ascii
Sent: Friday, December 25, 2009 12:04 AM            Content-Transfer-Encoding: 7bit
To: Claus, Santa                                    Last batch of toys was just loaded onto sleigh.
Subject: FWD: Tonight                               Please proceed per the norm.
Santa -                                             Regards,
Running a bit late. Will come grab you shortly.     Buddy
Standby.                                            --
Rudy                                                Buddy the Elf
Begin forwarded message:                            Chief Elf
> Last batch of toys was just loaded onto sleigh.   Workshop Operations
                                                    North Pole
                                                    buddy.the.elf@northpole.example.org
Mbox
From santa@northpole.example.org Fri Dec 25         >
00:06:42 2009                                       > Please proceed per the norm.
Message-ID:                                         >
<16159836.1075855377439@mail.northpole.exampl       > Regards,
e.org>                                              > Buddy
References:                                         >
<88364590.8837464573838@mail.northpole.exampl       > --
e.org>                                              > Buddy the Elf
In-Reply-To:                                        > Chief Elf
<194756537.0293874783209@mail.northpole.exam        > Workshop Operations
ple.org>                                            > North Pole
Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT)         > buddy.the.elf@northpole.example.org
From: St. Nick <santa@northpole.example.org>        From buddy.the.elf@northpole.example.org Fri Dec
To: rudolph@northpole.example.org                   25 00:03:34 2009
Subject: RE: FWD: Tonight                           Message-ID:
Mime-Version: 1.0                                   <88364590.8837464573838@mail.northpole.exampl
Content-Type: text/plain; charset=us-ascii          e.org>
Content-Transfer-Encoding: 7bit                     Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)
Sounds good. See you at the usual location.         From: Buddy
Thanks,                                             <buddy.the.elf@northpole.example.org>
-S                                                  To: workshop@northpole.example.org
-----Original Message-----                          Subject: Tonight
From: Rudolph                                       Mime-Version: 1.0
Sent: Friday, December 25, 2009 12:04 AM            Content-Type: text/plain; charset=us-ascii
To: Claus, Santa                                    Content-Transfer-Encoding: 7bit
Subject: FWD: Tonight                               Last batch of toys was just loaded onto sleigh.
Santa -                                             Please proceed per the norm.
Running a bit late. Will come grab you shortly.     Regards,
Standby.                                            Buddy
Rudy                                                --
Begin forwarded message:                            Buddy the Elf
> Last batch of toys was just loaded onto sleigh.   Chief Elf
                                                    Workshop Operations
                                                    North Pole
                                                    buddy.the.elf@northpole.example.org
Mbox
From santa@northpole.example.org Fri Dec 25         >
00:06:42 2009                                       > Please proceed per the norm.
Message-ID:                                         >
<16159836.1075855377439@mail.northpole.exampl       > Regards,
e.org>                                              > Buddy
References:                                         >
<88364590.8837464573838@mail.northpole.exampl       > --
e.org>                                              > Buddy the Elf
In-Reply-To:                                        > Chief Elf
<194756537.0293874783209@mail.northpole.exam        > Workshop Operations
ple.org>                                            > North Pole
Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT)         > buddy.the.elf@northpole.example.org
From: St. Nick <santa@northpole.example.org>        From buddy.the.elf@northpole.example.org Fri Dec
To: rudolph@northpole.example.org                   25 00:03:34 2009
Subject: RE: FWD: Tonight                           Message-ID:
Mime-Version: 1.0                                   <88364590.8837464573838@mail.northpole.exampl
Content-Type: text/plain; charset=us-ascii          e.org>
Content-Transfer-Encoding: 7bit                     Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT)
Sounds good. See you at the usual location.         From: Buddy
Thanks,                                             <buddy.the.elf@northpole.example.org>
-S                                                  To: workshop@northpole.example.org
-----Original Message-----                          Subject: Tonight
From: Rudolph                                       Mime-Version: 1.0
Sent: Friday, December 25, 2009 12:04 AM            Content-Type: text/plain; charset=us-ascii
To: Claus, Santa                                    Content-Transfer-Encoding: 7bit
Subject: FWD: Tonight                               Last batch of toys was just loaded onto sleigh.
Santa -                                             Please proceed per the norm.
Running a bit late. Will come grab you shortly.     Regards,
Standby.                                            Buddy
Rudy                                                --
Begin forwarded message:                            Buddy the Elf
> Last batch of toys was just loaded onto sleigh.   Chief Elf
                                                    Workshop Operations
                                                    North Pole
                                                    buddy.the.elf@northpole.example.org
Mbox
{
"From": "St. Nick <santa@northpole.example.org>",
"Content-Transfer-Encoding": "7bit",
"To": [
"rudolph@northpole.example.org"
],
"parts": [
{
"content": "Sounds good. See you at the usual location.nnThanks,...",
"contentType": "text/plain"
}
],
"References": "<88364590.8837464573838@mail.northpole.example.org>",
"Mime-Version": "1.0",
"In-Reply-To": "<194756537.0293874783209@mail.northpole.example.org>",
"Date": "Fri, 25 Dec 2001 00:06:42 -0000 (GMT)",
"Message-ID": "<16159836.1075855377439@mail.northpole.example.org>",
"Content-Type": "text/plain; charset=us-ascii",
"Subject": "RE: FWD: Tonight"
},
{
"From": "Buddy <buddy.the.elf@northpole.example.org>",
"Content-Transfer-Encoding": "7bit",
"To": [
"workshop@northpole.example.org"
],
"parts": [
{
"content": "Last batch of toys was just loaded onto sleigh. nn...",
"contentType": "text/plain"
}
],
"Mime-Version": "1.0",
"Date": "Fri, 25 Dec 2001 00:03:34 -0000 (GMT)",
"Message-ID": "<88364590.8837464573838@mail.northpole.example.org>",
"Content-Type": "text/plain; charset=us-ascii",
"Subject": "Tonight"
}
]
Mbox + couchDB
DB 에 저장하여 통계를낼수
있다.
Json API를 제공
couchDB
문서 기반 DB Server
Json API를 제공
Views
Schema-Free
couchDB
Install couchdb on centOS
  yum install couchdb
  /etc/init.d/couchdb start
couchDB -+ Python
Install Couchdb Kit (On CentOS)
  curl -O
  http://peak.telecommunity.com/dist/ez_se
  tup.py
  http://pypi.python.org/pypi/setuptools#r
  pm-based-systems
  $ sudo python ez_setup.py -U setuptools

Python – Couchdb API
  http://packages.python.org/CouchDB
couchDB -+ Python
{# -*- coding: utf-8 -*-
import sys
import os
import couchdb
try:
import jsonlib2 as json
except ImportError:
import json
JSON_MBOX = sys.argv[1] # i.e. enron.mbox.json
DB = os.path.basename(JSON_MBOX).split('.')[0]
server = couchdbkit.Server('http://localhost:5984')
db = server.create(DB)
docs = json.loads(open(JSON_MBOX).read())
db.update(docs, all_or_nothing=True)
couchDB - Views
def dateTimeToDocMapper(doc):
# Note that you need to include imports used by your mapper
# inside the function definition
from dateutil.parser import parse
from datetime import datetime as dt
if doc.get('Date'):
# [year, month, day, hour, min, sec]
_date = list(dt.timetuple(parse(doc['Date']))[:-3])
yield (_date, doc)
# Specify an index to back the query. Note that the index won't
be
# created until the first time the query is run
view = ViewDefinition('index', 'by_date_time',
dateTimeToDocMapper,
language='python')
view.sync(db)
couchDB – Map/Reduce
def dateTimeCountMapper(doc):
from dateutil.parser import parse
from datetime import datetime as dt
if doc.get('Date'):
_date = list(dt.timetuple(parse(doc['Date']))[:-3])
yield (_date, 1)
def summingReducer(keys, values, rereduce):
return sum(values)
view = ViewDefinition('index', 'doc_count_by_date_time',
dateTimeCountMapper,
reduce_fun=summingReducer, language='python')
view.sync(db)
couchDB – Lucene
JAVA 기반의 검색 엔진
Library
Look Who’s Talking
 검색어에 해당하는 메시지 ID를
couchdb-lucene 에 질의.
 메시지 ID가 있는 모든 메일을
찾는다.
 메일중에서 메시지가 있는 메일
의 유니크한 메일 주소를 찾아 낸다.
Look Who’s Talking
Look Who’s Talking
Look Who’s Talking
Look Who’s Talking
Look Who’s Talking
Analyzing Mail Data
Getmail
Poplib
Imaplib
Graph Your Inbox
  Google Chrome Extension

More Related Content

Viewers also liked

Evaluation – question 3
Evaluation – question 3Evaluation – question 3
Evaluation – question 3JakeHafer
 
Raymond & Rachel Engagement Dinner
Raymond & Rachel Engagement DinnerRaymond & Rachel Engagement Dinner
Raymond & Rachel Engagement DinnerRiver Rock
 
V norte 1web
V norte 1webV norte 1web
V norte 1webAnam
 
Dad powerpoint2
Dad powerpoint2Dad powerpoint2
Dad powerpoint2michelirvo
 
The romans 3
The romans 3The romans 3
The romans 3FranJLte
 
Project in mapeh(bravo)
Project in mapeh(bravo)Project in mapeh(bravo)
Project in mapeh(bravo)Joyjoy Pena
 
Last Minute Holiday Fundraising Strategies
Last Minute Holiday Fundraising StrategiesLast Minute Holiday Fundraising Strategies
Last Minute Holiday Fundraising Strategiesgailperry
 
Sachin tuli
Sachin tuliSachin tuli
Sachin tulisknsz
 
Spotkanie z krzysztofem śliwińskim w ramach wiosennej szkoły
Spotkanie z krzysztofem śliwińskim w ramach wiosennej szkołySpotkanie z krzysztofem śliwińskim w ramach wiosennej szkoły
Spotkanie z krzysztofem śliwińskim w ramach wiosennej szkołysknsz
 
Beit mikdash ii& old city
Beit mikdash ii& old cityBeit mikdash ii& old city
Beit mikdash ii& old citymarlena1st
 
ฉันเหมือนใคร 8
ฉันเหมือนใคร 8ฉันเหมือนใคร 8
ฉันเหมือนใคร 8popkullatida
 
Application Software
Application SoftwareApplication Software
Application SoftwareBeth
 
Security Testing hands on Workshop Material
Security Testing hands on Workshop MaterialSecurity Testing hands on Workshop Material
Security Testing hands on Workshop MaterialvodQA
 
Kolom biostratigrafi
Kolom biostratigrafiKolom biostratigrafi
Kolom biostratigrafiReski Srem
 
ฉันเหมือนใคร 9
ฉันเหมือนใคร 9ฉันเหมือนใคร 9
ฉันเหมือนใคร 9popkullatida
 
Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00
Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00
Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00Paul G. Huppertz
 

Viewers also liked (18)

Evaluation – question 3
Evaluation – question 3Evaluation – question 3
Evaluation – question 3
 
Raymond & Rachel Engagement Dinner
Raymond & Rachel Engagement DinnerRaymond & Rachel Engagement Dinner
Raymond & Rachel Engagement Dinner
 
V norte 1web
V norte 1webV norte 1web
V norte 1web
 
Dad powerpoint2
Dad powerpoint2Dad powerpoint2
Dad powerpoint2
 
The romans 3
The romans 3The romans 3
The romans 3
 
Project in mapeh(bravo)
Project in mapeh(bravo)Project in mapeh(bravo)
Project in mapeh(bravo)
 
Last Minute Holiday Fundraising Strategies
Last Minute Holiday Fundraising StrategiesLast Minute Holiday Fundraising Strategies
Last Minute Holiday Fundraising Strategies
 
Mupe5 120312
Mupe5 120312Mupe5 120312
Mupe5 120312
 
Sachin tuli
Sachin tuliSachin tuli
Sachin tuli
 
Spotkanie z krzysztofem śliwińskim w ramach wiosennej szkoły
Spotkanie z krzysztofem śliwińskim w ramach wiosennej szkołySpotkanie z krzysztofem śliwińskim w ramach wiosennej szkoły
Spotkanie z krzysztofem śliwińskim w ramach wiosennej szkoły
 
Beit mikdash ii& old city
Beit mikdash ii& old cityBeit mikdash ii& old city
Beit mikdash ii& old city
 
ฉันเหมือนใคร 8
ฉันเหมือนใคร 8ฉันเหมือนใคร 8
ฉันเหมือนใคร 8
 
Re:new
Re:newRe:new
Re:new
 
Application Software
Application SoftwareApplication Software
Application Software
 
Security Testing hands on Workshop Material
Security Testing hands on Workshop MaterialSecurity Testing hands on Workshop Material
Security Testing hands on Workshop Material
 
Kolom biostratigrafi
Kolom biostratigrafiKolom biostratigrafi
Kolom biostratigrafi
 
ฉันเหมือนใคร 9
ฉันเหมือนใคร 9ฉันเหมือนใคร 9
ฉันเหมือนใคร 9
 
Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00
Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00
Keynote 'Mr. Service - Composer & Conductor of Service Providing' V01.02.00
 

More from scor7910

대규모 서비스를 지탱하는기술 Ch14
대규모 서비스를 지탱하는기술 Ch14대규모 서비스를 지탱하는기술 Ch14
대규모 서비스를 지탱하는기술 Ch14scor7910
 
Head first statistics ch15
Head first statistics ch15Head first statistics ch15
Head first statistics ch15scor7910
 
Head first statistics ch.11
Head first statistics ch.11Head first statistics ch.11
Head first statistics ch.11scor7910
 
어플 개발자의 서버개발 삽질기
어플 개발자의 서버개발 삽질기어플 개발자의 서버개발 삽질기
어플 개발자의 서버개발 삽질기scor7910
 
Mining the social web ch8 - 1
Mining the social web ch8 - 1Mining the social web ch8 - 1
Mining the social web ch8 - 1scor7910
 
Software pattern
Software patternSoftware pattern
Software patternscor7910
 
Google app engine
Google app engineGoogle app engine
Google app enginescor7910
 
Half sync/Half Async
Half sync/Half AsyncHalf sync/Half Async
Half sync/Half Asyncscor7910
 
Cpp 0x kimRyungee
Cpp 0x kimRyungeeCpp 0x kimRyungee
Cpp 0x kimRyungeescor7910
 
Component configurator
Component configuratorComponent configurator
Component configuratorscor7910
 
Proxy pattern
Proxy patternProxy pattern
Proxy patternscor7910
 
Reflection
ReflectionReflection
Reflectionscor7910
 

More from scor7910 (12)

대규모 서비스를 지탱하는기술 Ch14
대규모 서비스를 지탱하는기술 Ch14대규모 서비스를 지탱하는기술 Ch14
대규모 서비스를 지탱하는기술 Ch14
 
Head first statistics ch15
Head first statistics ch15Head first statistics ch15
Head first statistics ch15
 
Head first statistics ch.11
Head first statistics ch.11Head first statistics ch.11
Head first statistics ch.11
 
어플 개발자의 서버개발 삽질기
어플 개발자의 서버개발 삽질기어플 개발자의 서버개발 삽질기
어플 개발자의 서버개발 삽질기
 
Mining the social web ch8 - 1
Mining the social web ch8 - 1Mining the social web ch8 - 1
Mining the social web ch8 - 1
 
Software pattern
Software patternSoftware pattern
Software pattern
 
Google app engine
Google app engineGoogle app engine
Google app engine
 
Half sync/Half Async
Half sync/Half AsyncHalf sync/Half Async
Half sync/Half Async
 
Cpp 0x kimRyungee
Cpp 0x kimRyungeeCpp 0x kimRyungee
Cpp 0x kimRyungee
 
Component configurator
Component configuratorComponent configurator
Component configurator
 
Proxy pattern
Proxy patternProxy pattern
Proxy pattern
 
Reflection
ReflectionReflection
Reflection
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Mining the social web ch3

  • 1. Mining The Social Web NAVER 아키텍트를 꿈꾸는 사람들 발표 : 김연기
  • 2. Mail Boxes 누가 메일을 보내나? 답장을 받는 시간대가 있나? 누가 자주 메일을 보내나? 요즘 핫이슈는??
  • 3. Mbox From santa@northpole.example.org Fri Dec 25 > 00:06:42 2009 > Please proceed per the norm. Message-ID: > <16159836.1075855377439@mail.northpole.exampl > Regards, e.org> > Buddy References: > <88364590.8837464573838@mail.northpole.exampl > -- e.org> > Buddy the Elf In-Reply-To: > Chief Elf <194756537.0293874783209@mail.northpole.exam > Workshop Operations ple.org> > North Pole Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.org From: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri Dec To: rudolph@northpole.example.org 25 00:03:34 2009 Subject: RE: FWD: Tonight Message-ID: Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.exampl Content-Type: text/plain; charset=us-ascii e.org> Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) Sounds good. See you at the usual location. From: Buddy <buddy.the.elf@northpole.example.org> Thanks, To: workshop@northpole.example.org -S Subject: Tonight -----Original Message----- Mime-Version: 1.0 From: Rudolph Content-Type: text/plain; charset=us-ascii Sent: Friday, December 25, 2009 12:04 AM Content-Transfer-Encoding: 7bit To: Claus, Santa Last batch of toys was just loaded onto sleigh. Subject: FWD: Tonight Please proceed per the norm. Santa - Regards, Running a bit late. Will come grab you shortly. Buddy Standby. -- Rudy Buddy the Elf Begin forwarded message: Chief Elf > Last batch of toys was just loaded onto sleigh. Workshop Operations North Pole buddy.the.elf@northpole.example.org
  • 4. Mbox From santa@northpole.example.org Fri Dec 25 > 00:06:42 2009 > Please proceed per the norm. Message-ID: > <16159836.1075855377439@mail.northpole.exampl > Regards, e.org> > Buddy References: > <88364590.8837464573838@mail.northpole.exampl > -- e.org> > Buddy the Elf In-Reply-To: > Chief Elf <194756537.0293874783209@mail.northpole.exam > Workshop Operations ple.org> > North Pole Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.org From: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri Dec To: rudolph@northpole.example.org 25 00:03:34 2009 Subject: RE: FWD: Tonight Message-ID: Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.exampl Content-Type: text/plain; charset=us-ascii e.org> Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) Sounds good. See you at the usual location. From: Buddy Thanks, <buddy.the.elf@northpole.example.org> -S To: workshop@northpole.example.org -----Original Message----- Subject: Tonight From: Rudolph Mime-Version: 1.0 Sent: Friday, December 25, 2009 12:04 AM Content-Type: text/plain; charset=us-ascii To: Claus, Santa Content-Transfer-Encoding: 7bit Subject: FWD: Tonight Last batch of toys was just loaded onto sleigh. Santa - Please proceed per the norm. Running a bit late. Will come grab you shortly. Regards, Standby. Buddy Rudy -- Begin forwarded message: Buddy the Elf > Last batch of toys was just loaded onto sleigh. Chief Elf Workshop Operations North Pole buddy.the.elf@northpole.example.org
  • 5. Mbox From santa@northpole.example.org Fri Dec 25 > 00:06:42 2009 > Please proceed per the norm. Message-ID: > <16159836.1075855377439@mail.northpole.exampl > Regards, e.org> > Buddy References: > <88364590.8837464573838@mail.northpole.exampl > -- e.org> > Buddy the Elf In-Reply-To: > Chief Elf <194756537.0293874783209@mail.northpole.exam > Workshop Operations ple.org> > North Pole Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) > buddy.the.elf@northpole.example.org From: St. Nick <santa@northpole.example.org> From buddy.the.elf@northpole.example.org Fri Dec To: rudolph@northpole.example.org 25 00:03:34 2009 Subject: RE: FWD: Tonight Message-ID: Mime-Version: 1.0 <88364590.8837464573838@mail.northpole.exampl Content-Type: text/plain; charset=us-ascii e.org> Content-Transfer-Encoding: 7bit Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) Sounds good. See you at the usual location. From: Buddy Thanks, <buddy.the.elf@northpole.example.org> -S To: workshop@northpole.example.org -----Original Message----- Subject: Tonight From: Rudolph Mime-Version: 1.0 Sent: Friday, December 25, 2009 12:04 AM Content-Type: text/plain; charset=us-ascii To: Claus, Santa Content-Transfer-Encoding: 7bit Subject: FWD: Tonight Last batch of toys was just loaded onto sleigh. Santa - Please proceed per the norm. Running a bit late. Will come grab you shortly. Regards, Standby. Buddy Rudy -- Begin forwarded message: Buddy the Elf > Last batch of toys was just loaded onto sleigh. Chief Elf Workshop Operations North Pole buddy.the.elf@northpole.example.org
  • 6. Mbox { "From": "St. Nick <santa@northpole.example.org>", "Content-Transfer-Encoding": "7bit", "To": [ "rudolph@northpole.example.org" ], "parts": [ { "content": "Sounds good. See you at the usual location.nnThanks,...", "contentType": "text/plain" } ], "References": "<88364590.8837464573838@mail.northpole.example.org>", "Mime-Version": "1.0", "In-Reply-To": "<194756537.0293874783209@mail.northpole.example.org>", "Date": "Fri, 25 Dec 2001 00:06:42 -0000 (GMT)", "Message-ID": "<16159836.1075855377439@mail.northpole.example.org>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "RE: FWD: Tonight" }, { "From": "Buddy <buddy.the.elf@northpole.example.org>", "Content-Transfer-Encoding": "7bit", "To": [ "workshop@northpole.example.org" ], "parts": [ { "content": "Last batch of toys was just loaded onto sleigh. nn...", "contentType": "text/plain" } ], "Mime-Version": "1.0", "Date": "Fri, 25 Dec 2001 00:03:34 -0000 (GMT)", "Message-ID": "<88364590.8837464573838@mail.northpole.example.org>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "Tonight" } ]
  • 7. Mbox + couchDB DB 에 저장하여 통계를낼수 있다. Json API를 제공
  • 8. couchDB 문서 기반 DB Server Json API를 제공 Views Schema-Free
  • 9. couchDB Install couchdb on centOS yum install couchdb /etc/init.d/couchdb start
  • 10. couchDB -+ Python Install Couchdb Kit (On CentOS) curl -O http://peak.telecommunity.com/dist/ez_se tup.py http://pypi.python.org/pypi/setuptools#r pm-based-systems $ sudo python ez_setup.py -U setuptools Python – Couchdb API http://packages.python.org/CouchDB
  • 11. couchDB -+ Python {# -*- coding: utf-8 -*- import sys import os import couchdb try: import jsonlib2 as json except ImportError: import json JSON_MBOX = sys.argv[1] # i.e. enron.mbox.json DB = os.path.basename(JSON_MBOX).split('.')[0] server = couchdbkit.Server('http://localhost:5984') db = server.create(DB) docs = json.loads(open(JSON_MBOX).read()) db.update(docs, all_or_nothing=True)
  • 12. couchDB - Views def dateTimeToDocMapper(doc): # Note that you need to include imports used by your mapper # inside the function definition from dateutil.parser import parse from datetime import datetime as dt if doc.get('Date'): # [year, month, day, hour, min, sec] _date = list(dt.timetuple(parse(doc['Date']))[:-3]) yield (_date, doc) # Specify an index to back the query. Note that the index won't be # created until the first time the query is run view = ViewDefinition('index', 'by_date_time', dateTimeToDocMapper, language='python') view.sync(db)
  • 13. couchDB – Map/Reduce def dateTimeCountMapper(doc): from dateutil.parser import parse from datetime import datetime as dt if doc.get('Date'): _date = list(dt.timetuple(parse(doc['Date']))[:-3]) yield (_date, 1) def summingReducer(keys, values, rereduce): return sum(values) view = ViewDefinition('index', 'doc_count_by_date_time', dateTimeCountMapper, reduce_fun=summingReducer, language='python') view.sync(db)
  • 14. couchDB – Lucene JAVA 기반의 검색 엔진 Library
  • 15. Look Who’s Talking  검색어에 해당하는 메시지 ID를 couchdb-lucene 에 질의.  메시지 ID가 있는 모든 메일을 찾는다.  메일중에서 메시지가 있는 메일 의 유니크한 메일 주소를 찾아 낸다.
  • 21. Analyzing Mail Data Getmail Poplib Imaplib Graph Your Inbox Google Chrome Extension