SlideShare a Scribd company logo
1 of 53
Download to read offline
데이터 탐사 그리고 SE
김상희
sanghee.kim@colorodo.edu
Session 1
과학 패러다임
빅 데이터 / 스몰 데이터
“네가 가진 데이터가 정말
빅 하다고 생각해?"
데이터 처리의 흐름
데이터 모으기(생성)

데이터 가공

데이터 분석

데이터 시각화
데이터 처리와 관련 툴
데이터 모으기(생성)
open refine
pandas, numpy

데이터 가공

data wrangler

google big query
apache lucene
manyeyes
d3
google chart API
matplotlib

데이터 분석
NodeXL
splunk

데이터 시각화

tableau

각 툴에 대한 참고: http://goo.gl/ooYExB
“트위터 데이터를 분석해보
자."
일단 한 번 해보기
데이터 모으기(생성)
데이터 가공
데이터 분석
데이터 시각화
연장을 준비하자
데이터 모으기(생성): Twitter API, Twython
데이터 가공: Python, Twython, IPython,
Pandas
데이터 분석: Splunk, Python, IPython, Pandas
데이터 시각화: Splunk, matplotlib, Google
Chart API
스플렁크로 해보기
Interesting query 1 of 3
Add a comment

By this query we see that the highest retweet on the nexus 5 is by google. Which shows that they have a
strong voice when getting to their fans.

data: https://github.com/sangheestyle/bisonsampledata
presenation: http://goo.gl/MLFf96
트위터 데이터로 분석해보
Interesting query 2 of 3
기
source="/Users/kimsanghee/Dev/datastore4bison/nexus_5_raw.csv.zip:./nexus_5_raw.csv"

By this query at launching time we see that the highest retweet by RT on the nexus 5 is by Sundar Pichai
who is is a senior vice president at Google, where he oversees Android, Chrome and Google Apps. Which
shows that he has a strong voice when getting to their fans.
data: https://github.com/sangheestyle/bisonsampledata
presenation: http://goo.gl/MLFf96
트위터 데이터로 분석해보
Interesting query 3 of 3
기
Top tweets show what organization is
most influential during 19 days
2nd largest tweet is about promotional
event for free nexus 5.
http://mobilesyrup.com/2013/11/02/wina-google-nexus-5/

data: https://github.com/sangheestyle/bisonsampledata
presenation: http://goo.gl/MLFf96
“트위터 데이터로 분석해보
기
+ 툴과 생각 바꿔보기”
Bison: Project Overview
Object: Analyzing tweets about mobile devices
Source & demo: https://github.com/sangheestyle/bison
How Big: 789,051 tweets
Tools: Python, Pandas, Numpy, Google Chart
Member: Jacob, Sanghee
http://goo.gl/L26mmP

What happen?
http://goo.gl/1yaekZ

What happen once again?

Only two weeks!
What they use?

http://goo.gl/OzYu0J
http://goo.gl/Y28HrQ

When they do?
Where do they live?

http://goo.gl/vyi1Gy
“툴 변경은 단지 툴만 변경
되는 것인가?”
생각해보기
이거 어떻게 생각하냐? (마음에 드는건? 아닌건?)
정확성을 위해서 두 개의 그래프를 동시에 보여
줘?
확장을 한다면 어떻게?
무슨 데이터를 더 제공한다면 너는 뭘 더 할 수 있
지?
네가 만든 모델이 다른 곳에서 유효할까? (기간,
Session 1 마감
+ 중간회고
Session 2
“40 percent of major
decisions are based not
on facts, but on the
manager’s gut”
from Software Analytics = Sharing Information by Thomas
Zimmermann http://goo.gl/WQ0BKv
데이터 처리의 흐름
데이터 모으기(생성)

데이터 가공

데이터 분석

데이터 시각화
“Git 에서 나오는 데이터를
분석해보자."
일단 한 번 해보기
데이터 모으기(생성)
데이터 가공
데이터 분석
데이터 시각화
연장을 준비하자
데이터 모으기(생성): Git
데이터 가공: Python, IPython, Pandas
데이터 분석: Splunk, Python, IPython, Pandas
데이터 시각화: Splunk, matplotlib, Google
Chart API
“미리 만들어 놓은 것으로
집단 감상을 해보자."
“우리 집단의 특성을 시간순
으로 알아보자."
“누가누가 잘하나? 눈속임
에 주의하면서!"
“분쟁지역! UN은 어디에?"
“다른것도 한 번 보자."
https://github.com/twbs/bootstrap/graphs
https://github.com/twbs/bootstrap/graphs
“우리 이래도 되는거야?"
생각해보기
미숙한 모델을 들이대지 말 것
상관관계
인센티브
From SE lecture by Professor Ruth Dameron (University of Colorado, Boulder)
확장해보기
개발: 어떤식으로 일을 하면 덜 고통스러울까?
교육: 우리는 어떠한 교육을 만들어내야 하는가?
HR: 어떤 사람들이 필요한가? 조직 구조는?
조직문화: 우리 조직의 특성은 어떠한가?
중요한 점
어디서 어떻게 데이터를 수집 할 것인가?
데이터는 집단을 충분히 반영하는가?
데이터는 지속적으로 변경될 수 있다.
분석하는 방법에 따라서 정보는 달라질 수 있다.
가정을 하고, 대화를 하고, 생각을 확장하자.
집단 내 전문가들을 이용하자.
잘라내기보다 이상치를 조정해보자.
의도적으로 툴을 바꿔보자.
(그 외에는?)
“(현 시스템 회고, 개선안 도
출, 반영) X 지속적인 반복”
“결론적으로 무엇을 하고 왜
할건데?”
집단 토론

“커밋 개수로 개발자의 능력
을 판단할 수 있을까?”
Session 2 마감
+ 최종회고

More Related Content

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

데이터 탐사 그리고 SE - Jan 8 2014, mc lab, seoul, south korea