Your SlideShare is downloading. ×
  • Like
HUG August 2010:Whats the buzz_bay_area
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

HUG August 2010:Whats the buzz_bay_area

  • 2,111 views
Published

•Stefan Groschupf, the co-founder and CTO of Datameer, will discuss challenges in social media analytics and how to overcome these using big data analytics built on Hadoop, in his “Social Media: …

•Stefan Groschupf, the co-founder and CTO of Datameer, will discuss challenges in social media analytics and how to overcome these using big data analytics built on Hadoop, in his “Social Media: What’s Really the Buzz?” talk. Identifying true thought leads and influencers in social media conversations are becoming increasingly important, so that companies can better understand who is having an impact on their customers' buying decisions. Rather than counting mentions in limited subsets of social media data, organizations need a solution that can uncover complex relationships buried in massive volumes of social media data and a way to bring in data from multiple online data sources to determine the quality and effectiveness of user commentary.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,111
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
31
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. What’s Really the Buzz? Stefan Groschupf sg@datameer.com © Datameer, Inc 2010
  • 2. How it started... 2 © Datameer, Inc 2010
  • 3. How it started... 3 © Datameer, Inc 2010
  • 4. How it started... 4 © Datameer, Inc 2010
  • 5. Agenda 5 © Datameer, Inc 2010
  • 6. Street Cred Long time open source contributor http://github.com/sgroschupf/zkclient http://github.com/sgroschupf/aws-tasks 6 © Datameer, Inc 2010
  • 7. Cubicle Cred Cloud Computing Architect Hadoop consultant at e.g. Co-Founder/CEO Scale Unlimited Co-Founder / CEO Datameer Inc. 7 © Datameer, Inc 2010
  • 8. Hadoop vs. DB Hadoop(mergesort) DB(b-tree/index) 8 © Datameer, Inc 2010
  • 9. Twitter 9 © Datameer, Inc 2010
  • 10. Stack 10 © Datameer, Inc 2010
  • 11. Stack AWS • ... Hadoop • ... Datameer • Spreadsheet compiles into MapReduce Jobs, + much more • Comercial Gephi • Graph Visualization, GPL, Netbeans based 11 © Datameer, Inc 2010
  • 12. Expenses What Price Description Client Ec2 $65.00 0.085/h * 24 * 31 S3 Storage $7.50 0.15/GB/m * 50 EMR $24.00 0.20 * 20 * 6 Datameer ~$20 private beta Gephi $0.00 GPL Development $8.99 + tax 6pk Pilsner Urquell 12 © Datameer, Inc 2010
  • 13. Pipeline DATAMEER Client on Ec2 EMR Gephi CSV/GEXF S3 13 © Datameer, Inc 2010
  • 14. Twitter Client Basic Auth S3 TwitterClient Compression Upload Thread Thread Thread EC2 Server 256 MB 50 MB 14 © Datameer, Inc 2010
  • 15. Twitter API Streaming API curl http://stream.twitter.com/1/statuses/sample.json -ustefan:somepwd streams, once a while interruped 5% of public statuses by default Search API curl http://search.twitter.com/search.json?q=datameer -ustefan:somepwd 150 requests per hour 15 © Datameer, Inc 2010
  • 16. Some Twitter Fields user_screen_name text user_followers_count source user_friends_count geo_type user_statuses_count geo_longitude user_location geo_latitude created_at 16 © Datameer, Inc 2010
  • 17. Simple Analytics Trending topics Tweet spread timing Topic Reach Topic Location Reach Etc. 17 © Datameer, Inc 2010
  • 18. Demo 18 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 19. From Stream to Network Understand tweets as network plus metadata (time, geo, etc) ReTweet => Citation => Link => Pagerank Twitter friends (not accessible from Streaming API) • Technically not realistic to get • Quality? 19 © Datameer, Inc 2010
  • 20. Social Network Analytics David Krackhardt Degree Number of direct connections a node has (Diane) Betweenness broker, point of failure, high influence of flow (Heather) Closeness Shortest path to all others (Fernando, Garth) 20 © Datameer, Inc 2010
  • 21. Social Network Analytics Centralization Centralized network dominated by one or a few very central nodes Single point of failure Prestige Prestige is the term for a node's centrality Reach Degree to which any member of a network can reach other members Boundary Spanners Central in overall network, bridging clusters, innovators since information comes from multiple clusters Path Length The distances between pairs of nodes in the network. Average path length is the average of these distances between all pairs of nodes. 21 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 22. Demo 22 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 23. Yes, we hiring too... 23 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 24. Resources... http://www.slideshare.net/padday/the-real-life-social-network-v2 (Paul Adams) http://xrime.sourceforge.net/ • SNA Metrics and Structures www.datameer.com twitter.com/datameer http://github.com/sgroschupf sg@datameer.com 24 © Datameer, Inc 2010