HUG August 2010:Whats the buzz_bay_area

2,481 views
2,431 views

Published on

•Stefan Groschupf, the co-founder and CTO of Datameer, will discuss challenges in social media analytics and how to overcome these using big data analytics built on Hadoop, in his “Social Media: What’s Really the Buzz?” talk. Identifying true thought leads and influencers in social media conversations are becoming increasingly important, so that companies can better understand who is having an impact on their customers' buying decisions. Rather than counting mentions in limited subsets of social media data, organizations need a solution that can uncover complex relationships buried in massive volumes of social media data and a way to bring in data from multiple online data sources to determine the quality and effectiveness of user commentary.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,481
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
31
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

HUG August 2010:Whats the buzz_bay_area

  1. 1. What’s Really the Buzz? Stefan Groschupf sg@datameer.com © Datameer, Inc 2010
  2. 2. How it started... 2 © Datameer, Inc 2010
  3. 3. How it started... 3 © Datameer, Inc 2010
  4. 4. How it started... 4 © Datameer, Inc 2010
  5. 5. Agenda 5 © Datameer, Inc 2010
  6. 6. Street Cred Long time open source contributor http://github.com/sgroschupf/zkclient http://github.com/sgroschupf/aws-tasks 6 © Datameer, Inc 2010
  7. 7. Cubicle Cred Cloud Computing Architect Hadoop consultant at e.g. Co-Founder/CEO Scale Unlimited Co-Founder / CEO Datameer Inc. 7 © Datameer, Inc 2010
  8. 8. Hadoop vs. DB Hadoop(mergesort) DB(b-tree/index) 8 © Datameer, Inc 2010
  9. 9. Twitter 9 © Datameer, Inc 2010
  10. 10. Stack 10 © Datameer, Inc 2010
  11. 11. Stack AWS • ... Hadoop • ... Datameer • Spreadsheet compiles into MapReduce Jobs, + much more • Comercial Gephi • Graph Visualization, GPL, Netbeans based 11 © Datameer, Inc 2010
  12. 12. Expenses What Price Description Client Ec2 $65.00 0.085/h * 24 * 31 S3 Storage $7.50 0.15/GB/m * 50 EMR $24.00 0.20 * 20 * 6 Datameer ~$20 private beta Gephi $0.00 GPL Development $8.99 + tax 6pk Pilsner Urquell 12 © Datameer, Inc 2010
  13. 13. Pipeline DATAMEER Client on Ec2 EMR Gephi CSV/GEXF S3 13 © Datameer, Inc 2010
  14. 14. Twitter Client Basic Auth S3 TwitterClient Compression Upload Thread Thread Thread EC2 Server 256 MB 50 MB 14 © Datameer, Inc 2010
  15. 15. Twitter API Streaming API curl http://stream.twitter.com/1/statuses/sample.json -ustefan:somepwd streams, once a while interruped 5% of public statuses by default Search API curl http://search.twitter.com/search.json?q=datameer -ustefan:somepwd 150 requests per hour 15 © Datameer, Inc 2010
  16. 16. Some Twitter Fields user_screen_name text user_followers_count source user_friends_count geo_type user_statuses_count geo_longitude user_location geo_latitude created_at 16 © Datameer, Inc 2010
  17. 17. Simple Analytics Trending topics Tweet spread timing Topic Reach Topic Location Reach Etc. 17 © Datameer, Inc 2010
  18. 18. Demo 18 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  19. 19. From Stream to Network Understand tweets as network plus metadata (time, geo, etc) ReTweet => Citation => Link => Pagerank Twitter friends (not accessible from Streaming API) • Technically not realistic to get • Quality? 19 © Datameer, Inc 2010
  20. 20. Social Network Analytics David Krackhardt Degree Number of direct connections a node has (Diane) Betweenness broker, point of failure, high influence of flow (Heather) Closeness Shortest path to all others (Fernando, Garth) 20 © Datameer, Inc 2010
  21. 21. Social Network Analytics Centralization Centralized network dominated by one or a few very central nodes Single point of failure Prestige Prestige is the term for a node's centrality Reach Degree to which any member of a network can reach other members Boundary Spanners Central in overall network, bridging clusters, innovators since information comes from multiple clusters Path Length The distances between pairs of nodes in the network. Average path length is the average of these distances between all pairs of nodes. 21 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  22. 22. Demo 22 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  23. 23. Yes, we hiring too... 23 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  24. 24. Resources... http://www.slideshare.net/padday/the-real-life-social-network-v2 (Paul Adams) http://xrime.sourceforge.net/ • SNA Metrics and Structures www.datameer.com twitter.com/datameer http://github.com/sgroschupf sg@datameer.com 24 © Datameer, Inc 2010

×