Klout changing landscape of social media

2,126 views

Published on

In this age of Big Data, data volumes grow exceedingly larger while the technical problems and business scenarios become more complex. Compounding these complexities, data consumers are demanding faster analysis to common business questions asked of their Big Data. This session provides concrete examples of how to address this challenge. We will highlight the use of Big Data technologies—including Hadoop and Hive —with classic BI systems such as SQL Server Analysis Services.

Session takeaways:
• Understand the architectural components surrounding Hadoop, Hive, Classic BI, and the Tier-1 BI ecosystem
• Get strategies for addressing the technical issues when working with extremely large cubes
• See how to address the technical issues when working with Big Data systems from the DBA perspective

Published in: Technology, Business
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total views
2,126
On SlideShare
0
From Embeds
0
Number of Embeds
130
Actions
Shares
0
Downloads
0
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Klout changing landscape of social media

  1. 1. How Klout is changing thelandscape of social media withHadoop and BIDave MarianiVP Engineering, KloutDenny LeePrincipal Program ManagerMicrosoft
  2. 2. Klout uses Big Data to unify the social web
  3. 3. Klout’s Big Data makes all this possible 15 Social Networks Processed Every Day 120 Terabytes of Data Storage 200,000 Indexed Users Added Every Day 140,000,000 Users Indexed Every Day 1,000,000,000 Social Signals Processed Every Day 30,000,000,000 API Calls Delivered Every Month 54,000,000,000 Rows of Data In Klout Data Warehouse 3
  4. 4. Scenario and Definitions Project: Event: Category: Property:Collection Captured Attribute Eventof Events User Action Type Attribute +K (Add a topic) event Topic, {Big Data, BI} Gender, {Male} Location {Palo Alto}
  5. 5. Klout Event Tracker 1 Perform A|B Testing of User Flows 2 Optimize User Registration Funnels3 Monitor consumer engagement & retention (DAUs & MAUs)4 Flexibly track and report on user generated events 5
  6. 6. Klout Event Tracker Requirements 3rd Party Hadoop BI Web Requirement & Query Analytics Hive Engines ToolsCapture & store all user and visitor events No Yes NoIntegrate internal Klout Data No Yes NoSupport queries against granular data No Yes NoSupport interactive queries Yes No YesSupport 3rd party BI tools No No Yes“Query-able” by custom apps No No YesTODO: Make this look good and use animation to “blend” the last 2 columns 6
  7. 7. Klout Data Architecture – The Best Tool for the Job Serve Signal Collectors Registrations DB (Java/ (MySql) Data Scala) Enhancement Engine Klout.com (PIG/Hive) Profile DB (Node.js) Klout API (HBase) (Scala) Data Warehouse (Hive) Search Index Mobile (Elastic Search) In (ObjectiveC) Store & Enhance Streams (MongoDB) Monitoring (Nagios) Dashboards (Tableau) Analytics Analyze Cube Perks Analyics (SSAS) (Scala) Event Tracker (Scala)TO DO: Need to animate the red boxes & make this look better +add Instrument, Collect, Persist, Query, Report information 7
  8. 8. TO DO: make this look better +add Instrument, Collect, Persist, Query, Report informationIf possible, merge slides 7 and 8 together 8
  9. 9. A Peek into Product Insights >A|B Test Example for Viral Workflow 9
  10. 10. 10
  11. 11. 11
  12. 12. A Peek into Product Insights >Projects: Mobile iOS 12
  13. 13. Projects > Mobile iOS > Scala/JavaScript API DB.withConnection("cube")(implicit conn => { var sql1 = SQL(""" select [[Date]].[Date]].[Date]].[MEMBER_CAPTION]]] AS date, ... convert(int, [[Measures]].[Counter]]]) AS cnt from openquery(productinsight, ’ SELECT {[Measures].[Counter]} ON COLUMNS, NON EMPTY CROSSJOIN ( exists([Date].[Date].[Date].allmembers, {[Date].[Date].&[""" + dateFormat(past) + """]:... ) DIMENSION PROPERTIES MEMBER_CAPTION ON ROWS FROM [ProductInsight] ) """) sql1().iterator.foreach(row => { // process row val event = row[String]("event") // .... })
  14. 14. Projects > Mobile iOS > Actual MDX SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter] } ON COLUMNS, NON EMPTY CROSSJOIN ( exists([Date].[Date].[Date].allmembers, [Date].[Date].&[2012-05-19T00:00:00]:[Date]. [Date].&[2012-06-02T00:00:00]), [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES MEMBER_CAPTION ON ROWS FROM [ProductInsight] WHERE ({[Projects].[Project].[mobile-ios]})
  15. 15. Drilling down to the Events >Query Hive using Excel 15
  16. 16. Drilling down to the Events > HiveQL QueryCREATE TABLE mobile-ios-details-20120530 asSELECT get_json_object(json_text,$.sid) as sid, get_json_object(json_text,$.inc) as inc, get_json_object(json_text,$.status) as status, event json_textFROM bi.event_logWHERE project="mobile-ios" AND dt=20120530 AND get_json_object(json_text,$.v)!=1.5 AND (event = api_error OR event = api_timeout)DISTRIBUTE BY get_json_object(json_text,$.sid)SORT BY get_json_object(json_text,$.sid) asc
  17. 17. Adhoc Analysis >Answering Questions on the Fly 17
  18. 18. Summary•  Leverage the best tool for the function or job•  Big Data != Business Intelligence•  Go open source wherever possible but use commercial software when needed 18
  19. 19. Any Questions? What’s next 19

×