Four Problems You Run into When DIY-ing a “Big Data” Analytics System

  • 1,000 views
Uploaded on

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,000
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • <<<NOTE>>> We have to add that we can not disclose some customers’ name here, including some of world’s largest enterprises and one of the world’s largest web company.

Transcript

  • 1. Four Problems You Run into When DIY-ing a “Big Data” analytic system.(and how to solve them. Hint: Treasure Data)Kiyoto Tamura & Jeff Yuan
  • 2. Before we begin… 2
  • 3. <announcements size=“two”> 3
  • 4. 1. we are hiring! 4
  • 5. 1. WE ARE HIRING! 5
  • 6. We are looking for… 6
  • 7. Lead UI/UX Designer 7
  • 8. 0 8
  • 9. which means… 9
  • 10. design the entire UI/UX 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. Anything that makes ourcustomer’s experience BETTER 14
  • 15. super importanthigh-responsibility 15
  • 16. Face of our service 16
  • 17. Lead UI/UX Designer 17
  • 18. careers@treasure-data.com 18
  • 19. We are also looking for… 19
  • 20. Engineers 20
  • 21. 21
  • 22. (Hadoop) Engineers 22
  • 23. 23
  • 24. 24
  • 25. Distributed Systems 25
  • 26. specifically 26
  • 27. (multi-tenant) Hadoop 27
  • 28. Open Source! 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 32
  • 33. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 33
  • 34. 34
  • 35. (more on Fluentd later) 35
  • 36. #OneMoreThing 36
  • 37. 37
  • 38. “way better than C++!” 38
  • 39. according to a committer 39
  • 40. (who works at Treasure Data) 40
  • 41. 41
  • 42. 42
  • 43. www.treasure-data.com/careers/ 43
  • 44. 1. We are hiring! 44
  • 45. 2. Discounts for Our Service! 45
  • 46. (ask us for the secret coupon code) 46
  • 47. 30% OFF 47
  • 48. 6 months 48
  • 49. 49
  • 50. </announcements> 50
  • 51. Four Problems You Run into When DIY-ing a “Big Data” analytic system. 51
  • 52. 52
  • 53. Hadoop as-a-Service! 53
  • 54. It’s a great idea 54
  • 55. more accessible and useful 55
  • 56. but also 56
  • 57. not so easy to implement 57
  • 58. e.g. 58
  • 59. 59
  • 60. (zoom out) 60
  • 61. 61
  • 62. Hadoop as-a-Service 62
  • 63. good in theory, lots of work in reality 63
  • 64. That’s where we come in! 64
  • 65. Easiest (and most cost effective) wayto get answers about my data! 65
  • 66.  Collect/Store Query Access Scale 66
  • 67. 1. How do I collect my data and how do Istore them? Stream (access logs, standard error) Bulk (historical data, sales transactions, etc.) Secure and reliable storage! 67
  • 68. Client ServerApacheAppApp RDBMSOther data sources Treasure Data API Layer csv json 68
  • 69. 2. How do I query my data? Ad hoc queries Scheduled queries Data schema? 69
  • 70. Cmdline, console Query API HIVE, PIG (to be supported) Processing Layer Apps (JDBC, ClusterUser ODBC, REST) MapReduce Jobs Amazon S3 Hadoop cluster 70
  • 71. 71
  • 72. 3. How do different users in my orgaccess query results? Different roles need to access results from different interfaces • Analysts -> Excel • Devs -> REST, MySQL 72
  • 73. Google Spreadsheet ODBC -> Excel (Coming Q1) AnalystsTreasure Data MySQL, Postgres JDBC, REST API POST to web server Engineers 73
  • 74. 4. How do I scale? More data? More queries? 74
  • 75. Don’t worry, we’ll take care of it! 75
  • 76. Number of records in TD (in billions) 120 100 80 60 40 20 Sep Nov Jan Mar May Jul Aug 2011 2011 2012 2012 2012 2012 2012January 2013 – Now over 200 Billion! 76
  • 77. Treasure Data High-Level Architecture Log Data Spread Sheets BI ToolsApplication Data Treasure Data Subscribe Data Warehouse SQL td-agent Operational 3rd Party Data Interface Analytics JDBC ODBC Databases Sensor DataWeb/Mobile Data CLI 77
  • 78. Our Customers – Fortune Global 500leaders and start-ups including: 78
  • 79.  Japan’s #1 recipe website 15 million users 1 million recipes 79
  • 80. MySQL to TD (Before) 80
  • 81. MySQL to TD (Before) 81
  • 82. MySQL to TD (After) 82
  • 83.  Europe’s largest independent mobile ad exchange 20 billion imps/month 15,000+ mobile apps 83
  • 84. Two Weeks From Start to Finish! 84