• Save
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Four Problems You Run into When DIY-ing a “Big Data” Analytics System

on

  • 1,340 views

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Statistics

Views

Total Views
1,340
Views on SlideShare
1,340
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <<>> We have to add that we can not disclose some customers’ name here, including some of world’s largest enterprises and one of the world’s largest web company.

Four Problems You Run into When DIY-ing a “Big Data” Analytics System Presentation Transcript

  • 1. Four Problems You Run into When DIY-ing a “Big Data” analytic system.(and how to solve them. Hint: Treasure Data)Kiyoto Tamura & Jeff Yuan
  • 2. Before we begin… 2
  • 3. <announcements size=“two”> 3
  • 4. 1. we are hiring! 4
  • 5. 1. WE ARE HIRING! 5
  • 6. We are looking for… 6
  • 7. Lead UI/UX Designer 7
  • 8. 0 8
  • 9. which means… 9
  • 10. design the entire UI/UX 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. Anything that makes ourcustomer’s experience BETTER 14
  • 15. super importanthigh-responsibility 15
  • 16. Face of our service 16
  • 17. Lead UI/UX Designer 17
  • 18. careers@treasure-data.com 18
  • 19. We are also looking for… 19
  • 20. Engineers 20
  • 21. 21
  • 22. (Hadoop) Engineers 22
  • 23. 23
  • 24. 24
  • 25. Distributed Systems 25
  • 26. specifically 26
  • 27. (multi-tenant) Hadoop 27
  • 28. Open Source! 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 32
  • 33. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 33
  • 34. 34
  • 35. (more on Fluentd later) 35
  • 36. #OneMoreThing 36
  • 37. 37
  • 38. “way better than C++!” 38
  • 39. according to a committer 39
  • 40. (who works at Treasure Data) 40
  • 41. 41
  • 42. 42
  • 43. www.treasure-data.com/careers/ 43
  • 44. 1. We are hiring! 44
  • 45. 2. Discounts for Our Service! 45
  • 46. (ask us for the secret coupon code) 46
  • 47. 30% OFF 47
  • 48. 6 months 48
  • 49. 49
  • 50. </announcements> 50
  • 51. Four Problems You Run into When DIY-ing a “Big Data” analytic system. 51
  • 52. 52
  • 53. Hadoop as-a-Service! 53
  • 54. It’s a great idea 54
  • 55. more accessible and useful 55
  • 56. but also 56
  • 57. not so easy to implement 57
  • 58. e.g. 58
  • 59. 59
  • 60. (zoom out) 60
  • 61. 61
  • 62. Hadoop as-a-Service 62
  • 63. good in theory, lots of work in reality 63
  • 64. That’s where we come in! 64
  • 65. Easiest (and most cost effective) wayto get answers about my data! 65
  • 66.  Collect/Store Query Access Scale 66
  • 67. 1. How do I collect my data and how do Istore them? Stream (access logs, standard error) Bulk (historical data, sales transactions, etc.) Secure and reliable storage! 67
  • 68. Client ServerApacheAppApp RDBMSOther data sources Treasure Data API Layer csv json 68
  • 69. 2. How do I query my data? Ad hoc queries Scheduled queries Data schema? 69
  • 70. Cmdline, console Query API HIVE, PIG (to be supported) Processing Layer Apps (JDBC, ClusterUser ODBC, REST) MapReduce Jobs Amazon S3 Hadoop cluster 70
  • 71. 71
  • 72. 3. How do different users in my orgaccess query results? Different roles need to access results from different interfaces • Analysts -> Excel • Devs -> REST, MySQL 72
  • 73. Google Spreadsheet ODBC -> Excel (Coming Q1) AnalystsTreasure Data MySQL, Postgres JDBC, REST API POST to web server Engineers 73
  • 74. 4. How do I scale? More data? More queries? 74
  • 75. Don’t worry, we’ll take care of it! 75
  • 76. Number of records in TD (in billions) 120 100 80 60 40 20 Sep Nov Jan Mar May Jul Aug 2011 2011 2012 2012 2012 2012 2012January 2013 – Now over 200 Billion! 76
  • 77. Treasure Data High-Level Architecture Log Data Spread Sheets BI ToolsApplication Data Treasure Data Subscribe Data Warehouse SQL td-agent Operational 3rd Party Data Interface Analytics JDBC ODBC Databases Sensor DataWeb/Mobile Data CLI 77
  • 78. Our Customers – Fortune Global 500leaders and start-ups including: 78
  • 79.  Japan’s #1 recipe website 15 million users 1 million recipes 79
  • 80. MySQL to TD (Before) 80
  • 81. MySQL to TD (Before) 81
  • 82. MySQL to TD (After) 82
  • 83.  Europe’s largest independent mobile ad exchange 20 billion imps/month 15,000+ mobile apps 83
  • 84. Two Weeks From Start to Finish! 84