Four Problems You Run into When DIY-ing a “Big Data” analytic system.(and how to solve them. Hint: Treasure Data)Kiyoto Ta...
Before we begin…                   2
<announcements size=“two”>                             3
1. we are hiring!                    4
1. WE ARE HIRING!                    5
We are looking for…                      6
Lead UI/UX Designer                      7
0    8
which means…               9
design the entire UI/UX                          10
11
12
13
Anything that makes ourcustomer’s experience BETTER                               14
super importanthigh-responsibility                      15
Face of our service                      16
Lead UI/UX Designer                      17
careers@treasure-data.com                            18
We are also looking for…                           19
Engineers            20
21
(Hadoop) Engineers                     22
23
24
Distributed Systems                      25
specifically               26
(multi-tenant) Hadoop                        27
Open Source!               28
29
30
31
class MemcacheList(object):  def push(self, key, value):   """ Add an element to the front of the list """   packed = msgp...
class MemcacheList(object):  def push(self, key, value):   """ Add an element to the front of the list """   packed = msgp...
34
(more on Fluentd later)                          35
#OneMoreThing                36
37
“way better than C++!”                         38
according to a committer                           39
(who works at Treasure Data)                               40
41
42
www.treasure-data.com/careers/                                 43
1. We are hiring!                    44
2. Discounts for Our Service!                                45
(ask us for the secret coupon code)                                      46
30% OFF          47
6 months           48
49
</announcements>                   50
Four Problems You Run into When DIY-ing a “Big Data”      analytic system.                             51
52
Hadoop as-a-Service!                       53
It’s a great idea                    54
more accessible and useful                             55
but also           56
not so easy to implement                           57
e.g.       58
59
(zoom out)             60
61
Hadoop as-a-Service                      62
good in theory, lots of work in reality                                          63
That’s where we come in!                           64
Easiest (and most cost effective) wayto get answers about my data!                                        65
 Collect/Store Query Access Scale                  66
1. How do I collect my data and how do Istore them? Stream (access logs, standard error) Bulk (historical data, sales  t...
Client       ServerApacheAppApp        RDBMSOther data sources                     Treasure Data API                      ...
2. How do I query my data? Ad hoc queries Scheduled queries Data schema?                             69
Cmdline,        console                                                Query                                              ...
71
3. How do different users in my orgaccess query results? Different roles need to access results  from different interface...
Google Spreadsheet           ODBC -> Excel (Coming Q1)                                       AnalystsTreasure  Data     My...
4. How do I scale? More data? More queries?                     74
Don’t worry, we’ll take care of it!                                      75
Number of records in TD (in billions) 120 100 80 60 40 20       Sep    Nov     Jan   Mar    May     Jul   Aug       2011  ...
Treasure Data High-Level Architecture   Log Data                                                     Spread Sheets        ...
Our Customers – Fortune Global 500leaders and start-ups including:                                     78
 Japan’s #1 recipe website 15 million users 1 million recipes                              79
MySQL to TD (Before)                       80
MySQL to TD (Before)                       81
MySQL to TD (After)                      82
 Europe’s largest independent  mobile ad exchange 20 billion imps/month 15,000+ mobile apps                            ...
Two Weeks From Start to Finish!                                  84
Upcoming SlideShare
Loading in...5
×

Four Problems You Run into When DIY-ing a “Big Data” Analytics System

1,229

Published on

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,229
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • &lt;&lt;&lt;NOTE&gt;&gt;&gt; We have to add that we can not disclose some customers’ name here, including some of world’s largest enterprises and one of the world’s largest web company.
  • Four Problems You Run into When DIY-ing a “Big Data” Analytics System

    1. 1. Four Problems You Run into When DIY-ing a “Big Data” analytic system.(and how to solve them. Hint: Treasure Data)Kiyoto Tamura & Jeff Yuan
    2. 2. Before we begin… 2
    3. 3. <announcements size=“two”> 3
    4. 4. 1. we are hiring! 4
    5. 5. 1. WE ARE HIRING! 5
    6. 6. We are looking for… 6
    7. 7. Lead UI/UX Designer 7
    8. 8. 0 8
    9. 9. which means… 9
    10. 10. design the entire UI/UX 10
    11. 11. 11
    12. 12. 12
    13. 13. 13
    14. 14. Anything that makes ourcustomer’s experience BETTER 14
    15. 15. super importanthigh-responsibility 15
    16. 16. Face of our service 16
    17. 17. Lead UI/UX Designer 17
    18. 18. careers@treasure-data.com 18
    19. 19. We are also looking for… 19
    20. 20. Engineers 20
    21. 21. 21
    22. 22. (Hadoop) Engineers 22
    23. 23. 23
    24. 24. 24
    25. 25. Distributed Systems 25
    26. 26. specifically 26
    27. 27. (multi-tenant) Hadoop 27
    28. 28. Open Source! 28
    29. 29. 29
    30. 30. 30
    31. 31. 31
    32. 32. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 32
    33. 33. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 33
    34. 34. 34
    35. 35. (more on Fluentd later) 35
    36. 36. #OneMoreThing 36
    37. 37. 37
    38. 38. “way better than C++!” 38
    39. 39. according to a committer 39
    40. 40. (who works at Treasure Data) 40
    41. 41. 41
    42. 42. 42
    43. 43. www.treasure-data.com/careers/ 43
    44. 44. 1. We are hiring! 44
    45. 45. 2. Discounts for Our Service! 45
    46. 46. (ask us for the secret coupon code) 46
    47. 47. 30% OFF 47
    48. 48. 6 months 48
    49. 49. 49
    50. 50. </announcements> 50
    51. 51. Four Problems You Run into When DIY-ing a “Big Data” analytic system. 51
    52. 52. 52
    53. 53. Hadoop as-a-Service! 53
    54. 54. It’s a great idea 54
    55. 55. more accessible and useful 55
    56. 56. but also 56
    57. 57. not so easy to implement 57
    58. 58. e.g. 58
    59. 59. 59
    60. 60. (zoom out) 60
    61. 61. 61
    62. 62. Hadoop as-a-Service 62
    63. 63. good in theory, lots of work in reality 63
    64. 64. That’s where we come in! 64
    65. 65. Easiest (and most cost effective) wayto get answers about my data! 65
    66. 66.  Collect/Store Query Access Scale 66
    67. 67. 1. How do I collect my data and how do Istore them? Stream (access logs, standard error) Bulk (historical data, sales transactions, etc.) Secure and reliable storage! 67
    68. 68. Client ServerApacheAppApp RDBMSOther data sources Treasure Data API Layer csv json 68
    69. 69. 2. How do I query my data? Ad hoc queries Scheduled queries Data schema? 69
    70. 70. Cmdline, console Query API HIVE, PIG (to be supported) Processing Layer Apps (JDBC, ClusterUser ODBC, REST) MapReduce Jobs Amazon S3 Hadoop cluster 70
    71. 71. 71
    72. 72. 3. How do different users in my orgaccess query results? Different roles need to access results from different interfaces • Analysts -> Excel • Devs -> REST, MySQL 72
    73. 73. Google Spreadsheet ODBC -> Excel (Coming Q1) AnalystsTreasure Data MySQL, Postgres JDBC, REST API POST to web server Engineers 73
    74. 74. 4. How do I scale? More data? More queries? 74
    75. 75. Don’t worry, we’ll take care of it! 75
    76. 76. Number of records in TD (in billions) 120 100 80 60 40 20 Sep Nov Jan Mar May Jul Aug 2011 2011 2012 2012 2012 2012 2012January 2013 – Now over 200 Billion! 76
    77. 77. Treasure Data High-Level Architecture Log Data Spread Sheets BI ToolsApplication Data Treasure Data Subscribe Data Warehouse SQL td-agent Operational 3rd Party Data Interface Analytics JDBC ODBC Databases Sensor DataWeb/Mobile Data CLI 77
    78. 78. Our Customers – Fortune Global 500leaders and start-ups including: 78
    79. 79.  Japan’s #1 recipe website 15 million users 1 million recipes 79
    80. 80. MySQL to TD (Before) 80
    81. 81. MySQL to TD (Before) 81
    82. 82. MySQL to TD (After) 82
    83. 83.  Europe’s largest independent mobile ad exchange 20 billion imps/month 15,000+ mobile apps 83
    84. 84. Two Weeks From Start to Finish! 84

    ×