Four Problems You Run into When DIY-ing a “Big Data” Analytics System

1,622 views
1,449 views

Published on

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,622
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • <<<NOTE>>> We have to add that we can not disclose some customers’ name here, including some of world’s largest enterprises and one of the world’s largest web company.
  • Four Problems You Run into When DIY-ing a “Big Data” Analytics System

    1. 1. Four Problems You Run into When DIY-ing a “Big Data” analytic system.(and how to solve them. Hint: Treasure Data)Kiyoto Tamura & Jeff Yuan
    2. 2. Before we begin… 2
    3. 3. <announcements size=“two”> 3
    4. 4. 1. we are hiring! 4
    5. 5. 1. WE ARE HIRING! 5
    6. 6. We are looking for… 6
    7. 7. Lead UI/UX Designer 7
    8. 8. 0 8
    9. 9. which means… 9
    10. 10. design the entire UI/UX 10
    11. 11. 11
    12. 12. 12
    13. 13. 13
    14. 14. Anything that makes ourcustomer’s experience BETTER 14
    15. 15. super importanthigh-responsibility 15
    16. 16. Face of our service 16
    17. 17. Lead UI/UX Designer 17
    18. 18. careers@treasure-data.com 18
    19. 19. We are also looking for… 19
    20. 20. Engineers 20
    21. 21. 21
    22. 22. (Hadoop) Engineers 22
    23. 23. 23
    24. 24. 24
    25. 25. Distributed Systems 25
    26. 26. specifically 26
    27. 27. (multi-tenant) Hadoop 27
    28. 28. Open Source! 28
    29. 29. 29
    30. 30. 30
    31. 31. 31
    32. 32. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 32
    33. 33. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 33
    34. 34. 34
    35. 35. (more on Fluentd later) 35
    36. 36. #OneMoreThing 36
    37. 37. 37
    38. 38. “way better than C++!” 38
    39. 39. according to a committer 39
    40. 40. (who works at Treasure Data) 40
    41. 41. 41
    42. 42. 42
    43. 43. www.treasure-data.com/careers/ 43
    44. 44. 1. We are hiring! 44
    45. 45. 2. Discounts for Our Service! 45
    46. 46. (ask us for the secret coupon code) 46
    47. 47. 30% OFF 47
    48. 48. 6 months 48
    49. 49. 49
    50. 50. </announcements> 50
    51. 51. Four Problems You Run into When DIY-ing a “Big Data” analytic system. 51
    52. 52. 52
    53. 53. Hadoop as-a-Service! 53
    54. 54. It’s a great idea 54
    55. 55. more accessible and useful 55
    56. 56. but also 56
    57. 57. not so easy to implement 57
    58. 58. e.g. 58
    59. 59. 59
    60. 60. (zoom out) 60
    61. 61. 61
    62. 62. Hadoop as-a-Service 62
    63. 63. good in theory, lots of work in reality 63
    64. 64. That’s where we come in! 64
    65. 65. Easiest (and most cost effective) wayto get answers about my data! 65
    66. 66.  Collect/Store Query Access Scale 66
    67. 67. 1. How do I collect my data and how do Istore them? Stream (access logs, standard error) Bulk (historical data, sales transactions, etc.) Secure and reliable storage! 67
    68. 68. Client ServerApacheAppApp RDBMSOther data sources Treasure Data API Layer csv json 68
    69. 69. 2. How do I query my data? Ad hoc queries Scheduled queries Data schema? 69
    70. 70. Cmdline, console Query API HIVE, PIG (to be supported) Processing Layer Apps (JDBC, ClusterUser ODBC, REST) MapReduce Jobs Amazon S3 Hadoop cluster 70
    71. 71. 71
    72. 72. 3. How do different users in my orgaccess query results? Different roles need to access results from different interfaces • Analysts -> Excel • Devs -> REST, MySQL 72
    73. 73. Google Spreadsheet ODBC -> Excel (Coming Q1) AnalystsTreasure Data MySQL, Postgres JDBC, REST API POST to web server Engineers 73
    74. 74. 4. How do I scale? More data? More queries? 74
    75. 75. Don’t worry, we’ll take care of it! 75
    76. 76. Number of records in TD (in billions) 120 100 80 60 40 20 Sep Nov Jan Mar May Jul Aug 2011 2011 2012 2012 2012 2012 2012January 2013 – Now over 200 Billion! 76
    77. 77. Treasure Data High-Level Architecture Log Data Spread Sheets BI ToolsApplication Data Treasure Data Subscribe Data Warehouse SQL td-agent Operational 3rd Party Data Interface Analytics JDBC ODBC Databases Sensor DataWeb/Mobile Data CLI 77
    78. 78. Our Customers – Fortune Global 500leaders and start-ups including: 78
    79. 79.  Japan’s #1 recipe website 15 million users 1 million recipes 79
    80. 80. MySQL to TD (Before) 80
    81. 81. MySQL to TD (Before) 81
    82. 82. MySQL to TD (After) 82
    83. 83.  Europe’s largest independent mobile ad exchange 20 billion imps/month 15,000+ mobile apps 83
    84. 84. Two Weeks From Start to Finish! 84

    ×