Your SlideShare is downloading. ×
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Four Problems You Run into When DIY-ing a “Big Data” Analytics System

1,127

Published on

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Tech Talk at the Treasure Data and Context Logic Meetup on 1/17

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,127
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • <<<NOTE>>> We have to add that we can not disclose some customers’ name here, including some of world’s largest enterprises and one of the world’s largest web company.
  • Transcript

    • 1. Four Problems You Run into When DIY-ing a “Big Data” analytic system.(and how to solve them. Hint: Treasure Data)Kiyoto Tamura & Jeff Yuan
    • 2. Before we begin… 2
    • 3. <announcements size=“two”> 3
    • 4. 1. we are hiring! 4
    • 5. 1. WE ARE HIRING! 5
    • 6. We are looking for… 6
    • 7. Lead UI/UX Designer 7
    • 8. 0 8
    • 9. which means… 9
    • 10. design the entire UI/UX 10
    • 11. 11
    • 12. 12
    • 13. 13
    • 14. Anything that makes ourcustomer’s experience BETTER 14
    • 15. super importanthigh-responsibility 15
    • 16. Face of our service 16
    • 17. Lead UI/UX Designer 17
    • 18. careers@treasure-data.com 18
    • 19. We are also looking for… 19
    • 20. Engineers 20
    • 21. 21
    • 22. (Hadoop) Engineers 22
    • 23. 23
    • 24. 24
    • 25. Distributed Systems 25
    • 26. specifically 26
    • 27. (multi-tenant) Hadoop 27
    • 28. Open Source! 28
    • 29. 29
    • 30. 30
    • 31. 31
    • 32. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 32
    • 33. class MemcacheList(object): def push(self, key, value): """ Add an element to the front of the list """ packed = msgpack.packb(value) self.connection.append(key, packed) def _unpack(self, data): if data == x90: return [], 0 _unpacker = msgpack.Unpacker() _unpacker.feed(data) 33
    • 34. 34
    • 35. (more on Fluentd later) 35
    • 36. #OneMoreThing 36
    • 37. 37
    • 38. “way better than C++!” 38
    • 39. according to a committer 39
    • 40. (who works at Treasure Data) 40
    • 41. 41
    • 42. 42
    • 43. www.treasure-data.com/careers/ 43
    • 44. 1. We are hiring! 44
    • 45. 2. Discounts for Our Service! 45
    • 46. (ask us for the secret coupon code) 46
    • 47. 30% OFF 47
    • 48. 6 months 48
    • 49. 49
    • 50. </announcements> 50
    • 51. Four Problems You Run into When DIY-ing a “Big Data” analytic system. 51
    • 52. 52
    • 53. Hadoop as-a-Service! 53
    • 54. It’s a great idea 54
    • 55. more accessible and useful 55
    • 56. but also 56
    • 57. not so easy to implement 57
    • 58. e.g. 58
    • 59. 59
    • 60. (zoom out) 60
    • 61. 61
    • 62. Hadoop as-a-Service 62
    • 63. good in theory, lots of work in reality 63
    • 64. That’s where we come in! 64
    • 65. Easiest (and most cost effective) wayto get answers about my data! 65
    • 66.  Collect/Store Query Access Scale 66
    • 67. 1. How do I collect my data and how do Istore them? Stream (access logs, standard error) Bulk (historical data, sales transactions, etc.) Secure and reliable storage! 67
    • 68. Client ServerApacheAppApp RDBMSOther data sources Treasure Data API Layer csv json 68
    • 69. 2. How do I query my data? Ad hoc queries Scheduled queries Data schema? 69
    • 70. Cmdline, console Query API HIVE, PIG (to be supported) Processing Layer Apps (JDBC, ClusterUser ODBC, REST) MapReduce Jobs Amazon S3 Hadoop cluster 70
    • 71. 71
    • 72. 3. How do different users in my orgaccess query results? Different roles need to access results from different interfaces • Analysts -> Excel • Devs -> REST, MySQL 72
    • 73. Google Spreadsheet ODBC -> Excel (Coming Q1) AnalystsTreasure Data MySQL, Postgres JDBC, REST API POST to web server Engineers 73
    • 74. 4. How do I scale? More data? More queries? 74
    • 75. Don’t worry, we’ll take care of it! 75
    • 76. Number of records in TD (in billions) 120 100 80 60 40 20 Sep Nov Jan Mar May Jul Aug 2011 2011 2012 2012 2012 2012 2012January 2013 – Now over 200 Billion! 76
    • 77. Treasure Data High-Level Architecture Log Data Spread Sheets BI ToolsApplication Data Treasure Data Subscribe Data Warehouse SQL td-agent Operational 3rd Party Data Interface Analytics JDBC ODBC Databases Sensor DataWeb/Mobile Data CLI 77
    • 78. Our Customers – Fortune Global 500leaders and start-ups including: 78
    • 79.  Japan’s #1 recipe website 15 million users 1 million recipes 79
    • 80. MySQL to TD (Before) 80
    • 81. MySQL to TD (Before) 81
    • 82. MySQL to TD (After) 82
    • 83.  Europe’s largest independent mobile ad exchange 20 billion imps/month 15,000+ mobile apps 83
    • 84. Two Weeks From Start to Finish! 84

    ×