Datameer - Stephen Groschupf - Hadoop World 2010

2,063 views

Published on

Multi Channel Behavioral Analytics

Stephen Groschupf
Datameer

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,063
On SlideShare
0
From Embeds
0
Number of Embeds
269
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Datameer - Stephen Groschupf - Hadoop World 2010

  1. 1. “Hello World!”
  2. 2. My Path
  3. 3. Administrator Analyst Decision Maker Import DashboardsAnalytics ExportETL Datasource: User: Password: Submit What we do.
  4. 4. Buy a Laptop! Social Click Config Call Shop Deliver Register
  5. 5. Customer Behavior
  6. 6. ? Star Schema
  7. 7. Big Data
  8. 8. * Slashing Data Warehouse Costs with the Vertica® Analytic Database: Server (list price): $450,108 Storage (list price): $600,000 (2x $300,000) Data center ␣ power & cooling: $43,200 / year Data center ␣ space: $6,600 / year Implementation: 49,000 Software Licenses ETL/Man Month Oracle Hardware $0 $375,000 $750,000 $1,125,000 $1,500,000 $1,148,908 $220,000 $0 * ? Cost to scale
  9. 9. Disk SSD Memory 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 358,200,000 42,200,000 53,200,000 36,700,000 1,924 316 Random Sequential Adam Jacobs The Pathologies of Big Data Laws of Physics
  10. 10. Seq R/W Linear Scale Open Min. Admin Unstructured Learning No Tools Security Integration HR Batch
  11. 11. Hadoop Eco System Query/Serving Execution Storage Cascading Jaql S3
  12. 12. Architecture API
  13. 13. Join Ref Url Cookie email/phone email email email
  14. 14. Click Event Stream N-Gram Frequency
  15. 15. Implementation effort Learning/POC Hardware Integration Analytics 0 5 10 15 20
  16. 16. Software Licenses Integration Hadoop Hardware Support Administration Analytics $0 $75,000 $150,000 $225,000 $300,000 $240,000 $120,000 $100,000 $50,000 $120,000 $0? Hadoop Cost
  17. 17. Lessons learned Growing Remains strong Great market research Generates recommandations
  18. 18. Pull vs Push Very slow Local buffer = risk of lost data Monitor many agents Complicated Simple Pull as often as required Just one system to monitor Easier to secure
  19. 19. SQL vs. API (vs. Spreadsheets) SQL + UDF (20%) (80%) SQL SQL + UDF+
  20. 20. @datameer www.datameer.com sg AT datameer.com

×