Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ramunas Balukonis. Research DWH

3,220 views

Published on

#BigDataBY

Published in: Software
  • Be the first to comment

  • Be the first to like this

Ramunas Balukonis. Research DWH

  1. 1. VISIT OUR BLOG: adform.com TWITTER: adforminsider Research of technologies for Big Data Analytics (2013-2014) 1 Ramūnas Balukonis, Adform
  2. 2. Our impressions growth 3  Now 2 blns transaction or 1,4 TB per day (RAW)  2012 we started to research for technology to process, load and provide data for analytics 0 50 100 150 200 250 300 350 400 450 500 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Impressions Per Year, BLNS of ROWS
  3. 3. Where we are now 4
  4. 4. DWH – our needs for Big Data Analytics 5  Query performance up to moments  No downtime window  Short time to market  Near real time latency  No backups  Unattended scaling  Inessential data loss and data discrepancies
  5. 5. 6
  6. 6. How we tested 7  Testing takes up 3 month for each technology to finish test  Testing env: 3X (24 Cores + 96 GB RAM + 800 GB RAID10)  Loaded 5 TB of data (non compressed data)
  7. 7. Candidates for BIG Data Analytics 8
  8. 8. IBM Netezza 9  Appliance: no commodity HW  No elastic scale out  Global presence, sales, delivery and support.
  9. 9. HP Vertica 10  Elastic scale out  Brilliant performance (Load/Select)  No stored procedures  No UI  Price per TB
  10. 10. SAP Sybase IQ 11  Scaling using shared disk  Similar to MS SQL (tools, logic, stored procs, system views and SP, BOL similar)  Concerns about easy of implementation and use  Price per core
  11. 11. Amazon Redshift 12  Price – the only player we tested that provides prices online  Filters impact on query performance badly  Cluster resize/scaling  Unstable connection
  12. 12. Calpont InfiniDB 13  Shared nothing  MySQL as front end – tools, connectors, procedures etc.  Community (offers prebuild solutions) or EE  Super fast load  Relatively slow query perf  Slow insert/update/delete
  13. 13. Where we are now 15
  14. 14. What we learned  Number of suitables technologies drops when TBs increses  Adopt technology to your requirements and not vice versa  No Silver Bullet:  Queries vs row store – 10X  Load speed vs row store – 4X  Compression vs row store – 4X  ... And we‘ll learn much more after we‘ll run our first report 16
  15. 15. Thank you Ramūnas Balukonis 17

×