Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dublin 2014

537 views

Published on

A lightning talk from NoSQL Matters Dublin on why we need to stop doing ETL and focus on ELT, and how the Hadoop approach helps you short cut the model, parse, query loop when processing data.

Published in: Software
  • Be the first to comment

Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dublin 2014

  1. 1. Why Hadoop and SQL just want to be friends Simon Elliston Ball @sireb
  2. 2. ETL OLTP EDW Archive ETL
  3. 3. ETL OLTP EDW Archive ETL
  4. 4. ETL OLTP ETL EDW Archive
  5. 5. ETL More data Shorter windows Wider queries
  6. 6. ETL OLTP EDW Archive ETL Sqoop Pig Hive Oozie Falcon
  7. 7. ETL OLTP EDW Archive ETL Less structured Sqoop
  8. 8. ELT: saving the T for later 2012-01-06 09:22:27 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /ustensiles - 80 Test0001 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZ WX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE 6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914 CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803 C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19E E5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B3788 45AE627979EE54 http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2Fustensiles site.supersimple.fr 200 0 0 7136 849 1249
  9. 9. ELT: saving the T for later Schema on write: Model Parse Store Query ● Keep going back to the drawing board ● Reprocessing all the data
  10. 10. ELT: saving the T for later Schema on read: Store Query Model Parse ● Only model what you need ● Agile Data Modelling ● Don’t move the data
  11. 11. Cost per TB...
  12. 12. Come for the cheap storage... The Data Lake https://www.flickr.com/photos/msvg/5891279010
  13. 13. ...stay for the analytics Machine learning libraries Recommendation systems Batch Big Data
  14. 14. Summary Hadoop can: ● Improve your ETL processing ● Help you with unstructured data ● Save you money
  15. 15. Thank you! Simon Elliston Ball @sireb

×