Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ahmad alkilani big data at move

1,156 views

Published on

Learn about how Move.com uses BIG Data to support Realtor.com operations and continued innovation.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Ahmad alkilani big data at move

  1. 1. Big Data at Move The past, present and future of Big Data at Move - Realtor.com
  2. 2. About me • Data Warehouse Architect • Move Inc (realtor.com) • Pluralsight Author • Passion for data and technology Ahmad Alkilani linkedin.com/in/ahmadalkilani EASkills.com
  3. 3. Topics… • History of Move’s enterprise data warehouse • Why Hadoop found a home at Move • High level architecture • Where we are now • Where we’re heading in the future • Q & A
  4. 4. Move Inc. • Leader in online real estate and operator of realtor.com • Over 410 million minutes per month on Move websites • Over 300 million user engagement events per day on realtor.com and mobile apps • Connecting consumers and customers requires lots of data
  5. 5. growth… - 1,000,000,000 2,000,000,000 3,000,000,000 4,000,000,000 5,000,000,000 6,000,000,000 7,000,000,000 Raw Events Move Inc. (Realtor.com and Mobile)
  6. 6. proactive... • Transitioned from legacy warehouse and ETLs • Near real-time collection
  7. 7. • Bigger servers • 8 processors 10 core each • 2 TB of RAM! • Solid state drives • Fusion IO cards • 10 Terabytes each server • Worked Great! • Until we realized we could only store 50 days worth of data! reactive … - 1 2 3 4 5 6 7 Billions Raw Events Move Inc. (Realtor.com and Mobile)
  8. 8. • Started with 13 nodes at a fraction of the cost of our SSD monster servers - Cost • Plan to continue to scale out – Ease of scalability • Current capacity is ~125 TB – Good starting point proactive …
  9. 9. Big picture…
  10. 10. In more details… Hive over HCatalog Transferred to HDFS and then the Hive Warehouse HDFS External Tables against data in HDFS Data moves to Hive Warehouse Dynamic Partition Inserts • Partition Pruning • Snappy Compression • Dynamic Tables with Maps and Arrays
  11. 11. ETL & Querying Hive… Hive Warehouse Aggregates SQL Server (EDW) Multi-Inserts Single Pass Details Stats
  12. 12. ETL & Querying Hive… Separate files for different keys of a Map • Resort to MapReduce instead of Hive and use MultipleOutputs class • Dynamic Partition Inserts again & Hadoop -getmerge
  13. 13. Some lessons learned… • Our ETLs are still expensive • Putting our data loads and cluster at the mercy of our analysts. Not a very good idea • Use Queues to guarantee room for ETLs to do their job • Default queue is for users • Specialized queue is for ETL • Keep an eye on the slots available • Use .hiverc file to automatically control behavior
  14. 14. Where we’re headed • Re-evaluate tool selection • Talend/Pentaho • Real-time analytics • Kafka/Honu/Flume/Storm/StreamInsight • Hive Geospatial • Integrating different technologies is OK
  15. 15. D3.js with Asp.Net SignalR Visualizing search activity and active listings in different states
  16. 16. Questions?

×