Big Data at
Move
The past, present and future of Big
Data at Move - Realtor.com
About me
• Data Warehouse Architect
• Move Inc (realtor.com)
• Pluralsight Author
• Passion for data and technology
Ahmad ...
Topics…
• History of Move’s enterprise data warehouse
• Why Hadoop found a home at Move
• High level architecture
• Where ...
Move Inc.
• Leader in online real estate and operator of realtor.com
• Over 410 million minutes per month on Move websites...
growth…
-
1,000,000,000
2,000,000,000
3,000,000,000
4,000,000,000
5,000,000,000
6,000,000,000
7,000,000,000
Raw Events Mov...
proactive...
• Transitioned from legacy warehouse and ETLs
• Near real-time collection
• Bigger servers
• 8 processors 10 core each
• 2 TB of RAM!
• Solid state drives
• Fusion IO cards
• 10 Terabytes each ser...
• Started with 13 nodes at a fraction of the cost of our SSD monster servers - Cost
• Plan to continue to scale out – Ease...
Big picture…
In more details…
Hive over HCatalog
Transferred to HDFS and
then the Hive Warehouse
HDFS
External
Tables against
data in H...
ETL & Querying Hive…
Hive Warehouse
Aggregates
SQL
Server
(EDW)
Multi-Inserts
Single Pass
Details
Stats
ETL & Querying Hive…
Separate files for different keys of a Map
• Resort to MapReduce instead of Hive and use MultipleOutp...
Some lessons learned…
• Our ETLs are still expensive
• Putting our data loads and cluster at the mercy of our analysts. No...
Where we’re headed
• Re-evaluate tool selection
• Talend/Pentaho
• Real-time analytics
• Kafka/Honu/Flume/Storm/StreamInsi...
D3.js with Asp.Net SignalR
Visualizing search activity and active listings in different states
Questions?
Ahmad alkilani   big data at move
Upcoming SlideShare
Loading in …5
×

Ahmad alkilani big data at move

1,016 views
931 views

Published on

Learn about how Move.com uses BIG Data to support Realtor.com operations and continued innovation.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,016
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ahmad alkilani big data at move

  1. 1. Big Data at Move The past, present and future of Big Data at Move - Realtor.com
  2. 2. About me • Data Warehouse Architect • Move Inc (realtor.com) • Pluralsight Author • Passion for data and technology Ahmad Alkilani linkedin.com/in/ahmadalkilani EASkills.com
  3. 3. Topics… • History of Move’s enterprise data warehouse • Why Hadoop found a home at Move • High level architecture • Where we are now • Where we’re heading in the future • Q & A
  4. 4. Move Inc. • Leader in online real estate and operator of realtor.com • Over 410 million minutes per month on Move websites • Over 300 million user engagement events per day on realtor.com and mobile apps • Connecting consumers and customers requires lots of data
  5. 5. growth… - 1,000,000,000 2,000,000,000 3,000,000,000 4,000,000,000 5,000,000,000 6,000,000,000 7,000,000,000 Raw Events Move Inc. (Realtor.com and Mobile)
  6. 6. proactive... • Transitioned from legacy warehouse and ETLs • Near real-time collection
  7. 7. • Bigger servers • 8 processors 10 core each • 2 TB of RAM! • Solid state drives • Fusion IO cards • 10 Terabytes each server • Worked Great! • Until we realized we could only store 50 days worth of data! reactive … - 1 2 3 4 5 6 7 Billions Raw Events Move Inc. (Realtor.com and Mobile)
  8. 8. • Started with 13 nodes at a fraction of the cost of our SSD monster servers - Cost • Plan to continue to scale out – Ease of scalability • Current capacity is ~125 TB – Good starting point proactive …
  9. 9. Big picture…
  10. 10. In more details… Hive over HCatalog Transferred to HDFS and then the Hive Warehouse HDFS External Tables against data in HDFS Data moves to Hive Warehouse Dynamic Partition Inserts • Partition Pruning • Snappy Compression • Dynamic Tables with Maps and Arrays
  11. 11. ETL & Querying Hive… Hive Warehouse Aggregates SQL Server (EDW) Multi-Inserts Single Pass Details Stats
  12. 12. ETL & Querying Hive… Separate files for different keys of a Map • Resort to MapReduce instead of Hive and use MultipleOutputs class • Dynamic Partition Inserts again & Hadoop -getmerge
  13. 13. Some lessons learned… • Our ETLs are still expensive • Putting our data loads and cluster at the mercy of our analysts. Not a very good idea • Use Queues to guarantee room for ETLs to do their job • Default queue is for users • Specialized queue is for ETL • Keep an eye on the slots available • Use .hiverc file to automatically control behavior
  14. 14. Where we’re headed • Re-evaluate tool selection • Talend/Pentaho • Real-time analytics • Kafka/Honu/Flume/Storm/StreamInsight • Hive Geospatial • Integrating different technologies is OK
  15. 15. D3.js with Asp.Net SignalR Visualizing search activity and active listings in different states
  16. 16. Questions?

×