• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Ahmad alkilani   big data at move
 

Ahmad alkilani big data at move

on

  • 735 views

Learn about how Move.com uses BIG Data to support Realtor.com operations and continued innovation.

Learn about how Move.com uses BIG Data to support Realtor.com operations and continued innovation.

Statistics

Views

Total Views
735
Views on SlideShare
717
Embed Views
18

Actions

Likes
0
Downloads
1
Comments
0

2 Embeds 18

http://www.linkedin.com 15
https://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Ahmad alkilani   big data at move Ahmad alkilani big data at move Presentation Transcript

    • Big Data at Move The past, present and future of Big Data at Move - Realtor.com
    • About me • Data Warehouse Architect • Move Inc (realtor.com) • Pluralsight Author • Passion for data and technology Ahmad Alkilani linkedin.com/in/ahmadalkilani EASkills.com
    • Topics… • History of Move’s enterprise data warehouse • Why Hadoop found a home at Move • High level architecture • Where we are now • Where we’re heading in the future • Q & A
    • Move Inc. • Leader in online real estate and operator of realtor.com • Over 410 million minutes per month on Move websites • Over 300 million user engagement events per day on realtor.com and mobile apps • Connecting consumers and customers requires lots of data
    • growth… - 1,000,000,000 2,000,000,000 3,000,000,000 4,000,000,000 5,000,000,000 6,000,000,000 7,000,000,000 Raw Events Move Inc. (Realtor.com and Mobile)
    • proactive... • Transitioned from legacy warehouse and ETLs • Near real-time collection
    • • Bigger servers • 8 processors 10 core each • 2 TB of RAM! • Solid state drives • Fusion IO cards • 10 Terabytes each server • Worked Great! • Until we realized we could only store 50 days worth of data! reactive … - 1 2 3 4 5 6 7 Billions Raw Events Move Inc. (Realtor.com and Mobile)
    • • Started with 13 nodes at a fraction of the cost of our SSD monster servers - Cost • Plan to continue to scale out – Ease of scalability • Current capacity is ~125 TB – Good starting point proactive …
    • Big picture…
    • In more details… Hive over HCatalog Transferred to HDFS and then the Hive Warehouse HDFS External Tables against data in HDFS Data moves to Hive Warehouse Dynamic Partition Inserts • Partition Pruning • Snappy Compression • Dynamic Tables with Maps and Arrays
    • ETL & Querying Hive… Hive Warehouse Aggregates SQL Server (EDW) Multi-Inserts Single Pass Details Stats
    • ETL & Querying Hive… Separate files for different keys of a Map • Resort to MapReduce instead of Hive and use MultipleOutputs class • Dynamic Partition Inserts again & Hadoop -getmerge
    • Some lessons learned… • Our ETLs are still expensive • Putting our data loads and cluster at the mercy of our analysts. Not a very good idea • Use Queues to guarantee room for ETLs to do their job • Default queue is for users • Specialized queue is for ETL • Keep an eye on the slots available • Use .hiverc file to automatically control behavior
    • Where we’re headed • Re-evaluate tool selection • Talend/Pentaho • Real-time analytics • Kafka/Honu/Flume/Storm/StreamInsight • Hive Geospatial • Integrating different technologies is OK
    • D3.js with Asp.Net SignalR Visualizing search activity and active listings in different states
    • Questions?