by Vincent Yates
Director of Analytic Engineering at Zillow Group
Fountain of Youth or Polluted Swamp: Is your data lake revitalizing your business or eroding the foundation?
We’ve all been promised the shangri-la that is data lakes: more data means more insights—synergy! But has it really panned out? The trouble is that data lakes are more like the early days of the internet than they are a panacea of pristine useful information. Anyone can publish data, and even when they have the best of intentions, priorities shift, people leave and ultimately the priceless data become worthless. Those data may have been reliable when they were first published but are now wrong. Yet like many stale webpages, there is no way to tell, and the business continues to rely on those wrong data to make decisions. We at Zillow faced the same problem and decided to change it. I will describe the tools we’ve built and the tenants behind our team to help you ensure your lake rejuvenates your organization. Einstein said it best, “whoever is careless with the truth in small matters cannot be trusted with important matters.”