Hadoop has proven to be an invaluable tool for many companies over the past few years. Yet it has it's ways and knowing them up front can safe valuable time. This session is a run down of the ever recurring lessons learned from running various Hadoop clusters in production since version 0.15.
What to expect from Hadoop - and what not? How to integrate Hadoop into existing infrastructure? Which data formats to use? What compression? Small files vs big files? Append or not? Essential configuration and operations tips. What about querying all the data? The project, the community and pointers to interesting projects that complement the Hadoop experience.