4. As of 2009, 72% percent of patches going into the Hadoop source code were coming from Yahoo! Developing Hadoop at Yahoo! 8/14/10 4
5. Yahoo! provides extensive QE and QA resources to test Hadoop releases at scale. Developing Hadoop at Yahoo! 8/14/10 5
6. Developing Hadoop at Yahoo! 8/14/10 6 The Yahoo! distribution of Hadoop, available on Github, is the same code we run internally on our servers. Patches important to stability and performance and stability are applied here, as well as Apache.
Discussion of how security was not a huge priority. File system permissions not added until 17.
Kerberos was chosen because it’s a tested, trusted solution. In use at Yahoo! already. Hadoop actors – users, servers such as NameNode, JobTracker, DNs, TTs all authenticate with Kerberos as principals. This allows Hadoop, for the first time to be able to trust the identity of its various components.
Kerberos provides single-sign-on serviceKinit, kdestroyCan be configured to automatically initialize via PAMBy default last 10 hours, renewable 7 days
Overall, the entire ship has been tightened. We believe that we’ve secured each of the data access points that were shown in the big scary picture previously.
* Secure Distributed Cache
For the majority of jobs, there will be no changes necessary to run under security. It was important to make the switch to security as painless as possible since there are already thousands of different jobs running on our clusters, hundreds of thousands around the world and those needed to continue to run. Also, user education is very difficult.