Netlib: CS? HPC at ORNL; PVM/MPI as precursors to Hadoop; Not for exactly the same purposes (explain) Messaging/CEP/Analytics: Algorithmic trading - Usually predictive data mining for definition of complex events prior to CEP runtime JTV/DHS: CEP for anti-money laundering (AML), etc. BI: 180,000 reports daily; large-scale ETL (Pentaho and/or Talend) Interests: Links to Scala/Hadoop, R/Hadoop
Setting Up Your First Hadoop Cluster Chad Vawter TriHUG Meeting: July 20, 2010
Each machine in a Hadoop cluster has a configuration script for environment settings.
Edit the hadoop-env.sh Bash script on each machine or have a mechanism for sharing environment settings; e.g., rsync .
Values for many environment variables can be identical for all machines in the cluster. Not all machines will have the same hardware profile, though. Configure each machine’s Hadoop environment so that it best uses its resources.
This file defines which machines will run datanodes and/or tasktrackers
Note: We don’t need to specify which machine(s) will run a NameNode and/or a JobTracker. The Hadoop control scripts are responsible for NamNode and JobTracker nodes when they are run on a given machine.