Saturday, March 20, 2010

Hadoop tips

1. Hadoop Core provides several mechanisms for setting the classpath for your application:

  • You can set up a fixed base classpath by altering hadoop-env.sh, via the HADOOP_CLASSPATHenvironment variable (on all of your machines) or by setting that environment variable in the runtime environment for the user that starts the Hadoop servers.

  • You may run your jobs via the bin/hadoop jar command and supply a -libjars argument with a list of JARs.
  • The DistributedCache object provides a way to add files or archives to your runtime classpath.
2.--deleteOutput, which must be the last argument, causes the output directory to be deleted before the job is started. This is convenient when running the job multiple times.
3.Configuration.get("slave.host.name") can get the hostname on each tasktracker.
   if (job.get("slave.host.name") != null) {
       this.hostname = job.get("slave.host.name");
     }

     if (hostname == null) {
       try {
this.hostname =
DNS.getDefaultHost
(job.get("mapred.tasktracker.dns.interface","default"),
job.get("mapred.tasktracker.dns.nameserver","default"));
} catch (UnknownHostException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
     }

No comments:

Post a Comment