See http://wiki.apache.org/hadoop/
http://hadoop.apache.org/
and
http://hadoop.apache.org/
Hadoop performance analysis tool.
Currently there are 2 ways to analyze performance of hadoop cluster and jobs
1. Hadoop Vaidya: is a performance diagnostic tool for hadoop jobs which
executes a set of rules against the job counters and gives a report of
performance improvement areas as a result. But Hadoop Vaidya is a separate
contrib module and has a limited set of rules as of now.
2. Hadoop Metrics(used with Ganglia or Nagios): All Hadoop daemons expose
runtime metrics which are then analyzed by some cluster monitoring system
like Ganglia or Nagios. Hadoop Metrics has a lot of dependencies on third
party libraries and limited number of metrics as of now.
3. Gang lia http://ganglia.sourceforge.net/
Number of Map tasks
Setting the number of map tasks (mapred.map.tasks or JobConf.setNumMapTasks()) does not guarantee that number of maps in the job will be set to that. It will only be used as a hint. Number of maps is decided by your InputFormat. You should implement InputFormat.getSplits() to define how the input should be split. The fact is "number of splits is equal to the number of maps".
If you are using default InputFormat (i.e. TextInputFormat), number of maps is decided by DFS block size. If you use NLineInputFormat with mapred.line.input.format.
More details @
http://hadoop.apache.org/
You might also be interested in Karmasphere Studio for developing and debugging MapReduce jobs on Hadoop clusters. On the monitoring front, it includes various graphical capabilities for monitoring Hadoop clusters and file systems. We have a new release coming up with even more goodies in it too.
ReplyDeleteSee: http://www.hadoopstudio.org
Cool, sounds great. Thank you for sharing.
ReplyDelete