Preface
File and Directories Used in this Exercise (P30)
In this Exercise, you will implement a driver using ToolRunner.
Follow the steps below to start with the Average Word Length program you wrote in an earlier exercise, and modify the driver to use ToolRunner. Then modify the Mapper to reference a Boolean parameter called caseSensitive; if true, the mapper should treat upper and lower case letters as different; if false or unset, all letters should be converted to lower case.
Source Code
Driver
By using ToolRunner, you can easily pass argument in command line and the Mapper/Reducer can have different behavior based on the arguments given in command line. This time your driver class should extends Configured and implements Tool:
- solution/AvgWordLength.java
Mapper
The mapper reads parameter caseSensitive from setup() and uses it to decide your letter is case sensitive or not:
- solution/LetterMapper.java
Reducer
The reducer is doing sum operation and output the frequency of letter:
- solution/AverageReducer.java
Lab Experiment
1. Build project and run MapReduce
2. Review the result
File and Directories Used in this Exercise (P30)
In this Exercise, you will implement a driver using ToolRunner.
Follow the steps below to start with the Average Word Length program you wrote in an earlier exercise, and modify the driver to use ToolRunner. Then modify the Mapper to reference a Boolean parameter called caseSensitive; if true, the mapper should treat upper and lower case letters as different; if false or unset, all letters should be converted to lower case.
Source Code
Driver
By using ToolRunner, you can easily pass argument in command line and the Mapper/Reducer can have different behavior based on the arguments given in command line. This time your driver class should extends Configured and implements Tool:
- solution/AvgWordLength.java
- package solution;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.DoubleWritable;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.conf.Configured;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.util.Tool;
- import org.apache.hadoop.util.ToolRunner;
- public class AvgWordLength extends Configured implements Tool {
- public static void main(String[] args) throws Exception {
- Configuration conf = new Configuration();
- /*
- * set the caseSensitive configuration value for the job
- * programmatically.
- * Comment code out to set from the command line instead.
- */
- //conf.setBoolean("caseSensitive", false);
- int exitCode = ToolRunner.run(conf, new AvgWordLength(),
- args);
- System.exit(exitCode);
- }
- @Override
- public int run(String[] args) throws Exception {
- /*
- * Validate that two arguments were passed from the command line.
- */
- if (args.length != 2) {
- System.out
- .printf("Usage: AvgWordLength );
- System.exit(-1);
- }
- /*
- * Instantiate a Job object for your job's configuration.
- */
- Job job = new Job(getConf());
- /*
- * Specify the jar file that contains your driver, mapper, and reducer.
- * Hadoop will transfer this jar file to nodes in your cluster running
- * mapper and reducer tasks.
- */
- job.setJarByClass(AvgWordLength.class);
- /*
- * Specify an easily-decipherable name for the job. This job name will
- * appear in reports and logs.
- */
- job.setJobName("Average Word Length");
- /*
- * Specify the paths to the input and output data based on the
- * command-line arguments.
- */
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- /*
- * Specify the mapper and reducer classes.
- */
- job.setMapperClass(LetterMapper.class);
- job.setReducerClass(AverageReducer.class);
- /*
- * The input file and output files are text files, so there is no need
- * to call the setInputFormatClass and setOutputFormatClass methods.
- */
- /*
- * The mapper's output keys and values have different data types than
- * the reducer's output keys and values. Therefore, you must call the
- * setMapOutputKeyClass and setMapOutputValueClass methods.
- */
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(IntWritable.class);
- /*
- * Specify the job's output key and value classes.
- */
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(DoubleWritable.class);
- /*
- * Start the MapReduce job and wait for it to finish. If it finishes
- * successfully, return 0. If not, return 1.
- */
- boolean success = job.waitForCompletion(true);
- return(success ? 0 : 1);
- }
- }
The mapper reads parameter caseSensitive from setup() and uses it to decide your letter is case sensitive or not:
- solution/LetterMapper.java
- package solution;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.conf.Configuration;
- /**
- * To define a map function for your MapReduce job, subclass the Mapper class
- * and override the map method. The class definition requires four parameters:
- *
- * @param The
- * data type of the input key - LongWritable
- * @param The
- * data type of the input value - Text
- * @param The
- * data type of the output key - Text
- * @param The
- * data type of the output value - IntWritable
- */
- public class LetterMapper extends Mapper
{ - boolean caseSensitive = false;
- /**
- * The map method runs once for each line of text in the input file. The
- * method receives:
- *
- * @param A
- * key of type LongWritable
- * @param A
- * value of type Text
- * @param A
- * Context object.
- */
- @Override
- public void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- /*
- * Convert the line, which is received as a Text object, to a String
- * object.
- */
- String line = value.toString();
- /*
- * The line.split("\\W+") call uses regular expressions to split the
- * line up by non-word characters. If you are not familiar with the use
- * of regular expressions in Java code, search the web for
- * "Java Regex Tutorial."
- */
- for (String word : line.split("\\W+")) {
- if (word.length() > 0) {
- /*
- * Obtain the first letter of the word
- */
- String letter;
- if (caseSensitive)
- letter = word.substring(0, 1);
- else
- letter = word.substring(0, 1).toLowerCase();
- /*
- * Call the write method on the Context object to emit a key and
- * a value from the map method. The key is the letter (in
- * lower-case) that the word starts with; the value is the
- * word's length.
- */
- context.write(new Text(letter), new IntWritable(word.length()));
- }
- }
- }
- @Override
- public void setup(Context context) {
- Configuration conf = context.getConfiguration();
- caseSensitive = conf.getBoolean("caseSensitive", false);
- }
- }
The reducer is doing sum operation and output the frequency of letter:
- solution/AverageReducer.java
- package solution;
- import java.io.IOException;
- import org.apache.hadoop.io.DoubleWritable;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
- /**
- * To define a reduce function for your MapReduce job, subclass
- * the Reducer class and override the reduce method.
- * The class definition requires four parameters:
- * @param The data type of the input key - Text
- * @param The data type of the input value - IntWritable
- * @param The data type of the output key - Text
- * @param The data type of the output value - DoubleWritable
- */
- public class AverageReducer extends
- Reducer
{ - /**
- * The reduce method runs once for each key received from
- * the shuffle and sort phase of the MapReduce framework.
- * The method receives:
- * @param A key of type Text
- * @param A set of values of type IntWritable
- * @param A Context object
- */
- @Override
- public void reduce(Text key, Iterable
values, Context context) - throws IOException, InterruptedException {
- long sum = 0, count = 0;
- /*
- * For each value in the set of values passed to us by the mapper:
- */
- for (IntWritable value : values) {
- /*
- * Add up the values and increment the count
- */
- sum += value.get();
- count++;
- }
- if (count != 0) {
- /*
- * The average length is the sum of the values divided by the count.
- */
- double result = (double)sum / (double)count;
- /*
- * Call the write method on the Context object to emit a key
- * (the words' starting letter) and a value (the average length
- * per word starting with this letter) from the reduce method.
- */
- context.write(key, new DoubleWritable(result));
- }
- }
- }
1. Build project and run MapReduce
2. Review the result
沒有留言:
張貼留言