Preface
Files and Directories Used in this Exercise
In this exercise, you will create a custom WritableComparable type that holds two strings.
Test the new type by creating a simple program that reads a list of names (first and last) and counts the number of occurences of each name. The mapper should accepts lines in the form:
The goal is to count the number of times a lastname/firstname pair occur within the dataset. For example, for input:
We want to output:
Solution Code
You need to implement a WritableComparable object that holds the two strings. After that, you will need to implement the readFields, write and compareTomethods required by WritableComparable and generate hashCode and equals methods. Here we define StringPairWritable class to hold pair strings information:
- Custom WritableComparable
The mapper just extract first name/last name pair and use them as key to count the occurence of each name:
- Mapper
There are some plug-in reducers for us to use. Here we leverage
LongSumReducer to count the occurence of each name. The last is the driver class:
- Driver
Lab Experiment
You can use the simple test data in ~/training_materials/developer/data/nameyeartestdata to make sure your new type works as expected.
1. Build project and execute MapReduce job
2. Check output result
Supplement
* [ Java Essence ] 記憶中的那個東西 : 要怎麼參考呢 (物件相等性)
Files and Directories Used in this Exercise
In this exercise, you will create a custom WritableComparable type that holds two strings.
Test the new type by creating a simple program that reads a list of names (first and last) and counts the number of occurences of each name. The mapper should accepts lines in the form:
The goal is to count the number of times a lastname/firstname pair occur within the dataset. For example, for input:
- Smith Joe 1963-08-12 Poughkeepsie, NY
- Smith Joe 1832-01-20 Sacramento, CA
- Murphy Alice 2004-06-02 Berlin, MA
Solution Code
You need to implement a WritableComparable object that holds the two strings. After that, you will need to implement the readFields, write and compareTomethods required by WritableComparable and generate hashCode and equals methods. Here we define StringPairWritable class to hold pair strings information:
- Custom WritableComparable
- package solution;
- import java.io.DataInput;
- import java.io.DataOutput;
- import java.io.IOException;
- import org.apache.hadoop.io.WritableComparable;
- public class StringPairWritable implements WritableComparable
{ - String left;
- String right;
- /**
- * Empty constructor - required for serialization.
- */
- public StringPairWritable() {
- }
- /**
- * Constructor with two String objects provided as input.
- */
- public StringPairWritable(String left, String right) {
- this.left = left;
- this.right = right;
- }
- /**
- * Serializes the fields of this object to out.
- */
- public void write(DataOutput out) throws IOException {
- out.writeUTF(left);
- out.writeUTF(right);
- }
- /**
- * Deserializes the fields of this object from in.
- */
- public void readFields(DataInput in) throws IOException {
- left = in.readUTF();
- right = in.readUTF();
- }
- /**
- * Compares this object to another StringPairWritable object by
- * comparing the left strings first. If the left strings are equal,
- * then the right strings are compared.
- */
- public int compareTo(StringPairWritable other) {
- int ret = left.compareTo(other.left);
- if (ret == 0) {
- return right.compareTo(other.right);
- }
- return ret;
- }
- /**
- * A custom method that returns the two strings in the
- * StringPairWritable object inside parentheses and separated by
- * a comma. For example: "(left,right)".
- */
- public String toString() {
- return "(" + left + "," + right + ")";
- }
- /**
- * The equals method compares two StringPairWritable objects for
- * equality. The equals and hashCode methods have been automatically
- * generated by Eclipse by right-clicking on an empty line, selecting
- * Source, and then selecting the Generate hashCode() and equals()
- * option.
- */
- @Override
- public boolean equals(Object obj) {
- if (this == obj)
- return true;
- if (obj == null)
- return false;
- if (getClass() != obj.getClass())
- return false;
- StringPairWritable other = (StringPairWritable) obj;
- if (left == null) {
- if (other.left != null)
- return false;
- } else if (!left.equals(other.left))
- return false;
- if (right == null) {
- if (other.right != null)
- return false;
- } else if (!right.equals(other.right))
- return false;
- return true;
- }
- /**
- * The hashCode method generates a hash code for a StringPairWritable
- * object. The equals and hashCode methods have been automatically
- * generated by Eclipse by right-clicking on an empty line, selecting
- * Source, and then selecting the Generate hashCode() and equals()
- * option.
- */
- @Override
- public int hashCode() {
- final int prime = 31;
- int result = 1;
- result = prime * result + ((left == null) ? 0 : left.hashCode());
- result = prime * result + ((right == null) ? 0 : right.hashCode());
- return result;
- }
- }
- Mapper
- package solution;
- import java.io.IOException;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- public class StringPairMapper extends
- Mapper
{ - @Override
- public void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- LongWritable one = new LongWritable(1);
- /*
- * Split the line into words. Create a new StringPairWritable consisting
- * of the first two strings in the line. Emit the pair as the key, and
- * '1' as the value (for later summing).
- */
- String[] words = value.toString().split("\\W+", 3);
- if (words.length > 2) {
- context.write(new StringPairWritable(words[0], words[1]), one);
- }
- }
- }
- Driver
- package solution;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.mapreduce.lib.reduce.LongSumReducer;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.conf.Configured;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.util.Tool;
- import org.apache.hadoop.util.ToolRunner;
- public class StringPairTestDriver extends Configured implements Tool {
- @Override
- public int run(String[] args) throws Exception {
- if (args.length != 2) {
- System.out.printf("Usage: " + this.getClass().getName() + " );
- return -1;
- }
- Job job = new Job(getConf());
- job.setJarByClass(StringPairTestDriver.class);
- job.setJobName("Custom Writable Comparable");
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- /*
- * LongSumReducer is a Hadoop API class that sums values into
- * A LongWritable. It works with any key and value type, therefore
- * supports the new StringPairWritable as a key type.
- */
- job.setReducerClass(LongSumReducer.class);
- job.setMapperClass(StringPairMapper.class);
- /*
- * Set the key output class for the job
- */
- job.setOutputKeyClass(StringPairWritable.class);
- /*
- * Set the value output class for the job
- */
- job.setOutputValueClass(LongWritable.class);
- boolean success = job.waitForCompletion(true);
- return success ? 0 : 1;
- }
- public static void main(String[] args) throws Exception {
- int exitCode = ToolRunner.run(new Configuration(), new StringPairTestDriver(), args);
- System.exit(exitCode);
- }
- }
You can use the simple test data in ~/training_materials/developer/data/nameyeartestdata to make sure your new type works as expected.
1. Build project and execute MapReduce job
2. Check output result
Supplement
* [ Java Essence ] 記憶中的那個東西 : 要怎麼參考呢 (物件相等性)
沒有留言:
張貼留言