Projects and Directories Used in this Exercise
In this exercise you write a MapReduce job that reads any text input and computes the average length of all words that start with each char. For any text input, the job should report the average length of words that begin with 'a', 'b' and so forth. For example, for input:
The output would be:
The algorithm for this program is a simple one-pass MapReduce program:
The Mapper receives a line of text for each input value. (Ignore the input key.) For each word in the line, emit the first letter of the word as a key, and the length of the word as a value. Check source code below:
Thanks to the shuffle and sort phrase built into MapReduce, the Reduce receives the keys in sorted order, and all the values for one key are grouped together. So for the Mapper output above, the Receive source code as below:
The driver is almost the same as the one in WordCount. Source code as below:
Under path ~/workspace/averagewordlength, you should have a build.xml file which you can use ant to build the project.
1. Build the project
2. Run the MapReduce program
3. Review the results
The file should list all the numbers and letters in the data set, and the average length of the words starting with them.