Preface
Files and Directories Used in this Exercise
In this exercise, you will analyze a log file from a web server to count the number of hits made from each unique IP address.
Your task is to count the number of hits made from each IP address in the sample web server log file that you uploaded to the /user/training/weblog directory in HDFS when you complete the "Using HDFS" exercise.
Source Code
Mapper
Extract the IP address field and output pairs:
- solution/LogFileMapper.java
Reducer
The reducer just do the sum operation on each ip and output pairs:
- solution/SumReducer.java
Driver
The driver is quite straight forward:
- solution/ProcessLogs.java
Lab Experiment
1. Build the project and run the MapReduce program
2. Review the result
Files and Directories Used in this Exercise
In this exercise, you will analyze a log file from a web server to count the number of hits made from each unique IP address.
Your task is to count the number of hits made from each IP address in the sample web server log file that you uploaded to the /user/training/weblog directory in HDFS when you complete the "Using HDFS" exercise.
Source Code
Mapper
Extract the IP address field and output
- solution/LogFileMapper.java
- package solution;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- /**
- * Example input line:
- * 96.7.4.14 - - [24/Apr/2011:04:20:11 -0400] "GET /cat.jpg HTTP/1.1" 200 12433
- *
- */
- public class LogFileMapper extends Mapper
{ - @Override
- public void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- /*
- * Split the input line into space-delimited fields.
- */
- String[] fields = value.toString().split(" ");
- if (fields.length > 0) {
- /*
- * Emit the first field - the IP address - as the key
- * and the number 1 as the value.
- */
- String ip = fields[0];
- context.write(new Text(ip), new IntWritable(1));
- }
- }
- }
The reducer just do the sum operation on each ip and output
- solution/SumReducer.java
- package solution;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
- /**
- * This is the SumReducer class from the word count exercise
- */
- public class SumReducer extends Reducer
{ - @Override
- public void reduce(Text key, Iterable
values, Context context) - throws IOException, InterruptedException {
- int wordCount = 0;
- for (IntWritable value : values) {
- wordCount += value.get();
- }
- context.write(key, new IntWritable(wordCount));
- }
- }
The driver is quite straight forward:
- solution/ProcessLogs.java
- package solution;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.mapreduce.Job;
- public class ProcessLogs {
- public static void main(String[] args) throws Exception {
- if (args.length != 2) {
- System.out.printf("Usage: ProcessLogs );
- System.exit(-1);
- }
- Job job = new Job();
- job.setJarByClass(ProcessLogs.class);
- job.setJobName("Process Logs");
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- job.setMapperClass(LogFileMapper.class);
- job.setReducerClass(SumReducer.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- boolean success = job.waitForCompletion(true);
- System.exit(success ? 0 : 1);
- }
- }
1. Build the project and run the MapReduce program
2. Review the result
沒有留言:
張貼留言