2015年4月26日 星期日

[ 常見問題 ] How to get the input file name in the mapper in a Hadoop program?

Source From Here
Question
How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to know which file the mapper has read.

How-To
First you need to get the InputSplit object, using the MapReduce v2 API it would be done as follows:
  1. ...  
  2.     @Override  
  3.     public void map(LongWritable key, Text value, Context context)  
  4.             throws IOException, InterruptedException {  
  5.           
  6.         InputSplit inputSplit = context.getInputSplit();  
  7.     }  
  8. ...  
But in order to get the file path and the file name you will need to first typecast the result into FileSplit. So, in order to get the input file path you may do the following:
  1. Path filePath = ((FileSplit) context.getInputSplit()).getPath();  
  2. String filePathString = filePath.toString();  
Similarly, to get the file name, you may just call upon getName(), like this:
  1. String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();  


沒有留言:

張貼留言

[JS 文章收集] 用 Node.js 學 JavaScript 語言(1)簡介與安裝

Source From  Here   簡介   Node.js  是 Ryan Dahl 基於 Google 的 V8 引擎於 2009 年釋出的一個 JavaScript 開發平台,主要聚焦於 Web 程式的開發,通常用被來寫網站。但是,要開發網站就勢必要把「 HTML,...