使用 HttpClient 4.x 版本的 Http Get 如果要處理 status code 是 301(Moved Permanently)/302 (Found) 的 redirection 時, 需要透過方法 setRedirectStrategy() 修改DefaultRedirectStrategy 的行為, 底下為範例代碼:
- DefaultHttpClient httpClient = new DefaultHttpClient();
- httpClient.setRedirectStrategy(new DefaultRedirectStrategy() {
- @Override
- public HttpUriRequest getRedirect(HttpRequest request,
- HttpResponse response,
- HttpContext context)
- throws ProtocolException{
- HttpUriRequest redirect = super.getRedirect(request, response, context);
- redirtList.add(redirect.getURI().toString());
- System.out.printf("\t[Test] Redirect to '%s'\n", redirect.getURI());
- return redirect;
- }
- @Override
- public boolean isRedirected(HttpRequest request, HttpResponse resp, HttpContext context) {
- boolean isRedirect=false;
- if(!isRedirect)
- {
- try {
- isRedirect = super.isRedirected(request, resp, context);
- } catch (ProtocolException e) {
- e.printStackTrace();
- }
- }
- if(!isRedirect) {
- int responseCode = resp.getStatusLine().getStatusCode();
- if (responseCode == 301 || responseCode == 302) {
- isRedirect = true;
- }
- }
- return isRedirect;
- }
- });
- public static String Charset(HttpResponse resp, String def)
- {
- ContentType contentType = ContentType.getOrDefault(resp.getEntity());
- Charset charSet = contentType.getCharset();
- if(charSet!=null) return charSet.name();
- else return def;
- }
- public synchronized static String GetPageBody(HttpResponse resp, String cs) throws IOException {
- //String cs = Charset(resp, "utf-8");
- Header contentEncoding = resp.getFirstHeader("Content-Encoding");
- InputStream instream = resp.getEntity().getContent();
- if (contentEncoding != null && contentEncoding.getValue().equalsIgnoreCase("gzip")) {
- instream = new GZIPInputStream(instream);
- BufferedReader br = new BufferedReader(new InputStreamReader(instream, cs));
- StringBuffer pageBodyBuf = new StringBuffer();
- String line;
- while((line=br.readLine())!=null) pageBodyBuf.append(String.format("%s\n", line));
- return pageBodyBuf.toString().trim();
- } else {
- return EntityUtils.toString( resp.getEntity(), cs );
- }
- }
- HttpGet get = new HttpGet("http://pp.ceromulta.cl/");
- HttpResponse resp = httpClient.execute(get);
- String pageBody = HttpKit.GetPageBody(resp, "utf-8");
- System.out.printf("\t[Info] Page Body:\n%s\n", pageBody);
而我們的重點在於上面的紅體字 meta http-equiv="refresh", 透過它可以設定頁面自動 redirect. 而這種類型的 redirect 是 HttpClient 無法處理的 redirect. 因此下面的範例代碼在於說明如何處理這類型的 redirect.
範例代碼:
第一步我們定義了一個 Pattern 來捕捉這樣語法的出現:
- Pattern redirPtn = Pattern.compile("<(?:META|meta|Meta) (?:HTTP-EQUIV|http-equiv)=\"refresh\".*URL=(.*)\">");
然後當我們發現有 meta refresh 的語法出現, 透過一個 while loop 一直處理 refresh URL 並幫我們導向 refresh URL 指定的位置, 直到該頁面沒有 meta refresh. 完整代碼如下:
- Pattern redirPtn = Pattern.compile("<(?:META|meta|Meta) (?:HTTP-EQUIV|http-equiv)=\"refresh\".*URL=(.*)\">");
- HttpContext localContext = new BasicHttpContext();
- HttpGet get = new HttpGet("http://pp.ceromulta.cl/");
- HttpResponse resp = httpClient.execute(get, localContext);
- String pageBody = GetPageBody(resp, "utf-8");
- Matcher mth = redirPtn.matcher(pageBody);
- while(mth.find())
- {
- String rurl = mth.group(1);
- if(!rurl.startsWith("http:")) {
- HttpUriRequest currentReq = (HttpUriRequest) localContext.getAttribute(ExecutionContext.HTTP_REQUEST);
- HttpHost currentHost = (HttpHost) localContext.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
- String rdu = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());
- rurl = String.format("%s%s", rdu, rurl);
- }
- System.out.printf("\t[Info] Refresh URL=%s...\n", rurl);
- get = new HttpGet(rurl);
- pageBody = GetPageBody(httpClient.execute(get, localContext), "utf-8");
- mth = redirPtn.matcher(pageBody);
- }
- System.out.printf("\t[Info] Final Page Body:\n%s\n", pageBody);
Supplement:
* Meta 十大功用
沒有留言:
張貼留言