程式扎記

Preface:
假設裡有個 URL, 你希望得到該 URL 連到的 Web page 的影像, 硬幹的方法是:

1. 打開瀏覽器
2. 輸入 URL
3. 影像擷取

但如果你要 automation 的話, 則可以利用 Java 的 Deskop 類別上的方法 browse(URI uri) 幫你打開瀏覽器並輸入 URL; 並利用類別 Robot 上面的方法 createScreenCapture(Rectangle screenRect) 幫你擷取桌面上的截圖. 然而這樣的方法必須在有 UI 的環境下才生效, 如果你是在 Linux runlevel<=2 (沒有 UI) 的情況下, 則會有 exception 發生. 此時就麻煩點要採用第二種方法.

方法一:
使用 Java 自帶類別就可以辦到, 但是前提必須待在有 UI 的環境下 (這樣才打得開瀏覽器 orz), 底下為範例代碼:

view plaincopy to clipboardprint?
/** 
* BD: Take Image of web page from given URL. 
* REF: 
*  - http://stackoverflow.com/questions/2746289/converting-webpage-into-jpeg-image-using-java 
*  - http://stackoverflow.com/questions/602032/getting-java-gui-to-open-a-webpage-in-web-browser 
*  - http://stackoverflow.com/questions/7032556/how-to-take-a-webpage-screenshot 
* @param url: URL to browse 
* @param output: Output image file (.png) 
* @param st: Sleep time which should exceed browsing time to have correct image. 
* @throws Exception 
*/  
public static void TakeImageOfWebPage(String url, File output, long st) throws Exception  
{         
    Desktop.getDesktop().browse(new URI(url));  
       Robot robot = new Robot();  
       Dimension screenDim = Toolkit.getDefaultToolkit().getScreenSize();  
       Rectangle rect = new Rectangle(screenDim);  
       Thread.sleep(st);  
       BufferedImage img = robot.createScreenCapture( rect );          
       ImageIO.write(img, "png", output);  
}  

假設你要 take 雅虎的 web page image, 則可以如下使用上面 API:

view plaincopy to clipboardprint?
public static void main(String[] args) throws Exception{          
    TakeImageOfWebPage("http://www.kimo.com.tw", new File("yahoo.png"), 1000);            
}  

此時你會發現你的瀏覽器被打開, 而輸出的 "yahoo.png" 即是當前 yahoo 的首頁 image. 這會有個困擾就是當你執行幾次, 你的瀏覽器就會產生多少個分頁....

方法二:
第二個方法不會有第一個方法造成瀏覽器產生多個分頁的問題, 而且是使用在 Linux runlevel=2 沒有 UI 的環境下. 但是在使用這個方法前有些 library 要去下載:

* selenium: Selenium browser automation framework
* Apache Commons IO: Commons IO is a library of utilities to assist with developing IO functionality.
* google-collections: Guava is a fully compatible superset of the old Google Collections Library.
* java-json.jar

除此之外, 你還需要安裝 xvfb, 如果是在 Ubuntu, 下面的命令會幫你搞定:

# apt-get install xvfb

接著下面是範例代碼:

view plaincopy to clipboardprint?
   private static int      DISPLAY_NUMBER  = 99;  
   private static String   XVFB            = "/usr/bin/Xvfb";  
   private static String   XVFB_COMMAND    = XVFB + " :" + DISPLAY_NUMBER;      
public static void TakeImageOfWebPageViaFirefox(String url, String fn) throws Exception  
{  
    // http://stackoverflow.com/questions/7032556/how-to-take-a-webpage-screenshot  
    // https://code.google.com/p/selenium/  
    Process p = Runtime.getRuntime().exec(XVFB_COMMAND);  
       FirefoxBinary firefox = new FirefoxBinary();  
       firefox.setEnvironmentProperty("DISPLAY", ":" + DISPLAY_NUMBER);  
       WebDriver driver = new FirefoxDriver(firefox, null);  
       driver.get(url);  
       File scrFile = ( (TakesScreenshot) driver ).getScreenshotAs(OutputType.FILE);  
       FileUtils.copyFile(scrFile, new File(fn));  
       driver.close();  
       p.destroy();  
}  

這次我們來 take google web page 的 image 並存成檔案 "google.png":

view plaincopy to clipboardprint?
TakeImageOfWebPageViaFirefox("http://www.google.com.tw", "google.png");   

Supplement:
* stackoverflow: How to take a webpage screenshot?
* ConvertWebpage: 工具幫你自動轉換
* CutyCapt

CutyCapt is a small cross-platform command-line utility to capture WebKit's rendering of a web page into a variety of vector and bitmap formats, including SVG, PDF, PS, PNG, JPEG, TIFF, GIF, and BMP...

Using CutyCapt without X server 執行範例:
# xvfb-run --auto-servernum ./CutyCapt --url=http://cutycapt.sourceforge.net --out=cutycapt.png

程式扎記

標籤

2013年9月12日星期四

[ Java 代碼範本 ] 擷取 Web page 的影像

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2013年9月12日 星期四