程式扎記: [ Java 常見問題 ] Match multiline text using regular expression

標籤

2013年6月24日 星期一

[ Java 常見問題 ] Match multiline text using regular expression

來源自 這裡 
Question: 
考慮下面的片段代碼: 
  1. String test = "User Comments: This is \t a\ta \n test \n\n message \n";  
  2.   
  3. String pattern1 = "User Comments:\\s*(.*)";  
  4. Pattern p = Pattern.compile(pattern1);  
  5. Matcher mth = p.matcher(test);        
  6. if(mth.find())  
  7. {  
  8.     System.out.printf("\t[Info] Find '%s'!\n", mth.group(1).trim());              
  9. }  
  10. else  
  11. {  
  12.     System.out.printf("\t[Info] Miss!\n");  
  13. }  
輸出會是: 
[Info] Find 'This is a a'!

也就是 Regular expression 的 "." 並不能代表 "\n" (換行), 所以只比對到第二個 'a'. 但問題是我們希望 "." 也能代表換行. 

Solution: 
可以參考類別 Pattern 上面的說明, 該類別有提供一個參數 DOTALL 讓你在建立物件時設定 "." 也能代表換行: 
Enables dotall mode.
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

因此上面的代碼只要改: 
  1. Pattern p = Pattern.compile(pattern1, Pattern.DOTALL);  
或是: 
  1. String pattern1 = "(?s)User Comments:\\s*(.*)";  
便可以進入所謂的 "Dotall mode" 讓 "." 也能代表換行.

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!