2015年3月10日 星期二

[ Java 文章收集 ] Backreferences in Java Regular Expressions

Source From Here
Preface
Backreferences in Java Regular Expressions is another important feature provided by Java.

To understand backreferences, we need to understand group first. Group in regular expression means treating multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses - ”()”. Each set of parentheses corresponds to a group.

Backreferences are convenient, because it allows us to repeat a pattern without writing it again. We can just refer to the previous defined group by using \# (# is the group number). This will make more sense after you read the following two examples.

Example 1: Finding Repeated Pattern
(\d\d\d)\1 matches 123123, but does not match 123456 in a row. This indicates that the referred pattern needs to be exactly the name. Below is the sample code:
  1. String str = "123123456";  
  2. Pattern p = Pattern.compile("(\\d\\d\\d)\\1");  
  3. Matcher m = p.matcher(str);  
  4. System.out.println(m.groupCount());  
  5. while (m.find()) {  
  6.     String word = m.group();  
  7.     System.out.println(word + " " + m.start() + " " + m.end());  
  8. }  
Execution result:
1
123123 0 6

Example 2: Finding Duplicate Words
Below sample code will help to find duplicate word in a sequence. However, this is not a good method to use regular expression to find duplicate words. From the example above, the first “duplicate” is not matched.
  1. String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b";  
  2. Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);  
  3. String phrase = "unique is not duplicate but unique, Duplicate is duplicate.";  
  4. Matcher m = p.matcher(phrase);  
  5. while (m.find()) {  
  6.     String val = m.group();  
  7.     System.out.println("Matching subsequence is \"" + val + "\"");  
  8.     System.out.println("Duplicate word: " + m.group(1) + "\n");  
  9. }  
Execution result:
Matching subsequence is "unique is not duplicate but unique"
Duplicate word: unique

Matching subsequence is "Duplicate is duplicate"
Duplicate word: Duplicate


沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...