Tuesday, June 16, 2009

Irregular Expressions

A friend sent me the following code that did not work as he expected

   1: public static void main(String[] args) {    
   2:      
   3:     String str = " status=\"The word deleted is in this sentence and no carriage return";    
   4:     String str1 = " status=\"The word deleted is in this sentence and carriage return\n";    
   5:      
   6:     assert str.matches(".*deleted.*");    
   7:     assert str1.matches(".*deleted.*") : "carriage return threw off the regex";    
   8: }
If you run this code you will see that carriage return is not considered “.*” by default. Of course, a little looking around helped us realize that this was documented,

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified

In other words, we need to use the ugly Pattern class in order to make this work, as follows:

   1: public static void main(String[] args) {    
   2:      
   3:         String str = "The word deleted is in this sentence and no carriage return";    
   4:         String str1 = "The word deleted is in this sentence and carriage return\n";    
   5:      
   6:         Pattern p = Pattern.compile(".*deleted.*", Pattern.DOTALL);    
   7:         Matcher m = p.matcher(str);    
   8:         assert m.matches();    
   9:         m = p.matcher(str1);    
  10:         assert m.matches();    
  11:      
  12:     }

The key to making this work is the parameter Pattern.DOTALL used in the compile.

And how about in Groovy? It works of course!

   1: String str = " status=\"The word deleted is in this sentence and no carriage return";    
   2: String str1 = " status=\"The word deleted is in this sentence and carriage return\n";    
   3: assert str =~ /.*deleted.*/    
   4: assert str1 =~ /.*deleted.*/

As written about extensively, one of the really well done features of Groovy is definitely the regular expression syntax but here we see that the implementation was also improved.

No comments:

Post a Comment