|
Description
|
FULL PRODUCT VERSION :
java version "1.4.2_08"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
customer Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
In trying to match a regular expression such as, "(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|e|i|o|u)?(s|ss)(a|e|i|o|u)?" to any given string such as "mare" produces instant results. However, trying to match a regular expression that is lengthier than the above example takes three minutes or longer.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and execute the standalone code included.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The output should be:
false
RE match completed in 0.016 seconds.
true
RE match completed in 0.016 seconds.
ACTUAL -
false
RE match completed in 0.016 seconds.
true
RE match completed in 160.922 seconds.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Test
{
public static void main( String argv[] )
{
long startTime = System.currentTimeMillis();
String s1 = "(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|e|i|o|u)?(s|ss)(a|e|i|o|u)?";
Pattern p = Pattern.compile( s1 );
String mare = new String( "mare" );
Matcher m = p.matcher( mare );
boolean b = m.matches();
System.out.println( b );
long finishTime = System.currentTimeMillis();
System.out.println( "RE match completed in " + (double)(finishTime - startTime)/1000 + " seconds.");
startTime = System.currentTimeMillis();
String s2 = "(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|aa|a|null)?(n|nn)(a|e|i|o|u)?(k|kk)(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|e|i|o|u)?(t|tt)(a|e|i|o|u)?";
p = Pattern.compile( s2 );
String frankfurt = new String( "frankfurt" );
m = p.matcher( str );
b = m.matches();
System.out.println( b );
finishTime = System.currentTimeMillis();
System.out.println( "RE match completed in " + (double)(finishTime - startTime)/1000 + " seconds.");
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I am in the process of writing my own regular expression pattern parser and matcher. If I didn't have to use Java, Perl and Python are able to accomplish this same task in seconds.
xxxxx@xxxxx 2005-05-17 04:52:07 GMT
|
|
Comments
|
Submitted On 17-MAY-2005
STREMLER
Another workaround would be to anchor the pattern and then to loop over the substrings of the target string (e.g. match against "frankfurt", "rankfurt", "ankfurt", "nkfurt", "kfurt", "furt", "urt", "rt", "t"), as this would still result in a performance improvement.
Submitted On 11-SEP-2005
uncle_alice
This is just a duplicate of 5013651; it isn't the matching that takes so long, it's compiling the regex. If you add a '^' to the beginning of the regex, the second test goes just as quickly as the first. Also, if you write the regex more sensibly, such as using "[aeiou]" instead of "(a|e|i|o|u)" and "rr?" instead of "(r|rr)", compilation becomes virtually instantaneous even without adding the anchor.
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|