Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6520207
Votes 1
Synopsis Dollar/UnixDollar bad behavior shouldn't match twice in "\n"
Category java:classes_util_regex
Reported Against
Release Fixed
State 3-Accepted, bug
Priority: 4-Low
Related Bugs
Submit Date 01-FEB-2007
Description
FULL PRODUCT VERSION :
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b06)
Java HotSpot(TM) Client VM (build 1.7.0-ea-b06, mixed mode, sharing)


ADDITIONAL OS VERSION INFORMATION :
Linux helium 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.

http://elliotth.blogspot.com/2007/01/what-do-anchors-and-mean-in-regular.html

the first match is the final line terminator. the second match is the end-of-input.

in MULTILINE mode this is unfortunate (because it's not Perl-compatible and should be listed in the incompatibilities with Perl 5 in the documentation), but it's understandable because of the "or" in the definition of what MULTILINE causes $ to match.

but in non-MULTILINE mode, this is incorrect (in that i don't see how it's specified by the documentation).

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the supplied test case.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.*;

public class test {
 public static void main(String[] args) {
  Pattern p = Pattern.compile("$");
  Matcher m = p.matcher("a\nb\nc\nhello\nworld\n");
  int count = 0;
  while (m.find()) {
   ++count;
  }
  System.err.println(count);
 }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
i would have suggested using \Z, but that's broken too ;-)
Posted Date : 2007-02-01 11:13:19.0
Work Around
N/A
Evaluation
N/A
Comments
  
  Include a link with my name & email   

Submitted On 03-FEB-2007
(i'm the original poster.) it turns out the \Z is actually defined (somewhat strangely) to have this behavior. \z does what i thought \Z does.

i've updated my blog post to include this work-around too.


Submitted On 06-FEB-2007
uncle_alice
In your Perl code, you perform the match in list context (by assigning the result of the match to an array) and then count the number of matches.  However, if you iterate through the matches in scalar context, you'll see that Java's behavior is consistent with Perl's:

perl -e "print qq{\n>$`<\n} while qq{hello\nworld\n} =~ m/$/mg;"

>hello<

>hello
world<

>hello
world
<

perl -e "print qq{\n>$`<\n} while qq{hello\nworld\n} =~ m/$/g;"

>hello
world<

>hello
world
<

There is an inconsistency here, but it's within Perl (and I suspect it's deliberate).  Anyway, it's not a problem for Java, since Java has no equivalent for Perl's list-versus-scalar context (yet).



PLEASE NOTE: JDK6 is formerly known as Project Mustang