United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 7173959 Jvm crashed during coherence exabus (tmb) testing
7173959 : Jvm crashed during coherence exabus (tmb) testing

Details
Type:
Bug
Submit Date:
2012-06-04
Status:
Resolved
Updated Date:
2013-06-07
Project Name:
JDK
Resolved Date:
2012-12-17
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P1
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs25

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Duplicate:
Relates:

Sub Tasks

Description
SHORT SUMMARY:
Coherence testing saw a crash in GC once. It is not reproducible and not 
blocking any release. Please investigate the hs_err file, and decide if you 
want to investigate this further. If not, please close bugtraq bug and bugdb 
bug as Not Reproducable

                                    

Comments
URL:   http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/730cc4ddd550
User:  amurillo
Date:  2012-12-21 20:28:27 +0000

                                     
2012-12-21
URL:   http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/730cc4ddd550
User:  brutisso
Date:  2012-12-17 12:34:25 +0000

                                     
2012-12-17
The reason that we crash is due to how we re-map memory when we want to align it for large pages in ReservedSpace::initialize().

Here is what happens:

The reservation of memory is split up to a few steps. When we want a chunk of memory with large pages we first just reserve some memory large enough for what we need. Then we realize that we want large pages, so we want to re-map the reserved memory to use large pages. But since this requires that we have a large-page-aligned memory chunk we may have to fix the recently reserved memory chunk up a bit.

We do this in ReservedSpace::initialize() by first releasing the memory we just reserved. Then requesting more memory than we actually need to make sure that we have enough space to do manual large page alignment. After we have gotten this memory we figure out a nicely aligned start address. We then release the memory again and then reserve just enough memory but using the aligned start address as a fixed address to make sure that we get the memory we wanted in an aligned way.

This is done in a loop to make sure that we eventually get some memory. The interesting code looks like this:

      do {
        char* extra_base = os::reserve_memory(extra_size, NULL, alignment);
        if (extra_base == NULL) return;
        // Do manual alignement                                                                                                                                                      
        base = (char*) align_size_up((uintptr_t) extra_base, alignment);
        assert(base >= extra_base, "just checking");
        // Re-reserve the region at the aligned base address.                                                                                                                        
        os::release_memory(extra_base, extra_size);                                              // (1)
        base = os::reserve_memory(size, base);                                                      // (2)
      } while (base == NULL);


There is a race here between releasing the memory in (1) and re-reserving it in (2). But the loop is supposed to handle this race.

The problem is that on posix platforms you can remap the same memory area several times. The call in (2) will use mmap with MAP_FIXED. This means that the OS will think that you know exactly what you are doing. So, if part of the memory has been mapped already by the process it will just go ahead and reuse that memory.

This means that if we are having multiple threads that do mmap. We can end up with a situation where we release our mapping in (1). Then another thread comes in and maps part of the memory that we used to have. Then we remap over that memory again in (2) with MAP_FIXED. Now we have a situation where two threads in our process have mapped the same memory. If both threads try to use it or if one of the threads unmap part or all of the memory we will crash.

On posix it is possible to unmap any part of a mapped chunk. So, our proposed solution to the race described above is to not unmap all memory in (1) but rather just unmap the section at the start and at the end of the chunk that we mapped to get alignment. This also removes the need for the loop.

However, on Windows you can only unmap _all_ of the memory that you have mapped. On the other hand Windows also will not allow you to map over other mappings, so the original code is actually safe. If we keep the loop.

So, our solution is to treat this differently on Windows and posix platforms.
                                     
2012-12-13



Hardware and Software, Engineered to Work Together