EVALUATION
We know what the race is:
A heap region's RSet comprises several tables including a "sparse" table. Sparse tables have two RSHashTables: cur and next. Those two usually point to the same physical table. When we want to expand a sparse table we create a new next RSHashTable, which is larger than the old cur, and we copy the contents from cur into next. For a while the sparse table has two RSHashTables: next where new entries are added, cur which is used for iterations. (Note: when we add new entries to an RSet during a pause we generally have the make sure we scan those specially; so we only need to iterate over cur while scanning the RSet and we can safely ignore next.)
Expanded sparse tables are added on a list (the "expanded list") so that we process them before we iterate over the RSets at the beginning of a pause. "Processing" them involves freeing the old cur and replacing it with next.
The race is as follows:
We reclaim several regions during cleanup that have expanded sparse tables and those tables are on the expanded list. Those regions are added on the cleanup list.
Thread 1: the concurrent cleanup start processing the cleanup list and clears the RSet of every region on it, including its sparse table.
Thread 2: the VM thread that's processing the expanded list; it frees up the old cur RSHashTable of each sparse table and replaces it with next.
Given that the concurrent cleanup operation can now work through a pause, Threads 1 and 2 can now race and reach the same sparse table. This can result in the two failures we're seeing:
- one deleting the cur entry first, while another trying to delete it and finding that it's already been deleted (that's the guarantee, the destructor is the only place where _entries is set to NULL)
- both threads trying to delete the same entry, which explains the double-free.
The race happens due to the increased concurrency that was introduced by 6977804. Before, the concurrent cleanup operation and a pause were mutually exclusive, which is why we never hit the issue.
|