Currently, when worker W steals an entry E from another worker's taskqueue, it pushes E on its own taskqueue before going into trim_queues() where the first thing it does is to pop E. This should work fine most of the time. However, it also opens a (very small) window for another workers to steal E from W. This might cause ping-ponging between workers. It might be better if W just deals with E directly, instead of pushing it on its taskqueue.
I also had a quick look at the marking code and it seems to do the "right" thing, i.e., not pushing the stolen entry but processing it directly.
(thanks to Igor Veresov for spotting this)