Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 5049299
Votes 41
Synopsis (process) Use posix_spawn, not fork, on S10 to avoid swap exhaustion
Category java:classes_lang
Reported Against 5.0 , b06 , 1.4.2_04
Release Fixed
State 6-Fix Understood, request for enhancement
Priority: 2-High
Related Bugs 6381152 , 6728616 , 6745199 , 6845504 , 6850720 , 4343908 , 4391042
Submit Date 18-MAY-2004
Description
If you run a "small" program (e.g., a Perl script) from a "big" Java process on
a machine with "moderate" free swap space (but not as much as the big Java
process), then Runtime.exec() fails.
Work Around
1) mkfile followed by swap -a to add more swap space
2) do Runtime.exec "early" in the application execution before the process has
   grown so large (i.e. so the transient swap requirement between Runtime.exec's    fork and exec calls is big), cache resulting Process object, then replace
   the "later" Runtime.exec calls that kicked off perl with println or the like     to direct the aforementioned process exec perl with the same command line
   and relay back the perl command's standard output and error traffic.
3) Like (2) but spawn the "exec daemon" separate from Java to avoid any use of
   Runtime.exec and instead communicate with Java via a pipe or socket to 
   initiate running the perl scripts. 
   exit status. 

  xxxxx@xxxxx   2004-05-19
4) Implement the "small" program in pure Java in order to avoid Runtime.exec()
5) Consider using a scripting engine, see also https://scripting.dev.java.net/
Evaluation
Solaris reserves swap space conservatively, so when an X-megabyte process
forks the kernel attempts to reserve an additional X MB of swap space just in
case the child actually does touch all those pages, thereby making private
copies, and then later needs to swap them out.  (Linux doesn't do this, so
this bug will not be reproducible on a Linux system.)

Within the constraints of the existing semantics of Runtime.exec there does
not appear to be any way to avoid this in current Solaris releases.  vfork(2)
is not thread-safe and popen(3C) only provides access to one of the child's
standard streams rather than all three of them.  S10 does support the new
posix_spawn call; we should look into using that when running on S10.

See the comments section for additional information.

--   xxxxx@xxxxx   2004/5/19
I agree that the use of posix_spawn on S10 should be investigated.
Historically, changes to this kind of code has been extraordinarily risky
due to unforseen race conditions, so this sort of change should be introduced
near the beginning of a release.  Therefore I am targeting this at dolphin.
Hopefully, it will get addressed early in that release.
Posted Date : 2006-02-06 18:03:11.0

There are a couple of issues with posix_spawn(). 

1) It doesn't support doing a chdir() along with the other file descriptor operations
   that it does after being invoked, but before the target gets exec'd.

2) The style of the API does not suit general purpose multi-threaded
   environments like Java. In particular, the ability to perform actions on file descriptors
   inherited by the child, does not work that well, if other threads in the VM are
   potentially opening and closing files in parallel with the call to posix_spawn().

So, here is the plan. We will use posix_spawn() in a minimal fashion, simply to efficiently
spawn a new helper binary (processhelper). This small (12k) binary cleans up the
file descriptors inherited from the parent, chdirs() to the new working directory, and then exec's the actual target executable.

The new binary will not be noticed by users/applications at runtime, since the end result is the same as before, and the processhelper itself will only run for a very short time.
Posted Date : 2009-03-10 11:02:24.0
Comments
  
  Include a link with my name & email   

Submitted On 14-OCT-2004
bavadekars
There are a few of things in the bug evaluation that don't seem to
make sense -

1. Why even consider popen(3C)? It internally uses fork() to create
   the child process and hence it will not solve the problem for which
   the bug was filed.

2. If vfork() is MT-unsafe, doesn't that just mean that you have to use
   a mutex to protect the vfork() call? "MT-unsafe" does not mean don't
   ever use it in a multi-threaded application.

3. Here are some snippets from the vfork(2) man page -

       "In a multithreaded application,  vfork()  borrows  only  the
       thread  of  control  that called vfork() in the parent; that
       is, the child contains  only  one  thread.  In  that  sense,
       vfork() behaves like  fork()."

   Okay, so that implies you CAN use vfork in a multithreaded app.

   But then later it says -

       "The vfork() function is  unsafe  in  multithreaded  applica-
        tions.

   This is ambiguous. Does it mean "use a mutex to protect the vfork()
   call" or "don't ever use vfork()". The kernel folks need to clarify
   this.

   And why is this method "unsafe" anyway? Doesn't the kernel suspend
   the entire parent process until the vfork'ed child calls exec()?

   Furthermore, the man page says -

       "This function will be eliminated in a future release.  The
       memory  sharing semantics of vfork() can be obtained through
       other mechanisms."

   So maybe this issue become moot in Solaris 10 when pthread_spawn() is
   introduced and that's fine. But most customers will stay with Solaris
   9 or even 8 for quite some time.

   But what "other semantics" is the man page referring to? If Solaris
   kernel engineers can suggest a different way to fix the
   Runtime.exec() problem, let's have it.


Note that this really is a major problem for server applications. We
essentially have to waste huge amount of swap space just so that there
is head room for an occasional Runtime.exec(). e.g. In our case we have
hardware that will easily handle -

    sizeof(parent) + N * sizeof(child)

however the transient memory usage of 2 * sizeof(parent) prevents us
from getting there. Memory is cheap, but at least let us use what we pay
for :-)

I really hope to see a better explanation here. If Sun has no plans to
fix this in the current JDK releases, give us the right information and
we will implement it ourselves using JNI..


Submitted On 14-OCT-2004
bavadekars
Related bugs : 4227230 and 4693581


Submitted On 16-MAR-2009
bculp2000
I see no reason you cant put a synch block around your vfork call.  Also I think vfork getting deprecated has about the same chance of this bug ever being fixed.


Submitted On 26-MAR-2009
bculp2000
This bug has been fixed.

http://www.forkndie.com/


Submitted On 27-MAR-2009
Michael_C_McMahon
The fix for this is currently being tested for jdk7.


Submitted On 27-MAR-2009
bculp2000
Submit date: 18-MAY-2004


Submitted On 09-APR-2009
bculp2000
Downloaded JDK7 latest beta source.  No fix in it. 


Submitted On 28-APR-2009
Michael_C_McMahon
Sorry, it's not putback yet. Hope to publish the source change for review on the core_libs_dev  openjdk mailing list soon.


Submitted On 28-JUN-2009
jvmKrash
Can guarding with a mutex make vfork MT-safe?

As I see it, vfork will pause all the threads of parent process, until child exec's or exit's. Incase the child wants to acquire a lock that's been taken by another thread of parent, there will be a deadlock. The problem is app-developer may have less control over the deadlock such as in this example deadlock in dynamic loader. 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4332595

It'd be interesting if we can guard vfork in some way in MT applications.


Submitted On 02-JUL-2009
bculp2000
No you cant use vfork even guarded because if you read the extensive writeup on this the problem is with the linker, and you have no control over when or if the JVM will call loadlibrary.  So your guard does no good.  It will work mostly but then deadlock the JVM.  So you can try it but it will inconsistently fail.

If it werent for licensing issues, I would post the correct the solution with all C and java code for the fix.  For now
go read http://forkndie.com and understand the fix or wait for java 7 and hope its there.



PLEASE NOTE: JDK6 is formerly known as Project Mustang