From: <Andries.Brouwer@cwi.nl>

Alan made overcommit mode 2 and it doesnt work at all.  A process passing
the limit often does so at a moment of stack extension, and is killed by a
segfault, not better than being OOM-killed.

Another problem is that close to the edge no other processes can be
started, so that a sysadmin has problems logging in and investigating.

Below a patch that does 3 things:

(1) It reserves a reasonable amount of virtual stack space (amount
    randomly chosen, no guarantees given) when the process is started, so
    that the common utilities will not be killed by segfault on stack
    extension.

(2) It reserves a reasonable amount of virtual memory for root, so that
    root can do things when the system is out-of-memory

(3) It limits a single process to 97% of what is left, so that also an
    ordinary user is able to use getty, login, bash, ps, kill and similar
    things when one of her processes got out of control.

Since the current overcommit mode 2 is not really useful, I did not give
this a new number.

The patch is just for playing, not to be applied by Linus.  But, Andrew, I
hope that you would be willing to put this in -mm so that people can
experiment.  Of course it only does something if one sets overcommit mode
to 2.

The past month I have pressured people asking for feedback, and now have
about a dozen reports, mostly positive, one very positive.

Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/fs/exec.c            |   19 +++++++++++--------
 25-akpm/security/commoncap.c |    8 ++++++++
 25-akpm/security/dummy.c     |    8 ++++++++
 3 files changed, 27 insertions(+), 8 deletions(-)

diff -puN fs/exec.c~mm-overcommit-updates fs/exec.c
--- 25/fs/exec.c~mm-overcommit-updates	Fri Nov 12 16:09:20 2004
+++ 25-akpm/fs/exec.c	Fri Nov 12 16:09:20 2004
@@ -341,6 +341,8 @@ out_sig:
 	force_sig(SIGKILL, current);
 }
 
+#define EXTRA_STACK_VM_PAGES	20	/* random */
+
 int setup_arg_pages(struct linux_binprm *bprm, int executable_stack)
 {
 	unsigned long stack_base;
@@ -378,15 +380,15 @@ int setup_arg_pages(struct linux_binprm 
 	memmove(to, to + offset, PAGE_SIZE - offset);
 	kunmap(bprm->page[j - 1]);
 
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = PAGE_SIZE * i - offset;
-
 	/* Limit stack size to 1GB */
 	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
 	if (stack_base > (1 << 30))
 		stack_base = 1 << 30;
 	stack_base = PAGE_ALIGN(STACK_TOP - stack_base);
 
+	/* Adjust bprm->p to point to the end of the strings. */
+	bprm->p = stack_base + PAGE_SIZE * i - offset;
+
 	mm->arg_start = stack_base;
 	arg_size = i << PAGE_SHIFT;
 
@@ -395,11 +397,13 @@ int setup_arg_pages(struct linux_binprm 
 		bprm->page[i++] = NULL;
 #else
 	stack_base = STACK_TOP - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
+	bprm->p += stack_base;
+	mm->arg_start = bprm->p;
 	arg_size = STACK_TOP - (PAGE_MASK & (unsigned long) mm->arg_start);
 #endif
 
-	bprm->p += stack_base;
+	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+
 	if (bprm->loader)
 		bprm->loader += stack_base;
 	bprm->exec += stack_base;
@@ -420,11 +424,10 @@ int setup_arg_pages(struct linux_binprm 
 		mpnt->vm_mm = mm;
 #ifdef CONFIG_STACK_GROWSUP
 		mpnt->vm_start = stack_base;
-		mpnt->vm_end = PAGE_MASK &
-			(PAGE_SIZE - 1 + (unsigned long) bprm->p);
+		mpnt->vm_end = stack_base + arg_size;
 #else
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
 		mpnt->vm_end = STACK_TOP;
+		mpnt->vm_start = mpnt->vm_end - arg_size;
 #endif
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
diff -puN security/commoncap.c~mm-overcommit-updates security/commoncap.c
--- 25/security/commoncap.c~mm-overcommit-updates	Fri Nov 12 16:09:20 2004
+++ 25-akpm/security/commoncap.c	Fri Nov 12 16:09:20 2004
@@ -386,6 +386,14 @@ int cap_vm_enough_memory(long pages)
 		allowed -= allowed / 32;
 	allowed += total_swap_pages;
 
+	/* Leave the last 3% for root */
+	if (current->euid)
+		allowed -= allowed / 32;
+
+	/* Don't let a single process grow too big:
+	   leave 3% of the size of this process for other processes */
+	allowed -= current->mm->total_vm / 32;
+
 	if (atomic_read(&vm_committed_space) < allowed)
 		return 0;
 
diff -puN security/dummy.c~mm-overcommit-updates security/dummy.c
--- 25/security/dummy.c~mm-overcommit-updates	Fri Nov 12 16:09:20 2004
+++ 25-akpm/security/dummy.c	Fri Nov 12 16:09:20 2004
@@ -160,6 +160,14 @@ static int dummy_vm_enough_memory(long p
 		* sysctl_overcommit_ratio / 100;
 	allowed += total_swap_pages;
 
+	/* Leave the last 3% for root */
+	if (current->euid)
+		allowed -= allowed / 32;
+
+	/* Don't let a single process grow too big:
+	   leave 3% of the size of this process for other processes */
+	allowed -= current->mm->total_vm / 32;
+
 	if (atomic_read(&vm_committed_space) < allowed)
 		return 0;
 
_