Using current BK on my x86-64 workstation, it went completely nuts today killing tasks left and right with oodles of free memory available. Here's a little snippet from messages:
Out of Memory: Killed process 2847 (screen). oom-killer: gfp_mask=0xd1 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 HighMem per-cpu: empty
Free pages: 529184kB (0kB HighMem) Active:19127 inactive:20440 dirty:92 writeback:0 unstable:0 free:132296 slab:3827 mapped:3503 pagetables:164 DMA free:4536kB min:60kB low:72kB high:88kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 Normal free:524648kB min:4028kB low:5032kB high:6040kB active:76508kB inactive:81760kB present:1031360kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 556*4kB 155*8kB 65*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4536kB Normal: 29800*4kB 25115*8kB 6953*16kB 1251*32kB 326*64kB 103*128kB 31*256kB 12*512kB 3*1024kB 1*2048kB 0*4096kB = 524648kB HighMem: empty Swap cache: add 59864, delete 55781, find 6188/8478, race 0+0 Out of Memory: Killed process 27326 (bash). oom-killer: gfp_mask=0xd1 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 HighMem per-cpu: empty
On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote: > Using current BK on my x86-64 workstation, it went completely nuts today > killing tasks left and right with oodles of free memory available.
Yes, the fact that the oom-killer exists is a serious problem. People work on trying to tune it, instead of just removing it.
I am getting reports that also in overcommit mode 2 (no overcommit, no oom-killer ever needed) processes are killed by the oom-killer (on 2.6.10).
On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote: > On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:
> > Using current BK on my x86-64 workstation, it went completely nuts today > > killing tasks left and right with oodles of free memory available.
> Yes, the fact that the oom-killer exists is a serious problem. > People work on trying to tune it, instead of just removing it.
I'm working on fixing it, not just tuning it. The bugs in mainline aren't about the selection algorithm (which is normally what people calls oom killer). The bugs in mainline are about being able to kill a task reliably, regardless of which task we pick, and every linux kernel out there has always killed some task when it was oom. So the bugs are just obvious regressions of 2.6 if compared to 2.4.
But this is all fixed now, I'm starting sending the first patches to Anderw very shortly (last week there was still the oracle stuff going on). Now I can fix the rejects.
I will guarantee nothing about which task will be picked (that's the old code at works, I changed not a bit in what normally people calls "the oom killer", plus the recent improvement from Thomas), but I guarantee the VM won't kill tasks right and left like it does now (i.e. by invoking the oom killer multiple times). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote: > On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:
> > Using current BK on my x86-64 workstation, it went completely nuts today > > killing tasks left and right with oodles of free memory available.
> Yes, the fact that the oom-killer exists is a serious problem. > People work on trying to tune it, instead of just removing it.
> I am getting reports that also in overcommit mode 2 (no overcommit, > no oom-killer ever needed) processes are killed by the oom-killer > (on 2.6.10).
Hi Andries,
There is a user requirement for overcommit mode, you know.
Saying "hey, there's no more overcommit mode in future v2.6 releases, you run out of memory and get -ENOMEM" is not really an option is it?
You propose to remove the OOM killer and do what? Lockup solid?
It is _WAY_ off right now: look at the amount of free pages:
v2.4 gets it pretty much right for most cases, and its obviously screwed up right now in v2.6.
Andrea/Thomas were working on getting it fixed ?? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jan 20, 2005 at 12:00:34PM -0200, Marcelo Tosatti wrote: > On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote: > > Yes, the fact that the oom-killer exists is a serious problem. > > People work on trying to tune it, instead of just removing it.
> > I am getting reports that also in overcommit mode 2 (no overcommit, > > no oom-killer ever needed) processes are killed by the oom-killer > > (on 2.6.10).
> Hi Andries,
> There is a user requirement for overcommit mode, you know.
> Saying "hey, there's no more overcommit mode in future v2.6 releases, you > run out of memory and get -ENOMEM" is not really an option is it?
> You propose to remove the OOM killer and do what? Lockup solid?
Right now we have three overcommit modes. They are specified by: 0: overcommit, but keep it reasonable (the current default) 1: overcommit, always say yes 2: keep track of all our obligations, do not overcommit
So, one has the right to expect that no OOM situation can occur in overcommit mode 2. But in 2.6.10 it can. That is a bug. The conclusion must be that bookkeeping is done incorrectly. Perhaps also mode 0 is affected by that same bug.
Now you ask what I propose. There is no hurry worrying about that - the first thing should be to fix the bookkeeping problem.
But assume that fixed. Then everybody can run in mode 2 and never have any problems. That is what I do.
Yes, you say, but that is an inefficient use of memory. Perhaps. That is the price I am willing to pay for the guarantee that my processes are not killed at some random moment.
But if someone else does not do anything of importance and doesnt care if his processes die at arbitrary moments if only things go as fast as possible and use as much of his precious memory as possible, then also for him overcommit mode 2 can be useful. It is accompanied by the variable overcommit_ratio R - the amount of memory that can be used is Swap + Memory*(R/100). Here R can be larger than 100, so in overcommit mode 2 one can specify very precisely what amount of overcommitment is considered acceptable.
Very few people run overcommit mode 2, and lots of things are badly tested. It cannot become the default today. But I would like to see it the default at some future moment.
On Thu, Jan 20, 2005 at 06:15:44PM +0100, Andrea Arcangeli wrote: > > Yes, the fact that the oom-killer exists is a serious problem. > > People work on trying to tune it, instead of just removing it.
> I'm working on fixing it, not just tuning it. The bugs in mainline > aren't about the selection algorithm (which is normally what people > calls oom killer). The bugs in mainline are about being able to kill a > task reliably, regardless of which task we pick, and every linux kernel > out there has always killed some task when it was oom. So the bugs are > just obvious regressions of 2.6 if compared to 2.4.
Yes, earlier one lost a job once in a great while, these days it is once in a while - the frequency has gone up.
But let me stress that I also consider the earlier situation unacceptable. It is really bad to lose a few weeks of computation.
You talk about "when it is oom", as if it would be unavoidable, an act of nature. But it can be avoided, and should be avoided, unless the sysadmin explicitly says that oom is OK for him.
(Compare allowing oom with overclocking - there is a trade-off between speed and reliability. It must be possible to choose for reliability. Indeed, reliability must be the default.)
Andries Brouwer wrote: > But let me stress that I also consider the earlier situation > unacceptable. It is really bad to lose a few weeks of computation.
Shouldn't the application be backing up intermediate results to disk periodically? Power outages do occur, as do bus faults, electrical glitches, dead fans, etc.
On Thu, Jan 20, 2005 at 03:57:07PM -0600, Chris Friesen wrote: > Andries Brouwer wrote:
> >But let me stress that I also consider the earlier situation > >unacceptable. It is really bad to lose a few weeks of computation.
> Shouldn't the application be backing up intermediate results to disk > periodically? Power outages do occur, as do bus faults, electrical > glitches, dead fans, etc.
Agreed. Plus if you truly cannot change the app because it's binary only at least you can set the ulimit based on the virtual sizes, ulimit should work reliably even if overcommit doesn't. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jan 20 2005, Andrea Arcangeli wrote: > On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote: > > On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:
> > > Using current BK on my x86-64 workstation, it went completely nuts today > > > killing tasks left and right with oodles of free memory available.
> > Yes, the fact that the oom-killer exists is a serious problem. > > People work on trying to tune it, instead of just removing it.
> I'm working on fixing it, not just tuning it. The bugs in mainline > aren't about the selection algorithm (which is normally what people > calls oom killer). The bugs in mainline are about being able to kill a > task reliably, regardless of which task we pick, and every linux kernel > out there has always killed some task when it was oom. So the bugs are > just obvious regressions of 2.6 if compared to 2.4.
> But this is all fixed now, I'm starting sending the first patches to > Anderw very shortly (last week there was still the oracle stuff going > on). Now I can fix the rejects.
> I will guarantee nothing about which task will be picked (that's the old > code at works, I changed not a bit in what normally people calls "the oom > killer", plus the recent improvement from Thomas), but I guarantee the > VM won't kill tasks right and left like it does now (i.e. by invoking the > oom killer multiple times).
And especially not with 500MB of zone normal free, thanks :)
2.6.11-rc1-xx vm behaviour is looking a _lot_ worse than 2.6.10 btw, I haven't looked closer at what has changed yet it's just a subjective feeling. I regularly have to run a fillmem.c hog to prune caches or it runs like an old dog.
On Fri, Jan 21, 2005 at 08:42:08AM +0100, Jens Axboe wrote: > And especially not with 500MB of zone normal free, thanks :)
;) Are you sure you had 500m free even before the _first_ oom killing?
I assumed what you posted was not the first one of the oom killing messages. If it was the first then there was a regression. But if OTOH I didn't misunderstood your message and it wasn't the first, then what you've seen is just the brokeness of 2.6 w.r.t. oom killing, that's what made Thomas drive a few hours too, and you've only to apply the 5 patches I just posted, and everything will work perfectly correct then in terms of _not_ killing right and left anymore, even despite the 500m free ;). I tested the code before posting and my regression test passed at least, so it looked like there was no other regression. The several rejects I've got while porting the code looked all due noop-cleanups. So I doubt there was a regression and I'm optimistic you've just seen the old bugs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Fri, Jan 21 2005, Andrea Arcangeli wrote: > On Fri, Jan 21, 2005 at 08:42:08AM +0100, Jens Axboe wrote: > > And especially not with 500MB of zone normal free, thanks :)
> ;) Are you sure you had 500m free even before the _first_ oom killing?
No it wasn't, the first looked like this:
Jan 20 13:22:15 wiggum kernel: oom-killer: gfp_mask=0xd1 Jan 20 13:22:15 wiggum kernel: DMA per-cpu: Jan 20 13:22:15 wiggum kernel: cpu 0 hot: low 2, high 6, batch 1 Jan 20 13:22:15 wiggum kernel: cpu 0 cold: low 0, high 2, batch 1 Jan 20 13:22:15 wiggum kernel: Normal per-cpu: Jan 20 13:22:15 wiggum kernel: cpu 0 hot: low 32, high 96, batch 16 Jan 20 13:22:15 wiggum kernel: cpu 0 cold: low 0, high 32, batch 16 Jan 20 13:22:15 wiggum kernel: HighMem per-cpu: empty Jan 20 13:22:15 wiggum kernel: Jan 20 13:22:15 wiggum kernel: Free pages: 155720kB (0kB HighMem) Jan 20 13:22:15 wiggum kernel: Active:113367 inactive:14428 dirty:2048 writeback:0 unstable:0 free:38930 slab:6284 mapped:102966 pagetables:2010 Jan 20 13:22:15 wiggum kernel: DMA free:4080kB min:60kB low:72kB high:88kB active:16kB inactive:0kB present:16384kB pages_scanned:21 all_unreclaimable? yes Jan 20 13:22:15 wiggum kernel: protections[]: 0 0 0 Jan 20 13:22:15 wiggum kernel: Normal free:151640kB min:4028kB low:5032kB high:6040kB active:453452kB inactive:57712kB present:1031360kB pages_scanned:0 all_unreclaimable? no Jan 20 13:22:15 wiggum kernel: protections[]: 0 0 0 Jan 20 13:22:15 wiggum kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Jan 20 13:22:15 wiggum kernel: protections[]: 0 0 0 Jan 20 13:22:15 wiggum kernel: DMA: 520*4kB 120*8kB 65*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4080kB Jan 20 13:22:15 wiggum kernel: Normal: 23636*4kB 6171*8kB 225*16kB 7*32kB 1*64kB 4*128kB 3*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 151640kB Jan 20 13:22:15 wiggum kernel: HighMem: empty Jan 20 13:22:15 wiggum kernel: Swap cache: add 35304, delete 31894, find 5456/7337, race 0+0 Jan 20 13:22:15 wiggum kernel: Out of Memory: Killed process 12786 (firefox-bin). Jan 20 13:22:20 wiggum kernel: oom-killer: gfp_mask=0xd1 Jan 20 13:22:20 wiggum kernel: DMA per-cpu: Jan 20 13:22:20 wiggum kernel: cpu 0 hot: low 2, high 6, batch 1 Jan 20 13:22:20 wiggum kernel: cpu 0 cold: low 0, high 2, batch 1 Jan 20 13:22:20 wiggum kernel: Normal per-cpu: Jan 20 13:22:20 wiggum kernel: cpu 0 hot: low 32, high 96, batch 16 Jan 20 13:22:20 wiggum kernel: cpu 0 cold: low 0, high 32, batch 16 Jan 20 13:22:20 wiggum kernel: HighMem per-cpu: empty Jan 20 13:22:20 wiggum kernel: Jan 20 13:22:20 wiggum kernel: Free pages: 215112kB (0kB HighMem) Jan 20 13:22:20 wiggum kernel: Active:97117 inactive:15986 dirty:2693 writeback:0 unstable:0 free:53778 slab:6223 mapped:85471 pagetables:1948 Jan 20 13:22:20 wiggum kernel: DMA free:4152kB min:60kB low:72kB high:88kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no Jan 20 13:22:20 wiggum kernel: protections[]: 0 0 0 Jan 20 13:22:20 wiggum kernel: Normal free:210960kB min:4028kB low:5032kB high:6040kB active:388468kB inactive:63944kB present:1031360kB pages_scanned:0 all_unreclaimable? no Jan 20 13:22:20 wiggum kernel: protections[]: 0 0 0 Jan 20 13:22:20 wiggum kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Jan 20 13:22:20 wiggum kernel: protections[]: 0 0 0 Jan 20 13:22:20 wiggum kernel: DMA: 524*4kB 125*8kB 66*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4152kB Jan 20 13:22:20 wiggum kernel: Normal: 29382*4kB 8689*8kB 669*16kB 91*32kB 29*64kB 10*128kB 4*256kB 4*512kB 0*1024kB 2*2048kB 0*4096kB = 210960kB Jan 20 13:22:20 wiggum kernel: HighMem: empty Jan 20 13:22:20 wiggum kernel: Swap cache: add 35388, delete 32397, find 5465/7365, race 0+0 Jan 20 13:22:20 wiggum kernel: Out of Memory: Killed process 12909 (xmms).
I've kept the timestamps this time, as you can see there are 5 seconds