Por que o OOM-killer invocou mesmo quando uma quantidade enorme de RAM está livre

8

Na instância xlarge ubuntu suportada pelo EC2, oom-killer está sendo chamado. Na saída / var / log / syslog abaixo, parece que ZONE_NORMAL está ficando sem memória:

Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB

mas por que ZONE_NORMAL está alocado apenas 11 MB de 15 GB de RAM total? Ou existe algum outro motivo para ficar sem memória?

A RAM disponível na máquina (instância xlarge) é de 15 GB. A soma da coluna rss no log abaixo é 3.7GB e a da total_vm é 11.4GB.

Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456146] node invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456153] node cpuset=/ mems_allowed=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456157] Pid: 639, comm: node Not tainted 2.6.38-1-virtual #28-Ubuntu
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456160] Call Trace:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456170]  [<ffffffff810b259d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456189]  [<ffffffff81106a01>] ? dump_header+0x91/0x1e0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456196]  [<ffffffff81007e7f>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456203]  [<ffffffff815c92de>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456208]  [<ffffffff8110705d>] ? oom_kill_process+0x8d/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456213]  [<ffffffff81107492>] ? out_of_memory+0x102/0x240
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456218]  [<ffffffff8110c209>] ? __alloc_pages_nodemask+0x709/0x7f0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456224]  [<ffffffff81105c78>] ? filemap_fault+0x398/0x490
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456230]  [<ffffffff811414bd>] ? alloc_pages_current+0x9d/0x110
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456235]  [<ffffffff8104338b>] ? pte_alloc_one+0x1b/0x50
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456239]  [<ffffffff8100530e>] ? xen_pud_val+0xe/0x10
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456245]  [<ffffffff811255c5>] ? __pte_alloc+0x35/0x100
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456250]  [<ffffffff81129559>] ? handle_mm_fault+0x129/0x250
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456257]  [<ffffffff815ccf82>] ? do_page_fault+0x1a2/0x540
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456262]  [<ffffffff8109aacb>] ? do_futex+0xbb/0x210
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456266]  [<ffffffff8109ac9b>] ? sys_futex+0x7b/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456274]  [<ffffffff814c2975>] ? sys_recvmsg+0x75/0x90
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456278]  [<ffffffff815c9a15>] ? page_fault+0x25/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456282] Mem-Info:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456284] Node 0 DMA per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456288] CPU    0: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456290] CPU    1: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456293] CPU    2: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456295] CPU    3: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456299] Node 0 DMA32 per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456303] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456307] CPU    1: hi:  186, btch:  31 usd:  81
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456310] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456314] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456317] Node 0 Normal per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456321] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456323] CPU    1: hi:  186, btch:  31 usd: 175
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456326] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456329] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456334] active_anon:3726183 inactive_anon:44 isolated_anon:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456335]  active_file:214 inactive_file:237 isolated_file:64
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456336]  unevictable:0 dirty:1 writeback:0 unstable:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456337]  free:18955 slab_reclaimable:2811 slab_unreclaimable:7277
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456338]  mapped:256 shmem:101 pagetables:12634 bounce:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456341] Node 0 DMA free:15880kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:15688kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456353] lowmem_reserve[]: 0 4024 15142 15142
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456360] Node 0 DMA32 free:48596kB min:4180kB low:5224kB high:6268kB active_anon:3968448kB inactive_anon:32kB active_file:168kB inactive_file:208kB unevic
table:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:328kB shmem:80kB slab_reclaimable:2208kB slab_unreclaimable:8604kB kernel_stack:48kB pagetabl
es:9724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:601 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456371] lowmem_reserve[]: 0 0 11117 11117
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456377] Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB
unevictable:0kB isolated(anon):0kB isolated(file):256kB present:11384720kB mlocked:0kB dirty:4kB writeback:0kB mapped:696kB shmem:324kB slab_reclaimable:9036kB slab_unreclaimable:20504kB kernel_stack:20
16kB pagetables:40812kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2190 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456388] lowmem_reserve[]: 0 0 0 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456393] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456409] Node 0 DMA32: 715*4kB 525*8kB 343*16kB 245*32kB 147*64kB 50*128kB 16*256kB 5*512kB 2*1024kB 0*2048kB 1*4096kB = 48996kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456423] Node 0 Normal: 1919*4kB 5*8kB 2*16kB 2*32kB 1*64kB 1*128kB 2*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 11588kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456438] 768 total pagecache pages
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456440] 0 pages in swap cache
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456443] Swap cache stats: add 0, delete 0, find 0/0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456445] Free swap  = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456447] Total swap = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495285] 3934192 pages RAM
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495290] 91275 pages reserved
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495291] 1661 pages shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495293] 3822220 pages non-shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495296] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495310] [  228]     0   228     4297       71   2       0             0 upstart-udev-br
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495315] [  242]     0   242     5353      123   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495319] [  319]     0   319     5352      126   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495323] [  320]     0   320     5352      126   0     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495327] [  392]     0   392     1771      124   2       0             0 dhclient
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495332] [  467]     0   467    12366      140   3     -17         -1000 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495336] [  474]   101   474    47476      160   2       0             0 rsyslogd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495340] [  502]   102   502     5927      108   2       0             0 dbus-daemon
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495344] [  534]     0   534     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495348] [  539]     0   539     1035       27   0       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495352] [  543]     0   543    13589     2510   1       0             0 supervisord
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495356] [  544]     0   544     1035       25   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495360] [  545]     0   545     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495364] [  547]     0   547     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495368] [  552]     0   552     5302       65   1       0             0 cron
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495372] [  553]     0   553     4753       44   1       0             0 atd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495376] [  565]     0   565     3946       46   2       0             0 irqbalance
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495380] [  571]   106   571    35760     3859   1       0             0 memcached
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495384] [  637]  1000   637   222128    36427   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495388] [  639]  1000   639   189634     4396   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495392] [  696]     0   696     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495396] [  755]     0   755    31312      372   2       0             0 console-kit-dae
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495401] [ 2404]     0  2404    35058     3644   1       0             0 carbon-cache.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495405] [ 2601]  1000  2601     5982      199   1       0             0 screen
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495409] [ 2602]  1000  2602     5576      766   0       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495413] [ 2661]     0  2661     6801       67   1       0             0 sudo
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495417] [ 2663]     0  2663     7133      851   1       0             0 run-graphite-de
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495421] [ 2668]     0  2668     1056       25   1       0             0 sh
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495425] [ 2669]     0  2669    12906     3942   1       0             0 django-admin.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495429] [ 2670]     0  2670    50313    11546   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495433] [ 2785]     0  2785    15953      489   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495437] [ 3123]  1000  3123  1184911   266122   1       0             0 java
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495443] [18135]     0 18135  5884532     9128   1       0             0 mongod
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495449] [12574]  1000 12574   966781   932357   2       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495455] [30139]  1000 30139   759010   741642   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495461] [14186]    33 14186    19905     4657   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495465] [14187]    33 14187    19726     4527   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495470] [14190]  1000 14190    23767     8459   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495474] [14201]  1000 14201  1706261  1687237   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495478] [24553]     0 24553    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495482] [24584]  1000 24584    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495486] [24585]  1000 24585     6464     1655   1       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495490] Out of memory: Kill process 14201 (python) score 439 or sacrifice child
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495509] Killed process 14201 (python) total-vm:6825044kB, anon-rss:6748848kB, file-rss:100kB

Obrigado pela ajuda!

Tarkeshwar
fonte
4
Porque você não desativou oom-killer?
precisa
2
Talvez você deva incluir informações sobre como fazer isso.
Mdpc
pmap -x <pid> mostra valores de rss em kb. No relatório de falha do oom, a coluna rss está no número de páginas. Multiplique pelo tamanho da página (4kB).
tarkeshwar

Respostas:

13

Parece OOM bastante legítimo para mim:

  1. Você tem um nó NUMA (ou seja, nó 0) que possui três zonas:
    1. DMA que tem 15880kB grátis
    2. DMA32 que tem 48996kB grátis
    3. Normal que tem 11588kB grátis
  2. Quase toda a sua memória 3968448kB + 10936284kBestá na active_anonlista LRU.
  3. Os tamanhos de memória na tabela estão em páginas (que na sua arquitetura provavelmente têm 4kb), portanto somam 14,8Gb, não 3,7.
SaveTheRbtz
fonte
0

Ambos estão certos 3.7Gb (4) = 14.8Gb. Cada valor de rss deve ser multiplicado pelo tamanho das páginas para que o tamanho real seja alocado na memória. No descritor de memória, o rss é apenas o número de / * páginas alocadas * /

Saudações

Samuel Paz Nuñez
fonte