Linux VM and swapping

Preface

RAM always was a critycal resource. Swapping is a mechanism that allows system to operate when data size is more than RAM size.

Most of people think that swap is bad because it slows down system. This opinion is right and it's wrong. When system operates near-to-limits of RAM, swap allows to operate more efficient. Let's examine VM (virtual memory) inner process and lifecycles?

Alloc VM lifecycle

Process started and is working, now it requires more RAM. What happend here? There are two phases - commiting memory and allocating memory. Commiting is action when some amount of virtual pages assigned to process and allocating occurs when physical pages assigned to virtual pages.
The caveat: what happend if kernel has NO free pages?

MMAP VM lifecycle

Process started and is working, now it wants to read some pages from file and access it. There are two phases - mapping and access

The mapping phase:

The access phase:

The caveat: again - what happend if kernel has NO free pages?

In-kernel allocations

Within previous topics, we refer to "kernel allocation" in "caveates" notes. How kernel deal with allocation requests? Kernel counts all pages and has a list of used and free pages.

During the background activity kernel also flushes dirty pages to make it "clean". There are a special cases when dirty pages is not flushed but they are rare enough. Kernel also tries to keep some count of pages in free list to complete some specific highest-priority allocations.

Acting near the RAM boundaries

When acting near the all RAM is used, we can get one interesting case: the program that actively uses RAM (like browser or image editor) has many documents (or objects) created opened for read. But in facts these objects (and its' pages) is not modified and may be even not accessed. It means that a lots of memory is commited, allocated but not used.

In this case allocation doesn't ends with allocating new page (we have not enough memory), allocation doesn't ends with flushing pagecache (we have no dirty pages because we doesn't change files on disk). So, allocation may be complete only with swapping or dropping out clean pagecache page.

And finally, if you have no swap enabled, the only way is dropping clean pagecache pages.

The problem is that this cache pages mostly contains some valuable shared data and shared library code (shared libraries are mmapped to process address space). And when the next round starts and code that just been dropped from cache must be executed again, system will perform filesystem read.

System could swap out last recently used image editors' and browsers' data and keep the latency because swapping out rarely used data is better than swapping out frequently used pagecache, but you disabled it by yourself when disabled swap

The conclusion: always keep some amount of swap space

Overcommiting

Overcommiting allows kernel to commit more virtual memory than it can serve (kernel can server only RAM+SWAP pages really). But lots of apps tries to allocate large enough memory space and uses only part of it, or will use it significantly later. This mean that with some amount of a luck kernel may commit to applications more virtual memory than it has with a hope that when a page must be allocated it already will be available - i.e. some apps finish or other app will release allocated pages.

OOM killer and guaranteed allocation

But if the hope we mention before fail, you will get OOM-killer (out-of-memory killer). OOM killer kills process that failed witin allocation and often it's not a process that requested too many RAM. Oops.

To avoid OOM condition you can disable overcommit and limit commiting to some size, like a "swap size + 90% of RAM size". Disable overcommiting will end with application will not allow to commit memory if this commit can't be performed with allocation. The smart enough apps getting fail in allocation will flush some unused data to disk and reuse previously commited and allocated memory.

But mostly app that fail in memory commit (sbrk) just crashes, but at least it happen with app that tries to allocate data and it has a chance to work with it gracefully i.e. flush inner cashes and then die.

Tuning the system

To tune VM options, you may need to use sysctl options:

vm.swappinesscontrols swapping tendency
0 - avoid swap at all
100 - swap any time when there is a choice
vm.dirty_ratiocontrols dirty pages ratio after reaching flushing will start
vm.overcommit_memorycontrols overcommiting ability (0 - enabled, 2 - disabled)
vm.overcommit_ratiocontrols commiting factor and limit commit to RAM_size + <factor>*SWAP_size
vm.minfreekbytesminimum amount of free memory the kernel tries to keep

Artemy Kapitula, software developer at Mail.RU Cloud Solutions
artemy.kapitula@gmail.com