___ _ ______ ___
/ _ \| |/ / __ `__ \
/ __/> </ / / / / /
\___/_/|_/_/ /_/ /_/
EXtra Memory
Mike Kane and Bryan W. Lewis, September 2018
Kyle from Progressive at our R meetup earlier in 2018:
Some folks still like SAS because many methods can work on arbitrarily large data—maybe slowly, but they work.
R is fundamentally an in-memory computing environment.
Out of core computing in R...
- ff -Adler and Oehlschlägel
- bigmemory/bigalgebra (only matrices) -Kane and Emerson
- biglm/bigglm -Lumley
- iotools -Urbanek
- manual chunking Map/Reduce, foreach, etc.
- dplyr/sparklyr/... (external processing)
...and probably many others.
These specialized approaches require you to adapt to them.
The benefit is great performance is sometimes possible.
The problem is that things don't "just work." Many analyses can be tricky and very hard to implement! (Especially with packages using compiled code.)
Out of core getting interesting thanks to very fast, low-latency SSD and Optane storage media today.
Well, what about kernel swap to storage media?
- page-based, with attempts to map sequential pages together
- global in operation and can adversely affect entire OS
- works remarkably well in general
exm - a custom memory allocator that maps large allocations to files
- allocation- and page-fault based, always maintains sequential page order
- paging behavior tunable using madvise
- local operation specific to individual programs
(doesn't muck up rest of OS)
- dynamic allocations (swap is too, kinda)
- transparent to applications (like swap)
Started way back in 2005, sat for a long time. Dusting it off again.
Best together:
swap for handling lots of small allocations
exm for handling large allocations
swap + exm "just works"
exm simple command-line, C and R-package API:
- where to allocate exm_path()
- allocation threshold exm_threshold()
- control fork behavior (share/copy-on-write*/duplicate)
- tracemem()-like stuff...
*Copy in-core on write only for now.
Linux-only for now.
exm R --quiet
> library(exm)
> exm_threshold(1e8)
> system.time({x = matrix(runif(2000000 * 1000),
ncol=1000)})
user system elapsed
59.968 31.284 107.004
> object.size(x)
16000000200 bytes
> system.time({q = qr(x)})
user system elapsed
5403.564 3669.676 14393.603
Tested on my old home PC with a single AMD A10-7850K with 4 cores,
16 GB DDR4 RAM, NVME-attached 250GB Samsung SSD 850 EVO drive.
(*) figures from an HP ProLiant DL580 G7 w/1TB RAM and OMP_NUM_CORES=4.
Both using Ubuntu + OpenBLAS.
One more thing...this works for anything (not just R).
One more thing...this works for
anything (not just R).
One more even crazier thing...
distributed virtual shared memory with shim:
https://github.com/bwlewis/shim
(This project started as a way to define and allocate into virtual DSM over
Linux clusters using PVFS2.)