Persistent memory for transient data

By Jonathan Corbet
January 21, 2019

Arguably, the most notable characteristic of persistent memory is that it is persistent: it retains its contents over power cycles. One other important aspect of these persistent-memory arrays that, we are told, will soon be everywhere, is their sheer size and low cost; persistent memory is a relatively inexpensive way to attach large amounts of memory to a system. Large, cheap memory arrays seem likely to be attractive to users who may not care about persistence and who can live with slower access speeds. Supporting such users is the objective of a pair of patch sets that have been circulating in recent months.

Current kernels treat persistent memory as a device. The memory managed by that device can host a filesystem or be mapped directly into a process's address space (or both), but it will only be used by processes that ask for it explicitly. This patch set from Dave Hansen can change that behavior, though. It creates a new device driver that takes any persistent memory assigned to it and hotplugs it into the system as a range of ordinary RAM; after that, it will be given over to processes to satisfy normal memory-allocation requests. A portion (or all) of the system's persistent memory can be given over to this use, as the system administrator wishes.

Persistent memory used in this mode looks like ordinary memory, but the two are still not exactly the same. In particular, while persistent memory is fast, it is still not as fast as normal RAM. So users may well want to ensure that some applications use regular memory (DRAM) while others are relegated to persistent memory that is masquerading as the regular variety. When persistent memory is added to the system in this way, it shows up under one or more special NUMA nodes, so the usual memory-policy mechanisms can be used to control which processes use it. As Hansen suggested, a cloud provider could use this mechanism to offer two classes of virtual machines, with the cheaper ones confined mostly to the slower, persistent memory.

Hansen's patches are mostly uncontroversial; that is not entirely true of the other patch set, though. Intel has developed a hardware feature known as "memory mode", which is another way to use persistent memory as if it were DRAM. It differs, though, in that memory mode also takes over the system's real DRAM and uses it as a direct-mapped cache for persistent memory. An application that exhibits good cache behavior will be able to use persistent memory at something close to DRAM speeds; things will slow down, though, in the presence of a lot of cache contention.

The fact that the cache is direct-mapped can make contention problems worse. Unlike an associative cache, a direct-mapped cache only has one slot available for any given memory address; if the data of interest is not in that particular slot, it cannot be in the cache at all. Making effective use of such an arrangement requires a memory-allocation pattern that will spread accesses across the entire cache. Otherwise, an applications memory may end up mapped to a relatively small number of cache slots and it will end up contending with itself — and running slowly.

The Linux memory-management system has no awareness of this kind of caching, though, and thus makes no provisions for using the cache effectively. The result is inconsistent performance at best, and heavy cache contention at worst; cache utilization tends to degrade over time, leading to situations where some high-performance users end up periodically rebooting their systems to restore performance. Linux might achieve world domination even with such behavior, but parts of that world would be likely to be looking for a new overlord.

The proposed solution, in the form of this patch set from Dan Williams, is simple enough: randomize the order in which memory appears on the free lists so that allocations will be more widely scattered. The initial randomization is done at system boot, when memory (in relatively large blocks) is shuffled. Over time, though, the system is likely to undo that randomization; mechanisms like memory compaction are designed to clean up fragmentation messes, for example. To avoid the creation of too much order, the patch set randomizes the placement of new large blocks in the free lists as they are created, hopefully keeping access patterns scattered over the lifetime of the system.

Williams cited some benchmarks that show performance improvements from this randomization when a direct-mapped cache is in use. Perhaps most importantly, the long-term performance levels out and remains predictable over the life of the system rather than degrading over time. Even so, this patch set has proved to be a hard sell with the memory-management developers, who fear its effects on performance in general. The shuffling only happens if the system is detected to be running in memory mode (or if it has been explicitly enabled with a command-line parameter), so it should not have any effect on most systems. Michal Hocko eventually came around to a grudging acceptance of the patches. Mel Gorman, instead, has withheld his approval, though he has also chosen not to try to block it from being merged.

One other developer who does support the patch is Kees Cook, who sees some potential security benefits from the randomization. The security benefits have, in general, been even harder to sell than the performance benefits, especially since nobody has provided an example of an attack that would be blocked by the free-list shuffling. Kernel developers can be unfavorably inclined toward security patches even when clear security benefits have been demonstrated; protestations that a change might, maybe, make things better, possibly, someday, tend not to get too far.

At this point, the work is seemingly complete and has gone to Andrew Morton, who will have to make a decision on whether to accept it. He has not tipped his hand so far, so the direction he will go is not clear. In the end, though, this is a relatively focused patch set that should help some use cases while having no effect on the rest. It would not be surprising if it found its way in sometime well before we all get our persistent-memory laptops to use in our autonomous flying cars.

Index entries for this article
Kernel	Memory management/Nonvolatile memory

Persistent memory for transient data

Posted Jan 22, 2019 2:36 UTC (Tue) by pabs (subscriber, #43278) [Link] (16 responses)

I'm reminded of the cold boot attacks:

https://en.wikipedia.org/wiki/Cold_boot_attack

Persistent memory for transient data

Posted Jan 22, 2019 4:10 UTC (Tue) by sbates (subscriber, #106518) [Link] (13 responses)

Hopefully these patches will be combined with the memory encryption ones discussed earlier this week on LWN ;-). Otherwise stealing DIMMs out of servers could get a lot more interesting....

Persistent memory for transient data

Posted Jan 22, 2019 9:58 UTC (Tue) by Sesse (subscriber, #53779) [Link] (3 responses)

Well, you'd have to treat them as any other disks you'd sell.

Persistent memory for transient data

Posted Jan 22, 2019 11:42 UTC (Tue) by matthias (subscriber, #94967) [Link] (2 responses)

This was exactly what sbates was refering to. As all other disks, they should only contain encrypted data. Therefore, we need the memory encryption patches.

Persistent memory for transient data

Posted Jan 22, 2019 15:10 UTC (Tue) by Sesse (subscriber, #53779) [Link] (1 responses)

You can't just take it out of the caching mode and then write zeros everywhere?

Persistent memory for transient data

Posted Jan 22, 2019 16:36 UTC (Tue) by matthias (subscriber, #94967) [Link]

Not after the DIMMs were stolen from the server. This was the threat discussed in this sub-thread. And usually, you are not notified in advance before a crime is commited.

Persistent memory for transient data

Posted Jan 22, 2019 13:25 UTC (Tue) by azaghal (subscriber, #47042) [Link] (8 responses)

On another note, what would be the correct way to safely wipe such memory without it being in use by the system?

With disks that's easy to do due to nature of their physical and software interfaces.

Not a question directly for you, but probably fits the context well if someone can chime in :)

Persistent memory for transient data

Posted Jan 22, 2019 14:13 UTC (Tue) by jwkblades (guest, #129049) [Link] (6 responses)

Most NVDIMMs that I have played with at this point are designed to lose their contents on graceful shutdown or reboot. So, if you wanted to clear the entirety of their contents, just ensure you have flushed anything you want from the cache device (NVDIMM) to a persistent block device (SSD, HDD, tape, etc.) and reboot the system. Once it comes back up, the NVDIMMs likely will have no content on them.

If you want graceful reboots and shutdowns to be a use case in which the NVDIMMs retain their content, you actually have to go through a fairly wide gauntlet - in our case it required a custom BIOS and CPLD (power management) firmware, and even there it took multiple iterations to get it "right".

Persistent memory for transient data

Posted Jan 22, 2019 15:24 UTC (Tue) by zdzichu (subscriber, #17118) [Link] (1 responses)

Really? That would make Non-Volatile DIMMs volatile, somehow defeating their reason to exist.

Persistent memory for transient data

Posted Jan 22, 2019 15:37 UTC (Tue) by jwkblades (guest, #129049) [Link]

Haha, yeah. I spent a relatively significant amount of time last year ensuring that they were in fact non-volatile for our use case (a cache device for IO).

Persistent memory for transient data

Posted Jan 22, 2019 17:52 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Right now you can't buy real NVDIMMs. At most there are NVDIMM emulators based on regular DIMMs with a battery backup.

Persistent memory for transient data

Posted Jan 22, 2019 18:14 UTC (Tue) by sbates (subscriber, #106518) [Link] (2 responses)

Well a DRAM based DIMM with a NVM based backup is a NVDIMM-N. There is no requirement the primary memory itself be made from persistent media. Plenty of vendors sell these today and many server vendors have motherboards that accept these NVDIMM-Ns.

There are also Intel Optane DIMM enabled servers coming to market very soon from companies like SuperMicro.

Also Optane DIMM enabled servers can be rented on Google cloud through an alpha program. See their website for more details on that offering. I’d assume other public cloud vendors will do something similar. This offers NVDIMMs at a better cost and capacity point than the DRAM based ones apparently....

Persistent memory for transient data

Posted Jan 22, 2019 18:21 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> There are also Intel Optane DIMM enabled servers coming to market very soon from companies like SuperMicro.
I've been hearing this for the last 2 years. I tried writing to various vendors to ask for a sample but so far they are controlled tighter than Trump's tax returns.

This kinda looks like the NVDIMM is basically a vaporware.

Persistent memory for transient data

Posted Jan 22, 2019 18:43 UTC (Tue) by sbates (subscriber, #106518) [Link]

Ha. I’d agree the rollout of Optane DIMMs has been disappointing to say the least. However the GCP alpha machines appear to be real. If you want to try and request access you can do so here:

https://docs.google.com/forms/d/1IByNBv-7n9FJ1cjGvrjcwILr...

The DRAM based NVDIMM-Ns however are very real and are used in production for storage and database workloads. However their cost and capacity make them less interesting (to
some) than their PM based counterparts.

Persistent memory for transient data

Posted Jan 22, 2019 15:06 UTC (Tue) by sbates (subscriber, #106518) [Link]

Well there are a couple of ways to approach this.

1. You could trust the PM hardware vendor to flush all persistent data. However trusting hardware vendors to do the right thing in all corner cases can lead to disappointment.

2. You can use the CPU memory controller to encrypt all data going to the NVDIMMs and then throw away the keys when requested by the user. This is related to the memory encryption patches I mentioned in my earlier comment.

3. You can use SW to encrypt your application data before you commit it to memory.

The second approach is *much* more visible to the user than the first and provides good performance. The third option probably lacks the performance needed to make the technology interesting.

With block devices option 1 equates to self-encrypting drives and they have been notoriously easy to hack. See [1] for a rather terrifying treatise on this topic. I suspect NVDIMMs will face similar challenges.

[1] https://www.ru.nl/publish/pages/909275/draft-paper_1.pdf

Persistent memory for transient data

Posted Jan 22, 2019 23:41 UTC (Tue) by hansendc (subscriber, #7363) [Link] (1 responses)

The Intel NVDIMMs, at least, have hardware encryption just like disks. The end result is that you can't just take a DIMM out of one system and pop it into another to read the contents. You need to "unlock" the DIMM on a new system, just like when you transplant a hardware-encryption-protected SSD or hard drive.

Persistent memory for transient data

Posted Jan 23, 2019 11:04 UTC (Wed) by nim-nim (subscriber, #34454) [Link]

But is the encryption solid or as trivially broken as the disk encryption typically is? Like Microsoft found out recently?

https://redmondmag.com/articles/2018/11/06/microsoft-ssd-...