Persistent memory for transient data
Current kernels treat persistent memory as a device. The memory managed by that device can host a filesystem or be mapped directly into a process's address space (or both), but it will only be used by processes that ask for it explicitly. This patch set from Dave Hansen can change that behavior, though. It creates a new device driver that takes any persistent memory assigned to it and hotplugs it into the system as a range of ordinary RAM; after that, it will be given over to processes to satisfy normal memory-allocation requests. A portion (or all) of the system's persistent memory can be given over to this use, as the system administrator wishes.
Persistent memory used in this mode looks like ordinary memory, but the two are still not exactly the same. In particular, while persistent memory is fast, it is still not as fast as normal RAM. So users may well want to ensure that some applications use regular memory (DRAM) while others are relegated to persistent memory that is masquerading as the regular variety. When persistent memory is added to the system in this way, it shows up under one or more special NUMA nodes, so the usual memory-policy mechanisms can be used to control which processes use it. As Hansen suggested, a cloud provider could use this mechanism to offer two classes of virtual machines, with the cheaper ones confined mostly to the slower, persistent memory.
Hansen's patches are mostly uncontroversial; that is not entirely true of the other patch set, though. Intel has developed a hardware feature known as "memory mode", which is another way to use persistent memory as if it were DRAM. It differs, though, in that memory mode also takes over the system's real DRAM and uses it as a direct-mapped cache for persistent memory. An application that exhibits good cache behavior will be able to use persistent memory at something close to DRAM speeds; things will slow down, though, in the presence of a lot of cache contention.
The fact that the cache is direct-mapped can make contention problems worse. Unlike an associative cache, a direct-mapped cache only has one slot available for any given memory address; if the data of interest is not in that particular slot, it cannot be in the cache at all. Making effective use of such an arrangement requires a memory-allocation pattern that will spread accesses across the entire cache. Otherwise, an applications memory may end up mapped to a relatively small number of cache slots and it will end up contending with itself — and running slowly.
The Linux memory-management system has no awareness of this kind of caching, though, and thus makes no provisions for using the cache effectively. The result is inconsistent performance at best, and heavy cache contention at worst; cache utilization tends to degrade over time, leading to situations where some high-performance users end up periodically rebooting their systems to restore performance. Linux might achieve world domination even with such behavior, but parts of that world would be likely to be looking for a new overlord.
The proposed solution, in the form of this patch set from Dan Williams, is simple enough: randomize the order in which memory appears on the free lists so that allocations will be more widely scattered. The initial randomization is done at system boot, when memory (in relatively large blocks) is shuffled. Over time, though, the system is likely to undo that randomization; mechanisms like memory compaction are designed to clean up fragmentation messes, for example. To avoid the creation of too much order, the patch set randomizes the placement of new large blocks in the free lists as they are created, hopefully keeping access patterns scattered over the lifetime of the system.
Williams cited some benchmarks that show performance improvements from this randomization when a direct-mapped cache is in use. Perhaps most importantly, the long-term performance levels out and remains predictable over the life of the system rather than degrading over time. Even so, this patch set has proved to be a hard sell with the memory-management developers, who fear its effects on performance in general. The shuffling only happens if the system is detected to be running in memory mode (or if it has been explicitly enabled with a command-line parameter), so it should not have any effect on most systems. Michal Hocko eventually came around to a grudging acceptance of the patches. Mel Gorman, instead, has withheld his approval, though he has also chosen not to try to block it from being merged.
One other developer who does support the patch is Kees Cook, who sees some potential security benefits from the randomization. The security benefits have, in general, been even harder to sell than the performance benefits, especially since nobody has provided an example of an attack that would be blocked by the free-list shuffling. Kernel developers can be unfavorably inclined toward security patches even when clear security benefits have been demonstrated; protestations that a change might, maybe, make things better, possibly, someday, tend not to get too far.
At this point, the work is seemingly complete and has gone to Andrew
Morton, who will have to make a decision on whether to accept it. He has
not tipped his hand so far, so the direction he will go is not clear. In
the end, though, this is a relatively focused patch set that should help
some use cases while having no effect on the rest. It would not be
surprising if it found its way in sometime well before we all get our
persistent-memory laptops to use in our autonomous flying cars.
Index entries for this article | |
---|---|
Kernel | Memory management/Nonvolatile memory |
Posted Jan 22, 2019 2:36 UTC (Tue)
by pabs (subscriber, #43278)
[Link] (16 responses)
Posted Jan 22, 2019 4:10 UTC (Tue)
by sbates (subscriber, #106518)
[Link] (13 responses)
Posted Jan 22, 2019 9:58 UTC (Tue)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted Jan 22, 2019 11:42 UTC (Tue)
by matthias (subscriber, #94967)
[Link] (2 responses)
Posted Jan 22, 2019 15:10 UTC (Tue)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted Jan 22, 2019 16:36 UTC (Tue)
by matthias (subscriber, #94967)
[Link]
Posted Jan 22, 2019 13:25 UTC (Tue)
by azaghal (subscriber, #47042)
[Link] (8 responses)
With disks that's easy to do due to nature of their physical and software interfaces.
Not a question directly for you, but probably fits the context well if someone can chime in :)
Posted Jan 22, 2019 14:13 UTC (Tue)
by jwkblades (guest, #129049)
[Link] (6 responses)
If you want graceful reboots and shutdowns to be a use case in which the NVDIMMs retain their content, you actually have to go through a fairly wide gauntlet - in our case it required a custom BIOS and CPLD (power management) firmware, and even there it took multiple iterations to get it "right".
Posted Jan 22, 2019 15:24 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link] (1 responses)
Posted Jan 22, 2019 15:37 UTC (Tue)
by jwkblades (guest, #129049)
[Link]
Posted Jan 22, 2019 17:52 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Jan 22, 2019 18:14 UTC (Tue)
by sbates (subscriber, #106518)
[Link] (2 responses)
There are also Intel Optane DIMM enabled servers coming to market very soon from companies like SuperMicro.
Also Optane DIMM enabled servers can be rented on Google cloud through an alpha program. See their website for more details on that offering. I’d assume other public cloud vendors will do something similar. This offers NVDIMMs at a better cost and capacity point than the DRAM based ones apparently....
Posted Jan 22, 2019 18:21 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
This kinda looks like the NVDIMM is basically a vaporware.
Posted Jan 22, 2019 18:43 UTC (Tue)
by sbates (subscriber, #106518)
[Link]
https://docs.google.com/forms/d/1IByNBv-7n9FJ1cjGvrjcwILr...
The DRAM based NVDIMM-Ns however are very real and are used in production for storage and database workloads. However their cost and capacity make them less interesting (to
Posted Jan 22, 2019 15:06 UTC (Tue)
by sbates (subscriber, #106518)
[Link]
1. You could trust the PM hardware vendor to flush all persistent data. However trusting hardware vendors to do the right thing in all corner cases can lead to disappointment.
2. You can use the CPU memory controller to encrypt all data going to the NVDIMMs and then throw away the keys when requested by the user. This is related to the memory encryption patches I mentioned in my earlier comment.
3. You can use SW to encrypt your application data before you commit it to memory.
The second approach is *much* more visible to the user than the first and provides good performance. The third option probably lacks the performance needed to make the technology interesting.
With block devices option 1 equates to self-encrypting drives and they have been notoriously easy to hack. See [1] for a rather terrifying treatise on this topic. I suspect NVDIMMs will face similar challenges.
[1] https://www.ru.nl/publish/pages/909275/draft-paper_1.pdf
Posted Jan 22, 2019 23:41 UTC (Tue)
by hansendc (subscriber, #7363)
[Link] (1 responses)
Posted Jan 23, 2019 11:04 UTC (Wed)
by nim-nim (subscriber, #34454)
[Link]
https://redmondmag.com/articles/2018/11/06/microsoft-ssd-...
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data
I've been hearing this for the last 2 years. I tried writing to various vendors to ask for a sample but so far they are controlled tighter than Trump's tax returns.
Persistent memory for transient data
some) than their PM based counterparts.
Persistent memory for transient data
Persistent memory for transient data
Persistent memory for transient data