Messages in this thread |  | From | Waiman Long <> | Date | Thu, 30 Jan 2025 12:41:19 -0500 | Subject | Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle |
| |
On 1/30/25 12:32 PM, Shakeel Butt wrote: > On Thu, Jan 30, 2025 at 12:19:38PM -0500, Waiman Long wrote: >> On 1/30/25 12:05 PM, Roman Gushchin wrote: >>> On Thu, Jan 30, 2025 at 10:05:34AM -0500, Waiman Long wrote: >>>> On 1/30/25 3:15 AM, Michal Hocko wrote: >>>>> On Wed 29-01-25 14:12:04, Waiman Long wrote: >>>>>> Since commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing >>>>>> reclaim over memory.high"), the amount of allocator throttling had >>>>>> increased substantially. As a result, it could be difficult for a >>>>>> misbehaving application that consumes increasing amount of memory from >>>>>> being OOM-killed if memory.high is set. Instead, the application may >>>>>> just be crawling along holding close to the allowed memory.high memory >>>>>> for the current memory cgroup for a very long time especially those >>>>>> that do a lot of memcg charging and uncharging operations. >>>>>> >>>>>> This behavior makes the upstream Kubernetes community hesitate to >>>>>> use memory.high. Instead, they use only memory.max for memory control >>>>>> similar to what is being done for cgroup v1 [1]. >>>>> Why is this a problem for them? >>>> My understanding is that a mishaving container will hold up memory.high >>>> amount of memory for a long time instead of getting OOM killed sooner and be >>>> more productively used elsewhere. >>>>>> To allow better control of the amount of throttling and hence the >>>>>> speed that a misbehving task can be OOM killed, a new single-value >>>>>> memory.high.throttle control file is now added. The allowable range >>>>>> is 0-32. By default, it has a value of 0 which means maximum throttling >>>>>> like before. Any non-zero positive value represents the corresponding >>>>>> power of 2 reduction of throttling and makes OOM kills easier to happen. >>>>> I do not like the interface to be honest. It exposes an implementation >>>>> detail and casts it into a user API. If we ever need to change the way >>>>> how the throttling is implemented this will stand in the way because >>>>> there will be applications depending on a behavior they were carefuly >>>>> tuned to. >>>>> >>>>> It is also not entirely sure how is this supposed to be used in >>>>> practice? How do people what kind of value they should use? >>>> Yes, I agree that a user may need to run some trial runs to find a proper >>>> value. Perhaps a simpler binary interface of "off" and "on" may be easier to >>>> understand and use. >>>>>> System administrators can now use this parameter to determine how easy >>>>>> they want OOM kills to happen for applications that tend to consume >>>>>> a lot of memory without the need to run a special userspace memory >>>>>> management tool to monitor memory consumption when memory.high is set. >>>>> Why cannot they achieve the same with the existing events/metrics we >>>>> already do provide? Most notably PSI which is properly accounted when >>>>> a task is throttled due to memory.high throttling. >>>> That will require the use of a userspace management agent that looks for >>>> these stalling conditions and make the kill, if necessary. There are >>>> certainly users out there that want to get some benefit of using memory.high >>>> like early memory reclaim without the trouble of handling these kind of >>>> stalling conditions. >>> So you basically want to force the workload into some sort of a proactive >>> reclaim but without an artificial slow down? > I wouldn't call it a proactive reclaim as reclaim will happen > synchronously in allocating thread. > >>> It makes some sense to me, but >>> 1) Idk if it deserves a new API, because it can be relatively easy implemented >>> in userspace by a daemon which monitors cgroups usage and reclaims the memory >>> if necessarily. No kernel changes are needed. >>> 2) If new API is introduced, I think it's better to introduce a new limit, >>> e.g. memory.target, keeping memory.high semantics intact. >> Yes, you are right about that. Introducing a new "memory.target" without >> disturbing the existing "memory.high" semantics will work for me too. >> > So, what happens if reclaim can not reduce usage below memory.target? > Infinite reclaim cycles or just give up?
Just give up in this case. It is used mainly to reduce the chance of reaching max and cause OOM kill.
Cheers, Longman
|  |