ngx threads: replace list with rbtree #1215

ZigzagAK · 2017-12-22T12:33:02Z

I hereby granted the copyright of the changes in this pull request
to the authors of this lua-nginx-module project.

ZigzagAK · 2017-12-22T14:05:54Z

All tests with uthreads locally is fine:
Tests : tt/023-rewrite/on-abort.t
tt/023-rewrite/on-abort.t .. ok
All tests successful.
Files=1, Tests=134, 12 wallclock secs ( 0.04 usr 0.00 sys + 0.28 cusr 0.07 csys = 0.39 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-exec.t
tt/023-rewrite/uthread-exec.t .. ok
All tests successful.
Files=1, Tests=40, 3 wallclock secs ( 0.02 usr 0.00 sys + 0.18 cusr 0.06 csys = 0.26 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-exit.t
tt/023-rewrite/uthread-exit.t .. ok
All tests successful.
Files=1, Tests=122, 6 wallclock secs ( 0.04 usr 0.00 sys + 0.31 cusr 0.10 csys = 0.45 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-redirect.t
tt/023-rewrite/uthread-redirect.t .. ok
All tests successful.
Files=1, Tests=16, 1 wallclock secs ( 0.01 usr 0.00 sys + 0.10 cusr 0.02 csys = 0.13 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-spawn.t
tt/023-rewrite/uthread-spawn.t .. ok
All tests successful.
Files=1, Tests=218, 7 wallclock secs ( 0.03 usr 0.01 sys + 0.45 cusr 0.15 csys = 0.64 CPU)
Result: PASS
Tests : tt/024-access/on-abort.t
tt/024-access/on-abort.t .. ok
All tests successful.
Files=1, Tests=134, 11 wallclock secs ( 0.03 usr 0.00 sys + 0.24 cusr 0.07 csys = 0.34 CPU)
Result: PASS
Tests : tt/024-access/uthread-exec.t
tt/024-access/uthread-exec.t .. ok
All tests successful.
Files=1, Tests=40, 2 wallclock secs ( 0.01 usr 0.00 sys + 0.14 cusr 0.02 csys = 0.17 CPU)
Result: PASS
Tests : tt/024-access/uthread-exit.t
tt/024-access/uthread-exit.t .. ok
All tests successful.
Files=1, Tests=120, 7 wallclock secs ( 0.03 usr 0.00 sys + 0.28 cusr 0.07 csys = 0.38 CPU)
Result: PASS
Tests : tt/024-access/uthread-redirect.t
tt/024-access/uthread-redirect.t .. ok
All tests successful.
Files=1, Tests=16, 1 wallclock secs ( 0.02 usr 0.00 sys + 0.11 cusr 0.00 csys = 0.13 CPU)
Result: PASS
Tests : tt/024-access/uthread-spawn.t
tt/024-access/uthread-spawn.t .. ok
All tests successful.
Files=1, Tests=186, 6 wallclock secs ( 0.04 usr 0.00 sys + 0.41 cusr 0.10 csys = 0.55 CPU)
Result: PASS

agentzh · 2017-12-22T17:39:58Z

@ZigzagAK This change is big. Are you actually seeing performance problems with the current list implementation in real world profiling?

agentzh · 2017-12-22T17:42:44Z

@ZigzagAK The travis ci failures look like real problems. You need to apply the no-pool patch to the nginx core otherwise nginx's memory pools may hide many memory bugs for small payload like in our existing test suite (furthermore, we should use asan or valgrind to run the tests to make sure more memory issues can be caught, not just those lead to segfaults).

ZigzagAK · 2017-12-23T10:55:33Z

Yes, i have a performance problem in https://github.com/ZigzagAK/lua-resty-cache-redis/tree/0.2.

https://github.com/ZigzagAK/lua-resty-cache-redis/blob/0.2/lib/resty/cache/redis_with_upsync.lua

local function default_content_handler(self, resp, desc)
   ...
  local batch = self:create_batch(desc.batch_size, function(row)
      ....
  end)

  foreach(prepared, function(key, item)
    ....
    local ok, err = batch:add("set_unsafe", item, { overwrite = true, key = key })
    if not ok then
      self:err("upsync()", function()
        return "job=", desc.job_name, " ", err
      end)
    end
    ....
  end)

https://github.com/ZigzagAK/lua-resty-cache-redis/blob/0.2/lib/resty/cache/redis.lua

function batch_class:add(f, ...)
  ....

  tinsert(self.batch, { function(...)
    assert(fun(...))
  end, { ... } })

  if #self.batch < self.size then
    return true
  end

  return self:flush()
end

function batch_class:flush()
  local threads = {}

  local ok, err = pcall(foreachi, self.batch, function(opts)
    local fun, args = unpack(opts)
    local thr = thread_spawn(fun, self.cache, unpack(args))
    assert(thr, "failed to create corutine")
    tinsert(threads, thr)
  end)

  if not ok then
    self.cache:err("batch:flush()", function()
      return err
    end)
  end

  for i=1,#threads
  do
    local result = { thread_wait(threads[i]) }
    local ok = tremove(result, 1)
    local args = { ok, result, tremove(self.batch, 1)[2] }
    if self.check_fun then
      pcall(self.check_fun, args)
    else
      self.ret[1 + #self.ret] = args
    end
  end

  return ok, err
end

I have BIG document (130k rows) and for the perfomance many threads creaded (and destroyed after completed) to interract with Redis.

This document is loaded in ~300 seconds with 100% core utilization in ngx_http_lua_get_co_ctx because O(n).

With rbtree document with 130k rows loaded in 30 seconds and not more than 60% core utilization.

With list implementation every batch will be slower than previous because search in list O(n).

ZigzagAK · 2017-12-23T11:03:50Z

flamegraph:
nginx.svg.gz

ZigzagAK · 2017-12-23T12:07:58Z

Please, review my code. May be I anything forgot.

spacewander · 2017-12-23T12:26:49Z

@ZigzagAK
I pull your branch to my machine, and found some tests are broken. For example, the TEST 9: simple user thread wait without I/O (in a user coroutine) in t/098-uthread-wait.t.

ZigzagAK · 2017-12-23T12:43:07Z

I am apply the 'no pool patch' 1.13.6 to nginx. Coredump detected.
I will look for a mistake.

ZigzagAK · 2017-12-23T14:09:24Z

I successful run the t/098-uthread-wait.t.
Found error.
But it fails because i remove coctx from rbtree and 21 (TEST 21: waiting on a dead coroutine).

static int
ngx_http_lua_uthread_wait(lua_State *L)
...
sub_coctx = ngx_http_lua_get_co_ctx(sub_co, ctx);
if (sub_coctx == NULL) {
return luaL_error(L, "no co ctx found");
}

With list this dead corutine still present in list. If i turn off deleting coctx from rbtree test is OK.

But if coctx still present in rbtree no memory are freed until the request is not completed.
No memory feed and tree is growing.

agentzh · 2017-12-25T06:16:09Z

@ZigzagAK This looks like an usual use case. Can I ask how many concurrent light threads do you create and run at a time in a single openresty server instance? And how many redis server instances are you running in the backend? I wonder if you could optimize your Lua logic to avoid creating too many light threads at the same time and just create a limited number of light threads and in each of these "worker" light thread, you can repeatedly access redis and consume more than one task. This might be much more efficient than even the rbtree light thread lookup approach. Creating too many concurrent light threads in a single request handler looks like a very unusual (if not very wrong) use case to me :)

ZigzagAK · 2017-12-25T07:22:16Z

@agentzh Total number of threads 130k / 50 ~ 2600 total. In the same time only 50 light threads are active. I know, that is not very good, but limited number of light (green) threads is more comlicated for client code ligic. If it is thre real OS threads I never be use this approach, but is is a green corutines and this approach is normal.
In the https://github.com/ZigzagAK/lua-resty-couchbase/blob/master/lib/resty/couchbase.lua i use the another approach with queue (function couchbase_session:batch(b, opts)) - limited number of threads with input queue. But in this case client code must pass all data in one call.
Yes, in lua-resty-cache-redis i can implement the same way, but i must implement in additional distributed lock, because many instances on nginx is active in the same time on different hosts. I must implement callback for the batch function to get next request. I must prolong this lock every (for example) 10 seconds. In this case i must create threads at start and stopping they manually. This is not a good idea.

Creating too many concurrent light threads in a single request handler looks like a very unusual (if not very wrong) use case to me :)

In the moment limited number of threads are active (50-200 for example). The problem with list is the 'dead' threads.

ZigzagAK · 2017-12-25T07:35:00Z

/* FIXME: we should use rbtree here to prevent O(n) lookup overhead */

;-)

ZigzagAK · 2017-12-25T08:53:39Z

This is log with load 134k records (every record for the first is readed from Redis and compared with new):
2017/12/25 11:50:04 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync begin (last modified: Mon, 25 Dec 2017 07:36:08 GMT)
2017/12/25 11:50:05 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync fetch=0.69400000572205 seconds
2017/12/25 11:50:05 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync fetch keys at 0.78200006484985 seconds
2017/12/25 11:50:09 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync prepare at 3.1299998760223 seconds
2017/12/25 11:50:10 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 10000 batched, 10000 stored, 0 failed
2017/12/25 11:50:11 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 20000 batched, 20000 stored, 0 failed
2017/12/25 11:50:12 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 30000 batched, 30000 stored, 0 failed
2017/12/25 11:50:14 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 40000 batched, 40000 stored, 0 failed
2017/12/25 11:50:15 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 50000 batched, 50000 stored, 0 failed
2017/12/25 11:50:17 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 60000 batched, 60000 stored, 0 failed
2017/12/25 11:50:18 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 70000 batched, 70000 stored, 0 failed
2017/12/25 11:50:20 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 80000 batched, 80000 stored, 0 failed
2017/12/25 11:50:23 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 90000 batched, 90000 stored, 0 failed
2017/12/25 11:50:24 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 100000 batched, 100000 stored, 0 failed
2017/12/25 11:50:27 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 110000 batched, 110000 stored, 0 failed
2017/12/25 11:50:30 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 120000 batched, 120000 stored, 0 failed
2017/12/25 11:50:33 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 130000 batched, 130000 stored, 0 failed
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync store at 26.300000190735 seconds, count=134538, stored=134538, failed=0
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync cleanup at 0 seconds, count=0
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync update last modified: Mon, 25 Dec 2017 08:49:38 GMT
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync total=30.273999929428 seconds

With list implementation total time is ~300 seconds.

agentzh · 2017-12-26T18:34:43Z

@ZigzagAK True, light threads are much cheaper than OS threads, but it is still not coming for free and still incurs some nontrivial overhead especially for large number of light thread creations and destructions. The thing is that I really do not want to complicate the existing code base just to optimize for unusual use cases. That's not really worth it. I think you would see big performance increase with limited number of recycled light threads. As I said above, should be even faster than the unlimited case with your rbtree optimization.

agentzh · 2017-12-26T18:36:18Z

@ZigzagAK BTW, you can use the ngx.semaphore API to feed your worker light threads with new jobs very efficiently.

ZigzagAK · 2017-12-26T20:22:55Z

@agentzh Thanks for the idea with semaphore. In this case i have reimplemented alghoritm with queue and semaphore. It is faster in two times. Now sync is 15 seconds.

ZigzagAK · 2017-12-26T20:25:58Z

What about this pr? You don't want to replace list with rbtree?

agentzh · 2017-12-26T21:40:32Z

@ZigzagAK Good to know that. We can leave it as is for people who are interested. I prefer simplicity here and I'm not still convinced to migrate to the more complicated rbtree approach yet for this part.

Unigoge · 2018-06-19T14:08:37Z

Could searching a list of 'dead' threads (coroutines?) be somehow related to a problem that I've described over here - https://groups.google.com/d/msg/openresty-en/3BjMNEhVvN0/EEGAy4mEBgAJ ???

Modifying lua_resty_http code to implement simplified body_reades (no coroutines for requests which are sent using simple request_uri() calls) has improved performance by 10x at least....

mergify · 2020-06-26T00:39:42Z

This pull request is now in conflict :(

agentzh · 2020-10-08T06:16:47Z

Closing this one since we have a better solution in #1800 which is superior in performance (O(1) operation now).

ZigzagAK force-pushed the thread-rbtree branch 3 times, most recently from 5e2602c to 1899416 Compare December 22, 2017 12:46

ZigzagAK force-pushed the thread-rbtree branch 3 times, most recently from 06d8430 to bc01dda Compare December 23, 2017 14:26

ngx threads: replace list with rbtree

933e96a

ZigzagAK force-pushed the thread-rbtree branch from bc01dda to 933e96a Compare December 23, 2017 15:07

thibaultcha force-pushed the master branch from a79e9aa to 315330a Compare August 27, 2019 21:35

spacewander mentioned this pull request Apr 15, 2020

Feature request: dynamic listen/close socket in lua #1690

Open

mergify bot added conflict needs-test-cases labels Jun 26, 2020

doujiang24 mentioned this pull request Jul 16, 2020

FIXME: we should use rbtree here to prevent O(n) lookup overhead in ngx_http_lua_get_co_ctx #1749

Closed

agentzh mentioned this pull request Oct 8, 2020

Optimize: made co_ctx lookup an O(1) operation by using our LuaJIT's new exdata2 API #1800

Merged

agentzh closed this Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ngx threads: replace list with rbtree #1215

ngx threads: replace list with rbtree #1215

ZigzagAK commented Dec 22, 2017

ZigzagAK commented Dec 22, 2017

agentzh commented Dec 22, 2017

agentzh commented Dec 22, 2017

ZigzagAK commented Dec 23, 2017 •

edited

Loading

ZigzagAK commented Dec 23, 2017

ZigzagAK commented Dec 23, 2017

spacewander commented Dec 23, 2017

ZigzagAK commented Dec 23, 2017

ZigzagAK commented Dec 23, 2017 •

edited

Loading

agentzh commented Dec 25, 2017

ZigzagAK commented Dec 25, 2017 •

edited

Loading

ZigzagAK commented Dec 25, 2017 •

edited

Loading

ZigzagAK commented Dec 25, 2017 •

edited

Loading

agentzh commented Dec 26, 2017

agentzh commented Dec 26, 2017

ZigzagAK commented Dec 26, 2017

ZigzagAK commented Dec 26, 2017

agentzh commented Dec 26, 2017 •

edited

Loading

Unigoge commented Jun 19, 2018

mergify bot commented Jun 26, 2020

agentzh commented Oct 8, 2020

ngx threads: replace list with rbtree #1215

ngx threads: replace list with rbtree #1215

Conversation

ZigzagAK commented Dec 22, 2017

ZigzagAK commented Dec 22, 2017

agentzh commented Dec 22, 2017

agentzh commented Dec 22, 2017

ZigzagAK commented Dec 23, 2017 • edited Loading

ZigzagAK commented Dec 23, 2017

ZigzagAK commented Dec 23, 2017

spacewander commented Dec 23, 2017

ZigzagAK commented Dec 23, 2017

ZigzagAK commented Dec 23, 2017 • edited Loading

agentzh commented Dec 25, 2017

ZigzagAK commented Dec 25, 2017 • edited Loading

ZigzagAK commented Dec 25, 2017 • edited Loading

ZigzagAK commented Dec 25, 2017 • edited Loading

agentzh commented Dec 26, 2017

agentzh commented Dec 26, 2017

ZigzagAK commented Dec 26, 2017

ZigzagAK commented Dec 26, 2017

agentzh commented Dec 26, 2017 • edited Loading

Unigoge commented Jun 19, 2018

mergify bot commented Jun 26, 2020

agentzh commented Oct 8, 2020

ZigzagAK commented Dec 23, 2017 •

edited

Loading

ZigzagAK commented Dec 23, 2017 •

edited

Loading

ZigzagAK commented Dec 25, 2017 •

edited

Loading

ZigzagAK commented Dec 25, 2017 •

edited

Loading

ZigzagAK commented Dec 25, 2017 •

edited

Loading

agentzh commented Dec 26, 2017 •

edited

Loading