Skip to content

ngx threads: replace list with rbtree #1215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ZigzagAK
Copy link

I hereby granted the copyright of the changes in this pull request
to the authors of this lua-nginx-module project.

@ZigzagAK ZigzagAK force-pushed the thread-rbtree branch 3 times, most recently from 5e2602c to 1899416 Compare December 22, 2017 12:46
@ZigzagAK
Copy link
Author

All tests with uthreads locally is fine:
Tests : tt/023-rewrite/on-abort.t
tt/023-rewrite/on-abort.t .. ok
All tests successful.
Files=1, Tests=134, 12 wallclock secs ( 0.04 usr 0.00 sys + 0.28 cusr 0.07 csys = 0.39 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-exec.t
tt/023-rewrite/uthread-exec.t .. ok
All tests successful.
Files=1, Tests=40, 3 wallclock secs ( 0.02 usr 0.00 sys + 0.18 cusr 0.06 csys = 0.26 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-exit.t
tt/023-rewrite/uthread-exit.t .. ok
All tests successful.
Files=1, Tests=122, 6 wallclock secs ( 0.04 usr 0.00 sys + 0.31 cusr 0.10 csys = 0.45 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-redirect.t
tt/023-rewrite/uthread-redirect.t .. ok
All tests successful.
Files=1, Tests=16, 1 wallclock secs ( 0.01 usr 0.00 sys + 0.10 cusr 0.02 csys = 0.13 CPU)
Result: PASS
Tests : tt/023-rewrite/uthread-spawn.t
tt/023-rewrite/uthread-spawn.t .. ok
All tests successful.
Files=1, Tests=218, 7 wallclock secs ( 0.03 usr 0.01 sys + 0.45 cusr 0.15 csys = 0.64 CPU)
Result: PASS
Tests : tt/024-access/on-abort.t
tt/024-access/on-abort.t .. ok
All tests successful.
Files=1, Tests=134, 11 wallclock secs ( 0.03 usr 0.00 sys + 0.24 cusr 0.07 csys = 0.34 CPU)
Result: PASS
Tests : tt/024-access/uthread-exec.t
tt/024-access/uthread-exec.t .. ok
All tests successful.
Files=1, Tests=40, 2 wallclock secs ( 0.01 usr 0.00 sys + 0.14 cusr 0.02 csys = 0.17 CPU)
Result: PASS
Tests : tt/024-access/uthread-exit.t
tt/024-access/uthread-exit.t .. ok
All tests successful.
Files=1, Tests=120, 7 wallclock secs ( 0.03 usr 0.00 sys + 0.28 cusr 0.07 csys = 0.38 CPU)
Result: PASS
Tests : tt/024-access/uthread-redirect.t
tt/024-access/uthread-redirect.t .. ok
All tests successful.
Files=1, Tests=16, 1 wallclock secs ( 0.02 usr 0.00 sys + 0.11 cusr 0.00 csys = 0.13 CPU)
Result: PASS
Tests : tt/024-access/uthread-spawn.t
tt/024-access/uthread-spawn.t .. ok
All tests successful.
Files=1, Tests=186, 6 wallclock secs ( 0.04 usr 0.00 sys + 0.41 cusr 0.10 csys = 0.55 CPU)
Result: PASS

@agentzh
Copy link
Member

agentzh commented Dec 22, 2017

@ZigzagAK This change is big. Are you actually seeing performance problems with the current list implementation in real world profiling?

@agentzh
Copy link
Member

agentzh commented Dec 22, 2017

@ZigzagAK The travis ci failures look like real problems. You need to apply the no-pool patch to the nginx core otherwise nginx's memory pools may hide many memory bugs for small payload like in our existing test suite (furthermore, we should use asan or valgrind to run the tests to make sure more memory issues can be caught, not just those lead to segfaults).

@ZigzagAK
Copy link
Author

ZigzagAK commented Dec 23, 2017

Yes, i have a performance problem in https://github.com/ZigzagAK/lua-resty-cache-redis/tree/0.2.

https://github.com/ZigzagAK/lua-resty-cache-redis/blob/0.2/lib/resty/cache/redis_with_upsync.lua

local function default_content_handler(self, resp, desc)
   ...
  local batch = self:create_batch(desc.batch_size, function(row)
      ....
  end)

  foreach(prepared, function(key, item)
    ....
    local ok, err = batch:add("set_unsafe", item, { overwrite = true, key = key })
    if not ok then
      self:err("upsync()", function()
        return "job=", desc.job_name, " ", err
      end)
    end
    ....
  end)

https://github.com/ZigzagAK/lua-resty-cache-redis/blob/0.2/lib/resty/cache/redis.lua

function batch_class:add(f, ...)
  ....

  tinsert(self.batch, { function(...)
    assert(fun(...))
  end, { ... } })

  if #self.batch < self.size then
    return true
  end

  return self:flush()
end

function batch_class:flush()
  local threads = {}

  local ok, err = pcall(foreachi, self.batch, function(opts)
    local fun, args = unpack(opts)
    local thr = thread_spawn(fun, self.cache, unpack(args))
    assert(thr, "failed to create corutine")
    tinsert(threads, thr)
  end)

  if not ok then
    self.cache:err("batch:flush()", function()
      return err
    end)
  end

  for i=1,#threads
  do
    local result = { thread_wait(threads[i]) }
    local ok = tremove(result, 1)
    local args = { ok, result, tremove(self.batch, 1)[2] }
    if self.check_fun then
      pcall(self.check_fun, args)
    else
      self.ret[1 + #self.ret] = args
    end
  end

  return ok, err
end

I have BIG document (130k rows) and for the perfomance many threads creaded (and destroyed after completed) to interract with Redis.

This document is loaded in ~300 seconds with 100% core utilization in ngx_http_lua_get_co_ctx because O(n).

With rbtree document with 130k rows loaded in 30 seconds and not more than 60% core utilization.

With list implementation every batch will be slower than previous because search in list O(n).

@ZigzagAK
Copy link
Author

flamegraph:
nginx.svg.gz

@ZigzagAK
Copy link
Author

Please, review my code. May be I anything forgot.

@spacewander
Copy link
Member

@ZigzagAK
I pull your branch to my machine, and found some tests are broken. For example, the TEST 9: simple user thread wait without I/O (in a user coroutine) in t/098-uthread-wait.t.

@ZigzagAK
Copy link
Author

I am apply the 'no pool patch' 1.13.6 to nginx. Coredump detected.
I will look for a mistake.

@ZigzagAK
Copy link
Author

ZigzagAK commented Dec 23, 2017

I successful run the t/098-uthread-wait.t.
Found error.
But it fails because i remove coctx from rbtree and 21 (TEST 21: waiting on a dead coroutine).

static int
ngx_http_lua_uthread_wait(lua_State *L)
...
sub_coctx = ngx_http_lua_get_co_ctx(sub_co, ctx);
if (sub_coctx == NULL) {
return luaL_error(L, "no co ctx found");
}

With list this dead corutine still present in list. If i turn off deleting coctx from rbtree test is OK.

But if coctx still present in rbtree no memory are freed until the request is not completed.
No memory feed and tree is growing.

@ZigzagAK ZigzagAK force-pushed the thread-rbtree branch 3 times, most recently from 06d8430 to bc01dda Compare December 23, 2017 14:26
@agentzh
Copy link
Member

agentzh commented Dec 25, 2017

@ZigzagAK This looks like an usual use case. Can I ask how many concurrent light threads do you create and run at a time in a single openresty server instance? And how many redis server instances are you running in the backend? I wonder if you could optimize your Lua logic to avoid creating too many light threads at the same time and just create a limited number of light threads and in each of these "worker" light thread, you can repeatedly access redis and consume more than one task. This might be much more efficient than even the rbtree light thread lookup approach. Creating too many concurrent light threads in a single request handler looks like a very unusual (if not very wrong) use case to me :)

@ZigzagAK
Copy link
Author

ZigzagAK commented Dec 25, 2017

@agentzh Total number of threads 130k / 50 ~ 2600 total. In the same time only 50 light threads are active. I know, that is not very good, but limited number of light (green) threads is more comlicated for client code ligic. If it is thre real OS threads I never be use this approach, but is is a green corutines and this approach is normal.
In the https://github.com/ZigzagAK/lua-resty-couchbase/blob/master/lib/resty/couchbase.lua i use the another approach with queue (function couchbase_session:batch(b, opts)) - limited number of threads with input queue. But in this case client code must pass all data in one call.
Yes, in lua-resty-cache-redis i can implement the same way, but i must implement in additional distributed lock, because many instances on nginx is active in the same time on different hosts. I must implement callback for the batch function to get next request. I must prolong this lock every (for example) 10 seconds. In this case i must create threads at start and stopping they manually. This is not a good idea.

Creating too many concurrent light threads in a single request handler looks like a very unusual (if not very wrong) use case to me :)

In the moment limited number of threads are active (50-200 for example). The problem with list is the 'dead' threads.

@ZigzagAK
Copy link
Author

ZigzagAK commented Dec 25, 2017

/* FIXME: we should use rbtree here to prevent O(n) lookup overhead */

;-)

@ZigzagAK
Copy link
Author

ZigzagAK commented Dec 25, 2017

This is log with load 134k records (every record for the first is readed from Redis and compared with new):
2017/12/25 11:50:04 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync begin (last modified: Mon, 25 Dec 2017 07:36:08 GMT)
2017/12/25 11:50:05 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync fetch=0.69400000572205 seconds
2017/12/25 11:50:05 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync fetch keys at 0.78200006484985 seconds
2017/12/25 11:50:09 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync prepare at 3.1299998760223 seconds
2017/12/25 11:50:10 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 10000 batched, 10000 stored, 0 failed
2017/12/25 11:50:11 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 20000 batched, 20000 stored, 0 failed
2017/12/25 11:50:12 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 30000 batched, 30000 stored, 0 failed
2017/12/25 11:50:14 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 40000 batched, 40000 stored, 0 failed
2017/12/25 11:50:15 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 50000 batched, 50000 stored, 0 failed
2017/12/25 11:50:17 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 60000 batched, 60000 stored, 0 failed
2017/12/25 11:50:18 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 70000 batched, 70000 stored, 0 failed
2017/12/25 11:50:20 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 80000 batched, 80000 stored, 0 failed
2017/12/25 11:50:23 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 90000 batched, 90000 stored, 0 failed
2017/12/25 11:50:24 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 100000 batched, 100000 stored, 0 failed
2017/12/25 11:50:27 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 110000 batched, 110000 stored, 0 failed
2017/12/25 11:50:30 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 120000 batched, 120000 stored, 0 failed
2017/12/25 11:50:33 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync 130000 batched, 130000 stored, 0 failed
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync store at 26.300000190735 seconds, count=134538, stored=134538, failed=0
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync cleanup at 0 seconds, count=0
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync update last modified: Mon, 25 Dec 2017 08:49:38 GMT
2017/12/25 11:50:35 [info] 27632#0: *214833 [lua] redis.lua:552: info(): [job] acc_roles_upsync() job=sync total=30.273999929428 seconds

With list implementation total time is ~300 seconds.

@agentzh
Copy link
Member

agentzh commented Dec 26, 2017

@ZigzagAK True, light threads are much cheaper than OS threads, but it is still not coming for free and still incurs some nontrivial overhead especially for large number of light thread creations and destructions. The thing is that I really do not want to complicate the existing code base just to optimize for unusual use cases. That's not really worth it. I think you would see big performance increase with limited number of recycled light threads. As I said above, should be even faster than the unlimited case with your rbtree optimization.

@agentzh
Copy link
Member

agentzh commented Dec 26, 2017

@ZigzagAK BTW, you can use the ngx.semaphore API to feed your worker light threads with new jobs very efficiently.

@ZigzagAK
Copy link
Author

@agentzh Thanks for the idea with semaphore. In this case i have reimplemented alghoritm with queue and semaphore. It is faster in two times. Now sync is 15 seconds.

@ZigzagAK
Copy link
Author

What about this pr? You don't want to replace list with rbtree?

@agentzh
Copy link
Member

agentzh commented Dec 26, 2017

@ZigzagAK Good to know that. We can leave it as is for people who are interested. I prefer simplicity here and I'm not still convinced to migrate to the more complicated rbtree approach yet for this part.

@Unigoge
Copy link

Unigoge commented Jun 19, 2018

Could searching a list of 'dead' threads (coroutines?) be somehow related to a problem that I've described over here - https://groups.google.com/d/msg/openresty-en/3BjMNEhVvN0/EEGAy4mEBgAJ ???

Modifying lua_resty_http code to implement simplified body_reades (no coroutines for requests which are sent using simple request_uri() calls) has improved performance by 10x at least....

@mergify
Copy link

mergify bot commented Jun 26, 2020

This pull request is now in conflict :(

@agentzh
Copy link
Member

agentzh commented Oct 8, 2020

Closing this one since we have a better solution in #1800 which is superior in performance (O(1) operation now).

@agentzh agentzh closed this Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants