You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On 1.27.1.1 (I think this can be reproduced on the historical version as well), the following error occurs when sending a streaming response using ngx.print and ngx.flush in the access phase:
2025/04/21 10:16:36 [error] 2735#2735: *1 lua entry thread aborted: runtime error: access_by_lua(nginx.conf:18):3: API disabled in the context of body_filter_by_lua*
stack traceback:
coroutine 0:
[C]: in function 'print'
access_by_lua(nginx.conf:18):3: in main chunk, client: 127.0.0.1, server: , request: "GET / HTTP/1.1", host: "127.0.0.1:9080"
It will spit out some response, after which the connection is closed, and looking at error.log, you can find the error mentioned above.
runtime error: access_by_lua(nginx.conf:18):3: API disabled in the context of body_filter_by_lua
Debug
I use GDB for debugging.
This error is triggered by the yieldable phase checking code in ngx.print or ngx.flush, so the error is due to the phase checking code incorrectly assuming that it is in the body_filter phase instead of the access where it should be.
It uses ctx->context on the lua module ctx to make the determination.
I see that there is another branch regarding the input in, where the state is not restored when in is NULL. So I'm setting a breakpoint here to check why the input value is NULL or not and why this is the case.
I found out that in some specific cases it does have in as NULL calls, so the status can't be restored.
This is triggered by ngx_http_lua_wev_handler and causes an output_filter (body_filter) call with an in of NULL to be made because ctx->busy_bufs is not empty in ngx_http_lua_flush_pending_output, which will result in an incorrectly recovered state in ngx_http_lua_body_filter.
Since the call was initiated by ngx_http_lua_wev_handler, I suspected that this condition would occur when sending a large responses with a slow downstream resulting in data accumulating in the kernel TCP buffer or inside NGINX, so I constructed the above mentioned nginx.conf to verify this, and outputting some large data chunks during the access phase does indeed reproduce the issue.
Fix
As a fix, I think that even with the ngx_http_lua_body_filter call triggered by the fd's epoll writable event, the context should be restored rather than retaining a broken context.
Current behavior
On 1.27.1.1 (I think this can be reproduced on the historical version as well), the following error occurs when sending a streaming response using
ngx.print
andngx.flush
in the access phase:Expected behavior
There should be no error.
Steps to reproduce
127.0.0.1:9080
It will spit out some response, after which the connection is closed, and looking at error.log, you can find the error mentioned above.
Debug
I use GDB for debugging.
This error is triggered by the yieldable phase checking code in
ngx.print
orngx.flush
, so the error is due to the phase checking code incorrectly assuming that it is in thebody_filter
phase instead of the access where it should be.It uses
ctx->context
on the lua module ctx to make the determination.Checking the code in
body_filter
(https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_bodyfilterby.c#L327-L328), it does set it tobody_filter
before the Lua code enters that phase, and reverts to entering the phase it is in after the code logic is executed (https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_bodyfilterby.c#L339). Since NGINX is essentially single-threaded, this works and the state is expected to be restored correctly.I see that there is another branch regarding the input
in
, where the state is not restored whenin
is NULL. So I'm setting a breakpoint here to check why the input value is NULL or not and why this is the case.I found out that in some specific cases it does have in as NULL calls, so the status can't be restored.
Checking the call stack:
This is triggered by
ngx_http_lua_wev_handler
and causes an output_filter (body_filter) call with an in of NULL to be made becausectx->busy_bufs
is not empty inngx_http_lua_flush_pending_output
, which will result in an incorrectly recovered state inngx_http_lua_body_filter
.Since the call was initiated by
ngx_http_lua_wev_handler
, I suspected that this condition would occur when sending a large responses with a slow downstream resulting in data accumulating in the kernel TCP buffer or inside NGINX, so I constructed the above mentioned nginx.conf to verify this, and outputting some large data chunks during the access phase does indeed reproduce the issue.Fix
As a fix, I think that even with the
ngx_http_lua_body_filter
call triggered by the fd's epoll writable event, the context should be restored rather than retaining a broken context.i.e. add
ctx->context = old_context
at https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_bodyfilterby.c#L370-L372I'll launch a PR to fix it. #2411
The text was updated successfully, but these errors were encountered: