Signals
Kinds of signals
Sender→receiver | Signal | Description | Replacement | Patch |
---|---|---|---|---|
administrator→postmaster | SIGTERM | smart shutdown | keep, and add control socket? | |
administrator→postmaster | SIGQUIT | immediate shutdown | keep, and add control socket? | |
administrator→postmaster | SIGINT | fast shutdown | keep, and add control socket? | |
administrator→postmaster | SIGHUP | reload | keep, and add control socket? | |
postmaster→all | SIGQUIT | immediate shutdown, quickdie(), _exit() | keep | |
postmaster→all | SIGTERM | exit at next CFI() | SendInterrupt(INTERRUPT_DIE)? | |
postmaster→avlauncher | SIGUSR2 | start autovacuum | ? | |
postmaster→checkpointer | SIGUSR2 | shutdown | ? | |
postmaster→walsenders | SIGUSR2 | finish and shutdown | ? | |
postmaster→pgarch | SIGUSR2 | shutdown | ? | |
postmaster→startup | SIGUSR2 | promote | ? | |
postmaster→client backend | SIGINT | cancel query | SendInterrupt(INTERRUPT_CANCEL)? | |
postmaster→all | SIGKILL | timed out while waiting for shutdown | keep | |
postmaster→any | SIGUSR1 | bgworker state change notification | SetLatch() | SendInterrupt() proposal |
postmaster→all | SIGHUP | reload config | SendInterrupt(INTERRUPT_RELOAD_CONFIG)? | |
kernel→postmaster | SIGCHLD | child state change notification | keep, but refactor? | |
kernel→backend | SIGINFO | postmaster exited | keep for now (but see below for MT redesign ideas) | |
kernel→backend | SIGALRM | itimer | keep for now, but change handlers to do RaiseInterrupt(INTERRUPT_XXX) | SendInterrupt() proposal |
backend→backend | SIGURG | latch wakeup | keep for now, but later replace with ? | |
backend→backend | SIGUSR1 | SendProcSignal(pid, PROCSIG_XXX) | SendInterrupt(INTERRUPT_XXX, procno) | SendInterrupt() proposal |
backend→postmaster | SIGUSR1 | SendPostmasterSignal(PMSIGNAL_XXX) | ? |
Thoughts on administrator→postmaster
The signals that pg_ctl and other control programs sends are mostly Unix conventions and it seems OK to keep them. But perhaps we should also have a control pipe?
On Windows, we already have a control socket for pretending to send SIGHUP etc to the postmaster. Maybe we stop pretending Windows has Unix signals, and support a general control pipe? That way we could also implement richer communication, like "what state are you in? what is recovery progress?".
Thoughts on backend→postmaster
Idea #1: We could teach postmaster to accept a shared latch. Some say it can't because that means it is exposed to shared memory corruption risks, but in fact it already has some exposure though PMSIGNAL_ vector. Perhaps it could have a "robust" latch mode, that always takes the slow patch (system call), so that it is no less robust than the current PMSIGNAL_ mechanism. Specifically, it is not possible for one backend to trash memory in such a way that prevents another backend from waking the postmaster.
Idea #2: We could use a pipe/socketpair to talk to the postmaster and send it richer messages.
In general, a lot of messages to the postmaster would probably go away in a multithreaded model anyway, because it would no longer be in charge of starting new backends.
Thoughts on postmaster→backend
To replace the current SIGUSR1/SIGUSR2 signals, in an intermediate phase, we could decide that it is OK for the postmaster to use SetLatch(). If we are worried about shared memory corruption, we could decide that it has to use a "robust" SetLatch() in the multi-process model, meaning it doesn't check shmem, it just always uses the system call slow path.
In the multi-threaded future, some of those probably go away and are replaced with communication between backends (you don't need to ask the postmaster to start a worker). Some communication is still needed. Should there be a single socketpair connecting the postmaster to the backend container process, through which it can coordinate eg promotion, shutdown? Perhaps that implies a special monitor thread inside the backend container process that would forward such communications, ie it receives eg "PROMOTE\n" through a pipe and the monitor thread generates SendInterrupt() and/or raw SetLatch() calls as required?
Thoughts on latch wakeups with threads
Idea #1: We could do pthread_kill(pthread_t, SIGURG) on the sending side. Then for WAIT_USE_POLL give each backend its own self-pipe, for WAIT_USE_EPOLL it might already work with one shared signalfd or maybe they need one each (?), and for WAIT_USE_KQUEUE no change is needed. (And Windows just works, native events.)
Idea #2: pthread_keill(), then on the receiving side, WAIT_USE_POLL could switch to ppoll() (finally standardised in POSIX 2024, atomic signal masking, Solaris has it and it is currently the only user of WAIT_USE_POLL?), for WAIT_USE_EPOLL we could witch to epoll_pwait2() (get rid or the signal pipe and use atomic signal masking). Again no change for WAIT_USE_KQUEUE. (And Windows just works, native events.)
Idea #3: We could give every backend a pipe, and write a byte to it. We could have done that already, but in a multi-process model it might create a MaxBackends^2 explosion of duplicated kernel descriptors. Should be OK for single-process multi-thread mode, and on Linux it replaces the current per-backend signalfd.
Idea #4: We could give every backend a pipe as a fallback, but use better options when available: With kqueue you can send a custom wakeup event directly to someone else's kqueue from inside the same process. For Linux we could replace the current signalfd that is in the epoll with an eventfd, which anyone can write into. For Linux we might eventually want to switch to a per-backend uring, in which case any thread could post a custom wakeup to any other backend's uring directly.
Thoughts on timers and SIGALRM
Each backend currently has its own separate itimer to manage various timeouts. In an intermediate phase that could continue, but the timer handlers could just call RaiseInterrupt(INTERRUPT_xxx), as shown the SendInterrupt() patch. (This remaining manipulation of the interrupt bitmap from inside a signal handler is the reason why the SendInterrupt() patch relies on --disable-atomics being dropped, because it's not safe to use lock-based emulation from inside a signal handler.)
In a multi-threaded future, I think we'd probably need to invent our own timer monitor thread, that would maintain a schedule table and do SendInterrupt(INTERRUPT_XXX, target_procno) at the right times as requested, or something like that? Then SIGALRM would not be needed, but each backend could still configure its own timeout schedule separately. Right?
Thoughts on Windows cleanup
Instead of giving every backend a named pipe to send fake signals too, we could delete all that stuff and keep just latches, and give both Unix and Windows master pipe/control socket for top level? Instead of generating a fake SIGCHLD, we could maybe add some way to consume WL_PROCESS_EXIT events to WaitEventSet, to abstract over Unix and Windows?
Thoughts on traces of reliance on EINTR in socket calls
There are a couple of leftover bits that still use blocking socket I/O. One I know of: RADIUS authentication. That's racy (one of the main problems latch multiplexing fixed) and we have to get rid of it. Then we can rip out most of the horrible socket wrapper code for Windows which is known to be buggy.