Skip to content

Add timeout support to flushSendBuffer()#575

Open
awelzel wants to merge 1 commit into
machinezone:masterfrom
zeek:topic/awelzel/flush-send-buffer-timeout
Open

Add timeout support to flushSendBuffer()#575
awelzel wants to merge 1 commit into
machinezone:masterfrom
zeek:topic/awelzel/flush-send-buffer-timeout

Conversation

@awelzel
Copy link
Copy Markdown
Contributor

@awelzel awelzel commented May 11, 2026

When sending a lot of data to a client that does not reliably drain its own socket, it can happen that 1) the thread on the server doing the sending blocks in flushSendBuffer(). Additionally, the "receiver" thread of WebSocket::run() on the server blocks in WebSocketTransport::poll() -> sendHeartBeat() -> flushSendBuffer(). Because now WebSocket::run() is blocked, it never reads data form the client, this can result in a deadlock scenario:

server: blocks sending to client in flushSendBuffer()
client: also blocked while sending to server (WebSocket::run() never receives)

The clients could unblock the situation if it receives from its socket. If the client doesn't do so, this change allows the server to close the socket after a configurable timeout that is honored during flushSendBuffer(). The callback will see "Send timeout" as close reason and an abnormal close code.


I've ran into this a few times during development/testing and forcefully disconnecting the client after 5 seconds or so and logging the reason in the server logs is a better experience than observing a "freeze" and looking with gdb what the thread stacks look like :-)

Thread 8 (Thread 0x742faf7fe6c0 (LWP 44856) "zk/ws-reply-thr"):
#0  0x0000742fe771b4fd in __GI___poll (fds=fds@entry=0x742faf7f9e40, nfds=nfds@entry=2, timeout=timeout@entry=10) at ..<...>/poll.c:29
#1  0x000057f624993f88 in poll (__timeout=10, __nfds=2, __fds=0x742faf7f9e40) at <...>/poll2.h:39
#2  ix::poll (fds=fds@entry=0x742faf7f9e40, nfds=2, timeout=timeout@entry=10, event=event@entry=0x742faf7f9e38) at <...>/IXNetSystem.cpp:288
#3  0x000057f62498f3e9 in ix::Socket::poll (readyToRead=readyToRead@entry=false, timeoutMs=timeoutMs@entry=10, sockfd=81, selectInterrupt=...) at <...>/IXSocket.cpp:97
#4  0x000057f62498f5ba in ix::Socket::isReadyToWrite (this=<optimized out>, timeoutMs=timeoutMs@entry=10) at <...>/IXSocket.cpp:196
#5  0x000057f6249a090f in ix::WebSocketTransport::flushSendBuffer (this=this@entry=0x742f80000b80) at <...>/unique_ptr.h:199
#6  0x000057f6249a12b6 in ix::WebSocketTransport::sendData (this=this@entry=0x742f80000b80, type=type@entry=ix::WebSocketTransport::wsheader_type::TEXT_FRAME, message=..., compress=<optimized out>, onProgressCallback=...) at <...>/IXWebSocketTransport.cpp:938
#7  0x000057f6249a156f in ix::WebSocketTransport::sendText (this=this@entry=0x742f80000b80, message=..., onProgressCallback=...) at <...>/IXWebSocketTransport.cpp:1062
#8  0x000057f624996ddb in ix::WebSocket::sendMessage (this=0x742f80000b80, message=..., sendMessageKind=sendMessageKind@entry=ix::SendMessageKind::Text, onProgressCallback=...) at <...>/IXWebSocket.cpp:557
#9  0x000057f624996fc9 in ix::WebSocket::sendUtf8Text (this=<optimized out>, text=..., onProgressCallback=...) at <...>/IXWebSocket.cpp:511
#10 0x000057f6240e89d8 in zeek::cluster::websocket::detail::ixwebsocket::IxWebSocketClient::SendText (this=0x742f80009160, sv=...) at <...>/WebSocket-IXWebSocket.cc:47           
Thread 5 (Thread 0x742fadffb6c0 (LWP 45682) "Srv:ws:0"):
#0  0x0000742fe771b4fd in __GI___poll (fds=fds@entry=0x742fadff6900, nfds=nfds@entry=2, timeout=timeout@entry=10) at ..<...>/poll.c:29
#1  0x000057f624993f88 in poll (__timeout=10, __nfds=2, __fds=0x742faf7f9e40) at <...>/poll2.h:39
#2  ix::poll (fds=fds@entry=0x742fadff6900, nfds=2, timeout=timeout@entry=10, event=event@entry=0x742fadff68f8) at <...>/IXNetSystem.cpp:288
#3  0x000057f62498f3e9 in ix::Socket::poll (readyToRead=readyToRead@entry=false, timeoutMs=timeoutMs@entry=10, sockfd=81, selectInterrupt=...) at <...>/IXSocket.cpp:97
#4  0x000057f62498f5ba in ix::Socket::isReadyToWrite (this=<optimized out>, timeoutMs=timeoutMs@entry=10) at <...>/IXSocket.cpp:196
#5  0x000057f6249a090f in ix::WebSocketTransport::flushSendBuffer (this=this@entry=0x742f80000b80) at <...>/unique_ptr.h:199
#6  0x000057f6249a12b6 in ix::WebSocketTransport::sendData (this=this@entry=0x742f80000b80, type=type@entry=ix::WebSocketTransport::wsheader_type::PING, message=..., compress=compress@entry=false, onProgressCallback=...) at <...>/IXWebSocketTransport.cpp:938
#7  0x000057f6249a1435 in ix::WebSocketTransport::sendPing (this=this@entry=0x742f80000b80, message=...) at <...>/IXWebSocketTransport.cpp:1039
#8  0x000057f6249a1b4b in ix::WebSocketTransport::sendHeartBeat (this=this@entry=0x742f80000b80, pingMessage=<optimized out>) at <...>/IXWebSocketTransport.cpp:280
#9  0x000057f6249a2324 in ix::WebSocketTransport::poll (this=this@entry=0x742f80000b80) at <...>/IXWebSocketTransport.cpp:330
#10 0x000057f62499918c in ix::WebSocket::run (this=this@entry=0x742f80000b80) at <...>/IXWebSocket.cpp:398

When sending a lot of data to a client that does not reliably drain its
own socket,  it can happen that 1) the thread on the server doing the
sending blocks in flushSendBuffer(). Additionally, the "receiver" thread
of WebSocket::run() on the server blocks in WebSocketTransport::poll() ->
sendHeartBeat() -> flushSendBuffer(). Because now WebSocket::run() is blocked,
it never reads data form the client, this can result in a deadlock scenario:

    server: blocks sending to client in flushSendBuffer()
    client: also blocked while sending to server (WebSocket::run() never receives)

The clients could unblock the situation if it receives from its socket.
If the client doesn't do so, this change allows the server to close the
socket after a configurable timeout that is honored during
flushSendBuffer(). The callback will see "Send timeout" as close reason
and an abnormal close code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant