-sNODERAWSOCKETS DNS support#27162
Open
guybedford wants to merge 7 commits into
Open
Conversation
Adds a new NODERAWSOCKETS setting that backs the POSIX sockets API directly with Node.js's node:net and node:dgram, giving real, non-blocking TCP and UDP sockets without WebSockets, an external proxy process, or pthreads. This is the sockets counterpart to NODERAWFS: where NODERAWFS gives direct access to the host filesystem, this gives direct access to host sockets. Unlike PROXY_POSIX_SOCKETS this is single-threaded and event-driven: socket readiness is delivered through the same emscripten_set_socket_*_callback hooks the default WebSocket backend uses, so it drops into existing readiness reactors unchanged. Under -pthread the socket syscalls are proxied to the main thread, so the backend always runs on node's event loop and a SharedArrayBuffer heap is safe. Supported: * TCP clients: connect, send, recv, shutdown and close, with non-blocking semantics and backpressure (send reports EAGAIN rather than buffering unboundedly). * TCP servers: bind, listen, accept, getsockname/getpeername. * UDP: bind, connect, sendto/recvfrom, with connected-peer filtering. * IPv4 and IPv6 (AF_INET6): TCP and UDP over v6, including IPV6_V6ONLY. * get/setsockopt: SO_ERROR, SO_KEEPALIVE and TCP_KEEPIDLE, TCP_NODELAY, SO_RCVBUF/SO_SNDBUF, SO_BROADCAST, IP_TTL, SO_REUSEPORT and IPV6_V6ONLY. Options are mirrored to a cache (the getsockopt source of truth) and projected onto the live socket; we only report options we can actually honor (e.g. SO_REUSEADDR reads back as 1 since libuv forces it on, and IPV6_V6ONLY returns EINVAL if changed after bind). Binding is eager and synchronous, so a conflict surfaces as EADDRINUSE at bind() and getsockname() reports the kernel-assigned ephemeral port immediately - there is no deferred-bind or lazy-handle promotion. A bound socket is a role-neutral handle, adopted as-is by listen() (server.listen) or connect() (net.Socket), and released by close() only if it was never adopted. Bind-time options (ipv6Only, reusePort) are passed to the handle at construction. The bind primitive is selected once per capability: * the public, synchronous net.BoundHandle (and dgram bindSync/connectSync) when the Node.js runtime provides them; and * the private tcp_wrap/udp_wrap bindings as a fallback on Node.js versions that do not (bind6/send6 for IPv6). Details: * new node backend in src/lib/libsockfs_node.js, pulled in only under -sNODERAWSOCKETS, implementing the sock_ops contract * __syscall_setsockopt and __syscall_shutdown now live in JS, routing to the backend under NODERAWSOCKETS (else reporting the option/feature as unsupported), avoiding a libstubs variation * tests under test/sockets exercise TCP echo, server accept/echo (including listen-without-bind autobind), client source-port bind plus synchronous EADDRINUSE, client semantics (EISCONN, half-close, EPIPE), backpressure, connection refused, UDP echo/connect, and IPv6 TCP/UDP over ::1 (including IPV6_V6ONLY before/after bind); all build and run natively against the host stack and run under node, including PROXY_TO_PTHREAD variants
Adds a new NODERAWSOCKETS setting that backs the POSIX sockets API directly with Node.js's node:net and node:dgram, giving real, non-blocking TCP and UDP sockets without WebSockets, an external proxy process, or pthreads. This is the sockets counterpart to NODERAWFS: where NODERAWFS gives direct access to the host filesystem, this gives direct access to host sockets. Unlike PROXY_POSIX_SOCKETS this is single-threaded and event-driven: socket readiness is delivered through the same emscripten_set_socket_*_callback hooks the default WebSocket backend uses, so it drops into existing readiness reactors unchanged. Under -pthread the socket syscalls are proxied to the main thread, so the backend always runs on node's event loop and a SharedArrayBuffer heap is safe. Supported: * TCP clients: connect, send, recv, shutdown and close, with non-blocking semantics and backpressure (send reports EAGAIN rather than buffering unboundedly). * TCP servers: bind, listen, accept, getsockname/getpeername. * UDP: bind, connect, sendto/recvfrom, with connected-peer filtering. * IPv4 and IPv6 (AF_INET6): TCP and UDP over v6, including IPV6_V6ONLY. * get/setsockopt: SO_ERROR, SO_KEEPALIVE and TCP_KEEPIDLE, TCP_NODELAY, SO_RCVBUF/SO_SNDBUF, SO_BROADCAST, IP_TTL, SO_REUSEPORT and IPV6_V6ONLY. Options are mirrored to a cache (the getsockopt source of truth) and projected onto the live socket; we only report options we can actually honor (e.g. SO_REUSEADDR reads back as 1 since libuv forces it on, and IPV6_V6ONLY returns EINVAL if changed after bind). Binding is eager and synchronous, so a conflict surfaces as EADDRINUSE at bind() and getsockname() reports the kernel-assigned ephemeral port immediately - there is no deferred-bind or lazy-handle promotion. A bound socket is a role-neutral handle, adopted as-is by listen() (server.listen) or connect() (net.Socket), and released by close() only if it was never adopted. Bind-time options (ipv6Only, reusePort) are passed to the handle at construction. The bind primitive is selected once per capability: * the public, synchronous net.BoundHandle (and dgram bindSync/connectSync) when the Node.js runtime provides them; and * the private tcp_wrap/udp_wrap bindings as a fallback on Node.js versions that do not (bind6/send6 for IPv6). Details: * new node backend in src/lib/libsockfs_node.js, pulled in only under -sNODERAWSOCKETS, implementing the sock_ops contract * __syscall_setsockopt and __syscall_shutdown now live in JS, routing to the backend under NODERAWSOCKETS (else reporting the option/feature as unsupported), avoiding a libstubs variation * tests under test/sockets exercise TCP echo, server accept/echo (including listen-without-bind autobind), client source-port bind plus synchronous EADDRINUSE, client semantics (EISCONN, half-close, EPIPE), backpressure, connection refused, UDP echo/connect, and IPv6 TCP/UDP over ::1 (including IPV6_V6ONLY before/after bind); all build and run natively against the host stack and run under node, including PROXY_TO_PTHREAD variants
Under -sNODERAWSOCKETS getaddrinfo() previously fabricated fake addresses via DNS.lookup_name. This adds real resolution backed by node:dns, plus a general asynchronous getaddrinfo so clients can resolve names without blocking. getaddrinfo() now resolves numeric addresses and /etc/hosts entries (read fresh through emscripten's FS) synchronously, and returns a full addrinfo linked list (one node per resolved address) rather than a single entry. For a real hostname: - without JSPI it returns EAI_AGAIN (no synchronous DNS); resolve it via the async API below and read the result - under JSPI it suspends the wasm stack on the real node:dns lookup and returns the resolved addresses directly (gated on ASYNCIFY == 2; non-JSPI unchanged) The async API (available in all builds, not just -sNODERAWSOCKETS): - emscripten_dns_lookup_async(node, service, hint) takes the same inputs as getaddrinfo() and returns a pollable fd that becomes readable - and delivers the emscripten_set_socket_message_callback - when resolution completes - emscripten_dns_lookup_result(fd, struct addrinfo **res) reads the outcome: 0 on success, writing the addrinfo list head to *res (freed with freeaddrinfo, as for getaddrinfo), or an EAI_* code on failure - with -sNODERAWSOCKETS a hostname is resolved via node:dns; otherwise (and for numeric/ /etc/hosts names) resolution is synchronous and the fd is simply readable on the next turn, so integration code need not branch on the backend Memory is minted only when the caller takes the result, so closing the fd without reading leaks nothing; the whole addrinfo chain is owned by the caller and freed uniformly by freeaddrinfo. Internally getaddrinfo is split into reusable stages - parse (getAddrInfo), resolve (resolveAddrInfo, node:dns), and mint (writeAddrInfoList) - threading a single descriptor through, which both the sync and async entry points share. - freeaddrinfo now walks and frees the whole ai_next chain (previously only the head node + its ai_addr) - adds EAI_AGAIN to the generated struct info Tested with test_dns_async (static /etc/hosts, multi-address list, async localhost), test_dns_callback (completion via the socket message callback), test_dns_async_net (real hostname over the network), test_dns_async_default (the async API without -sNODERAWSOCKETS), and test_dns_jspi (JSPI blocking resolution), including -pthread/PROXY_TO_PTHREAD variants.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a follow-on to #27080 and is based to that PR. See just the last commit of this PR for the exact diff.
This adds support for using Node.js's DNS resolver through
getaddrinfo()and an async counterpart, supporting both sync and JSPI modes under-sNODERAWSOCKETS.In order to support non-JSPI builds we introduce a new async DNS syscall,
emscripten_dns_lookup_async(node, service, hints)as a direct async conversion ofgetaddrinfo, which can be used as a system integration point for async DNS resoultion in Emscripten. It returns a pollable socket fd that can either be polled for completion or a listener can be attached via the existing socket callbackemscripten_set_socket_message_callback. On completion,emscripten_dns_lookup_result(fd, **addrinfo)can be used to read the result. The directgetaddrinfoform shares a cache with the async form so that the sync resolution can be "pre-warmed" by async resolution in environments that aren't able to upgrade to the async syscall. If no sync DNS is available,EAI_AGAINis returned.For example, when integrating this API with a runtime like Rust Tokio, it can then be possible to support full client connect lifecycles without needing JSPI by specializing the emscripten target to this async DNS API, while still also supporting JSPI builds.
Under the JSPI mode, the above continue to work, but
getaddrinfocan also do full DNS resolution asynchronously per standard semantics, and does not use an internal cache at all.DNS lookups first check the
/etc/hostsfile, then the internal cache (for non-JSPI builds), before doing a full async call via Node.js's DNS module.