Track and protect WebClient outbound requests, fix private-IP SSRF regression#312
Track and protect WebClient outbound requests, fix private-IP SSRF regression#312Mishenevd wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
7 Open source vulnerabilities detected - high severity
Aikido detected 7 vulnerabilities across 1 package, it includes 2 high, 2 medium and 3 low vulnerabilities.
Details
Remediation Aikido suggests bumping the vulnerable packages to a safe version.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| return; | ||
| } | ||
|
|
||
| try { |
There was a problem hiding this comment.
SocketChannelWrapper dynamically loads a JAR and reflectively invokes reportConnect, hiding execution via URLClassLoader/reflection and selective exception handling; this runtime indirection obscures intent and should be clarified or documented.
Details
✨ AI Reasoning
The new SocketChannelWrapper uses runtime dynamic class loading and reflection to locate and invoke DNSRecordCollector.reportConnect from an external JAR path supplied via a system property. It constructs a URLClassLoader from a string path, loads a class by name, searches methods by string name, and invokes one reflectively. It selectively rethrows InvocationTargetException causes only when the cause's toString() starts with the project's vulnerability package prefix, and otherwise swallows or prints exceptions. These behaviors introduce non-transparent control flow and runtime execution paths that can hide behavior from static review. Such patterns are commonly used to obscure intent and can be indicative of first-party malware or backdoor techniques.
🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| } | ||
| // Ignore non-aikido throwables. | ||
| } catch(Throwable e) { | ||
| System.out.println("AIKIDO: " + e.getMessage()); |
There was a problem hiding this comment.
System.out.println used for error reporting in SocketChannelWrapper. Use the project's Logger or remove this debug print.
| System.out.println("AIKIDO: " + e.getMessage()); |
Details
✨ AI Reasoning
A new ad-hoc console print was added inside an exception handler that swallows errors. Using System.out.println in library/agent code is a debug artifact: it produces unstructured output and may leak internal errors to stdout. This harms maintainability and can confuse production logs. Replacing it with the project's Logger or removing it will restore production-grade logging practices.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| try { | ||
| URL[] urls = { new URL(jarFilePath) }; | ||
| classLoader = new URLClassLoader(urls); | ||
| } catch (MalformedURLException ignored) {} |
There was a problem hiding this comment.
Empty catch swallowing MalformedURLException; at minimum log the exception or document why it can be safely ignored to avoid hiding classloader/URL setup failures.
Show fix
| } catch (MalformedURLException ignored) {} | |
| } catch (MalformedURLException e) { | |
| // Log malformed JAR path to aid debugging of classloader setup failures | |
| System.out.println("AIKIDO: Failed to load agent API JAR from path: " + jarFilePath + " - " + e.getMessage()); | |
| } |
Details
✨ AI Reasoning
The new wrapper code performs URL/JAR classloader setup inside a try block and then silently swallows MalformedURLException with an empty catch body. Silently ignoring classloading/URL errors can hide configuration or instrumentation failures and makes debugging harder; the catch should at least log the error or document why it is intentionally ignored. The surrounding try also involves loading an external JAR and reflective invocation, which are sensitive operations where failures should not be silently dropped.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| } | ||
| String hostname = inetSocketAddress.getHostString(); | ||
|
|
||
| String jarFilePath = System.getProperty("AIK_agent_api_jar"); |
There was a problem hiding this comment.
Creating a new URLClassLoader and doing reflective class/method lookups on every SocketChannel.connect() is expensive. Cache the loader/reflection results or use a preloaded bridge to avoid per-connection classloader and reflection overhead.
Details
✨ AI Reasoning
This code attempts to report the resolved address of a SocketChannel connection by dynamically loading the agent_api JAR and invoking DNSRecordCollector.reportConnect via reflection. It performs the following expensive operations on each connect: reads a system property, constructs a URL and new URLClassLoader, loads a class by name, iterates methods to find reportConnect, invokes reflectively, and closes the class loader. Doing this on the hot path (connect) will cause significant allocation and classloading overhead that scales with connection frequency. Caching the URLClassLoader (or, better, obtaining a static bridge or a lightweight agent-to-api interface) would avoid repeated expensive work. The issue was introduced by the new SocketChannelWrapper in this PR which added this per-invocation classloader/reflection flow.
🔧 How do I fix it?
Move constant work outside loops. Use StringBuilder instead of string concatenation in loops. Cache compiled regex patterns. Use hash-based lookups instead of nested loops. Batch database operations instead of N+1 queries.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| process(hostname, new InetAddress[]{resolvedAddress}, PendingHostnamesStore.getPorts(hostname), SOCKET_CHANNEL_OPERATION_NAME); | ||
| } | ||
|
|
||
| private static void process(String hostname, InetAddress[] inetAddresses, Set<Integer> ports, String operationName) { |
There was a problem hiding this comment.
Method 'process' is too vague; rename to reflect its responsibilities (e.g., handleDnsLookupRecordingAndSsrfChecks) and add Javadoc describing side effects and when it throws.
Details
✨ AI Reasoning
A newly added private static method named 'process' was introduced to handle statistics registration, hostname/port recording, outbound-block checks, and SSRF detection for different invocation paths. The implementation mixes multiple responsibilities and performs important side effects (mutating stores, throwing blocking exceptions). The name 'process' doesn't convey the multi-step purpose (recording, blocking checks, SSRF evaluation). Although file comments give context, the function name is still vague and makes the code harder to understand at call sites and during maintenance.
🔧 How do I fix it?
Use descriptive verb-noun function names, add docstrings explaining the function's purpose, or provide meaningful return type hints.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
cea92b6 to
2230760
Compare
| process(hostname, new InetAddress[]{resolvedAddress}, PendingHostnamesStore.getPorts(hostname), SOCKET_CHANNEL_OPERATION_NAME); | ||
| } | ||
|
|
||
| private static void process(String hostname, InetAddress[] inetAddresses, Set<Integer> ports, String operationName) { |
There was a problem hiding this comment.
DNSRecordCollector.process() is large and mixes multiple unrelated responsibilities (stats, pending-port semantics, outbound blocking, SSRF detection and stored-SSRF checks); consider splitting into smaller focused methods.
Details
✨ AI Reasoning
1. What is this code trying to accomplish? It converts resolved InetAddress objects and pending ports into outbound-hit statistics and runs outbound-blocking and SSRF detectors. 2. Does this harm maintainability? The method combines several distinct responsibilities (stats, store mutation vs peeking semantics, blocking policy, SSRF detection, stored-SSRF detection and exception handling) making it cognitively heavy. 3. Is it appropriate? The method grew in this change to handle both InetAddress.getAllByName and SocketChannel.connect callers plus different port-consumption semantics - this increases branching and conceptual load. 4. Would fixing it meaningfully improve codebase? Yes — splitting concerns (port handling vs SSRF checks vs storage/stats) would make logic easier to reason about and reduce risk of regression. 5. Common pattern? Not particularly; existing code previously separated some behavior but this change merged additional responsibilities into one process() method. 6. Fixability within PR? Yes: extraction of smaller helper methods is feasible here.
🔧 How do I fix it?
Break down long functions into smaller helper functions. Aim for functions under 60 lines with fewer than 10 local variables.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
2230760 to
1eab8be
Compare
| } | ||
| String hostname = inetSocketAddress.getHostString(); | ||
|
|
||
| String jarFilePath = System.getProperty("AIK_agent_api_jar"); |
There was a problem hiding this comment.
Loading a JAR path from System property then runtime-classloading and reflective method invocation hides executed code; avoid dynamic loading/reflection or document and validate the JAR path and expected class/method.
Details
✨ AI Reasoning
The new wrapper loads a JAR path read from a system property, constructs a URLClassLoader at runtime, loads a class by name, iterates over its methods and invokes one by matching its name, all via reflection. This sequence (system-property-controlled JAR path + runtime classloading + reflective invocation) obscures the control flow and makes it hard to audit what code will run, which fits obfuscation/hiding patterns. It also depends on external input (System property) to locate executable code, increasing the risk that different code executes in different environments and hiding behavior from static analysis and code reviewers. These are deliberate runtime indirections introduced by this PR and therefore should be flagged.
🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| } catch (InvocationTargetException invocationTargetException) { | ||
| if(invocationTargetException.getCause().toString().startsWith("dev.aikido.agent_api.vulnerabilities")) { | ||
| throw invocationTargetException.getCause(); |
There was a problem hiding this comment.
Relying on exception.getCause().toString().startsWith(...) to decide to rethrow hides error semantics; use explicit exception type checks instead of string-matching the cause.
Show fix
| } catch (InvocationTargetException invocationTargetException) { | |
| if(invocationTargetException.getCause().toString().startsWith("dev.aikido.agent_api.vulnerabilities")) { | |
| throw invocationTargetException.getCause(); | |
| } catch (InvocationTargetException invocationTargetException) { | |
| Throwable cause = invocationTargetException.getCause(); | |
| if(cause != null && cause.getClass().getName().startsWith("dev.aikido.agent_api.vulnerabilities")) { | |
| throw cause; |
Details
✨ AI Reasoning
The added code catches InvocationTargetException and inspects invocationTargetException.getCause().toString().startsWith(...) to decide whether to throw the cause. Using string matching on the cause's toString() to classify exceptions is brittle and can be used to selectively propagate certain errors while swallowing others, masking failures or creating hidden control paths. This conditional propagation was introduced here, increasing opacity of error handling and potentially hiding malicious or unexpected behavior.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
1eab8be to
470324d
Compare
| for (Method method2: clazz.getMethods()) { | ||
| if(method2.getName().equals("reportConnect")) { | ||
| method2.invoke(null, hostname, resolvedAddress); | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Reflectively scanning/ invoking reportConnect on a runtime-loaded class obscures the actual operation performed; prefer a clearly referenced API boundary or documented explicit delegation.
Show fix
| for (Method method2: clazz.getMethods()) { | |
| if(method2.getName().equals("reportConnect")) { | |
| method2.invoke(null, hostname, resolvedAddress); | |
| break; | |
| } | |
| } | |
| Method reportConnectMethod = clazz.getMethod("reportConnect", String.class, InetAddress.class); | |
| reportConnectMethod.invoke(null, hostname, resolvedAddress); |
Details
✨ AI Reasoning
The change locates and invokes a method named reportConnect via reflection on a class loaded from the external classloader. Using reflection in this way (scanning methods by name and invoking them) obscures the actual call target and arguments from code readers and static tools, complicating understanding of what is executed when a socket connects. Reflection here is used to execute security-relevant logic (reporting/SSRF checks) and is performed conditionally at runtime, which can hide behavior and make auditing more difficult.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
…gression Follow-up to the reverted #308/#310. Customer flood was InetAddress.getAllByName() picking up Reactor Netty's own DNS-resolver bootstrap noise (0.0.0.0, ::, /etc/resolv.conf nameservers) as "new outbound connections" on port 0, and blocking them in lockdown mode. #310 fixed the flood with an early return in DNSRecordCollector.report() that also skipped the SSRF check below it - verified with a regression test that this let an attacker-supplied private-IP literal (e.g. a webhook field pointing straight at 169.254.169.254) through undetected. Investigating further found the actual root cause is bigger: Spring's WebClient was never instrumented at all, and Reactor Netty's default HTTP client bypasses InetAddress.getAllByName() entirely (it uses its own async DNS resolver). So even after wrapping WebClient to register pending ports, DNSRecordCollector was never invoked for real WebClient targets - confirmed empirically via trace logs against a live running app, with distinct markers proving InetAddressWrapper never fires for WebClient/Reactor Netty traffic in this configuration. WebClient had zero outbound-domain visibility and zero SSRF protection, independent of the original bug. - DNSRecordCollector: narrow the private-IP-literal gate to only skip recording + outbound blocking when there's no pending port (genuine infra noise). SSRF checks are unconditional again, fixing the bypass above. - SpringWebClientWrapper: register pending host+port for every WebClient request by hooking ExchangeFunction.exchange(), the interface every WebClient call goes through, same pattern as the existing OkHttp/Apache/JDK HttpClient wrappers. Uses string-based ByteBuddy matchers (hasSuperType(named(...))) instead of .class literals, since spring-webflux is compileOnly and only present on the target app's classloader - a .class reference in the matcher crashes the agent at premain with NoClassDefFoundError. - SocketChannelWrapper: hook java.nio.channels.SocketChannel.connect(), the JDK-level call every NIO-based client (including Reactor Netty) makes once it has a resolved address, regardless of which DNS resolver produced it. This is what actually closes the gap for WebClient, and it also catches literal IP targets that never go through any resolver at all. Not Netty-specific instrumentation - it's a generic JDK hook with no references to io.netty.* types. - DNSRecordCollector.reportConnect(): entry point for the new hook. Peeks the pending port instead of consuming it (report()'s getAndRemove), because a single request can trigger multiple connect() calls to the same hostname (e.g. the IPv4 then IPv6 address of a dual-stack host like localhost). Consuming on the first attempt let a blocked SSRF target succeed on the second attempt via the other address family - found live, fixed, covered by a regression test. - PendingHostnamesStore: peeking instead of consuming means entries rely on WebRequestCollector's per-incoming-request clear() for cleanup, which never fires for WebClient calls made outside any incoming-request context (e.g. a @scheduled background task). Capped the store at 1000 entries per thread, evicting the least-recently-used one once exceeded - the same bounded-LRU pattern (LinkedHashMap with accessOrder=true + removeEldestEntry()) already used by Hostnames.java for the same class of problem. Deliberately not a time-based TTL, to avoid a timing-dependent race reopening the dual-stack gap under load. - SpringWebClientRedirectWrapper: WebClient calls with followRedirect(true) never re-invoke Spring's request-adaptation layer for redirect hops (Reactor Netty resends bodiless requests internally), so a redirect to a private IP was invisible to both tracking and SSRF - same failure mode as the DNS gap above, just one layer up. Hooks HttpClientConnect$HttpClientHandler.redirect() (package-private, mirroring the same tradeoff HttpConnectionRedirectWrapper already makes for the JDK's equally-internal followRedirect0) and feeds the chain into the existing RedirectCollector/PrivateIPRedirectFinder mechanism, the same one already used for JDK HttpURLConnection redirects. - RequestController (SpringWebfluxSampleApp): /api/request endpoint (plus a followRedirect(true) variant) used to validate all of the above against a real running app end to end, and now wired into end2end/spring_webflux_postgres.py as an automated "ssrf" e2e payload. Known limitation, not fixed here: Spring WebFlux has no request-body taint tracking at all (SpringWebfluxContextObject never populates ContextObject.body), so SSRF via JSON body can't be detected for WebFlux apps regardless of this change - flagged separately, doesn't regress anything.
470324d to
b33ebce
Compare
| Class<?> clazz = classLoader.loadClass("dev.aikido.agent_api.collectors.DNSRecordCollector"); | ||
|
|
||
| // Run reportConnect with "argument" | ||
| for (Method method2: clazz.getMethods()) { |
There was a problem hiding this comment.
Iterating clazz.getMethods() to find and invoke reportConnect is verbose; use clazz.getMethod("reportConnect", String.class, InetAddress.class) and invoke it directly.
Details
✨ AI Reasoning
The code loads a class via reflection and then iterates over clazz.getMethods() to find a method named "reportConnect", invoking it when found. This is more verbose than directly calling clazz.getMethod("reportConnect", parameterTypes...). Iterating all methods is unnecessary and reduces clarity; a direct reflective lookup is standard, clearer, and concise. This change was introduced in the new SocketChannelWrapper and is a straightforward simplification opportunity that keeps identical behavior.
🔧 How do I fix it?
Rewrite the snippet in the simpler, behavior-equivalent form: return a boolean expression directly instead of if cond return true else return false, avoid using lists when they are guaranteed to contain one element, etc.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
Context
Follow-up to #308/#310 (reverted as #311). Customer flood was
InetAddress.getAllByName()picking up Reactor Netty's own DNS-resolver bootstrap noise (0.0.0.0,::,/etc/resolv.confnameservers) as "new outbound connections" on port 0, and blocking them in lockdown mode. #310 fixed the flood but did it with an earlyreturninDNSRecordCollector.report()that also skipped the SSRF check below it — verified with a live regression test that this let an attacker-supplied private-IP literal (e.g. a webhook field pointing straight at169.254.169.254) through undetected.While investigating a better fix, found the actual root cause is bigger: Spring's
WebClientwas never instrumented at all, and Reactor Netty's default HTTP client bypassesInetAddress.getAllByName()entirely (it uses its own async DNS resolver). So even after wrapping WebClient to register pending ports,DNSRecordCollectorwas never invoked for real WebClient targets — confirmed empirically via trace logs against a live running app. WebClient traffic had zero outbound-domain visibility and zero SSRF protection, independent of this bug.Review feedback addressed
recordHitAndEnforceBlocklistextraction back inline, cut comments down to what's non-obvious.isInfrastructureNoise/ports.isEmpty()reliability — was a real gap, not just a theoretical one: it's exactly what let redirect targets slip through undetected (seeSpringWebClientRedirectWrapperbelow). Fixed for the redirect case; the more general "any explicit scheduler hop breaksThreadLocalpropagation" class of problem is noted under Known limitations, out of scope for this PR.WebClientTest, real network calls, same style asHttpURLConnectionTest) and a full e2e payload inspring_webflux_postgres.py. See Tests below.Architecture note: this fits the agent's existing two-layer hook pattern, it doesn't add a new one
Every HTTP client the agent tracks is covered by two kinds of hook working together:
OkHttpWrapper(OkHttpClient.newCall),ApacheHttpClientWrapper(HttpClient.execute),HttpClientSendWrapper(JDKHttpClient.send/sendAsync), and nowSpringWebClientWrapper(ExchangeFunction.exchange, the Spring-level interface everyWebClientcall goes through, independent of whatever transport it's configured with). All of these just extract host+port and register it viaURLCollector→PendingHostnamesStore.DNSRecordCollector). Before this PR there was exactly one:InetAddressWrapper, hooking the JDK'sInetAddress.getAllByName(). It's generic — it doesn't know or care which library called it.OkHttp, Apache HttpClient, and the JDK's own
HttpClientall resolve DNS through that same standard JDK method, so layer 2 already covered them with no extra work.WebClient's underlying transport (Reactor Netty) is the one exception: it deliberately uses its own async DNS resolver instead ofInetAddress.getAllByName(), so it was invisible to layer 2 no matter how well layer 1 was instrumented.SocketChannelWrapperis not Netty-specific instrumentation — it hooksjava.nio.channels.SocketChannel.connect(SocketAddress), a standard JDK class, with zero references to anyio.netty.*type. It catches Reactor Netty as a side effect of Netty'sNioSocketChannelinternally calling this same standard method to open its sockets — the same hook would also catch any other NIO-based client, not just Netty. This is layer 2 gaining a second, complementary entry point, not a new Netty-specific layer.What changed
DNSRecordCollector.java— private-IP-literal gate narrowed toports.isEmpty() && IsPrivateIP.isPrivateIp(hostname), only skipping recording + outbound blocking (SSRF checks are unconditional again). Deliberately not Fully ignore private IP literals as outbound connections (early return) #310's "just ignore private IPs" — theports.isEmpty()half is what makes it safe: a genuine request through any instrumented client always registers a pending port before resolution/connection happens, so this can only ever match traffic with no app-level request behind it (DNS bootstrap, direct-by-IP connects), never a real request, attacker-driven or not.SpringWebClientWrapper.java— new wrapper, hooksExchangeFunction.exchange()(the interface every WebClient request goes through) to register pending host+port viaURLCollector, same pattern as the existing OkHttp/Apache/JDKHttpClientwrappers (layer 1 above).SocketChannelWrapper.java— new wrapper, hooksjava.nio.channels.SocketChannel.connect(SocketAddress)(layer 2 above). This is what actually closes the gap: it's the JDK-level call every NIO-based client (including Reactor Netty) makes once it has a resolved address, regardless of which DNS resolver produced it, and it also catches literal-IP targets that never go through any resolver at all.DNSRecordCollector.reportConnect()— new entry point used bySocketChannelWrapper. Peeks the pending port (PendingHostnamesStore.getPorts) instead of consuming it (report()'sgetAndRemove), for a reason specific to this new hook:report()(fed byInetAddress.getAllByName()) always sees every resolved address for a hostname in one call, so it's safe to consume the pending port immediately — the full picture is already known before any connection is attempted.reportConnect()(fed bySocketChannel.connect()) instead learns about addresses one at a time, exactly when each individual connection attempt happens.This matters because of Happy Eyeballs: a dual-stack hostname like
localhostresolves to both127.0.0.1and::1, and Netty tries them as separate, sequentialconnect()calls, not one combined resolution step. A hostname can resolve to more than 2 addresses in general (multiple A/AAAA records are common for load-balanced services) — Netty will try each one in turn.Found live: with a naive
getAndRemove(), the firstconnect()attempt (IPv4) correctly detected and blocked SSRF — but that also consumed the pending port. The second attempt (IPv6, same hostname) found nothing pending, so the SSRF check loop didn't run at all, and the connection succeeded — the earlier "block" was logged but had no real effect, because the client just silently retried on the other address family and got through. Fixed by peeking instead of consuming; both attempts are now checked. Covered bytestSsrfDetectedOnEveryConnectAttemptForDualStackHostname.PendingHostnamesStore.java— peeking instead of consuming (above) means entries now rely onWebRequestCollector's per-incoming-requestclear()for cleanup, which never fires forWebClientcalls made outside any incoming-request context (e.g. a@Scheduledbackground task) — those entries would otherwise sit in that thread'sThreadLocalmap forever. Capped the store at 1000 entries per thread, evicting the least-recently-used one once exceeded — the same bounded-LRU pattern (LinkedHashMapwithaccessOrder=true+removeEldestEntry()) already used byHostnames.javafor the same class of problem. Deliberately not a time-based TTL: eviction is purely by usage count, not wall-clock time, so it can't interact with the dual-stack fix above (an entry touched by a second connect attempt is safe unless ~1000 unrelated hostnames get registered on the same thread in between, which doesn't happen within one request's connect sequence) — no timing-dependent race, andadd()/getAndRemove()/getPorts()/clear()stay in their original, simple form. Covered byPendingHostnamesStoreTest.SpringWebClientRedirectWrapper.java— new wrapper closing a gap flagged in review:ports.isEmpty()isn't a fully reliable signal for "not a real request". Concretely: WebClient configured with.followRedirect(true)never re-invokes Spring's request-adaptation layer for redirect hops — Reactor Netty resends bodiless (e.g. GET) requests internally without going back throughExchangeFunction/ReactorClientHttpRequest— so a redirect to a private IP had no pending port and was invisible to both tracking and SSRF, indistinguishable from genuine infra noise even though it originated from a real, tainted request. Hooks the package-privateHttpClientConnect$HttpClientHandler.redirect()(the only place this is visible, verified by tracing Spring'ssend(BiFunction)callback — it does not re-fire per redirect for bodiless requests, contrary to its own javadoc's implication) and feeds the chain into the existingRedirectCollector/PrivateIPRedirectFindermachinery, the same one already used for JDKHttpURLConnectionredirects. MirrorsHttpConnectionRedirectWrapper's existing tradeoff of hooking an internal, undocumented method (there:followRedirect0) for this exact reason.RequestController.java(SpringWebfluxSampleApp) —/api/requestendpoint (GET withurlquery param, POST with JSON body) plus a.followRedirect(true)variant at/api/request/follow-redirects, used to validate all of the above against a real running app, and now wired intoend2end/spring_webflux_postgres.pyas an automated e2e payload.Also fixed along the way:
SpringWebClientWrapper's first version referencedExchangeFunction.classdirectly in its ByteBuddy matchers, which requires the class to resolve on the agent's own classloader at premain — butspring-webfluxiscompileOnly, only present on the target app's classloader. This crashed the JVM withNoClassDefFoundErrorat startup for any app using the agent. Fixed with string-based matchers (hasSuperType(named(...))), the same patternOkHttpWrapper/ApacheHttpClientWrapperalready use for third-party types.e2e scenarios tested
Built and ran the real agent against a real Spring Boot WebFlux app end to end (not just unit tests):
App startup didn't crash — caught the
NoClassDefFoundErrorbug this way; unit tests can't catch agent-premain class loading issues.Real domain via WebClient gets correct port tracking (not port 0):
Log:
DNSRecordCollector called with example.com & inet addresses: [example.com/104.20.23.154]→Hostname: example.com, Port: 443(previously untracked entirely).SSRF via WebClient, literal private IP, now blocked:
→ HTTP 500,
"Aikido Zen has blocked a server-side request forgery". Stack trace confirms interception point:sun.nio.ch.SocketChannelImpl.connect←NioSocketChannel.doConnect←reactor.netty.transport.TransportConnector.SSRF via WebClient, dual-stack hostname (
localhost→127.0.0.1+::1), both connect attempts blocked:→ HTTP 500. Before the
reportConnect()fix, this returned 200 with the mock server's real response — the first (IPv4) attempt was blocked and logged, but the second (IPv6) attempt silently bypassed SSRF because the first had already consumed the pending port.Private-IP noise from Netty's resolver bootstrap not recorded and not blocked in lockdown — confirmed via mock server heartbeat: after triggering the bootstrap noise (
0.0.0.0,::, local resolver IP10.2.0.1) and a real request, onlylocalhost:5000(the agent's own reporting) showed up in reported hostnames.Safe request to a real public domain still works normally, HTTP 200, both via WebClient and confirmed no regression for already-instrumented clients.
Redirect-based SSRF via WebClient, now blocked:
(mock core's
/mock/redirect-to-metadatareturns a 302 tohttp://169.254.169.254/latest/meta-data/.) → HTTP 500. Withblocktemporarily set to detect-only, confirmed the attack event fires specifically for169.254.169.254:80withsource=query, i.e. traced back through the redirect chain to the tainted origin — not just the origin itself being blocked. Before this wrapper,169.254.169.254never appeared in any DNSRecordCollector/SSRF log at all despiteSocketChannelWrapperseeing the actual connect attempt — no pending port meant no taint check.Tests
Added to
DNSRecordCollectorTest(22 tests total in the file now, all passing):testPrivateIpLiteralWithNoPendingPortNotRecorded/...NotBlockedInLockdown— infra noise filtering still works.testPrivateIpLiteralWithPendingPortStillRecordedAndBlockedInLockdown— real requests to private IPs via instrumented clients are unaffected by the noise filter.testSsrfStillDetectedForPrivateIpLiteralWithPendingPort— regression test proving the earlier early-return SSRF bypass is fixed.testReportConnectRecordsHostnameWithPendingPort,testReportConnectDoesNotConsumePendingPort,testReportConnectPrivateIpLiteralWithNoPendingPortNotRecorded,testReportConnectStoredSsrfStillRunsUnconditionally—reportConnect()behavior.testSsrfDetectedOnEveryConnectAttemptForDualStackHostname— regression test for the dual-stack SSRF bypass found during e2e testing.New
PendingHostnamesStoreTest:testUnboundedHostnamesDoNotGrowThreadLocalMapForever— adds 2000 distinct hostnames, confirms the oldest untouched ones are evicted and the store stays bounded.testReadingAnEntryProtectsItFromEvictionWhileStillInUse— confirms an entry touched by a second connect attempt survives a realistic number of unrelated hostnames being added in between.New
testSsrfDetectedForRedirectToPrivateIpinDNSRecordCollectorTest— regression test for the redirect gap, exercisingRedirectCollector+DNSRecordCollector.reportConnect()directly.New
agent_api/src/test/java/wrappers/WebClientTest.java, mirroring the existingHttpURLConnectionTestpattern (real network calls, run with the actual built agent attached via theagent_apitest task's-javaagent) — split into two tests instead of one end-to-end test, becauseSpringWebClientWrapperandSocketChannelWrapperfire on different threads for a real WebClient call (the former on the subscribing thread, the latter on Reactor Netty's own event-loop thread), andPendingHostnamesStore/ContextareThreadLocal, so they can't be observed together the way a same-thread blocking client's wrapper can:testSpringWebClientWrapperRegistersPendingPort— confirmsSpringWebClientWrapperfires and registers the right port.testSocketChannelWrapperBlocksSsrf— confirmsSocketChannelWrapper+DNSRecordCollector.reportConnect()'s SSRF logic, via a rawSocketChannel.connect()call (single-threaded, deterministic).New
end2end/spring_webflux_postgres.pypayload:ssrf, using the new/api/requestendpoint, following the same pattern asjavalin_mysql_kotlin.py's existing SSRF payload. Ran the full e2e suite (all existing SQL/command-injection/path-traversal/bot/IP-blocking/rate-limiting payloads plus this new one) — all passing, no regressions.Known limitations (not fixed in this PR)
SpringWebfluxContextObjectnever populatesContextObject.body), so SSRF/etc. via JSON body can't be detected for WebFlux apps regardless of this change — that's why the e2e scenarios above use query params instead of body. Doesn't regress anything (it was already broken before this PR), but worth a dedicated follow-up.Context/PendingHostnamesStoreareThreadLocal, which doesn't survive an explicit scheduler hop in a reactive chain (e.g..publishOn(Schedulers.boundedElastic())before aWebClientcall — a common pattern for mixing blocking and non-blocking code). This is the same underlying class of gap the redirect fix closes (Reactor Netty doing work on a thread other than the one that registered the request's context), just via a different trigger. It only breaks under an app's own explicit scheduler switch, not the default WebFlux request-handling flow (which stays on onereactor-http-niothread throughout). Not fixed here — a proper fix would mean propagating via Reactor's ownContextinstead ofThreadLocal, a much larger change affecting the whole detection pipeline, not just WebClient.🤖 Generated with Claude Code
Summary by Aikido
🚀 New Features
⚡ Enhancements
🐛 Bugfixes
More info