Skip to content

chore: merge upstream opengrep/main#6

Draft
mfow-nullify wants to merge 23 commits into
mainfrom
upstream-merge
Draft

chore: merge upstream opengrep/main#6
mfow-nullify wants to merge 23 commits into
mainfrom
upstream-merge

Conversation

@mfow-nullify
Copy link
Copy Markdown

Summary

Syncs our fork with the latest opengrep/main. Pulls in 22 commits, including significant refactors to the taint analysis pipeline.

Upstream changes

  • Removed AST_to_IL.ctx — Ruby bare-identifier disambiguation now runs at parse time via Disambiguate_ruby_calls.disambiguate (called from Parse_target.ml). This removes the need for manual ctx threading through AST_to_IL.function_definition.
  • Lambda taint analysis — Added lambda-specific taint tracking. Function_id.key now uses position info for _tmp_lambda-prefixed names so nested lambdas no longer collide.
  • Functional interface invoke methods — taint: support for .run, .call, .apply, and other invoke methods.
  • Ruby/Scala do-block flattening simplified in Graph_from_AST.
  • Tree-sitter parser updates — Elixir, Clojure, etc.

Preserved Nullify features

  • Interfile taint trackingimported_global_env_of_ast, interfile_rule_context, imported_global_index, Taint_input_env.mk_file_env.
  • Multi-file taint tracking from #1 continues to work after merging upstream's refactor.

Conflict resolution

Three files had explicit conflicts (plus silent auto-merges):

  1. src/call_graph/Function_id.ml — adopted upstream's _tmp_lambda position-based key strategy.
  2. src/engine/Match_tainting_mode.ml — preserved imported_glob_env / interfile context, adopted upstream's lambda filter (keep_src_toSink_only), dropped ctx references.
  3. tests/rules/cross_function_tainting/test_hof_callback_taint_ruby.rb — adopted upstream's ruleid: annotations (Ruby lambda HOF now matches).

Test plan

  • dune build @check passes for src/
  • dune build produces Main.exe
  • GitHub Actions tests pass

maciejpirog and others added 23 commits April 13, 2026 17:38
Accept `{sym "str-key"}` forms (and mixed with keywords) in function
parameters and `let` bindings. The parser previously only handled
keyword values and raised "Invalid map binding form" for string keys.

Also extend the Clojure-specific PatKeyVal case in AST_to_IL so the
string-literal value pattern registers the PatId binding, letting
taint propagate through the destructured variable.
…estructuring

clojure: support string-key map destructuring
`foo.bar` (no parens, no args, no do-block) is map/struct field access
in Elixir -- semantically distinct from `foo.bar(...)` which is a remote
function call. The parser was collapsing both into `Call(DotAccess, [])`,
causing `taint_assume_safe_functions: true` to silently drop taint at
every field access on a tainted receiver.

The tree-sitter grammar already separates the two via distinct
non-terminals, so route each CST case to a distinct AST variant:

- Add `FieldAccess of remote_dot` alongside `DotRemote of remote_dot` in
  AST_elixir.
- In the parser, emit `FieldAccess rdot` for the no-parens/no-args/
  no-do-block case; keep `Call (mk_call_no_parens (Right rdot) ...)`
  otherwise.
- In Elixir_to_generic, translate `FieldAccess` to `G.DotAccess`, the
  same shape used by other languages for field access.

Explicit `foo.bar()` still becomes `Call(DotAccess, [])` and remains
subject to `taint_assume_safe_functions`, as intended.
Extend the `&fun/arity` ShortLambda match in the parser to include
`FieldAccess _` so `&Mod.fun/arity` is recognised as a remote capture.

In `Graph_from_AST.extract_callback_from_arg`, widen the inner-call
callee pattern to accept `DotAccess(_, _, FN(Id))` alongside `N(Id _)`.
`identify_callback` resolves the method name via `all_funcs`.

Extend `test_hof_comprehensive_elixir.ex` with `&Mod.fun/1` cases via
`Enum.map` and the custom HOF, plus a second `defmodule` for
cross-module resolution.

Co-authored-by: @corneliuhoffman
Elixir: distinguish field access from zero-arity remote call
Commit a0b26ed dropped chardet without replacing it; Nuitka does not
follow the try/except import in requests/__init__.py, so Windows builds
since 1.17.0 emit a RequestsDependencyWarning on every run (opengrep#656).
Route the --version probe's stderr through a temp file and gate on
$LASTEXITCODE only, so runtime warnings surface via Write-Host instead
of aborting install.ps1 with a NativeCommandError (opengrep#656).
…normalizer

Fix Windows install: bundle charset_normalizer, don't treat stderr as fatal
Dissambiguating  the the receiver of a DotAccess solves all the issues
in AST_to_al
…ly, etc.)

Add invoke_methods to Lang_config for languages where lambdas are
invoked via named methods on functional interfaces (Java, Kotlin, C#,
Ruby). The call graph and taint signature lookup now recognise these
as lambda invocations so taint flows through them correctly.
Fix lambda taint tracking in intrafile mode
run_check_fundef_if_needed used to skip check_fundef for
  is_lambda_assignment entries, so local source-to-sink flows inside
  lambdas (e.g. record-field lambdas like `{ h: function(d){ sink(d); }
}`) were only ever stored in the signature
  and lost when no call site resolved.

  Changes:
  - Match_tainting_mode.ml: always run check_fundef; for lambda
    assignments filter effects to ToSink with Src-origin taint and
    PBool true precondition. BArg/other parameterised taints still ride
    the signature unchanged.
  - test_lambda_in_object_literal.{yaml,js}: new coverage for the
    record-field case.
  - test_same_name_functions.go: reworked to exercise only same-name
    confusion (second lambda now uses safe(s), both fns are called);
    still fails pre-opengrep#617.
Pulls in 22 upstream commits including:
- Removed AST_to_IL.ctx; Ruby bare-identifier disambiguation now handled
  at parse time via Disambiguate_ruby_calls.disambiguate
- Added lambda taint analysis and Function_id position-based keys for
  nested lambdas (prefix "_tmp_lambda")
- Added taint: functional interface invoke methods (.run, .call, .apply)
- graph-from-ast: ruby/scala do-block flattening simplified
- Various tree-sitter parser updates (elixir, clojure, etc.)

Preserved Nullify-specific features:
- Interfile taint tracking (imported_global_env_of_ast,
  interfile_rule_context, imported_global_index)
- File-level env via Taint_input_env.mk_file_env

Test updates:
- Ruby HOF lambda tests now pass (ruleid: annotations)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mfow-nullify mfow-nullify marked this pull request as ready for review April 17, 2026 01:52
@mfow-nullify mfow-nullify marked this pull request as draft April 17, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants