Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions .github/workflows/integration-test-multinode.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
name: Integration Test Multinode (Full)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD] Sync pr-cancel.yml workflows list with the workflow changes in this PR

.github/workflows/pr-cancel.yml:20 hard-codes the workflows list as ['pr-build.yml', 'system-test.yml', 'codeql.yml']. This PR deletes system-test.yml and adds two new workflows, but does not touch pr-cancel.yml. Once this PR is merged:

  1. github.rest.actions.listWorkflowRuns({workflow_id: 'system-test.yml', ...}) will hit a 404 on the deleted workflow file. actions/github-script@v8 has no try/catch around the loop, so the whole cancel job throws — meaning pr-build.yml, codeql.yml, and the two new integration-test workflows are never cancelled on a closed-unmerged PR.
  2. Multinode timeout is 60 min — a single un-cancelled run consumes a full runner slot.

Suggestion: in this PR, update pr-cancel.yml:20 to remove 'system-test.yml', add 'integration-test-single-node.yml' and 'integration-test-multinode.yml', and wrap each workflow iteration in a try/catch to make the job resilient to future drift.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — both issues are valid. Fixed in eed9dad7d:

  • Updated the workflow list: dropped system-test.yml, added integration-test-single-node.yml and integration-test-multinode.yml.
  • Wrapped each workflow iteration in a try/catch so a missing or renamed file logs a Skipping ... line instead of taking down the whole cancel job. This makes the cancel job resilient to future drift, not just the current rename.


on:
pull_request:
branches: [ 'develop', 'release_**' ]
types: [ opened, synchronize, reopened ]
paths-ignore: [ '**/*.md', '.gitignore', '**/.gitignore', '.editorconfig',
'.gitattributes', 'docs/**', 'CHANGELOG', '.github/ISSUE_TEMPLATE/**',
'.github/PULL_REQUEST_TEMPLATE/**', '.github/CODEOWNERS' ]
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
multinode-full:
name: Integration Test Multinode Full (JDK 8 / x86_64)
runs-on: ubuntu-latest
timeout-minutes: 60

steps:
- name: Checkout java-tron
uses: actions/checkout@v5

- name: Set up JDK 8
uses: actions/setup-java@v5
with:
java-version: '8'
distribution: 'temurin'

- name: Cache Gradle packages
uses: actions/cache@v4
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-multinode-${{ hashFiles('**/*.gradle', '**/gradle-wrapper.properties') }}
restore-keys: ${{ runner.os }}-gradle-multinode-

- name: Build FullNode.jar
run: ./gradlew clean build -x test --no-daemon

- name: Build local java-tron Docker image (wraps PR-built FullNode.jar)
run: |
mkdir -p /tmp/tron-image
cp build/libs/FullNode.jar /tmp/tron-image/
cat > /tmp/tron-image/Dockerfile <<'EOF'
FROM tronprotocol/java-tron:latest
COPY FullNode.jar /java-tron/lib/FullNode.jar
EOF
docker build -t java-tron-local:pr /tmp/tron-image

- name: Pull integration-test image
run: docker pull troninfra/troninfra-ci:latest

- name: Extract compose configs to host (for DinD path-alignment)
run: |
# start-multinode.sh builds HOST_COMPOSE_DIR as:
# ${HOST_WORKDIR}/docker/multi-node
# so the files must live at $HOST_WORKDIR/docker/multi-node/ on the
# host. Set HOST_WORKDIR to the workspace root and extract
# /app/docker/ 1:1 into workspace/docker/ — the subdirectories
# (multi-node/, single-node/) don't collide with java-tron's own
# docker/ files.
docker create --name it-extract troninfra/troninfra-ci:latest
docker cp it-extract:/app/docker/. "${{ github.workspace }}/docker/"
docker rm -f it-extract

- name: Run multinode full tests
run: |
# --network host: multinode tests talk to nodes via 127.0.0.1:50051 etc.
# DinD socket + HOST_WORKDIR path-alignment lets the container orchestrate
# the 3-witness compose stack via the host daemon.
# Don't override --workdir so the container's default /app entrypoint works.
docker run --name integration-multinode \
--network host \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${{ github.workspace }}:${{ github.workspace }}" \
-v "${{ github.workspace }}/docker/multi-node:/app/docker/multi-node" \
-e HOST_WORKDIR="${{ github.workspace }}" \
-e TRON_IMAGE=java-tron-local:pr \
-e JAVA_HOME=/usr/lib/jvm/temurin-8 \
-e JAVA_HOME_17=/opt/java/openjdk \
troninfra/troninfra-ci:latest \
--multinode --clean

- name: Extract test reports from container
if: always()
run: |
mkdir -p integration-reports
docker cp integration-multinode:/app/build/reports/. integration-reports/reports/ 2>/dev/null || true
docker cp integration-multinode:/app/build/test-results/. integration-reports/test-results/ 2>/dev/null || true
docker cp integration-multinode:/app/build/test-output.log integration-reports/ 2>/dev/null || true

- name: Collect witness node logs
if: always()
run: |
mkdir -p integration-reports/node-logs
for c in tron-mn-node1 tron-mn-node2 tron-mn-node3 tron-mn-mongodb; do
docker logs "$c" > "integration-reports/node-logs/${c}.log" 2>&1 || true
done

- name: Tear down compose stack
if: always()
run: |
docker rm -f tron-mn-node1 tron-mn-node2 tron-mn-node3 tron-mn-mongodb 2>/dev/null || true
docker network rm multi-node_tron-net 2>/dev/null || true
docker rm -f integration-multinode 2>/dev/null || true

- name: Upload test reports
if: always()
uses: actions/upload-artifact@v6
with:
name: integration-multinode-report
path: integration-reports/
if-no-files-found: warn
78 changes: 78 additions & 0 deletions .github/workflows/integration-test-single-node.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Integration Test Single Node (Full)

on:
pull_request:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[DISCUSS] Is this workflow intended to be a hard merge gate, or an informational signal?

The PR description frames the new workflows as 'wiring it into PR CI catches integration regressions before merge instead of after' — i.e., a real gate. However, the workflow files alone do not block merge: even if Integration Test Single Node Full or Integration Test Multinode Full fails red, GitHub will still allow merge unless these check names are explicitly added to the repository's required status checks under Branches → Branch protection rules.

That configuration lives outside the repository tree and is not part of this PR, so two questions to clarify before / soon after merge:

  1. Intent: are these meant to be required checks on release_v4.8.2 / develop, or informational only for now (e.g., let the new suite bake for 1–2 release cycles before promoting to required)?
  2. Rollout: if required, who owns adding Integration Test Single Node Full (JDK 8 / x86_64) and Integration Test Multinode Full (JDK 8 / x86_64) to the branch protection rules on which protected branches, and when? Without that follow-up, replacing system-test.yml (which is also not currently required, but conceptually was the gate) silently downgrades the gating posture — failures become advisory.

If gating is the goal, suggest tracking the branch-protection update as an explicit follow-up in the PR description (same vein as the existing 'two java.version assertions' and 'multinode cost' follow-ups). If informational-only is the goal, please say so in the PR description so reviewers don't assume blocking semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming: hard merge gates on release_v4.8.2 and develop is the intent, not informational. You're right that the workflow files alone don't enforce that — the branch protection update is a follow-up that lives outside this PR.

The reasoning: system-test.yml has historically been the on-chain-behavior gate — gRPC/HTTP/JSON-RPC responses, governance lifecycle, V2 staking, transaction validation, multi-witness consensus boundaries. The new workflows cover the same surface with strictly tighter assertions (JUnit 5 + AssertJ hard assertions on exact values, instead of isNotNull/isNotEmpty checks). Replacing stest without making the new workflows required would silently downgrade the gating posture — which is exactly the risk you flagged.

The two workflows are complementary, not redundant:

  • Integration Test Single Node (Full) runs the suite against one FullNode. Covers the bulk of on-chain behavior: API correctness (gRPC / HTTP / JSON-RPC response shapes and values), transaction validation, governance proposal lifecycle, V2 staking, smart contract execution, event subscription. Anything that depends on a single node's state machine.
  • Integration Test Multinode (Full) spins up a 3-witness docker-compose stack. Covers what one node can't observe: witness rotation across maintenance boundaries, block propagation between peers, solidification dynamics with multiple confirmations, view-change behavior, isolated-land detection, multi-witness-specific config matrices (RocksDB on node2, native event queue on node3, prometheus on node1, etc.).

A regression in chain parameter logic typically shows up only in single-node; a regression in consensus / P2P / witness scheduling typically shows up only in multinode. Removing either one leaves a class of regressions undetectable at PR time — which is why both need to be required.

branches: [ 'develop', 'release_**' ]
types: [ opened, synchronize, reopened ]
paths-ignore: [ '**/*.md', '.gitignore', '**/.gitignore', '.editorconfig',
'.gitattributes', 'docs/**', 'CHANGELOG', '.github/ISSUE_TEMPLATE/**',
'.github/PULL_REQUEST_TEMPLATE/**', '.github/CODEOWNERS' ]
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
integration:
name: Integration Test Single Node Full (JDK 8 / x86_64)
runs-on: ubuntu-latest
timeout-minutes: 45

steps:
- name: Checkout java-tron
uses: actions/checkout@v5

- name: Set up JDK 8
uses: actions/setup-java@v5
with:
java-version: '8'
distribution: 'temurin'

- name: Cache Gradle packages
uses: actions/cache@v4
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-integration-test-${{ hashFiles('**/*.gradle', '**/gradle-wrapper.properties') }}
restore-keys: ${{ runner.os }}-gradle-integration-test-

- name: Build FullNode.jar
run: ./gradlew clean build -x test --no-daemon

- name: Pull integration-test image
run: docker pull troninfra/troninfra-ci:latest

- name: Run integration tests
run: |
# JAVA_HOME=JDK 8 so FullNode runs on the same JVM family as
# production (a few assertions check `java.version` starts with
# "1.8"). JAVA_HOME_17 keeps Gradle on JDK 17 for the test
# tooling, which requires Java 17.
docker run --name integration-test \
-e FULLNODE_JAR=/javatron/FullNode.jar \
-e JAVA_HOME=/usr/lib/jvm/temurin-8 \
-e JAVA_HOME_17=/opt/java/openjdk \
-v "${{ github.workspace }}/build/libs/FullNode.jar:/javatron/FullNode.jar:ro" \
troninfra/troninfra-ci:latest \
--clean

- name: Extract test reports from container
if: always()
run: |
mkdir -p integration-reports
docker cp integration-test:/app/build/reports/. integration-reports/reports/ 2>/dev/null || true
docker cp integration-test:/app/build/test-results/. integration-reports/test-results/ 2>/dev/null || true
docker cp integration-test:/app/build/test-output.log integration-reports/ 2>/dev/null || true
docker cp integration-test:/app/node/node.log integration-reports/ 2>/dev/null || true
docker cp integration-test:/app/node/data/logs/tron.log integration-reports/ 2>/dev/null || true
docker rm -f integration-test 2>/dev/null || true

- name: Upload test reports
if: always()
uses: actions/upload-artifact@v6
with:
name: integration-test-report
path: integration-reports/
if-no-files-found: warn
66 changes: 39 additions & 27 deletions .github/workflows/pr-cancel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,43 +13,55 @@ jobs:
if: github.event.pull_request.merged == false
runs-on: ubuntu-latest
steps:
- name: Cancel PR Build and System Test
- name: Cancel PR Build and Integration Tests
uses: actions/github-script@v8
with:
script: |
const workflows = ['pr-build.yml', 'system-test.yml', 'codeql.yml'];
const workflows = [
'pr-build.yml',
'codeql.yml',
'integration-test-single-node.yml',
'integration-test-multinode.yml',
];
const headSha = context.payload.pull_request.head.sha;
const prNumber = context.payload.pull_request.number;

for (const workflowId of workflows) {
for (const status of ['in_progress', 'queued']) {
const runs = await github.paginate(
github.rest.actions.listWorkflowRuns,
{
owner: context.repo.owner,
repo: context.repo.repo,
workflow_id: workflowId,
status,
event: 'pull_request',
per_page: 100,
},
(response) => response.data.workflow_runs
);

for (const run of runs) {
if (!run) {
continue;
}
const prs = Array.isArray(run.pull_requests) ? run.pull_requests : [];
const isTargetPr = prs.length === 0 || prs.some((pr) => pr.number === prNumber);
if (run.head_sha === headSha && isTargetPr) {
await github.rest.actions.cancelWorkflowRun({
// Wrap each workflow iteration so a missing / renamed file
// doesn't take down the whole cancel job — other workflows
// in the list still get processed.
try {
for (const status of ['in_progress', 'queued']) {
const runs = await github.paginate(
github.rest.actions.listWorkflowRuns,
{
owner: context.repo.owner,
repo: context.repo.repo,
run_id: run.id,
});
console.log(`Cancelled ${workflowId} run #${run.id} (${status})`);
workflow_id: workflowId,
status,
event: 'pull_request',
per_page: 100,
},
(response) => response.data.workflow_runs
);

for (const run of runs) {
if (!run) {
continue;
}
const prs = Array.isArray(run.pull_requests) ? run.pull_requests : [];
const isTargetPr = prs.length === 0 || prs.some((pr) => pr.number === prNumber);
if (run.head_sha === headSha && isTargetPr) {
await github.rest.actions.cancelWorkflowRun({
owner: context.repo.owner,
repo: context.repo.repo,
run_id: run.id,
});
console.log(`Cancelled ${workflowId} run #${run.id} (${status})`);
}
}
}
} catch (err) {
console.log(`Skipping ${workflowId}: ${err.message}`);
}
}
95 changes: 0 additions & 95 deletions .github/workflows/system-test.yml

This file was deleted.

Loading