Skip to content

HADOOP-19901: Release checksum buffers after vectored read verification in ChecksumFileSystem#8511

Open
iemejia wants to merge 1 commit into
apache:trunkfrom
iemejia:HADOOP-19901
Open

HADOOP-19901: Release checksum buffers after vectored read verification in ChecksumFileSystem#8511
iemejia wants to merge 1 commit into
apache:trunkfrom
iemejia:HADOOP-19901

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented May 23, 2026

Description

ChecksumFileSystem.readVectored() allocates buffers for both data ranges and checksum ranges through the caller's allocator (IntFunction<ByteBuffer>). After checksum verification completes, the checksum buffers were never released because:

  1. The caller has no reference to checksum buffers (only data range results are visible)
  2. The system did not call the release consumer on them

This causes buffer leaks when callers use a tracking or pooled allocator. This bug was discovered while working on Apache Parquet's TrackingByteBufferAllocator, which detected unreleased buffers accumulating during vectored reads through ChecksumFileSystem.

Root Cause

In ChecksumFileSystem.ChecksumFSInputChecker.readVectored(), checksum ranges are read via sums.readVectored(checksumRanges, allocate, release). The checksum buffers are used in thenCombineAsync calls for verification, but after verification completes, no code released them back to the caller's pool.

A single checksum range can cover multiple data ranges, so the checksum buffer is shared across multiple thenCombineAsync calls — it must only be released once, after ALL verifications using it are complete.

Fix

After collecting all verification futures for a checksum range, use CompletableFuture.allOf(verifications).thenRun(...) to release the checksum buffer exactly once after all verifications complete:

CompletableFuture.allOf(verifications.toArray(new CompletableFuture[0]))
    .thenRun(() -> release.accept(checksumRange.getData().join()));

This ensures:

  • Exactly-once release (not once per data range that shares the checksum buffer)
  • Release happens only after the buffer is no longer in use
  • Works correctly with both the 2-arg API (no-op release) and the 3-arg API

Testing

Added testChecksumBuffersReleasedAfterVectoredRead() to TestLocalFSContractVectoredRead:

  • Uses a counting allocator/release pair to track buffer lifecycle
  • Performs a vectored read through LocalFileSystem (which uses ChecksumFileSystem)
  • Asserts that the system calls the release consumer at least once for internal buffers
  • Confirmed the test fails without the fix (0 system-initiated releases) and passes with it

All 54 existing vectored read contract tests continue to pass.

JIRA

https://issues.apache.org/jira/browse/HADOOP-19901

…on in ChecksumFileSystem

ChecksumFileSystem.readVectored() allocates buffers for both data ranges
and checksum ranges through the caller's allocator. After verification,
the checksum buffers were never released because the caller has no
reference to them and the system did not release them.

This causes buffer leaks when callers use a tracking/pooled allocator
(e.g., Apache Parquet's TrackingByteBufferAllocator), as the checksum
buffers accumulate without being returned to the pool.

Fix: After all verifications for a checksum range complete (via
CompletableFuture.allOf), release the checksum buffer using the
caller-provided release consumer. This ensures exactly-once release
even when one checksum range covers multiple data ranges.

Added a regression test that verifies the release consumer is called
for checksum buffers after vectored read verification completes.
@hadoop-yetus
Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 51m 8s trunk passed
+1 💚 compile 24m 9s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 29m 54s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 31s trunk passed
+1 💚 mvnsite 2m 2s trunk passed
+1 💚 javadoc 1m 26s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 20s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 18s trunk passed
+1 💚 shadedclient 37m 7s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 15s the patch passed
+1 💚 compile 16m 55s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 16m 55s the patch passed
+1 💚 compile 18m 6s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 18m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 26s the patch passed
+1 💚 mvnsite 1m 58s the patch passed
+1 💚 javadoc 1m 51s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 39s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 45s the patch passed
+1 💚 shadedclient 47m 28s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 32m 26s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 asflicense 1m 27s The patch does not generate ASF License warnings.
282m 27s
Reason Tests
Failed junit tests hadoop.ipc.TestRPC
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/artifact/out/Dockerfile
GITHUB PR #8511
JIRA Issue HADOOP-19901
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux cbd57bcce7e7 5.15.0-177-generic #187-Ubuntu SMP Sat Apr 11 22:54:33 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 78376e9
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/testReport/
Max. process+thread count 1312 (vs. ulimit of 10000)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/console
versions git=2.43.0 maven=3.9.15 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants