Skip to content

feat(benchmark): add append/pk table benchmark#302

Open
lucasfang wants to merge 3 commits into
alibaba:mainfrom
lucasfang:benchmark
Open

feat(benchmark): add append/pk table benchmark#302
lucasfang wants to merge 3 commits into
alibaba:mainfrom
lucasfang:benchmark

Conversation

@lucasfang
Copy link
Copy Markdown
Collaborator

Purpose

Linked issue: close #xxx

  • Add a Google Benchmark based performance suite covering:

    1. BM_Write (append table write)
    2. BM_Read (append table read with prefetch_parallel variants)
    3. BM_PK_Write (primary key table write)
    4. BM_MOR_Read (primary key MOR read with prefetch_parallel variants)
  • Add a benchmark entrypoint and custom CLI parsing for:

    1. --paimon_source_parquet
    2. --paimon_external_table_path
    3. --paimon_file_format
    4. --paimon_pk_columns
    5. --paimon_option (repeatable)
  • Add benchmark helper utilities for:

    1. Validation and skip behavior for invalid configurations
    2. Shared read iteration execution
    3. External table read mode (skip pre-write stage)
    4. Source parquet loading and cache reuse
  • Build integration updates:

    1. Add PAIMON_BUILD_BENCHMARKS option (default OFF)
    2. Add benchmark subdirectory and benchmark target
    3. Add add_paimon_benchmark macro and benchmark labels
    4. Add benchmark dependency discovery via FindbenchmarkAlt
    5. Add PAIMON_BENCHMARK_BUILD_VERSION=1.9.1 in third_party versions

Tests

UT:

  1. CliOptionParsingTest.ConsumeCliOptionWorks
  2. CliOptionParsingTest.ParseCsvColumnsWorks
  3. CliOptionParsingTest.ParseCsvColumnsRejectsInvalidInput
  4. CliOptionParsingTest.ParseDelimitedOptionsWorks
  5. CliOptionParsingTest.ParseDelimitedOptionsRejectsInvalidInput
  6. CliOptionParsingTest.ParseStringOptionArgWorksForEqualsAndSeparatedForms
  7. CliOptionParsingTest.ParseStringOptionArgRejectsMissingValue
  8. CliOptionParsingTest.ParseCsvOptionArgAndDelimitedRepeatableOptionArgWorks

Benchmark smoke:

  1. BM_Write
  2. BM_Read/1, BM_Read/2, BM_Read/4
  3. BM_PK_Write
  4. BM_MOR_Read/1, BM_MOR_Read/2, BM_MOR_Read/4

IT:

  1. No new integration tests in this PR (benchmark and build integration scope)

API and Format

  • No change to public API under include.
  • No storage format or protocol compatibility changes.
  • Changes are limited to benchmark executable behavior and build wiring.

Documentation

  • No user-facing documentation change required for this PR.
  • Benchmark usage documentation can be added separately if needed.

Generative AI tooling

Generated-by: GitHub Copilot (GPT-5.3-Codex)

Copilot AI review requested due to automatic review settings May 26, 2026 01:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional Google Benchmark-based performance suite for Paimon (append-table write/read and PK-table write/MOR read), along with CMake/third-party wiring to build and run the benchmarks via CTest labels.

Changes:

  • Add a PAIMON_BUILD_BENCHMARKS build option and a benchmark CTest label/target integration.
  • Vendor/resolve Google Benchmark as a dependency (bundled/system) and add a FindbenchmarkAlt.cmake module.
  • Add a benchmark executable with custom CLI parsing and shared helper utilities + unit tests for CLI parsing.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
third_party/versions.txt Adds Google Benchmark version metadata for bundled dependency download.
CMakeLists.txt Adds PAIMON_BUILD_BENCHMARKS option + benchmark CTest target/labels and subdir wiring.
cmake_modules/ThirdpartyToolchain.cmake Adds Benchmark dependency resolution and bundled build rule.
cmake_modules/FindbenchmarkAlt.cmake Adds a “system” find module for Google Benchmark.
cmake_modules/DefineOptions.cmake Adds PAIMON_BUILD_BENCHMARKS and Benchmark_SOURCE options.
cmake_modules/BuildUtils.cmake Adds add_paimon_benchmark / add_benchmark_case helpers and CTest labeling.
benchmark/CMakeLists.txt Defines benchmark executable target and CLI parsing unit test target.
benchmark/read_write_benchmark.cpp Benchmark entrypoint with custom CLI parsing + Google Benchmark init/run.
benchmark/cli_option_parsing.h Inline parsing helpers for custom benchmark CLI options.
benchmark/cli_option_parsing_test.cpp GTest coverage for CLI parsing helpers.
benchmark/benchmark_suite.h Declares benchmark runner functions and CLI helpers.
benchmark/benchmark_suite.cpp Implements benchmark suite: table setup, write/commit, read iterations, caching, CLI handling.
benchmark/benchmark_helpers.h Declares shared validation/skip and read-iteration helpers.
benchmark/benchmark_helpers.cpp Implements validation/skip behavior and shared read-iteration runner.
benchmark/benchmark_case_write.cpp Registers BM_Write benchmark.
benchmark/benchmark_case_read.cpp Registers BM_Read benchmark variants.
benchmark/benchmark_case_pk_write.cpp Registers BM_PK_Write benchmark.
benchmark/benchmark_case_mor_read.cpp Registers BM_MOR_Read benchmark variants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread third_party/versions.txt
Comment on lines +63 to +65
PAIMON_BENCHMARK_BUILD_VERSION=1.9.1
PAIMON_BENCHMARK_PKG_NAME=benchmark-${PAIMON_BENCHMARK_BUILD_VERSION}.tar.gz

Comment on lines +1754 to +1758
externalproject_add(benchmark_ep
URL ${BENCHMARK_SOURCE_URL}
CMAKE_ARGS ${BENCHMARK_CMAKE_ARGS}
BUILD_BYPRODUCTS "${BENCHMARK_STATIC_LIB}" "${BENCHMARK_MAIN_STATIC_LIB}")

Comment thread cmake_modules/FindbenchmarkAlt.cmake Outdated
Comment on lines +15 to +19
set(_PAIMON_BENCHMARK_ROOTS ${Benchmark_ROOT} ${benchmark_ROOT} ${PAIMON_PACKAGE_PREFIX})
list(REMOVE_ITEM _PAIMON_BENCHMARK_ROOTS "")
if(_PAIMON_BENCHMARK_ROOTS)
set(_PAIMON_BENCHMARK_FIND_ARGS HINTS ${_PAIMON_BENCHMARK_ROOTS} NO_DEFAULT_PATH)
endif()
Comment thread CMakeLists.txt
Comment on lines +437 to +439
if(TARGET benchmark::benchmark_main)
list(APPEND PAIMON_BENCHMARK_LINK_TOOLCHAIN benchmark::benchmark_main)
endif()
Comment thread benchmark/CMakeLists.txt
Comment on lines +23 to +40
"-Wl,--whole-archive"
paimon_local_file_system_shared
"-Wl,--no-whole-archive"
"-Wl,--no-as-needed"
paimon_parquet_file_format_shared
paimon_blob_file_format_shared
"-Wl,--as-needed")

if(PAIMON_ENABLE_ORC)
list(APPEND PAIMON_BENCHMARK_STATIC_LINK_LIBS "-Wl,--no-as-needed")
list(APPEND PAIMON_BENCHMARK_STATIC_LINK_LIBS paimon_orc_file_format_shared)
list(APPEND PAIMON_BENCHMARK_STATIC_LINK_LIBS "-Wl,--as-needed")
endif()

if(PAIMON_ENABLE_AVRO)
list(APPEND PAIMON_BENCHMARK_STATIC_LINK_LIBS "-Wl,--no-as-needed")
list(APPEND PAIMON_BENCHMARK_STATIC_LINK_LIBS paimon_avro_file_format_shared)
list(APPEND PAIMON_BENCHMARK_STATIC_LINK_LIBS "-Wl,--as-needed")
Comment thread benchmark/CMakeLists.txt
Comment on lines +56 to +60
${PAIMON_BENCHMARK_STATIC_LINK_LIBS}
Threads::Threads
${CMAKE_DL_LIBS}
rt
${PAIMON_BENCHMARK_LINK_TOOLCHAIN})
Comment on lines +155 to +164
struct BenchmarkWorkspace {
explicit BenchmarkWorkspace(const std::string& prefix) {
root_path = "/tmp/" + prefix + "_" + std::to_string(NextId());
EnsureDirectory(root_path);
}

~BenchmarkWorkspace() {
const std::string cleanup_cmd = "rm -rf '" + root_path + "'";
std::system(cleanup_cmd.c_str());
}
}

if (paimon::benchmark::HasHelpFlag(argc, argv)) {
paimon::benchmark::PrintPaimonBenchmarkCliHelp();
Comment thread cmake_modules/ThirdpartyToolchain.cmake Outdated
Comment on lines +1744 to +1745
set(BENCHMARK_STATIC_LIB "${BENCHMARK_PREFIX}/lib/libbenchmark.a")
set(BENCHMARK_MAIN_STATIC_LIB "${BENCHMARK_PREFIX}/lib/libbenchmark_main.a")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants