[fix](statistics) full analyze not collect hot value by default#63625
Open
yujun777 wants to merge 2 commits into
Open
[fix](statistics) full analyze not collect hot value by default#63625yujun777 wants to merge 2 commits into
yujun777 wants to merge 2 commits into
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Full analyze can spend excessive memory collecting hot values for high-cardinality columns. This change adds a collect.hot.value analyze property so manual full analyze skips hot value collection by default, manual sample analyze keeps collecting hot values by default, and explicit properties can override both. Automatic analyze keeps the previous nullable internal setting so existing behavior is not changed. The no-hot-value analyze SQL templates directly return null hot_value instead of relying on optimizer simplification.
### Release note
Manual ANALYZE supports PROPERTIES("collect.hot.value"="true/false") to control hot value collection.
### Check List (For Author)
- Test: Unit Test and Regression test
- ./build.sh --fe
- bash ./run-fe-ut.sh --run OlapAnalysisTaskTest,HMSAnalysisTaskTest,AnalyzeTableCommandTest,AnalysisManagerTest
- bash ./run-fe-ut.sh --run StatisticsUtilTest
- sh run-regression-test.sh --run -d statistics -s test_full_analyze_hot_value,test_hot_value
- sh run-regression-test.sh --run -d mv_p0/ssb/q_4_1_r1 -s q_4_1_r1
- sh run-regression-test.sh --run -d nereids_rules_p0/distinct_split -s distinct_split
- Behavior changed: Yes. Manual full analyze no longer collects hot value by default; manual sample analyze and automatic analyze keep previous defaults unless the new property is explicitly set.
- Does this need documentation: Yes
Keep sample analyze hot value collection unchanged while making manual full analyze require explicit WITH HOT VALUE. Auto full analyze keeps not collecting hot values, and auto sample continues collecting them. Key changes: - Add WITH HOT VALUE parsing for analyze statements. - Reject WITH HOT VALUE on sample analyze because sample always collects hot values. - Remove sample no-hot-value SQL templates and keep full no-hot-value SQL on the old lightweight path. - Set auto sample jobs to collect hot values explicitly. Unit Test: - AnalyzeTableCommandTest - OlapAnalysisTaskTest - AnalysisManagerTest - Regression: test_hot_value,test_full_analyze_hot_value
Contributor
Author
|
run buildall |
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
TPC-H: Total hot run time: 31661 ms |
Contributor
TPC-DS: Total hot run time: 173783 ms |
Contributor
FE Regression Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#62435 let full analyze always collect hot value, but excute may exceed statistics sql memory limit (default 2GB) for big table.
Keep sample analyze hot value collection unchanged while making manual full analyze require explicit WITH HOT VALUE. Auto full analyze continues to skip hot values, and auto sample still collects them, no change behaviour.
usage:
Tests:
Docs PR: apache/doris-website#3769