Skip to content

[fix](transaction) select txn insert backend from current cluster#63634

Open
sollhui wants to merge 1 commit into
apache:masterfrom
sollhui:fix_txn_insert
Open

[fix](transaction) select txn insert backend from current cluster#63634
sollhui wants to merge 1 commit into
apache:masterfrom
sollhui:fix_txn_insert

Conversation

@sollhui
Copy link
Copy Markdown
Contributor

@sollhui sollhui commented May 25, 2026

What problem does this PR solve?

Problem Summary:

In cloud mode with multiple compute groups, transactional insert into values may fail with:

Cannot invoke "org.apache.doris.system.Backend.getHost()" because "backend" is null

The root cause is that InsertStreamTxnExecutor selected a backend id from all clusters through selectBackendIdsByPolicy(policy, 1), but then looked up the selected id from getBackendsByCurrentCluster(). If the selected backend belonged to another compute group, the lookup returned null and FE hit an NPE when calling backend.getHost().

This PR changes txn insert backend selection to use the current cluster backend snapshot as the candidate list, so the selected backend is always from the current compute group.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented May 25, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31172 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3e12fdffcd59bafc8f81744a047257a1db297448, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17799	4013	3977	3977
q2	q3	10863	1387	819	819
q4	4691	478	355	355
q5	7604	2251	2078	2078
q6	347	176	134	134
q7	988	769	640	640
q8	9429	1671	1661	1661
q9	7010	4954	4989	4954
q10	6421	2238	1875	1875
q11	432	272	240	240
q12	695	428	293	293
q13	18214	3380	2751	2751
q14	271	257	248	248
q15	q16	836	781	713	713
q17	985	888	931	888
q18	6960	5794	5536	5536
q19	1280	1307	1138	1138
q20	499	396	260	260
q21	5844	2665	2304	2304
q22	442	350	308	308
Total cold run time: 101610 ms
Total hot run time: 31172 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4363	4271	4264	4264
q2	q3	4538	4957	4350	4350
q4	2103	2235	1403	1403
q5	4459	4315	4898	4315
q6	247	196	151	151
q7	2052	1895	1596	1596
q8	2443	2167	2195	2167
q9	8061	7962	8039	7962
q10	4824	4899	4335	4335
q11	596	474	384	384
q12	742	767	540	540
q13	3343	3638	3008	3008
q14	306	319	273	273
q15	q16	710	734	663	663
q17	1361	1354	1342	1342
q18	7883	7256	6906	6906
q19	1145	1121	1106	1106
q20	2219	2233	1961	1961
q21	5306	4588	4509	4509
q22	532	466	404	404
Total cold run time: 57233 ms
Total hot run time: 51639 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172297 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3e12fdffcd59bafc8f81744a047257a1db297448, data reload: false

query5	4336	667	529	529
query6	335	221	199	199
query7	4219	563	298	298
query8	340	232	233	232
query9	8848	4116	4124	4116
query10	469	349	310	310
query11	5795	2513	2281	2281
query12	182	130	128	128
query13	1309	644	424	424
query14	6139	5528	5269	5269
query14_1	4547	4551	4529	4529
query15	217	208	181	181
query16	993	462	393	393
query17	1151	751	615	615
query18	2637	501	368	368
query19	223	209	177	177
query20	145	132	131	131
query21	219	147	122	122
query22	13602	13671	13302	13302
query23	17271	16593	16216	16216
query23_1	16474	16307	16428	16307
query24	7567	1790	1344	1344
query24_1	1340	1299	1331	1299
query25	618	503	437	437
query26	1351	333	178	178
query27	2679	577	351	351
query28	4450	2060	2016	2016
query29	1027	651	515	515
query30	313	239	194	194
query31	1144	1099	960	960
query32	93	79	74	74
query33	580	381	314	314
query34	1210	1161	659	659
query35	782	808	708	708
query36	1378	1441	1251	1251
query37	161	103	84	84
query38	3226	3140	3072	3072
query39	925	932	900	900
query39_1	876	881	867	867
query40	225	142	118	118
query41	66	64	62	62
query42	108	107	107	107
query43	330	338	296	296
query44	
query45	210	206	196	196
query46	1073	1208	762	762
query47	2466	2386	2273	2273
query48	428	426	310	310
query49	661	488	370	370
query50	1021	355	265	265
query51	4401	4340	4400	4340
query52	105	105	97	97
query53	264	287	207	207
query54	328	265	249	249
query55	94	90	85	85
query56	291	293	309	293
query57	1479	1448	1391	1391
query58	295	268	254	254
query59	1655	1774	1581	1581
query60	324	323	300	300
query61	159	155	156	155
query62	714	656	589	589
query63	249	213	220	213
query64	2457	821	668	668
query65	
query66	1995	485	390	390
query67	29741	29548	29523	29523
query68	
query69	467	337	290	290
query70	1006	1031	1020	1020
query71	298	273	268	268
query72	2993	2667	2415	2415
query73	850	721	445	445
query74	5086	4951	4795	4795
query75	2696	2599	2268	2268
query76	2319	1145	771	771
query77	412	412	345	345
query78	12271	12623	11889	11889
query79	1473	1097	711	711
query80	1284	550	456	456
query81	508	276	243	243
query82	1337	161	118	118
query83	359	290	248	248
query84	268	144	113	113
query85	934	522	472	472
query86	438	381	322	322
query87	3470	3382	3277	3277
query88	3611	2768	2711	2711
query89	455	390	340	340
query90	1818	185	179	179
query91	180	184	146	146
query92	76	78	75	75
query93	1569	1436	848	848
query94	650	363	309	309
query95	675	470	361	361
query96	1071	879	362	362
query97	2753	2785	2610	2610
query98	241	248	226	226
query99	1182	1146	1025	1025
Total cold run time: 256149 ms
Total hot run time: 172297 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 75.00% (6/8) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants