lib: use utf8 fast path for streaming TextDecoder by ChALkeR · Pull Request #61549 · nodejs/node

ChALkeR · 2026-01-27T18:17:16Z

Tracking: #61041

A continuation of #61409

This is based on the logic in https://github.com/ExodusOSS/bytes

Unifies stream and non-stream codepath for UTF-8 both for intl and no-intl variants.

Previously, ICU and string_decoder were used on with-intl and without-intl variants correspondingly for streaming UTF-8 implementation

Instead, just do minor quick slices on JS side and use the single stateless native decoding API to be fast on large chunks

This:

Improves streaming UTF-8 TextDecoder performance
Brings streaming fatal UTF-8 TextDecoder support to without-intl builds
Unifies the logic, using a single codepath everywhere for UTF-8

Benchmarks

Previously: #61131

Non-chunked, 25.5.0

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.69 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	2.03 GiB/s	0.037 ms
Chinese lipsum	68.203 KiB	4.10 GiB/s	0.016 ms
Arabic + 2 * ASCII	249.577 KiB	2.72 GiB/s	0.092 ms
Non-ASCII char + ASCII	84.905 KiB	3.84 GiB/s	0.024 ms

Non-chunked, PR

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.19 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	2.02 GiB/s	0.038 ms
Chinese lipsum	68.203 KiB	4.00 GiB/s	0.016 ms
Arabic + 2 * ASCII	249.577 KiB	3.32 GiB/s	0.075 ms
Non-ASCII char + ASCII	84.905 KiB	4.71 GiB/s	0.022 ms

(should not be affected)

Chunked (1000 byte chunks), 25.5.0

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	1.31 GiB/s	0.062 ms
Arabic lipsum	79.771 KiB	0.90 GiB/s	0.084 ms
Chinese lipsum	68.203 KiB	0.99 GiB/s	0.066 ms
Arabic + 2 * ASCII	249.577 KiB	1.14 GiB/s	0.209 ms
Non-ASCII char + ASCII	84.905 KiB	1.31 GiB/s	0.062 ms

Chunked (1000 byte chunks), PR

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	7.45 GiB/s	0.011 ms
Arabic lipsum	79.771 KiB	1.24 GiB/s	0.062 ms
Chinese lipsum	68.203 KiB	1.54 GiB/s	0.042 ms
Arabic + 2 * ASCII	249.577 KiB	2.67 GiB/s	0.091 ms
Non-ASCII char + ASCII	84.905 KiB	7.26 GiB/s	0.011 ms

The benchmark creates about ~70-256 chunks on each test
The improvement is significant

anonrig · 2026-01-27T20:21:16Z

@ChALkeR can you run benchmark CI that applies to this pull-request?

ChALkeR · 2026-01-27T20:33:00Z

@anonrig I can't do anything until #61553

Also, we don't have a stream: true TextDecoder benchmark.
The bench results above are from https://github.com/lemire/jstextdecoderbench with this modification:

function decodeChunked(bytes) {
  const chunk = 1000
  const max = bytes.length - chunk
  let i = 0
  for (; i < max; i += chunk) decoder.decode(bytes.subarray(i, i + chunk), { stream: true })
  decoder.decode(bytes.subarray(i))
}

And using that instead of decoder.decode

anonrig

Thank you for the kind responses and amazing work.

codecov · 2026-01-27T21:08:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.77%. Comparing base (e155415) to head (1559954).
⚠️ Report is 38 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main   #61549    +/-   ##
========================================
  Coverage   89.77%   89.77%            
========================================
  Files         672      673     +1     
  Lines      203755   203875   +120     
  Branches    39167    39190    +23     
========================================
+ Hits       182922   183038   +116     
- Misses      13164    13172     +8     
+ Partials     7669     7665     -4

Files with missing lines	Coverage Δ
lib/internal/encoding.js	`100.00% <100.00%> (ø)`
lib/internal/encoding/util.js	`100.00% <100.00%> (ø)`

... and 33 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

nodejs-github-bot · 2026-01-27T22:48:04Z

CI: https://ci.nodejs.org/job/node-test-pull-request/71082/

ChALkeR · 2026-01-27T23:16:20Z

@anonrig I added a benchmark for streaming Unicode TextDecoder.
Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1787

nodejs-github-bot · 2026-01-28T15:17:24Z

CI: https://ci.nodejs.org/job/node-test-pull-request/71102/

ChALkeR · 2026-01-28T15:23:13Z

@gurgunday Once everything is fixed, the plan is to update WPT + bring in my extra tests. They'll fail now due to other bugs though, so it can't be done yet without sorting the tests, and that's likely not worth it if we can just fix stuff first.

nodejs-github-bot added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. labels Jan 27, 2026

ChALkeR force-pushed the chalker/decoder/unify/2 branch from a61503c to 4331d20 Compare January 27, 2026 18:18

ChALkeR mentioned this pull request Jan 27, 2026

lib: unify ICU and no-ICU TextDecoder #61409

Merged

ChALkeR force-pushed the chalker/decoder/unify/2 branch from 4331d20 to 916cd32 Compare January 27, 2026 18:25

anonrig reviewed Jan 27, 2026

View reviewed changes

Comment thread lib/internal/encoding/util.js Outdated

anonrig added the needs-benchmark-ci PR that need a benchmark CI run. label Jan 27, 2026

ChALkeR force-pushed the chalker/decoder/unify/2 branch 5 times, most recently from 9ebe8ce to 111b99e Compare January 27, 2026 19:22

ChALkeR requested a review from anonrig January 27, 2026 19:23

ChALkeR marked this pull request as ready for review January 27, 2026 19:25

ChALkeR force-pushed the chalker/decoder/unify/2 branch 3 times, most recently from 1a77e7a to cd5c966 Compare January 27, 2026 20:12

lib: use utf8 fast path for streaming TextDecoder

e9e9252

ChALkeR force-pushed the chalker/decoder/unify/2 branch from cd5c966 to e9e9252 Compare January 27, 2026 20:13

anonrig approved these changes Jan 27, 2026

View reviewed changes

ChALkeR added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2026

github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2026

benchmark: add streaming TextDecoder benchmark

1559954

ChALkeR added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2026

github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2026

RafaelGSS added the performance Issues and PRs related to the performance of Node.js. label Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lib: use utf8 fast path for streaming TextDecoder#61549

lib: use utf8 fast path for streaming TextDecoder#61549
ChALkeR wants to merge 2 commits into
nodejs:mainfrom
ChALkeR:chalker/decoder/unify/2

ChALkeR commented Jan 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

anonrig commented Jan 27, 2026

Uh oh!

ChALkeR commented Jan 27, 2026 •

edited

Loading

Uh oh!

anonrig left a comment

Uh oh!

codecov Bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

nodejs-github-bot commented Jan 27, 2026

Uh oh!

ChALkeR commented Jan 27, 2026

Uh oh!

nodejs-github-bot commented Jan 28, 2026

Uh oh!

ChALkeR commented Jan 28, 2026

Uh oh!

Uh oh!

Conversation

ChALkeR commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Non-chunked, 25.5.0

Non-chunked, PR

Chunked (1000 byte chunks), 25.5.0

Chunked (1000 byte chunks), PR

Uh oh!

Uh oh!

anonrig commented Jan 27, 2026

Uh oh!

ChALkeR commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anonrig left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nodejs-github-bot commented Jan 27, 2026

Uh oh!

ChALkeR commented Jan 27, 2026

Uh oh!

nodejs-github-bot commented Jan 28, 2026

Uh oh!

ChALkeR commented Jan 28, 2026

Uh oh!

ChALkeR commented Jan 27, 2026 •

edited

Loading

ChALkeR commented Jan 27, 2026 •

edited

Loading

codecov Bot commented Jan 27, 2026 •

edited

Loading