lib: use utf8 fast path for streaming TextDecoder#61549
Conversation
a61503c to
4331d20
Compare
4331d20 to
916cd32
Compare
9ebe8ce to
111b99e
Compare
1a77e7a to
cd5c966
Compare
cd5c966 to
e9e9252
Compare
|
@ChALkeR can you run benchmark CI that applies to this pull-request? |
|
@anonrig I can't do anything until #61553 Also, we don't have a function decodeChunked(bytes) {
const chunk = 1000
const max = bytes.length - chunk
let i = 0
for (; i < max; i += chunk) decoder.decode(bytes.subarray(i, i + chunk), { stream: true })
decoder.decode(bytes.subarray(i))
}And using that instead of |
anonrig
left a comment
There was a problem hiding this comment.
Thank you for the kind responses and amazing work.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61549 +/- ##
========================================
Coverage 89.77% 89.77%
========================================
Files 672 673 +1
Lines 203755 203875 +120
Branches 39167 39190 +23
========================================
+ Hits 182922 183038 +116
- Misses 13164 13172 +8
+ Partials 7669 7665 -4
🚀 New features to boost your workflow:
|
|
@anonrig I added a benchmark for streaming Unicode TextDecoder. |
|
@gurgunday Once everything is fixed, the plan is to update WPT + bring in my extra tests. They'll fail now due to other bugs though, so it can't be done yet without sorting the tests, and that's likely not worth it if we can just fix stuff first. |
Tracking: #61041
A continuation of #61409
This is based on the logic in https://github.com/ExodusOSS/bytes
Unifies stream and non-stream codepath for UTF-8 both for intl and no-intl variants.
Previously, ICU and
string_decoderwere used on with-intl and without-intl variants correspondingly for streaming UTF-8 implementationInstead, just do minor quick slices on JS side and use the single stateless native decoding API to be fast on large chunks
This:
Benchmarks
Previously: #61131
Non-chunked, 25.5.0
Non-chunked, PR
(should not be affected)
Chunked (1000 byte chunks), 25.5.0
Chunked (1000 byte chunks), PR
The benchmark creates about ~70-256 chunks on each test
The improvement is significant