Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLDR: We request the support for TextEncoder in FLEDGE bidding worklet. #961

Open
zhaozegoog opened this issue Dec 21, 2023 · 14 comments
Open

Comments

@zhaozegoog
Copy link

TLDR: We request the support for TextEncoder in FLEDGE bidding worklet.

We’re testing our WASM implementation of generateBid, and we found difficulties as FLEDGE worklet doesn’t support TextEncoder. This is essential as our implementation depends on Protobuf.

When we use polyfill instead of TextEncoder, the resulting bidding script is significantly slower than javascript, which is in contradict to what we observe in our simulation with TextEncoder (running under Chrome main thread).

@caraitto
Copy link
Collaborator

If you're using Protobufs (a binary format) and WASM (which takes binary data in formats like ArrayBuffer, where Uint8Array is a view into that ArrayBuffer), and are running into the lack of TextEncoder, it sounds like you're looking for ways to get strings (I assume encoding the binary data in things like base64) from JavaScript into WASM. In that case, TextEncoder might be the wrong API, as it takes UTF-16 code units from an input string and converts that into a UTF-8 Uint8Array -- IIUC, it can’t correctly round trip arbitrary byte sequences that aren’t valid UTF-16.

Instead, a library that does base64 -> ArrayBuffer is what you probably want instead -- I wrote a pure-JS library that seems to perform well (comparable to TextEncoder), code below. The base64 format is described in https://datatracker.ietf.org/doc/html/rfc4648.

Or is the protobuf parsing happening in a JavaScript library that itself invokes the TextEncoder library?

Here’s the pure JavaScript code I wrote that does UTF-8 -> ArrayBuffer conversion:

const b64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

const lookup = new Uint8Array(256);
for (let i = 0; i < b64Chars.length; i++) {
  lookup[b64Chars.charCodeAt(i)] = i;
}

function b64ToArrayBuf(b64String) {
  const outputBuf = allocateBufFor(b64String);
  return b64ToArrayBufInto(b64String, outputBuf);
}

function allocateBufFor(b64String) {
  if (b64String.length % 4 !== 0) {
    throw "b64ToArrayBuf(): Expected input size divisible by 4; actual size " +
        b64String.length;
  }

  if (b64String.length === 0) {
    return new ArrayBuffer(0);
  }

  let outputLength = b64String.length / 4 * 3;
  if (b64String[b64String.length - 1] === "=") {
    outputLength--;
    if (b64String[b64String.length - 2] === "=") {
      outputLength--;
    }
  }

  return new ArrayBuffer(outputLength);
}

function b64ToArrayBufInto(b64String, outputBuf) {
  const outputUint8Array = new Uint8Array(outputBuf);
  let outPtr = 0;

  for (let i = 0; i < b64String.length; i += 4) {
    const part0 = lookup[b64String.charCodeAt(i)];
    const part1 = lookup[b64String.charCodeAt(i + 1)];
    const part2 = lookup[b64String.charCodeAt(i + 2)];
    const part3 = lookup[b64String.charCodeAt(i + 3)];

    outputUint8Array[outPtr++] = (part0 << 2) | (part1 >> 4);
    outputUint8Array[outPtr++] = ((part1 & 0x0F) << 4) | (part2 >> 2);
    outputUint8Array[outPtr++] = ((part2 & 0x03) << 6) | part3;
  }

  return outputBuf;
}

@zhaozegoog
Copy link
Author

Thanks for the suggestion!

We're testing your implementation and will report back on the efficiency.

Or is the protobuf parsing happening in a JavaScript library that itself invokes the TextEncoder library?

Unfortunately that's the case. There is a function serializeBinary (which returns Uinit8Array) in the protobuf library (JS) that is calls TextEncoder underneath, and hence your suggested implementation won't be sufficient.

Would appreciate if you have any more suggestions.

@droundy
Copy link

droundy commented Feb 2, 2024

We're running into a similar issue with TextDecoder, which is used by the glue javascript generated by wasm-bindgen for rust.

@JensenPaul
Copy link
Collaborator

@droundy, would you please add your affiliation to your GitHub profile?
I checked out wasm-bindgen and was looking around to see how they used TextDecoder. I see suspect you're running into this use of it. Do you have a rust example that generates a use of TextDecoder?

@droundy
Copy link

droundy commented Feb 16, 2024

I've added my affiliation with NextRoll to my profile.

Cargo.toml
lib.rs

You can build (once you put them in the right directories with

wasm-pack build --target web --release --no-typescript --out-dir pkg

which results in a pkg/minimal_bidding_logic.js that uses TextDecoder to convert rust strings into javascript strings. Right now we're using a polyfill to manage this, but it's not ideal.

I'm sure my example could be more minimal, and I haven't actually tested this minimal example in a browser to confirm that it uses those bindings, because I don't know of a convenient way to trigger an auction without messing with our staging deployment.

aarongable pushed a commit to chromium/chromium that referenced this issue Jan 23, 2025
This allows use from content/services/auction_worklet and
continued use from services/accessibility/features.

This is the first step in addressing
WICG/turtledove#961

AX-Relnotes: n/a
Change-Id: I72d6e3ba799a683d3d7113ae5abc4b4bc0af37f7
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6181149
Commit-Queue: Paul Jensen <pauljensen@chromium.org>
Reviewed-by: Colin Blundell <blundell@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1410488}
@morlovich
Copy link
Collaborator

So I've been assigned to look into supporting TextEncoder, but looks like there is some interest in TextDecoder as well? I'd like to share some thoughts in case people have feedback.

  1. So from TextEncoder API, encode() is easy to provide, but encodeInto doesn't seem like it admits efficient implementation via V8 API only (in blink it seems like it may be able to perform well based on using a fancy feature where blink manages the string data). Does anyone actually need encodeInto?
  2. If we are not providing encodeInto, I think we probably shouldn't name our thing TextEncoder; but have a freestanding String -> Uint8Array (of utf-8) function.
  3. Full TextDecoder API is not very plausible, but do people actually need encoding other than utf-8? What about the streaming mode? Various options? (And possibly also non-UInt8Array inputs, though I am not sure how complicated these are). Similarly, this will probably just be some sort of standalone function.

@corywalker
Copy link

From the Google Ads side,

  1. We have not encountered the urgent need for encodeInto, just TextEncoder.encode().
  2. Seems fine enough. We definitely lean towards having the APIs be compatible with and named as TextEncoder, but an API that is not proper TextEncoder could still work for us.
  3. For TextDecoder I don't think we'd need anything more than UTF8. We don't need streaming mode or any special options. We would only pass in Uint8Array inputs. For example, a super basic V8 implementation that worked for us in local testing: https://gist.github.com/corywalker/527a7b8dcca895d01a3067e1ae8d9bd5.

For us, having TextEncoder (or equivalent) supported is a higher priority than TextDecoder.

@morlovich
Copy link
Collaborator

Do you need this outside generateBid?

(WIP at https://chromium-review.googlesource.com/c/chromium/src/+/6269638 FWIW, missing the flag to turn it off/on, though, and some test)

@droundy
Copy link

droundy commented Feb 21, 2025 via email

@corywalker
Copy link

Do you need this outside generateBid?

We might need it in reportWin (not sure if the current WIP covers that already). It seems like having scoreAd/reportResult would make sense for consistency, but I can't speak for if sellers have a need for it.

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 26, 2025
In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
aarongable pushed a commit to chromium/chromium that referenced this issue Feb 27, 2025
In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6269638
Reviewed-by: Russ Hamilton <behamilton@google.com>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1425747}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 27, 2025
In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6269638
Reviewed-by: Russ Hamilton <behamilton@google.com>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1425747}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 27, 2025
In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6269638
Reviewed-by: Russ Hamilton <behamilton@google.com>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1425747}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 4, 2025
…de,Decode} to worklets, a=testonly

Automatic update from web-platform-tests
FLEDGE: Provide browserSignals.utf8{Encode,Decode} to worklets

In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6269638
Reviewed-by: Russ Hamilton <behamilton@google.com>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1425747}

--

wpt-commits: e57faa381e135a2e09d2d825b534efcde22fed9c
wpt-pr: 50979
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this issue Mar 5, 2025
…de,Decode} to worklets, a=testonly

Automatic update from web-platform-tests
FLEDGE: Provide browserSignals.utf8{Encode,Decode} to worklets

In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6269638
Reviewed-by: Russ Hamilton <behamilton@google.com>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1425747}

--

wpt-commits: e57faa381e135a2e09d2d825b534efcde22fed9c
wpt-pr: 50979
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Mar 5, 2025
…de,Decode} to worklets, a=testonly

Automatic update from web-platform-tests
FLEDGE: Provide browserSignals.utf8{Encode,Decode} to worklets

In place of TextEncoder/TextEncoder which we don't have at hand and
which are more complex than what's actually needed.

(Requested in WICG/turtledove#961)

Bug: 397936915
Change-Id: Iaa9018e05119a4dc0ef1579134cb50347fb57c30
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6269638
Reviewed-by: Russ Hamilton <behamilton@google.com>
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1425747}

--

wpt-commits: e57faa381e135a2e09d2d825b534efcde22fed9c
wpt-pr: 50979
aarongable pushed a commit to chromium/chromium that referenced this issue Mar 6, 2025
This reverts commit eb03766.

Reason for revert: Second usage of components/text_encoder unnecessary.

Original change's description:
> This allows use from content/services/auction_worklet and
> continued use from services/accessibility/features.
>
> This is the first step in addressing
> WICG/turtledove#961

AX-Relnotes: n/a
Change-Id: I960a28e74aa591d07ffa152253a88a40d17ca13e
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6330176
Commit-Queue: Paul Jensen <pauljensen@chromium.org>
Reviewed-by: Katie Dektar <katie@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1429087}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Mar 7, 2025
object handing of the global object, rather than browserSignals argument
to functions, as design doc feedback pointed out that it's otherwise
hard to use in frozen context mode to pollyfill TextEncoder/TextDecoder.

(Part of WICG/turtledove#961)

Bug: 397936915
Change-Id: I084b14072ae8aac8419008410d4ee713191d77e9
aarongable pushed a commit to chromium/chromium that referenced this issue Mar 7, 2025
object handing of the global object, rather than browserSignals argument
to functions, as design doc feedback pointed out that it's otherwise
hard to use in frozen context mode to pollyfill TextEncoder/TextDecoder.

(Part of WICG/turtledove#961)

Bug: 397936915
Change-Id: I084b14072ae8aac8419008410d4ee713191d77e9
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6329720
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Russ Hamilton <behamilton@google.com>
Cr-Commit-Position: refs/heads/main@{#1429678}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Mar 7, 2025
object handing of the global object, rather than browserSignals argument
to functions, as design doc feedback pointed out that it's otherwise
hard to use in frozen context mode to pollyfill TextEncoder/TextDecoder.

(Part of WICG/turtledove#961)

Bug: 397936915
Change-Id: I084b14072ae8aac8419008410d4ee713191d77e9
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6329720
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Russ Hamilton <behamilton@google.com>
Cr-Commit-Position: refs/heads/main@{#1429678}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Mar 7, 2025
object handing of the global object, rather than browserSignals argument
to functions, as design doc feedback pointed out that it's otherwise
hard to use in frozen context mode to pollyfill TextEncoder/TextDecoder.

(Part of WICG/turtledove#961)

Bug: 397936915
Change-Id: I084b14072ae8aac8419008410d4ee713191d77e9
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6329720
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Russ Hamilton <behamilton@google.com>
Cr-Commit-Position: refs/heads/main@{#1429678}
@morlovich
Copy link
Collaborator

Put in some (very short) explainer text in the pull request 1409. (Not approved/reviewed/etc., yet, and I have a bunch of process stuff to do before this can be turned on even in canary/dev)

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 17, 2025
…otectedAudience., a=testonly

Automatic update from web-platform-tests
FLEDGE: Move encodeUtf8/decodeUtf8 to protectedAudience.

object handing of the global object, rather than browserSignals argument
to functions, as design doc feedback pointed out that it's otherwise
hard to use in frozen context mode to pollyfill TextEncoder/TextDecoder.

(Part of WICG/turtledove#961)

Bug: 397936915
Change-Id: I084b14072ae8aac8419008410d4ee713191d77e9
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6329720
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Russ Hamilton <behamilton@google.com>
Cr-Commit-Position: refs/heads/main@{#1429678}

--

wpt-commits: 56a607de4b1d7fd09314eda3f18b82ca16837fe4
wpt-pr: 51203
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Mar 18, 2025
…otectedAudience., a=testonly

Automatic update from web-platform-tests
FLEDGE: Move encodeUtf8/decodeUtf8 to protectedAudience.

object handing of the global object, rather than browserSignals argument
to functions, as design doc feedback pointed out that it's otherwise
hard to use in frozen context mode to pollyfill TextEncoder/TextDecoder.

(Part of WICG/turtledove#961)

Bug: 397936915
Change-Id: I084b14072ae8aac8419008410d4ee713191d77e9
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6329720
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Russ Hamilton <behamilton@google.com>
Cr-Commit-Position: refs/heads/main@{#1429678}

--

wpt-commits: 56a607de4b1d7fd09314eda3f18b82ca16837fe4
wpt-pr: 51203
@morlovich
Copy link
Collaborator

Canary/Dev 50% config landed (it will take some time for it to actually reach these numbers).

When it's on, you'll have a protectedAudience.encodeUtf8 (and decodeUtf8) in worklets; when it's off there isn't a protectedAudience object, so please feature detect accordingly.

@morlovich
Copy link
Collaborator

Working on the spec, I realized I implemented decoding as equivalent to ignoreBOM: true (...probably since I think the name should mean the opposite of what it does); are folks OK with this, or should I go change it?

@zhaozegoog
Copy link
Author

We don't use a Byte Order Mark (BOM) in our code, so the ignoreBOM option doesn't change anything for us.

Even though the BOM is optional for UTF-8 (see https://learn.microsoft.com/en-us/globalization/encoding/unicode-standard), it can still serve as an encoding indicator.

To maintain consistency with the standard TextDecoder (which defaults ignoreBOM to false), I suggest we make our decodeUtf8 function behave similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants