Add implementation of `FormData` builtin #191

andreiltd · 2024-12-23T20:14:22Z

This patch adds implementation of FormaData: https://xhr.spec.whatwg.org/#interface-formdata

I will add support for Request/Response FormData in a separate PR.

Depends on #181

andreiltd · 2025-01-09T16:02:57Z

This one is pretty much ready for a review. I will keep it as a draft until dependency PRs are merged.

tschneidereit

This is looking great! There are a bunch of things still to address, but I don't think it'll take too much more work

builtins/web/form-data.cpp

tests/wpt-harness/expectations/fetch/api/request/request-init-002.any.js.json

tests/wpt-harness/expectations/fetch/api/response/response-init-002.any.js.json

tests/wpt-harness/expectations/xhr/formdata/set-blob.any.js.json

tests/wpt-harness/pre-harness.js

andreiltd · 2025-01-14T21:09:10Z

Hey! I have a question regarding encoding FormData in the body initialization function: The spec says this :

Switch on object:

FormData

Set action to this step: run the multipart/form-data encoding algorithm, with object’s entry list and UTF-8.

If action is non-null, then run these steps in parallel:

Run action.
Whenever one or more bytes are available and stream is not errored, enqueue the result of creating a Uint8Array from the available bytes into stream. When running action is done, close stream.

Does this mean that we need to implement a stream encoder for FormData? My initial idea was to simply create a Blob out of FormDatas entries, and provide a blob.stream() in a extractBody function. Something like this:

const blobParts = [];
for (const { 0: name, 1: value } of formData) {
  blobParts.push(encodeBoundary(name));
  blobParts.push(value); 
}

const blob = new Blob(blobParts, { type: "multipart/form-data"});
...
stream = blob.stream();
length = blob.size
contentType = blob.type;

I think the problem with that approach is that constructing a new blob is blocking, e.g. if value is a File object, the Blob constructor ends up reading that file and copy its content to an internal buffer. Any ideas?

guybedford · 2025-01-14T22:06:40Z

Afaict, avoiding the extra copy would involve implementing a custom FormDataStream or similar exactly as you say, similarly to what you did with the blob streaming. Perhaps there are parts of the low-level code that can be abstracted / shared? I guess it depends on the performance requirements - if users have a lot of data, copying everything would not only have the compute cost of the copy but also the memory cost of duplicating these items. Nice to avoid if we can?

tschneidereit

Getting even closer! I once again left a bunch of comments, because this is surprisingly subtle. And would like to take another look once those are addressed—but that's not on you, it's on the spec being too subtle.

builtins/web/form-data.cpp

tschneidereit · 2025-01-15T14:17:20Z

builtins/web/form-data.cpp

+// https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#constructing-form-data-set
+//
+// Note: all uses of `create-an-entry` immediately append it, too, so that part is folded in here.
+bool FormData::append(JSContext *cx, HandleObject self, std::string_view name, HandleValue value,


(Leaving a comment on the overall function here, because I'm not sure were exactly inside the function body to leave it.)

The spec for this algorithm is a little bit weird, unfortunately, making it hard to understand what exactly is going on.

In particular, the sub-steps of step 3, interpreted precisely, would in some cases require first, in step 3.1, creating a new File instance with the name "blob", and then immediately, in step 3.2, creating another new File instance with filename as the name. Since creating a File instance doesn't have observable side-effects, that is technically okay, but of course wasteful. It also confuses what actually should happen, which is one of three things:

if value is a string, ensure that both name and value are scalar value strings, and append an entry of { name, value }, and ignore filename

otherwise, if value is a Blob but not a File, or if filename is given, create a new File instance and append an entry of { name, new_file }

otherwise, append an entry of { name, value }

I think it might make sense to restructure this function to follow that logic. In that case, I would still keep the spec step comments, but add a comment explaining the restructuring.

I know I'm giving you a lot of feedback on this one function; the reason is that the spec really is very subtle here, and I want to make sure that we both get it right, and put ourselves into a position where we can understand what's going on long after we've all forgotten this discussion :)

Right, but we still need to handle the filename in the second step and the logic for this should be something like this:

if value is a Blob but not a File, or if filename is given, then:

let new_file = match (is_file, maybe_filename) { (false, Some(filename)) => File(filename), (true, Some(filename)) => File(filename), (false, None) => File("blob"), (true, None) => unreachable!(), };

E.g: these are all the cases for bullets 2 and 3:

const value = new Blob(); formData.set("blob-3", value, "custom name"); const blob3 = formData.get("blob-3"); assert_equals(blob3.name, "custom name"); const value = new File([], "name"); formData.set("file-2", value, "custom name"); const file2 = formData.get("file-2"); assert_equals(file2.name, "custom name"); const value = new Blob(); formData.set("blob-1", value); const blob1 = formData.get("blob-1"); assert_equals(blob1.name, "blob"); const value = new File([], "name"); formData.set("file-1", value); const file1 = formData.get("file-1"); assert_equals(file1.name, "name");

I papered over the filename handling, yes. What I meant to highlight is that the key choice here is whether to create a new File or not, and that whether to use "blob" or filename is a nested choice in that. I.e., I'm arguing for a different factoring from you match expression, because whether or not to create a new File makes substantially more difference than what name to use.

Overall, the cleanest structure is probably

if (!Blob::is_instance(value)) { // treat `value` as a string and return early } if (File::is_instance(value) && filename.isUndefined()) { // append `value` as is and return early } RootedValue filename_val(cx); if (!filename.isUndefined()) { filename_val = // scalar version of `filename` } else { filename_val = // "blob" } // Create the new file using `filename_val`

Does that make sense?

builtins/web/form-data.cpp

tschneidereit

Very nice! Thank you for your patience with the long review cycles, and for getting this into great shape!

andreiltd marked this pull request as draft December 23, 2024 20:14

andreiltd force-pushed the form-data branch 2 times, most recently from 7b1e6a1 to 0a65a1e Compare January 9, 2025 12:06

andreiltd added 8 commits January 10, 2025 18:54

Add FormData builtin

ffe6a01

Add file create function

7b63522

Update expectactions

e4479e4

Fix appending function

cecdb83

Fix constructor throwing checks

acfd106

Implement FormData iterator

8037d2e

Export Symbol.iterator

b3eaf03

Only create a key value for iterator when needed

1a6c6e2

andreiltd force-pushed the form-data branch from 809c839 to 1a6c6e2 Compare January 10, 2025 17:58

andreiltd marked this pull request as ready for review January 10, 2025 18:00

guybedford approved these changes Jan 10, 2025

View reviewed changes

tschneidereit requested changes Jan 13, 2025

View reviewed changes

andreiltd added 3 commits January 13, 2025 16:41

Apply code review suggestions

311ce93

Fix reinstance test

28fbd24

Remove unused function

add04a4

tschneidereit requested changes Jan 15, 2025

View reviewed changes

Code cleanup

17353dd

tschneidereit reviewed Jan 15, 2025

View reviewed changes

builtins/web/form-data.cpp Outdated Show resolved Hide resolved

andreiltd and others added 5 commits January 15, 2025 18:41

Refactor FormData::append function

df35123

Add comment about name encoding

dfa2121

Apply suggestions

a1b6ee1

Code format

3048653

Merge branch 'main' into form-data

62482a1

tschneidereit approved these changes Jan 15, 2025

View reviewed changes

tschneidereit merged commit 58ae515 into bytecodealliance:main Jan 15, 2025
5 checks passed

andreiltd mentioned this pull request Jan 16, 2025

Move streaming reader out of Blob into a dedicated BufReader class #197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation of `FormData` builtin #191

Add implementation of `FormData` builtin #191

andreiltd commented Dec 23, 2024 •

edited

Loading

andreiltd commented Jan 9, 2025 •

edited

Loading

tschneidereit left a comment

andreiltd commented Jan 14, 2025 •

edited

Loading

guybedford commented Jan 14, 2025

tschneidereit left a comment

tschneidereit Jan 15, 2025

andreiltd Jan 15, 2025

andreiltd Jan 15, 2025 •

edited

Loading

tschneidereit Jan 15, 2025

tschneidereit left a comment

Add implementation of FormData builtin #191

Add implementation of FormData builtin #191

Conversation

andreiltd commented Dec 23, 2024 • edited Loading

andreiltd commented Jan 9, 2025 • edited Loading

tschneidereit left a comment

Choose a reason for hiding this comment

andreiltd commented Jan 14, 2025 • edited Loading

guybedford commented Jan 14, 2025

tschneidereit left a comment

Choose a reason for hiding this comment

tschneidereit Jan 15, 2025

Choose a reason for hiding this comment

andreiltd Jan 15, 2025

Choose a reason for hiding this comment

andreiltd Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

tschneidereit Jan 15, 2025

Choose a reason for hiding this comment

tschneidereit left a comment

Choose a reason for hiding this comment

Add implementation of `FormData` builtin #191

Add implementation of `FormData` builtin #191

andreiltd commented Dec 23, 2024 •

edited

Loading

andreiltd commented Jan 9, 2025 •

edited

Loading

andreiltd commented Jan 14, 2025 •

edited

Loading

andreiltd Jan 15, 2025 •

edited

Loading