Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation of FormData builtin #191

Merged
merged 17 commits into from
Jan 15, 2025

Conversation

andreiltd
Copy link
Contributor

@andreiltd andreiltd commented Dec 23, 2024

This patch adds implementation of FormaData: https://xhr.spec.whatwg.org/#interface-formdata

I will add support for Request/Response FormData in a separate PR.

Depends on #181

@andreiltd andreiltd marked this pull request as draft December 23, 2024 20:14
@andreiltd andreiltd force-pushed the form-data branch 2 times, most recently from 7b1e6a1 to 0a65a1e Compare January 9, 2025 12:06
@andreiltd
Copy link
Contributor Author

andreiltd commented Jan 9, 2025

This one is pretty much ready for a review. I will keep it as a draft until dependency PRs are merged.

@andreiltd andreiltd marked this pull request as ready for review January 10, 2025 18:00
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! There are a bunch of things still to address, but I don't think it'll take too much more work

builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
tests/wpt-harness/pre-harness.js Show resolved Hide resolved
@andreiltd
Copy link
Contributor Author

andreiltd commented Jan 14, 2025

Hey! I have a question regarding encoding FormData in the body initialization function: The spec says this :

  1. Switch on object:
  1. If action is non-null, then run these steps in parallel:
  • Run action.
    Whenever one or more bytes are available and stream is not errored, enqueue the result of creating a Uint8Array from the available bytes into stream. When running action is done, close stream.

Does this mean that we need to implement a stream encoder for FormData? My initial idea was to simply create a Blob out of FormDatas entries, and provide a blob.stream() in a extractBody function. Something like this:

const blobParts = [];
for (const { 0: name, 1: value } of formData) {
  blobParts.push(encodeBoundary(name));
  blobParts.push(value); 
}

const blob = new Blob(blobParts, { type: "multipart/form-data"});
...
stream = blob.stream();
length = blob.size
contentType = blob.type;

I think the problem with that approach is that constructing a new blob is blocking, e.g. if value is a File object, the Blob constructor ends up reading that file and copy its content to an internal buffer. Any ideas?

@guybedford
Copy link
Contributor

Afaict, avoiding the extra copy would involve implementing a custom FormDataStream or similar exactly as you say, similarly to what you did with the blob streaming. Perhaps there are parts of the low-level code that can be abstracted / shared? I guess it depends on the performance requirements - if users have a lot of data, copying everything would not only have the compute cost of the copy but also the memory cost of duplicating these items. Nice to avoid if we can?

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting even closer! I once again left a bunch of comments, because this is surprisingly subtle. And would like to take another look once those are addressed—but that's not on you, it's on the spec being too subtle.

builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
builtins/web/form-data.cpp Outdated Show resolved Hide resolved
// https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#constructing-form-data-set
//
// Note: all uses of `create-an-entry` immediately append it, too, so that part is folded in here.
bool FormData::append(JSContext *cx, HandleObject self, std::string_view name, HandleValue value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Leaving a comment on the overall function here, because I'm not sure were exactly inside the function body to leave it.)

The spec for this algorithm is a little bit weird, unfortunately, making it hard to understand what exactly is going on.

In particular, the sub-steps of step 3, interpreted precisely, would in some cases require first, in step 3.1, creating a new File instance with the name "blob", and then immediately, in step 3.2, creating another new File instance with filename as the name. Since creating a File instance doesn't have observable side-effects, that is technically okay, but of course wasteful. It also confuses what actually should happen, which is one of three things:

  • if value is a string, ensure that both name and value are scalar value strings, and append an entry of { name, value }, and ignore filename
  • otherwise, if value is a Blob but not a File, or if filename is given, create a new File instance and append an entry of { name, new_file }
  • otherwise, append an entry of { name, value }

I think it might make sense to restructure this function to follow that logic. In that case, I would still keep the spec step comments, but add a comment explaining the restructuring.

I know I'm giving you a lot of feedback on this one function; the reason is that the spec really is very subtle here, and I want to make sure that we both get it right, and put ourselves into a position where we can understand what's going on long after we've all forgotten this discussion :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but we still need to handle the filename in the second step and the logic for this should be something like this:

if value is a Blob but not a File, or if filename is given, then:

let new_file = match (is_file, maybe_filename) {
   (false, Some(filename)) => File(filename),
   (true, Some(filename)) => File(filename),
   (false, None) => File("blob"),
   (true, None) => unreachable!(),
};

Copy link
Contributor Author

@andreiltd andreiltd Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g: these are all the cases for bullets 2 and 3:

  const value = new Blob();
  formData.set("blob-3", value, "custom name");
  const blob3 = formData.get("blob-3");
  assert_equals(blob3.name, "custom name");
  
  const value = new File([], "name");
  formData.set("file-2", value, "custom name");
  const file2 = formData.get("file-2");
  assert_equals(file2.name, "custom name");
  
  const value = new Blob();
  formData.set("blob-1", value);
  const blob1 = formData.get("blob-1");
  assert_equals(blob1.name, "blob");

  const value = new File([], "name");
  formData.set("file-1", value);
  const file1 = formData.get("file-1");
  assert_equals(file1.name, "name");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I papered over the filename handling, yes. What I meant to highlight is that the key choice here is whether to create a new File or not, and that whether to use "blob" or filename is a nested choice in that. I.e., I'm arguing for a different factoring from you match expression, because whether or not to create a new File makes substantially more difference than what name to use.

Overall, the cleanest structure is probably

if (!Blob::is_instance(value)) {
 // treat `value` as a string and return early
}

if (File::is_instance(value) && filename.isUndefined()) {
  // append `value` as is and return early
}

RootedValue filename_val(cx);
if (!filename.isUndefined()) {
  filename_val = // scalar version of `filename`
} else {
  filename_val = // "blob"
}

// Create the new file using `filename_val`

Does that make sense?

builtins/web/form-data.cpp Outdated Show resolved Hide resolved
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thank you for your patience with the long review cycles, and for getting this into great shape!

@tschneidereit tschneidereit merged commit 58ae515 into bytecodealliance:main Jan 15, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants