-
-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate reducing memory usage of chunk uploading #2291
Comments
Apparently chunk uploading's large memory usage is causing problems for the Ingest team. Thanks @IanWoodard and @asottile-sentry for bringing this to my attention. |
just so it's written down -- my hypothesis is it might not be the chunk uploading -- but the preprocessing / loading of the debug files themselves. though probably the easiest way to get to the bottom of it would be to use some rust memory profiler to better understand where the overhead is happening in ~theory a proper streaming uploader shouldn't need much more than the chunksize of total memory to perform a successful upload (on the order of MBs ideally) though we're running out of multi-GB headroom at the moment |
With the memory profiler, I discovered that what appears to be eating memory is indeed the chunk uploading process. The reason that we consume much more than the chunk size amount of memory is that we are sending at once multiple 8 MiB chunks (we group multiple chunks into max 32 MiB requests). Additionally, due to how the curl library works, we need to allocate twice the amount of memory per request (64 MiB). Also, we use 8 threads to concurrently upload multiple requests, yielding a theoretical memory usage of 64 MiB * 8 = 512 MiB. In reality, we use even more than this (I observed close to 900 MB in my tests, while uploading a ~500 MB file), likely due to overheads. We can reduce memory usage greatly by reducing the request and chunk size. Setting both to 1 MiB yielded a maximum memory usage of ~130 MB in my testing (again, with a ~500 MB file), with only a slight slowdown |
During an offline discussion with @jan-auer and @mitsuhiko, we decided that we will not adjust the chunk and request sizes from their current settings, due to the high risk associated with changing these and because the more robust solution would be to simply stream the chunks, avoiding the need to ever load them into memory in the first place. Although We will divide this work into two stages:
We may want to consider moving the rest of the code to |
Or – perhaps an easier option (thanks @mitsuhiko for pointing this out) would be to write our own logic to generate the multipart form (logic should be trivial). Then, we can stream this with We can still consider changing to |
Is there any way to log the memory usage of uploading? I've spent a ridiculous amount of time trying to figure out why some chunks of my debugging files are not being uploaded and |
The memory usage doesn't seem to be related to the chunk uploading but as @asottile-sentry guessed, it's just how To confirm this, set up some dummy data files:
Now run debug-files check with memory profiling:
As you can see the memory usage grows with the file size. On my macbook processes have no memory limit, and sentry-cli has no problems uploading chunks. My guess is that there is a memory limit when running from a github action so the limit is reached and this is not being caught/logged by sentry-cli (oh the irony...) |
@xinsight thank you for writing in. Regarding your first comment, we do not plan to add any logging regarding memory usage, since there are external tools you can use to monitor memory usage. Regarding the second comment, I see you are using the I confirmed with another memory profiler ( |
Although, @xinsight – if you have been experiencing an error uploading debug files, which you have been having trouble debugging, please feel free to open a bug report issue. If possible, please provide a reproduction when opening the new issue. |
Chunk uploading appears to have a very large memory usage when uploading large files, since we appear to be reading the entire file contents into memory. For instance, the (currently experimental) Proguard chunk uploading feature greatly increases memory usage versus when the experimental chunk uploading feature is turned off. While profiling an upload of a 160 MB Proguard file, I observed memory usage of >300 MB while chunk uploading, vs ~90 MB with chunk uploading disabled.
As this problem likely affects all chunk uploads, not just Proguard files, we should investigate if we can reduce memory usage, e.g. by not reading the entire files into memory.
The text was updated successfully, but these errors were encountered: