-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase search table size(s) #123
Comments
Hello, apologies for a late reply, was busy on other projects the past months. This is one of the hard problems, heh. I'm not exactly happy about unconditionally expanding the fields to 32 bits as that will significantly increase size of search data for projects that don't need that many symbols (for example with Magnum I'm at over 1.3 MB with 12k symbols and even with that the initial load times are starting to become problematic). One option could be to support both 16- and 32-bit sizes in a single format, depending on how much is needed. How many symbols do you have in your case and what's the size of the generated |
Can we use variable-length integer encoding, like https://developers.google.com/protocol-buffers/docs/encoding#varints? I will investigate sizes, have to fish that project back out of whatever hole I left it in. Still very interested in getting this work though. |
Making JavaScript code robust is hard, so I'm trying to have as little JS as possible :) Variable-length encoding is possible, but the client side would need to unpack it first, which could offset the savings from faster downloads. I was personally thinking about pre-processing the data with for example Burrows - Wheeler transform but there's again the problem of having to decode them back on the client side. I'd bet more on server-side compression which is transparent to the client, gzip already does quite a good job (the 1.3 MB file gets compressed to about 700 kB by gzip on transfer) and more advanced compression schemes (brotli, zstd) could do even better. Curious: is the project public? |
I want to generate documentation for radare2. It is quite large, but the current documentation "solution" is sub-par. |
Additionally: could you help correct (or rewrite) the patch to use 32 bits if it's not a lot of up-front work so I can just give this a go to see if it is even worth the time? |
Done in the The resulting file size is about 10% larger, it's not as bad as I expected but I'm also not entirely happy about it. Will keep this in a branch until I get an idea how to solve it differently. |
Just got a chance to try it out, it's snappy and looks beautiful. Thank you for your work! The search reports: You can view it here temporarily (commit is HEAD): http://sdf.org/~keegan/radare2-89cfe05d2/index.html |
Nice :) The search data download took a few more seconds than would be acceptable for me. You're just above the 65k limit, so a couple ideas that could possibly trim down the search data size:
|
I will note that there is no gzip compression enabled on that page. Aside from that, this is good advice that I will investigate... does this exist in the FAQ or Quick Start Guide? I imagine others would find it useful. |
The first point needs changes on my side (there's nothing that could filter those out currently), and for the other two -- I don't recommend enabling |
Yeah I understand the I will continue to explore the first two. |
(Sorry for embarrassingly late replies, finally got a chance to get back to this project.) This is finally fixed as of b0cf44e and 0411e18, I ended up making the type sizes variable so small projects still keep small search data binaries but larger projects can expand the limits where needed. This can raise the default limit from 65k and 16MB files to 4G for both. It unfortunately isn't automagic as data packing defaults are picked to keep file sizes implicitly small and estimating the sizes beforehand would be overly error prone. Instead, if the limits are hit, the doc generation aborts with a message suggesting to modify various configuration options, such as
Add the suggested options in With this being implemented, I'm going to drop the |
When running this tool on a large C codebase, I was experiencing errors with values out-of-bounds of the size of a short, so I modified
_search.py
to create a table with larger bounds:It seems that the changes required for
search.js
are somewhat less trivial since the code seems to be a little more over the place (hardcoded buffer seek offsets, etc.). I am wondering if this is the correct way to proceed with this issue, and if so, what changes are required to synchronize the search for this expanded data format?The text was updated successfully, but these errors were encountered: