-
-
Notifications
You must be signed in to change notification settings - Fork 714
Feature Request: Fine-tuned selection of including/excluding specific code block when --compress
(i.e. Function Body code block)
#561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
--compress
(i.e. Function Body code block)
--compress
(i.e. Function Body code block)--compress
(i.e. Function Body code block)
Hi, @atjsh ! I may not have fully grasped the issue, but are you proposing to introduce stages to To make it more concrete—just as an example—you’d allow passing a compression level to I’ve been planning to add variations to the compression levels, so this sounds great! As we discussed on Discord, using llmlingua for compression could raise some challenges, but let’s take it one step at a time. |
Idea
Yes, that is my current idea. Let's say we "compress" this source code. function addValues(left: number, right: number): number {
console.log(left);
console.log(left);
console.log(left);
console.log(left);
console.log(left);
console.log(left);
console.log(right);
console.log(right);
console.log(right);
console.log(right);
console.log(right);
console.log(right);
return left + right;
} I want to tell LLM that:
The // structure - kept as-is
function addValues(left: number, right: number): number { } The // semantic - some contents are removed
console.log(left)
console.log(left)
console.log(right)
return left + right Combined result: function addValues(left: number, right: number): number {
console.log(left)
console.log(left)
console.log(right)
return left + right
} Methodology for source-code (text) compressionMy current idea for the 'compression of function body' is LLMLingua-2. When compared to, for example, LLaMa 3, It's small enough to run locally. Also, When summarizing text, LLMLingua-2 dose not "generate" new tokens that is not present in the original input, so it prevents hallucination problem. (source) Should I split the issue?It's getting quite big haha.
Or, we "could" plan and implement the new compression pipeline, right away. It is kinda risky tho - might be threaded as an experimental feature. What do you think? |
Compressing a single source-code file at once (import state, function signature, function implementation, etc. everything.) could be another method too. |
Uh oh!
There was an error while loading. Please reload this page.
Topic:
repomix --compress
, LLM-based code compressionMotivation
Generate richer context
When I run
--compress
in Repomix, I’d like the output to capture not only symbol information but also the implementation details of functions.My use-case is to bundle an entire source-code repository into a single file, feed it to an LLM, and let the model analyze the codebase holistically.
Because many insights depend on what happens inside a function, simply exporting the symbols wasn't enough for me—I need the function bodies in the compressed output as well.
Giving users control over which code snippets get included would provide much greater flexibility.
Apply custom pre-processing to code blocks identified via tree-sitter
With tree-sitter we can pinpoint distinct AST nodes—such as a function’s name and its body. If we could:
then we’d keep the static-analysis benefits of full symbol visibility and supply the language model with far richer context.
Ideally, each code block (e.g., function name, function body) would have its own configurable processing pipeline.
Related issues
--summary
CLI option for LLM-based code summarization #511 discuss LLM-based summarization.More Notes
I am currently developing llmlingua-2-js, a JavaScript port of LLMLingua-2. Once it is finished, developers will be able to run LLM-based code compression seamlessly in the same Node.js environment that Repomix uses.
I plan to open a separate PR for llmlingua-2-js when it is ready.
The text was updated successfully, but these errors were encountered: