Skip to content

Search: truncate contents before indexing #12146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 30, 2025

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Apr 29, 2025

Our search index is growing fast, mostly due that we are supporting all types of inputs, not just sphinx and mkdocs. This means that even malformed content may be indexed, resulting in us indexing invalid/long content.

@stsewd stsewd requested a review from a team as a code owner April 29, 2025 23:09
@stsewd stsewd requested a review from ericholscher April 29, 2025 23:09
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

As Eric mentioned in Slack, I think we should also skip malformed documents. This could be included in this PR or in another one.

@stsewd
Copy link
Member Author

stsewd commented Apr 30, 2025

The lib we are using doesn't have error detection.

@stsewd stsewd merged commit dc79f43 into main Apr 30, 2025
5 checks passed
@stsewd stsewd deleted the truncate-content-before-indexing branch April 30, 2025 15:13
stsewd added a commit that referenced this pull request Apr 30, 2025
Our search index is growing fast, mostly due that we are supporting all
types of inputs, not just sphinx and mkdocs. This means that even
malformed content may be indexed, resulting in us indexing invalid/long
content.
stsewd added a commit that referenced this pull request Apr 30, 2025
Our search index is growing fast, mostly due that we are supporting all
types of inputs, not just sphinx and mkdocs. This means that even
malformed content may be indexed, resulting in us indexing invalid/long
content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants