Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add a way to block all cookies and site data in headless mode puppeteer/chromium #3212

Open
PapiJalopy opened this issue Feb 1, 2025 · 4 comments

Comments

@PapiJalopy
Copy link

Proposed solution

I am consistently able to avoid getting 403ed on newegg as long as I block all cookies and delete site data on chromium when i first start running it.

This is a theory so take it with a grain of salt, but I've noticed that newegg allows you to visit their website one time with any IP, and as long as you don't store and block all cookies it seem to think it is always your first visit and wont give you a captcha or 403 you on your next scrape/refresh.

The issue is that it requires headless=false and settings are not persistent. If you can add or inform me of a way to modify chromium user settings to make them persistent and run in headless mode that would be perfect.

Objective

N/A

Goals

N/A

Non-goals

N/A

Anti-goals

N/A

@andrewmackrodt
Copy link
Contributor

andrewmackrodt commented Feb 1, 2025

You can try setting INCOGNITO=true so that a new browser is used for each store lookup, it shouldn't persist any cookies this way.

@PapiJalopy
Copy link
Author

PapiJalopy commented Feb 1, 2025

You can try setting INCOGNITO=true so that a new browser is used for each store lookup, it shouldn't persist any cookies this way.

INCOGNITO=true does not work unfortunately because it only blocks third party cookies. If all cookies are not blocked the page will load but streetmerchant gives a timeout error for every listing.

When enabling "Block all cookies" the first listing might time out but from there its smooth sailing.

I've able to run it on already "banned/403'ed" IP's for days now.

If possible, I would actually suggest modifying the INCOGNITO=* option so that chromium defaults to "Block all cookies" when enabled.

@PapiJalopy
Copy link
Author

Is there an argument or a function we can add to puppeteer to do this? There has to be some kind of call to set UI settings via API.

@PapiJalopy PapiJalopy changed the title [Feature]: Add a way to block all cookies and site data in headless mode chromium [Feature]: Add a way to block all cookies and site data in headless mode puppeteer/chromium Feb 1, 2025
@EbonyEquivocal
Copy link

EbonyEquivocal commented Feb 2, 2025

I get a ERR_HTTP2_PROTOCOL_ERROR when checking Nvidia store stock. I think that this would be resolved if I could clear all cookies/cache. I have tried incognito mode, and no luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants