Unify HTTP fingerprinting accross framework components #1081
Labels
debt
Code quality improvement or decrease of technical debt.
solutioning
The issue is not being implemented but only analyzed and planned.
t-tooling
Issues with this label are in the ownership of the tooling team.
Uh oh!
There was an error while loading. Please reload this page.
Background
Currently, our approach to HTTP fingerprinting is fragmented across different components. This leads to potential inconsistencies where, for example, HTTP headers might not align with TLS fingerprints or device characteristics, making our scrapers easier to detect. Furthermore, tracking down the code responsible for various parts of the fingerprinting functionality is difficult.
Objective
Create a unified approach to HTTP fingerprinting across all Crawlee components to produce more realistic and consistent scraper behavior. This will be ported to JS crawlee as a part of v4.
Proposed Solution
Create a
FingerprintProfile
data structure that encapsulates:Integrate this structure across Crawlee components:
FingerprintProfile
instance in the API responsible for handling individual requestsFingerprintProfile
should probably be included in theSession
objectsFingerprintProfile
is generated should be configurable, ideally in a way that allows adding custom code@Pijukatel @vdusek @B4nan
The text was updated successfully, but these errors were encountered: