Web scraper for user reviews on Alternativa Teatral
Download the binary from the Releases section and run:
./scrape <URL> -o <OUTPUT_FILE>
Example:
$ ./scrape https://www.alternativateatral.com/opiniones65140-sex-vivi-tu-experiencia
The results are saved in JSONL format.
{"date": "25/04/2025 17:08", "author": "Patricia", "rating": "5", "text": "Excelente! Súper recomendable, un espectáculo diferente!"}
Install the dependencies from requirements.txt
and run:
$ python src/scrape.py <URL> -o <OUTPUT_FILE>
For development/packaging, create the Conda environment:
$ conda env create -f environment.yml
$ conda activate alternativa
- I suspect the Alternativa Teatral website has a limit of 999 pages per play. At 7 comments per page, this would represent a maximum of 6,988 reviews.
- Although the site allows rating a play in half-star increments, the script only captures the integer part of the rating.
- To enable packaging into a binary, SSL certificate verification was disabled. This has some security implications. An alternative would be to bundle
cacert.pem
alongside the binary.