This project implements a Content-Based Recommendation System using the MovieLens dataset. It incorporates grid search for hyperparameter tuning and uses advanced techniques like embeddings, cosine similarity, and pre-trained language models to provide movie recommendations and predict ratings.
- Content-Based Recommendation: Combines movie tags and plot summaries to create a comprehensive feature set for recommendations.
- Embedding-Based Similarity: Utilizes pre-trained models to generate movie embeddings for accurate similarity computation.
- MAE Calculation: Evaluates the accuracy of predicted ratings using Mean Absolute Error.
- Top-N Recommendations: Provides top-N movie recommendations for users and calculates hit ratios.
- GPU Acceleration: Uses PyTorch for efficient computations on GPU.
- Hyperparameter Tuning: Implements grid search to optimize parameters like embedding size, learning rate, and regularization.
The MovieLens dataset is sourced from GroupLens. It includes various files like ratings.csv
, movies.csv
, tags.csv
, links.csv
OMDb API used to fetch plots.csv
- Dataset Preparation: Merge tags and plot summaries into a single feature column.
- Cleaning: Normalize and clean text data.
- Embedding Generation: Create movie embeddings using a pre-trained language model.
- Grid Search: Optimize hyperparameters for the recommendation system.
- Rating Prediction: Predict user ratings for movies and evaluate using Mean Absolute Error (MAE).
- Recommendation Evaluation: Generate Top-N recommendations and calculate the hit ratio.
- Grid Search Notebook: Tune hyperparameters for optimal performance.
- Content-Based Recommendation Notebook: Train, evaluate, and generate recommendations.
jupyter notebook grid_search.ipynb
jupyter notebook Content_Based_Recommendation.ipynb
- MAE: Achieved an average MAE of
0.66
on the rating prediction task. - Hit Ratio: Evaluated hit ratios for Top-N recommendations (N=10) with a robust evaluation setup.
movielens-recommender/
├── data/ # Place MovieLens dataset files here
├── grid_search.ipynb # Hyperparameter tuning with grid search
├── Content_Based_Recommendation.ipynb # Content-based recommendation system
├── requirements.txt # Python dependencies
├── README.md # Project description
└── plots.csv # Plot summaries of all the movies
- Python 3.8+
- Pandas, NumPy
- PyTorch
- SentenceTransformers
- scikit-learn
- tqdm