flight-price-prediction
SDAIA Bootcamp project 2 - web scraping/linear regression.
This project aims to predict ticket prices for upcoming flights to help customers in selecting the optimum time for travel and the cheapest flight to the desired destination. A random forest regression model is applied to forecast the flight prices based on data scraped from Kayak.
Table of Contents
Project Proposal
The project proposal can be found here.
Project MVP
The project MVP can be found here.
Scraping
The Kayak Scraper Notebook can be found here.
Here's a demo of the scraper in action (played at 2x speed):

The scraped data can be found here.

In total, the data consists of 55,363 rows and 7 columns.
Analysis and Results
The project notebook can be found here.
Selected features are:
- Source (4 Sources were selected for this project)
- Destination (4 Destinations were selected for this project)
- Total Stops
- Average Price per Airline
- Duration
- Price (Target)
Correlation of features:

Experimenting with different models:

The final selected model is the random forest regression model with:
| Metric | Score |
|---|---|
| MAE | 61.87 |
| MSE | 40409.87 |
| RMSE | 201.02 |
Therefore, the final model is able to predict flight ticket prices within around ≈ $61.87.
The final model can be found here.

Presentation
The presentation can be found here.
Mobile App
We've also developed an app on Android that finds the average estimated prices for a selected route and month based on our scraped data.

Below, a demo of the mobile app is shown:
