Interactive Results

Explore the model performance, feature importance, and data insights

Download Original Notebook

Final Model Performance

Test R²

0.6956

Final holdout performance

Cross-Validation R²

0.686

Training performance

Model Type

LightGBM

Gradient boosting

Features Used

After preprocessing

Outstanding Performance

The final LightGBM model achieved an R² of 0.6956 on the test set, explaining nearly 70% of the variance in Airbnb listing popularity. This demonstrates excellent predictive capability for a real-world regression problem.

Model Performance Comparison

Cross-Validation R² Scores

Comparison of different algorithms on the training set

Model Insights

LightGBM (Best)

Gradient boosting with excellent bias-variance balance

Random Forest

Strong ensemble method, slightly overfitting

Decision Tree

Good performance after hyperparameter tuning

Ridge Regression

Linear baseline, limited by feature relationships

Why LightGBM Won

• Efficient gradient boosting implementation
• Excellent handling of categorical features
• Built-in regularization prevents overfitting
• Fast training with good generalization
• Robust to outliers and missing values

Feature Importance Analysis

Most Important Features

Features ranked by their impact on model predictions

Top Insights

Recency is King

Days since last review is the strongest predictor

Review History Matters

Total review count indicates established popularity

Flexibility Wins

Lower minimum nights increase booking appeal

Location Impact

Neighborhood significantly affects popularity

Business Implications

• Maintain active listing engagement
• Encourage recent guest reviews
• Optimize minimum night requirements
• Consider location in pricing strategy

Dataset Insights

Reviews per Month Distribution

Distribution of target variable across all listings

Listings by Borough

Geographic distribution of Airbnb listings in NYC

Geographic Patterns

Manhattan and Brooklyn dominate the market, representing 85% of all listings

Review Patterns

20% of listings have zero reviews, indicating new or inactive properties

Market Dynamics

Most active listings receive 1-3 reviews per month consistently

Prediction Quality

Predicted vs Actual Values

Scatter plot showing model prediction accuracy (sample of test data)

Prediction Quality

R² Score0.696

Mean Absolute Error0.89

Root Mean Square Error1.42

Interpretation

Points close to the diagonal line indicate accurate predictions. The model performs well across the range of review frequencies, with some scatter expected in real-world data.

Model Strengths

• Strong performance on popular listings
• Good generalization to unseen data
• Robust to outliers and edge cases
• Consistent across different price ranges

Key Takeaways

Project Success

This analysis successfully demonstrates end-to-end machine learning workflow, achieving 69.6% variance explanation in Airbnb listing popularity. The insights provide actionable recommendations for hosts and platform optimization.

For Airbnb Hosts

• Maintain active guest engagement
• Optimize minimum night requirements
• Focus on recent review generation
• Consider location in pricing strategy

For Platform Development

• Prioritize recent activity in rankings
• Encourage flexible booking options
• Develop location-based recommendations
• Support new listing visibility