Interactive Results
Explore the model performance, feature importance, and data insights
Download Original NotebookFinal Model Performance
Test R²
0.6956
Final holdout performance
Cross-Validation R²
0.686
Training performance
Model Type
LightGBM
Gradient boosting
Features Used
15
After preprocessing
Outstanding Performance
The final LightGBM model achieved an R² of 0.6956 on the test set, explaining nearly 70% of the variance in Airbnb listing popularity. This demonstrates excellent predictive capability for a real-world regression problem.
Model Performance Comparison
Cross-Validation R² Scores
Comparison of different algorithms on the training set
Model Insights
LightGBM (Best)
Gradient boosting with excellent bias-variance balance
Random Forest
Strong ensemble method, slightly overfitting
Decision Tree
Good performance after hyperparameter tuning
Ridge Regression
Linear baseline, limited by feature relationships
Why LightGBM Won
- • Efficient gradient boosting implementation
- • Excellent handling of categorical features
- • Built-in regularization prevents overfitting
- • Fast training with good generalization
- • Robust to outliers and missing values
Feature Importance Analysis
Most Important Features
Features ranked by their impact on model predictions
Top Insights
Recency is King
Days since last review is the strongest predictor
Review History Matters
Total review count indicates established popularity
Flexibility Wins
Lower minimum nights increase booking appeal
Location Impact
Neighborhood significantly affects popularity
Business Implications
- • Maintain active listing engagement
- • Encourage recent guest reviews
- • Optimize minimum night requirements
- • Consider location in pricing strategy
Dataset Insights
Reviews per Month Distribution
Distribution of target variable across all listings
Listings by Borough
Geographic distribution of Airbnb listings in NYC
Geographic Patterns
Manhattan and Brooklyn dominate the market, representing 85% of all listings
Review Patterns
20% of listings have zero reviews, indicating new or inactive properties
Market Dynamics
Most active listings receive 1-3 reviews per month consistently
Prediction Quality
Predicted vs Actual Values
Scatter plot showing model prediction accuracy (sample of test data)
Prediction Quality
Interpretation
Points close to the diagonal line indicate accurate predictions. The model performs well across the range of review frequencies, with some scatter expected in real-world data.
Model Strengths
- • Strong performance on popular listings
- • Good generalization to unseen data
- • Robust to outliers and edge cases
- • Consistent across different price ranges
Key Takeaways
Project Success
This analysis successfully demonstrates end-to-end machine learning workflow, achieving 69.6% variance explanation in Airbnb listing popularity. The insights provide actionable recommendations for hosts and platform optimization.
For Airbnb Hosts
- • Maintain active guest engagement
- • Optimize minimum night requirements
- • Focus on recent review generation
- • Consider location in pricing strategy
For Platform Development
- • Prioritize recent activity in rankings
- • Encourage flexible booking options
- • Develop location-based recommendations
- • Support new listing visibility