Soil Quality Prediction — Bangladesh
RMSTU CSE MID-2 ML Research Project
AUTHORS: M Abdur Rabbi Tota | Prathay Barua | Md Mynuddin
Model: Random Forest + Bayesian Optimization (Optuna) |
Dataset: SPAS-Dataset-BD |
Target: AP Ratio (Production / Area)
Location and Crop
District
Season
Crop Name
Climate Conditions
10 45
20 100
10 50
5 42
20 100
10 100
Prediction
Engineered Features (auto-computed)
Model Metrics (test set)
| Metric | Value |
|---|---|
| R² | 0.5372 |
| MAE | 1.1596 |
| RMSE | 1.9064 |
| MAPE | 160404606.47% |
| 10-Fold CV RMSE | 1.8696 |
| p-value (paired t-test) | 0.0001 |
Best Hyperparameters (Optuna, 50 trials)
| Parameter | Value |
|---|---|
| n_estimators | 251 |
| max_depth | 23 |
| min_samples_leaf | 3 |
| min_samples_split | 10 |
| max_features | sqrt |
AP Ratio = Total Production ÷ Cultivated Area.
A higher ratio means more output per unit of land — a proxy for soil productivity under the given crop, season, and climate conditions.
Three Bangladesh-specific features are computed automatically from your inputs:
- Monsoon Moisture Index — combines humidity, temperature, and monsoon season weight (1.5× for Kharif, 0.8× for Rabi).
- Saltwater Intrusion Risk — non-zero only for the 16 coastal districts; peaks in Kharif 2 when tidal surges are worst.
- Seasonal Soil Stress — measures stress from diurnal temperature and humidity swings, highest in Kharif 2.
These features were validated with SHAP; all three rank in the top contributors to model predictions.