Boruta feature selection

Boruta Feature Selection:

Boruta is an all-relevant feature selection method based on Random Forest. It aims to identify every feature that has a meaningful impact on the target variable. The algorithm creates "shadow features" by shuffling original features' values. A Random Forest model is trained on both original and shadow features. Feature importance scores are calculated for all features. Real features are compared with the most important shadow feature. If a feature scores higher, it's marked important; if lower, unimportant. Uncertain features are re-evaluated over multiple iterations. Boruta is robust and captures complex feature interactions. However, it is computationally intensive, especially with large datasets.

Feature selection by color:

🟢 WIS, HE, AFB, DC: Marked green — these are important features with higher importance than the shadowMax.
🔴 ANCC: Marked red — consistently less important than shadow features, so it’s unimportant.
🟡 shadowMax: Sets the threshold — real features must exceed this to be considered important.
🔵 Blue features: Could be tentative or rejected, depending on their exact comparison to shadowMax.
✅ Focus on green features for modeling; consider removing red and re-evaluating blue ones.