Beyond performance: robustness-oriented model evaluation 2026.06.27 AI in Medicine ablation study AI in Medicine model evaluation robustness sensitivity analysis