SR 11-7 and its 2026 successor SR 26-2: the model risk management framework

Model risk management (MRM) is the discipline of identifying, measuring, and controlling the risk that AI and machine learning models produce inaccurate outputs that lead to poor decisions. For financial services, it is governed by Federal Reserve SR 26-2 (April 2026), which replaced SR 11-7 and explicitly covers AI/ML models. For all sectors, ISO/IEC 42001 and the EU AI Act impose comparable oversight requirements.

SR 11-7 defines a model as a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories to process input data into quantitative estimates. Modern machine learning systems clearly fall within this definition. The guidance's core requirements, conceptual soundness, rigorous development, validation by independent parties, and sound governance, apply to ML models. What SR 11-7 did not anticipate is how difficult these requirements become when applied to models whose internal logic is not fully interpretable.

Where traditional model risk management breaks down with ML

The explainability gap

SR 11-7 requires that models be conceptually sound, that their logic can be articulated, understood, and assessed for appropriateness. Traditional statistical models satisfy this requirement: the relationship between inputs and outputs can be expressed mathematically, examined for economic intuition, and evaluated by subject matter experts.

Deep learning models and complex ensemble methods do not satisfy this requirement in the same way. Their internal representations are not human-interpretable. A gradient-boosted model with thousands of trees, or a neural network with millions of parameters, produces outputs through processes that cannot be fully articulated even by the model's developers. Conceptual soundness assessment for these models requires different techniques: feature importance analysis, partial dependence plots, SHAP values, and adversarial testing, none of which provide the same level of assurance as being able to read a model's logic directly.

Distributional shift

Traditional model validation tests model performance on a hold-out sample drawn from the same distribution as the training data. This approach assumes that the distribution of inputs the model will encounter in production is similar to the distribution on which it was trained. For statistical models used in stable environments, this assumption is often reasonable.

For ML models deployed in dynamic environments, it frequently is not. Data drift, changes in the statistical properties of input data over time, and concept drift, changes in the relationship between inputs and the target variable, can degrade ML model performance rapidly and without obvious warning signs. A credit model trained on pre-2020 data encountered severe distributional shift during the pandemic. A fraud detection model trained on pre-2024 data may perform poorly against fraud patterns that emerged subsequently.

Model risk management for ML requires active monitoring for distributional shift, not just periodic performance review, and clear protocols for model refresh or replacement when drift is detected.

Validation infrastructure

SR 11-7 requires independent model validation, assessment by a team separate from model development. For traditional statistical models, this requires statistical expertise and access to model documentation and data. For complex ML models, independent validation requires ML expertise, access to training data and code, interpretability tooling, and the ability to conduct adversarial testing. Many financial institutions' validation functions do not yet have this capability, creating a gap between the requirement and the practice.

Adapting the SR 11-7 framework for ML

Tiered model inventory

Not all ML models require the same governance intensity. A tiered approach, classifying models by the criticality of the decisions they inform and the potential for adverse outcomes, allows governance resources to be concentrated where risk is highest. High-tier models (credit decisions, market risk, capital calculation) require full SR 11-7 treatment with ML-specific enhancements. Low-tier models (internal efficiency tools, non-consequential analytics) can operate under lighter governance.

Pre-deployment validation standards for ML

Validation of ML models before production deployment should include: conceptual soundness assessment using interpretability techniques; performance testing across demographic subgroups to identify fairness issues; stress testing under distributional shift scenarios; adversarial testing for robustness; and documentation of model limitations and appropriate use constraints. These requirements should be codified in a model validation policy that explicitly addresses ML characteristics.

Ongoing monitoring infrastructure

ML model monitoring requires automated infrastructure, not manual periodic review. Production monitoring systems should track input data distributions against training baselines, output distributions against expected ranges, performance metrics against defined thresholds, and fairness metrics across relevant subgroups. Monitoring triggers, defined thresholds that prompt escalation, investigation, or model suspension, must be documented and acted upon systematically.

SR 26-2, what changed on 17 April 2026

SR 26-2 introduces a materiality construct: governance intensity should be calibrated to actual risk, with high-materiality models (credit decisions, capital calculations, stress testing) retaining full validation rigour and low-materiality models permitted lighter governance. Annual revalidation is no longer the default; risk-based monitoring tied to materiality replaces it. SR 26-2 is primarily directed at banking organisations with over $30 billion in total assets, though smaller institutions with significant model complexity remain in scope.

The most consequential aspect of SR 26-2 for AI governance is Footnote 3: generative AI and agentic AI models are explicitly excluded from scope. The guidance states these technologies are "novel and rapidly evolving" and therefore not within its scope, while confirming that existing risk management and governance practices should guide controls for any tools outside the document's scope. Traditional ML models, non-generative, non-agentic, remain fully in scope under SR 26-2. LLMs, generative AI, and autonomous AI agents require separate governance frameworks that institutions must develop independently. The Federal Reserve has signalled forthcoming guidance to close this GenAI Gap.

The EU AI Act alignment

For financial services firms subject to both SR 26-2 and the EU AI Act, there is significant structural alignment. The EU AI Act's requirements for high-risk AI, risk management system, data governance, technical documentation, human oversight, accuracy and robustness, map directly to SR 26-2's model risk management components. Firms with mature model risk management functions can build EU AI Act compliance on that foundation rather than treating it as a parallel obligation. The incremental requirements are manageable: the Act adds transparency obligations, conformity assessment, and EU AI database registration that SR 26-2 does not require. Where SR 26-2 explicitly excludes GenAI from MRM scope, the EU AI Act's Annex III categories for credit scoring and risk assessment impose their own obligations regardless of SR 26-2 applicability.

Primary sources: Federal Reserve SR 26-2 · SR 26-2 Attachment (PDF)

Related reading