Exploring the role of simulator fidelity in the safety validation of learning-enabled autonomous systems

Ali Baheri

First published: 27 November 2023


This article presents key insights from the New Faculty Highlights talk given at AAAI 2023, focusing on the crucial role of fidelity simulators in the safety evaluation of learning-enabled components (LECs) within safety-critical systems. With the rising integration of LECs in safety-critical systems, the imperative for rigorous safety and reliability verification has intensified. Safety assurance goes beyond mere compliance, forming a foundational element in the deployment of LECs to reduce risks and ensure robust operation. In this evolving field, simulations have become an indispensable tool, and fidelity’s role as a critical parameter is increasingly recognized. By employing multifidelity simulations that balance the needs for accuracy and computational efficiency, new paths toward comprehensive safety validation are emerging. This article delves into our recent research, emphasizing the role of simulation fidelity in the validation of LECs in safety-critical systems.


The rise of learning-enabled components (LECs) in safety-critical systems, such as those found in transportation, healthcare, energy, aviation, and manufacturing sectors, underscores an urgent need for their rigorous safety and reliability verification (Seshia, Sadigh, and Sastry 2016; Dennis, Dixon, and Fisher 2022; Yu et al. 2021; Razzaghi et al. 2022; Dogan and Birant 2021). With the continuous escalation in system complexities, ensuring their safe operation becomes an ever-evolving challenge (Paoletti and Woodcock 2023). The complex interactions between various components, the unpredictability of human–machine interfaces, and the diversity of environmental conditions all contribute to this complexity. To mitigate these difficulties, simulation-driven validation has emerged as a formidable tool, providing a risk-free environment for the comprehensive evaluation of LECs across a spectrum of scenarios (Kapinski et al. 2016). This methodology allows for in-depth analysis of autonomous systems’ behavior, facilitating the identification of potential safety violations, and offering an avenue for developing proactive countermeasures.

A variety of validation tools have been developed and utilized across different communities. These tools often employ methods ranging from traditional software testing (Hynninen et al. 2018) to more advanced simulation-driven validation techniques, each catering to different aspects of safety and reliability verification. Traditional software testing methods, such as unit testing (Daka and Fraser 2014) and integration testing (Rehman et al. 2006), form the backbone of many safety validation processes, providing a structured approach to identifying and correcting errors within specific components. However, the growing complexity of learning-enabled systems often requires more sophisticated techniques. This has led to the rise of simulation-driven validation, wherein entire systems are modeled and subjected to a multitude of scenarios to gauge their response and performance. These simulations can recreate situations that might be hazardous or impractical to test in real life, allowing for comprehensive evaluation without risking actual equipment or human lives. Furthermore, techniques such as formal verification (Woodcock et al. 2009; Deshmukh and Sankaranarayanan 2019; Mitra 2021), model checking (Plaku, Kavraki, and Vardi 2013; Baier and Katoen 2008), and fault tree analysis (Ruijters and Stoelinga 2015) have found prominence in safety-critical domains. Formal verification uses mathematical logic to prove the correctness of a system, while model checking systematically explores all possible states of a model to validate specific properties. Fault tree analysis, on the other hand, helps in understanding the various failure pathways within a system, enabling proactive measures to prevent catastrophic failures.

Simulation-driven testing represents a shift towards leveraging computational power and intelligent algorithms to validate complex systems, particularly those found in safety-critical environments. Unlike traditional testing, where physical prototypes and real-world experiments may be necessary, simulation-driven testing occurs in a controlled virtual environment. This approach enables testing scenarios that would otherwise be dangerous, expensive, or impractical to create. A particularly intriguing subset of simulation-driven testing is falsification methods. At its core, falsification seeks to uncover conditions under which a system might fail. Instead of trying to prove that a system is always safe (a task that can be exceedingly complex or even impossible), falsification aims to find scenarios where safety properties are violated.

Software tools, including S-TaLiRo, Breach, C2E2, and DryVR, have become instrumental in the field of system falsification, representing crucial advancements in simulation-driven testing (Fainekos et al. 2012; Annpureddy et al. 2011; Donzé 2010; Duggirala et al. 2015; Qi et al. 2018). These tools are part of a broader evolution that encompasses techniques ranging from search-based algorithms to machine learning and data-driven approaches, all aimed at enhancing the effectiveness of falsification methods (Ramezani et al. 2021; Zhang et al. 2021; Zhang, Arcaini, and Hasuo 2020; Deshmukh et al. 2015; Ernst et al. 2021; Mathesen, Pedrielli, and Fainekos 2021; Akazaki et al. 2018; Qin et al. 2019; Zhang, Hasuo, and Arcaini 2019). Despite these advancements, the challenges of simulation-driven falsification remain multifaceted. One critical challenge lies in the inherent complexity of modeling and simulating complex systems. A simulation’s ability to accurately mirror real-world behaviors is contingent on a multitude of factors, including the fidelity of the simulation itself. Fidelity refers to the degree of exactness with which a simulation represents the real system, and it plays a vital role in the effectiveness of falsification methods (Shahrooei, Kochenderfer, and Baheri 2023; Beard and Baheri 2022; Baheri 2023). Low-fidelity simulations, although computationally less demanding, may lack the precision needed to uncover subtle or complex failure modes. On the other hand, high-fidelity simulations offer a more accurate and detailed representation but often come at a significant computational cost, limiting their scalability and applicability in extensive testing scenarios. The choice of fidelity thus presents a nuanced challenge, necessitating a balance between computational efficiency and accuracy. Striking this balance requires a deep understanding of the system being tested, the specific safety properties in question, and the potential impact of inaccuracies on the overall validation process.

In this paper, we delve into our contributions on the multifidelity safety validation of the learning-based control systems. In our context, LECs refer to specific segments of the control system that incorporate machine learning models to enhance the system’s adaptability and performance. These components are designed to learn from data, either through offline training or online adaptation, allowing the system to better respond to complex and dynamic environments that traditional control methods might not adequately address. Our investigation is structured around two themes that directly address the previously mentioned challenges, aiming to push the boundaries of existing methodologies.

Multifidelity simulation-driven falsification. We propose an approach that employs multifidelity Bayesian optimization for the falsification of LECs in safety-critical systems (Shahrooei, Kochenderfer, and Baheri 2023). Falsification refers to the process of identifying specific configurations or scenarios in which these systems violate safety properties, an essential step in ensuring their robustness and reliability. Our proposed method applies the multifidelity Bayesian optimization to minimize the number of high-fidelity simulations necessary for uncovering such violations, thereby significantly cutting computational expenses. Through a series of comprehensive case studies, we have shown that our approach provides an efficient solution for accelerating the validation of the safety of LECs in control policies.

Joint falsification and fidelity settings optimization. We formalize a theoretical framework for the safety validation of LECs, integrating falsification processes and simulator fidelity optimization (Baheri and Kochenderfer 2023). Our proposed method synergistically undertakes the tasks of falsifying LECs and fine-tuning simulator fidelity settings. This joint approach allows us to efficiently pinpoint challenging configurations for a learning-enabled control system, while maintaining computational resource efficiency. This framework prioritizes high-risk areas and dynamically adjusts the simulator fidelity in response to the specific scenario being evaluated. This dynamic adjustment process facilitates a more targeted and efficient testing process. A key attribute of this approach is its potential for generalization, where the knowledge gained from tested scenarios is leveraged to accelerate and improve the accuracy of safety validation for unseen situations.


Ensuring safety in LECs is a fundamental prerequisite, especially in applications where the slightest error can lead to catastrophic failure. The complexity of LECs, coupled with the dynamic and unpredictable nature of real-world operating scenarios, amplifies the challenge of identifying potential violations of safety rules. This complexity necessitates methodologies that go beyond traditional verification techniques. Recognizing these challenges, our research proposes a multifidelity falsification approach. This method is designed to leverage simulations at different levels of fidelity, combining the efficiency of lower-fidelity simulations with the accuracy of higher-fidelity ones. Unlike conventional methods, our approach does not solely depend on high-fidelity simulations but dynamically transitions between different fidelity levels.

The task of uncovering failure modes within LECs can be formulated as an optimization problem. The goal is to systematically locate instances where the safety specifications is breached. In our approach, we use Bayesian optimization to guide this search targeting counterexamples. Bayesian optimization is a global optimization technique known for its efficiency in optimizing expensive and noisy functions (Frazier 2018). It has been extensively used in various domains, including hyperparameter tuning in machine learning models (Victoria and Maragatham 2021), control theory (Baheri et al. 2017; Baheri, Deese, and Vermillion 2017), planning (Baheri and Vermillion 2020), robotics (Calandra et al. 2014), and materials design (Zhang, Apley, and Chen 2020), among others.

Central to our method is the integration of simulations at varying fidelity levels. This multifidelity approach is founded on correlating specification robustness values across different simulators, each operating at a distinct fidelity level. The specification robustness value provides a measure to gauge how closely a system trajectory adheres to or violates the safety specifications. The correlation is embodied in a mathematical formulation, which quantifies the relationship between various fidelity levels (Le Gratiet and Garnier 2014). We evaluated the effectiveness of multifidelity Bayesian optimization across three different learning-enabled control policies, focusing on its potential to enhance the discovery of counterexamples when compared to standard Bayesian optimization and random search (Shahrooei, Kochenderfer, and Baheri 2023). The empirical findings drawn from our case studies highlight the effectiveness of multifidelity Bayesian optimization when confronted with the falsification task. Notably, multifidelity falsification framework distinctly enabled a reduction in the number of computationally expensive experiments on high-fidelity simulators. In each of the examined case studies, the algorithm consistently demonstrated its effectiveness, achieving computational savings of up to 24% in specific scenarios. This highlights the method’s capability to adeptly harness the potential of low-fidelity simulations, thereby optimizing resource allocation and minimizing reliance on more expensive high-fidelity evaluations.

Building upon our original study, we explored the further optimization of our simulation-driven falsification process by incorporating an intermediate level of simulation fidelity. Our initial framework employed both low- and high-fidelity simulators in conjunction with Bayesian optimization, achieving a balance between computational efficiency and precision in safety validation. Recognizing the inherent trade-offs between computational cost and accuracy within the two-fidelity framework, we sought to propose a mid-fidelity simulator into our existing setup.

Our empirical analyses across various case studies highlighted the efficacy of the tri-fidelity approach. Notably, the proposed tri-fidelity framework was consistently more effective in detecting counterexamples compared to traditional high-fidelity Bayesian optimization, achieving this at a similar computational cost. One of the primary contributions of our recent study is the in-depth analysis of costs associated with various fidelity levels. The study emphasizes the crucial role that simulator costs play in the multifidelity falsification process. These costs, which effectively dictate the computational expenses of simulators across different fidelity levels, determine when to transition from one fidelity level to another. A significant difference in costs between low- and high-fidelity simulations can lead to more frequent utilization of the low-fidelity simulator, ensuring computational efficiency. However, this focus on cost savings might compromise the reliability of the counterexamples identified. Consequently, understanding and accurately accounting for simulator costs is paramount in refining and optimizing the multi-fidelity falsification framework. Furthermore, we analyzed the trade-off between cost and accuracy. In particular, our empirical findings indicate that the multifidelity falsification process not only mitigates the computational burden related to pinpointing counterexamples but also efficiently isolates the most critical among them.

The introduction of the middle-fidelity level added complexity to the optimization process. We empirically showed that, while this addition can be beneficial, its effectiveness is contingent on its similarity to either the low or high-fidelity levels. If the middle-fidelity is too similar to the low-fidelity, it might not offer significant advantages. Conversely, if it aligns more with the high-fidelity, it could provide high-quality results potentially at a reduced cost. The key takeaway is the importance of carefully calibrating this middle-fidelity for optimal performance.

The implications of this methodology are far-reaching. By intelligently capitalizing on the efficiency of low-fidelity simulators and marrying them with the accuracy of high-fidelity ones, this approach reduces the reliance on more computationally intensive simulations. Perhaps most compellingly, the combination of low- and high-fidelity levels in a single, coherent framework enables the extraction of more safety-critical information per simulation. Therefore, this contributes to risk mitigation in the predeployment phase of LECs used within safety-critical applications.


In our previous work, the focus was on multifidelity falsification where different fidelity settings were adjusted based on the falsification process’ requirements (Shahrooei, Kochenderfer, and Baheri 2023; Beard and Baheri 2022). This approach facilitated falsification of the LECs, uncovering failure trajectories using both low- and high-fidelity simulations. However, the adjustments were performed in a reactive manner, responding to the falsification process’s needs rather than proactively strategizing to optimize simulation fidelity settings. Although this approach yielded significant findings, it was restricted in its adaptability and efficiency. In particular, the lack of proactive planning limited our ability to systematically navigate the trade-off between simulation fidelity and computational costs. Furthermore, it left room for potential challenges in the exploration of failure regions to remain unaddressed, due to the lack of strategic and proactive optimization of simulation fidelity settings. Therefore, in our ongoing research, we have transitioned from a reactive multifidelity falsification framework to a proactive joint falsification and fidelity settings optimization. This paradigm shift brings together falsification processes and simulation fidelity optimization in a more holistic manner. The objective now is not merely to adapt fidelity settings based on the falsification process, but also to optimize these settings alongside the falsification. This approach allows for a more efficient allocation of computational resources, as well as the capability to more effectively address potential challenges in the exploration of failure regions.

The integration of fidelity settings optimization into the falsification process represents a profound advancement in the safety validation for LECs. By refining how we approach the discovery of potential failures, this methodology enhances the efficiency of exploring a wide range of failure scenarios. The framework not only accelerates the identification of failure modes but also amplifies the system’s adaptability to respond to diverse conditions. This strategy of fusing falsification and fidelity setting optimization has the potential to reinvent the simulation-driven validation process. It does so by introducing a broader spectrum of failure modes, thus offering a more exhaustive means of validating learning-enabled systems.

Our main contribution lies in formalizing a joint approach to falsification and fidelity settings optimization, cast within a bi-level optimization framework (Sinha, Malo, and Deb 2017). This formulation merges the testing of LECs in control policies with the iterative refinement of simulator settings to minimize discrepancies between the (failure) trajectories obtained from two different simulators at varying fidelity levels. At the heart of this formulation are two loops—the inner loop, responsible for falsification, and the outer loop, responsible for fidelity settings optimization. The inner loop applies Bayesian optimization, serving as the falsification mechanism. This optimization algorithm systematically probes for potential failure trajectories across both low- and high-fidelity simulators, representing the specific conditions under which the system deviates from safety specifications.

While the inner loop focuses on pinpointing the failure trajectories, the outer loop simultaneously fine-tunes the fidelity settings of the low-fidelity simulator. Specifically, the main objective here is to minimize the discrepancy between the failure trajectories produced by both the low- and high-fidelity simulators. This is achieved by adjusting parameters within the low-fidelity simulator so that its outputs align more closely with those from the high-fidelity simulator. The extent of this discrepancy is quantified using a measure like the mean squared error, which provides a mathematical gauge of the difference between the failure trajectories from the two simulators.

In the process of joint falsification and fidelity settings optimization, meta-learning principles play a pivotal role in achieving generalization to unseen events (Finn, Abbeel, and Levine 2017). By training the model across a diverse spectrum of tasks, it learns shared patterns that enable effective adaptation to unseen situations. In our approach, fidelity tuning refines the alignment between low- and high-fidelity simulators, grounding the learning in a more realistic representation of the world. This approach, where falsification, fidelity tuning, and learning across tasks interact dynamically, creates a learning process that enables the model to extract and reinforce shared patterns that are not only task-specific but also broadly applicable, thereby fostering generalization capability to unseen events.

To provide a theoretical basis for this approach, we proposed a set of theorems that illuminate important aspects of the joint falsification and fidelity settings optimization (Baheri and Kochenderfer 2023). These encompass a sensitivity analysis, which determines the extent to which alterations in the simulator settings influence the identified failure trajectories; an investigation of sample complexity; a review of the convergence properties, ensuring our approach aligns with the conditions needed for optimal solution convergence; and an exploration of the interaction between the inner and outer optimization loops. These theoretical results lay the foundation for joint falsification and fidelity settings optimization.


As we continue to delve into the era of LECs within safety-critical applications, it becomes crucial to devise efficient validation methodologies. While simulation-driven falsification has proven its efficacy, there lies an extensive scope for exploring uncharted territories that could revolutionize this field. This section attempts to shed light on potential future research directions in (multifidelity) simulation-driven falsification, specifically focusing on how these techniques can augment the evaluation of LECs in safety-critical applications.

Adaptive fidelity selection. Multifidelity optimization techniques have gained prominence for their ability to increase the efficiency of the falsification process. These techniques leverage various simulation fidelity levels, resulting in a balance between computational efficiency, scenario diversity, and accuracy. Current approaches typically resort to predefined resource allocation among different fidelity levels. Nevertheless, future research could channel efforts towards devising adaptive techniques that dynamically allocate resources based on ongoing simulation results. This approach could lead to the development of intelligent systems that shift from low-fidelity simulations to high-fidelity ones when a potential failure mode region is identified.

Leveraging parallel and distributed computing. The advent of parallel and distributed computing infrastructures presents an exciting avenue to be explored. Integrating these technologies with multifidelity optimization techniques could enable simultaneous running of numerous simulations. This would drastically expedite the falsification process and facilitate a broader exploration of parameter space within a reduced time frame.

Hybrid validation approaches. Hybrid methods that combine simulation-driven falsification with other validation techniques like formal verification and real-world testing could offer a comprehensive safety evaluation approach. These hybrid methodologies could maximize the strengths of each technique and mitigate their weaknesses. For instance, simulation-driven falsification could be used to identify potential failure modes, which could then be examined in greater detail using formal verification. The most critical scenarios could subsequently undergo real-world testing for validation of simulation and formal verification results.

Hybrid simulation techniques and digital twins. The development of hybrid simulation techniques can achieve an optimal balance between computational cost and environmental complexity. These techniques may employ lower fidelity simulations during the initial validation stages, followed by higher fidelity simulations that replicate complex real-world scenarios. The concept of Digital Twins Batty (2018)—high-fidelity virtual replicas of physical systems—when coupled with real-time data and learning algorithms, could offer a unique blend of the physical and simulated domains. This integration can pave the way for the development and validation of safer LECs.

Integrating mixed reality for enhanced falsification. The incorporation of mixed reality—blending real-world data into simulation environments—can provide an enriched representation of real-world scenarios and environments. It could enhance the discovery of failure modes by factoring in the complex, unstructured data typically encountered in real-world situations. However, integrating such large-scale data into the simulator presents significant technical challenges. This area calls for extensive research to devise algorithms capable of processing this data and facilitating more effective falsification process.

Interpretability and transparency in simulation-driven falsification. With the growing complexity of machine learning models, ensuring transparency and interpretability of decisions is becoming increasingly important in safety-critical applications (Carvalho, Pereira, and Cardoso 2019). To address this in the context of simulation-driven falsification, future work could focus on developing advanced visualization techniques for these simulations. These tools would not only illustrate how different scenarios lead to different outcomes but also show how alterations in the learning-enabled system’s behavior can mitigate failure modes. Enhanced transparency can thus guide developers and stakeholders in making more informed decisions during the safety validation process. This level of interpretability can also bolster the confidence in using LECs in safety-critical applications, as a clear understanding of why a system fails or passes a falsification test can be instrumental in evaluating its readiness for deployment.


In this article, we have explored the significant role that fidelity in simulators plays in the safety validation of LECs used in safety-critical applications. We outlined two research directions focused on deepening our understanding of the role of fidelity, each contributing to a novel perspective on how to approach this problem. The first direction proposed an approach for multifidelity simulation-driven falsification, where the goal is to strategically use different levels of fidelity. By doing so, we can significantly reduce the number of computationally expensive simulations required for the safety evaluation of LECs. In the second direction, we delved into an approach for joint falsification and fidelity settings optimization. This approach synthesizes the process of falsifying safety properties with the optimization of fidelity levels within the simulation environment. It allows for a more subtle exploration of the design space, where the adjustment of fidelity settings is aligned with the objective of minimizing discrepancies between low- and high-fidelity simulators. Beyond our primary contributions, we also provided an outlook on the future of this research area, identifying potential advancements and challenges that may arise as the field continues to evolve.


This research was supported in part by the National Science Foundation (NSF) under Award No. 2132060 and the Federal Aviation Administration (FAA) under Contract No. 692M15-21-T-00022.


The author declares that there is no conflict.


  • Ali Baheri is an Assistant Professor of Mechanical Engineering at the Rochester Institute of Technology. Before joining RIT, he was a Visiting Scholar at Stanford University. Prior to that, he served as an Assistant Professor (in the research track) at West Virginia University. He received his Ph.D. from the University of North Carolina at Charlotte in 2018. His lab focuses on research at the intersection of autonomy, controls, and machine learning, with the ultimate goal being the advancement of safe, certified, and efficient autonomous systems.