PhD defense Esther Goudet: Methods and Tools for Probabilistic Analytical Reliability Analysis of Gate Netlists
Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi 4 and in videoconferencing
Jury
- Roselyne CHOTIN, Professor, Sorbonne Université (LIP6, CIAN), France – Jury president
- Lirida NAVINER, Professor, Télécom Paris (LTCI, COMELEC), France – Thesis supervisor
- Jean-Marc DAVEAU, Doctor, STMicroelectronics, Crolles, France – Industrial supervisor
- Cristina MEINHARDT, Associate Professor, Universidade Federal de Santa Catarina (INE), Brazil – Examinator
- Matteo SONZA REORDA, Professor, Politecnico di Torino (DAUIN), Italy – Reviewer
- Olivier SENTIEYS, Professor, Université de Rennes, INRIA (IRISA), France – Reviewer
Abstract
The analysis of complex System-On-a-Chips (SoCs) for ISO 26262 automotive certification requires the extraction of reliability metrics such as the logic masking factor, which are used to calculate fault propagation probabilities and assess system-level reliability. These metrics can be extracted through experimental methods (fault injection) or probabilistic analytical methods.
… calculating various reliability metrics based on the circuit’s netlist. These metrics (incorrect output probability) are used to build a probabilistic failure model of a block behavior, which will be utilized in fault modeling and reliability assessment frameworks such as Reliability/Availability/Maintainability/Safety (RAMS) analysis and Model Based Safety Assessment (MBSA). This thesis builds on previous doctoral research, extending and complementing the Hierarchical Conditional Probability Matrix (HCPM) partitioning method developed and patented for its application to large netlists. The CPM clustering algorithm remains best in class but has not been utilized due to the lack of application to large gate netlists with ill-formed nested reconvergent paths. To continue this work, we aim to apply a partitioning algorithm to combinatorial netlists. The partitions’ clusters have reduced sizes compared to the full circuit, and the signal error rates of each cluster are analyzed separately, using the same model for each block. To calculate the failure rate of a combinatorial circuit, the signal error rates are propagated from the circuit’s inputs, between the clusters, up to the primary output signals. The complexity of analyzing the entire circuit is then reduced to the complexity of the heaviest cluster. Under specific partitioning conditions, the overall circuit analysis remains accurate while significantly reducing the time and memory required for computations. The primary focuses of this work include implementing an efficient partitioning algorithm, distributing and spreading faults within a circuit, and considering reconverging signals. The particularly problematic situation where a reconverging fan-out spreads across multiple distinct clusters leads to incorrect circuit error rate estimation. During the doctoral work, it was examined whether the reliability results obtained with this approach are consistent and relevant compared to existing analytical models in the state of the art. Additionally, at a certain point in the thesis and for a specific discussion during the review of our model, some results were verified by a fault injection tool implemented on an FPGA. Fault models for other elementary components of a silicon library (3D plug, I/Os) were also explored to link analytical reliability models to the reliability of silicon SoCs. The benefit of applying this to SoCs is the non-conservative extraction of fault masking probabilities for ISO 26262 certification. The generation of probabilistic failure models will enable their integration into the MBSA process.