skip to main content
survey
Free Access
Just Accepted

Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators - Trends in Quantum Computing, Heterogeneous Systems and Reliability

Online AM:06 May 2024Publication History
Skip Abstract Section

Abstract

Rapid progress in the CMOS technology for the past 25 years has increased the vulnerability of processors towards faults. Subsequently, focus of computer architects shifted towards designing fault-tolerance methods for processor architectures. Concurrently, chip designers encountered high order challenges for designing fault tolerant processor architectures. For processor cores, redundancy-based fault tolerance methods for fault detection at core level, micro-architectural level ,thread level , and software level are discussed. Similar applicable redundancy-based fault tolerance methods for cache memory, and hardware accelerators are presented in the article. Recent trends in fault tolerant quantum computing and quantum error correction are also discussed. The classification of state-of-the-art techniques is presented in the survey would help the researchers to organize their work on established lines.

References

  1. [1] Moore, G.E. 1998. Cramming more components onto integrated circuits. Proceedings of the IEEE 86, 1(1998), 82-85.Google ScholarGoogle Scholar
  2. [2] Moore, G.E. 2006. Lithography and the Future of Moore's Law. IEEE Solid-State Circuits Society Newsletter  11, 3 (2006), 37-42.Google ScholarGoogle Scholar
  3. [3] F Pollack. Pollack's rule of thumb for microprocessor and area. Retrieved 8 December 2023 from http://en.wikipedia.org/wiki/Pollack's_Rule.Google ScholarGoogle Scholar
  4. [4] Dennard, R.H., Gaensslen, F.H., Yu, H.N., Rideout, V.L., Bassous, E. and LeBlanc, A.R. 1974. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid-State Circuits  9, 5(1974), 256-268.Google ScholarGoogle Scholar
  5. [5] Tullsen, D.M., Eggers, S.J. and Levy, H.M. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd annual international symposium on Computer architecture,  June 22-24, 1995, Santa Margherita Ligure, Italy, 392-403.Google ScholarGoogle Scholar
  6. [6] Xbit Labs.2002. Intel Pentium 4 3.06 GHz CPU with hyper-threading technology: Killing two birds with astone…, Available[online]:http://www.xbitlabs.com/articles/cpu/display/pentium4-3066.html.Google ScholarGoogle Scholar
  7. [7] Borkar, S. 2005. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro  25 ,6(2005),10-16.Google ScholarGoogle Scholar
  8. [8] Gizopoulos, D., Psarakis, M., Adve, S.V., Ramachandran, P., Hari, S.K.S., Sorin, D., Meixner, A., Biswas, A. and Vera, X. 2011. Architectures for online error detection and recovery in multicore processors. In 2011 IEEE Design, Automation & Test in Europe, March 14-18 , 2011, Grenoble, France, 1-6.Google ScholarGoogle Scholar
  9. [9] Ray, J., Hoe, J.C. and Falsafi, B. 2001. Dual use of superscalar Datapath for transient-fault detection and recovery. In Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture, December 1-5, 2001, Austin, TX, USA, 214-224.Google ScholarGoogle Scholar
  10. [10] Parashar, A., Gurumurthi, S. and Sivasubramaniam, A. 2004. A complexity-effective approach to Alu bandwidth enhancement for instruction-level temporal redundancy. In Proceedings. 31st Annual International Symposium on Computer Architecture, June 19-23, 2004, Munich, Germany, 376-386.Google ScholarGoogle Scholar
  11. [11] Nickel, J.B. and Somani, A.K. 2001. REESE: A method of soft error detection in microprocessors. In proceedings. IEEE International Conference on Dependable Systems and Networks , July 1-4, 2001, Gothenburg, Sweden, 401-410.Google ScholarGoogle Scholar
  12. [12] Gomaa, M.A. and Vijaykumar, T.N. 2005. Opportunistic transient-fault detection. In 32nd IEEE International Symposium on Computer Architecture , June 4-8, 2005, Madison, WI, USA, 172-183.Google ScholarGoogle Scholar
  13. [13] Shyam, S., Constantinides, K., Phadke, S., Bertacco, V. and Austin, T. 2006. Ultra low-cost defect protection for microprocessor pipelines. ACM SIGARCH Computer Architecture News 34, 5 (2006), 3-82.Google ScholarGoogle Scholar
  14. [14] Meixner, A., Bauer, M.E. and Sorin, D. 2007. Argus: Low-cost, comprehensive error detection in simple cores. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 1 -5, 2007, Chicago, IL, USA, 210-222.Google ScholarGoogle Scholar
  15. [15] Hu, J.S., Link, G.M., John, J.K., Wang, S. and Ziavras, S.G. 2005. Resource-driven optimizations for transient-fault detecting superscalar microarchitectures. In 10th Asia-Pacific Conference on Advances in Computer Systems Architecture, October 24-26, 2005, Singapore, 200-214.Google ScholarGoogle Scholar
  16. [16] Soman, J., Miralaei, N., Mycroft, A. and Jones, T.M. 2015. REPAIR: Hard-error recovery via re-execution. In 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, October 12-14 , 2015, Amherst, MA, USA , 76-79.Google ScholarGoogle Scholar
  17. [17] Bernick, D., Bruckert, B., Vigna, P.D., Garcia, D., Jardine, R., Klecka, J. and Smullen, J. 2005. NonStop advanced architecture. In 2005 IEEE International Conference on Dependable Systems and Networks, June 28- July 01, 2005, Yokohama, Japan, 12-21.Google ScholarGoogle Scholar
  18. [18] Austin, T.M. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, November 16-18, 1999, Haifa, Israel, 196-207.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Purser, Z., Sundaramoorthy, K. and Rotenberg, E. 2000. A study of slipstream processors. In Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, December 10-13, 2000,Monterey, CA, USA, 269-280.Google ScholarGoogle Scholar
  20. [20] Rashid, M.W., Tan, E.J., Huang, M.C. and Albonesi, D.H. 2005. Exploiting coarse-grain verification parallelism for power-efficient fault tolerance. In 14th International Conference on Parallel Architectures and Compilation Techniques, September 17 – 21, 2005, St. Louis, MO, USA, 315-325.Google ScholarGoogle Scholar
  21. [21] Li, H.T., Chou, C.Y., Hsieh, Y.T., Chu, W.C. and Wu, A.Y. 2017. Variation-aware reliable many-core system design by exploiting inherent core redundancy. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems 25, 10(2017), 2803-2816.Google ScholarGoogle Scholar
  22. [22] Iturbe, X., Venu, B., Penton, J. and Ozer, E.2017. A" high resilience" mode to minimize soft error vulnerabilities in ARM cortex-R CPU pipelines: work-in-progress. In Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion , October 15 -20, 2017, Seoul, Korea ,1-2.Google ScholarGoogle Scholar
  23. [23] Ainsworth, S. and Jones, T.M.2018. Parallel error detection using heterogeneous cores. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 25-28, 2018, Luxembourg, Luxembourg, 338-349.Google ScholarGoogle Scholar
  24. [24] Spainhower, L. and Gregg, T.A. 1999. IBM S/390 parallel enterprise server G5 fault tolerance: A historical perspective. IBM Journal of Research and Development  43, 5.6(1999), 863-873.Google ScholarGoogle Scholar
  25. [25] Rotenberg, E. 1999. AR-SMT: A Microarchitectural approach to fault tolerance in microprocessors. In IEEE Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No. 99CB36352), June 15-18, 1999, Madison, WI, USA, 84-91.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Reinhardt, S.K. and Mukherjee, S.S.2000. Transient fault detection via simultaneous multithreading. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No. RS00201), June 12-14, 2000, Vancouver, BC, Canada, 25-36.Google ScholarGoogle Scholar
  27. [27] Vijaykumar, T.N., Pomeranz, I. and Cheng, K. 2002. Transient-fault recovery using simultaneous multithreading. In Proceedings 29th Annual International Symposium on Computer Architecture, May 25-29, 2002, Anchorage, AK, USA,87-98.Google ScholarGoogle Scholar
  28. [28] Gomaa, M., Scarbrough, C., Vijaykumar, T.N. and Pomeranz, I. 2003. Transient-fault recovery for chip multiprocessors. In Proceedings 30th Annual International Symposium on Computer Architecture, June 9 -11, 2003, San Diego, CA, USA, 98-109.Google ScholarGoogle Scholar
  29. [29] Huang, B., Sass, R., Debardeleben, N. and Blanchard, S. 2014. Harnessing unreliable cores in heterogeneous architecture: The PyDac programming model and runtime. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 23 -26, 2014, Atlanta, GA, USA ,744-749.Google ScholarGoogle Scholar
  30. [30] APPLE A12X BIONIC details. Retrieved from:<https:// www. apple.com /iPhone/ iPhone XS>[1 June 2018].Google ScholarGoogle Scholar
  31. [31] Kanawati, G.A., Nair, V.S., Krishnamurthy, N. and Abraham, J.A. 1996. Evaluation of integrated system-level checks for on-line error detection. In Proceedings of IEEE International Computer Performance and Dependability Symposium, September 4-6, 1996, Urbana-Champaign, IL, USA, 292-301.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Reis, G.A., Chang, J., Vachharajani, N., Rangan, R. and August, D.I. 2005. SWIFT: Software implemented fault tolerance. In International symposium on Code generation and optimization, March 20-23, 2005, San Jose, CA, USA, 243-254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Z. Liu, Z. Zhang, R. Xi, P. Zhu and B. Ma.2021. SoK: A Survey on Redundant Execution Technology, International Conference on Advanced Computing and Endogenous Security, April 21-22, 2022, Nanjing, China, pp. 1-14.Google ScholarGoogle Scholar
  34. [34] Q. Shi and O. Khan.2013.Toward Holistic Soft-Error-Resilient Shared-Memory Multicores. Computer 46,10 (2013), 56-64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Venkatesha S, and Parthasarathi R.2023. Design of Low-Cost Reliable and Fault-Tolerant 32-Bit One Instruction Core for Multi-Core Systems. Quality Control - An Anthology of Cases. IntechOpen, England. http://dx.doi.org/10.5772/intechopen.102823Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Zhang, Y., Lee, J.W., Johnson, N.P. and August, D.I. 2010. DAFT: Decoupled acyclic fault tolerance. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, September 11-15, 2010, Vienna, Austria, 87-98).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Liu, Q., Jung, C., Lee, D. and Tiwari, D. 2016. Compiler-directed soft error detection and recovery to avoid DUE and SDC via Tail-DMR. ACM Transactions on Embedded Computing Systems (TECS) 16, 2(2016),1-26.Google ScholarGoogle Scholar
  38. [38] Upasani, G., Vera, X. and González, A. 2014. Avoiding core's due & sdc via acoustic wave detectors and tailored error containment and recovery. ACM SIGARCH Computer Architecture News  42, 3 (2014), 37-48.Google ScholarGoogle Scholar
  39. [39] Mahmoud, A., Venkatagiri, R., Ahmed, K., Misailovic, S., Marinov, D., Fletcher, C.W. and Adve, S.V. 2019. Minotaur: Adapting software testing techniques for hardware errors. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 13-17, 2019, Providence, RI, USA, 1087-1103Google ScholarGoogle Scholar
  40. [40] Sorin, D.J., Martin, M.M., Hill, M.D. and Wood, D.A. 2002. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings 29th Annual International Symposium on Computer Architecture, May 25-29, 2002, Anchorage, AK, USA,123-134.Google ScholarGoogle Scholar
  41. [41] Prvulovic, M., Zhang, Z. and Torrellas, J. 2002. ReVive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. ACM SIGARCH Computer Architecture News 30, 2(2002),111-122.Google ScholarGoogle Scholar
  42. [42] Nakano, J., Montesinos, P., Gharachorloo, K. and Torrellas, J. 2006. ReViveI/O: Efficient handling of I/O in highly-available rollback-recovery servers. In The Twelfth International Symposium on High-Performance Computer Architecture, February 11-15, 2006, Austin, TX, USA, 200-211.Google ScholarGoogle Scholar
  43. [43] Doudalis, I. and Prvulovic, M. 2012. Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability. In 2012 39th Annual International Symposium on Computer Architecture, June 9-13, 2012, Portland, OR, USA, 261-272.Google ScholarGoogle Scholar
  44. [44] Agarwal, R., Garg, P. and Torrellas, J. 2011. Rebound: scalable checkpointing for coherent shared memory. In Proceedings of the 38th annual international symposium on Computer architecture, June 4 -8, 2011, San Jose, CA, USA , 53-164.Google ScholarGoogle Scholar
  45. [45] Sarangi, S.R., Greskamp, B. and Torrellas, J. 2006. Cadre: Cycle-accurate deterministic replay for hardware debugging. In International Conference on Dependable Systems and Networks, June 25 - 28, 2006 , Philadelphia, PA, USA, 301-312.Google ScholarGoogle Scholar
  46. [46] X W. Bartlett and B. Ball.1998. Tandems approach to fault tolerance. Tandem Systems 4, 1(1998), 84-95.Google ScholarGoogle Scholar
  47. [47] Fair, M.L., Conklin, C.R., Swaney, S.B., Meaney, P.J., Clarke, W.J., Alves, L.C., Modi, I.N., Freier, F., Fischer, W. and Weber, N.E. 2004. Reliability, Availability, and Serviceability (RAS) of the IBM eServer z990. IBM Journal of Research and Development 48, 3.4(2004), 519-534.Google ScholarGoogle Scholar
  48. [48] Aggarwal, N., Ranganathan, P., Jouppi, N.P. and Smith, J.E. 2007. Configurable isolation: building high availability systems with commodity multi-core processors. ACM SIGARCH Computer Architecture News 35, 2(2007), 470-481.Google ScholarGoogle Scholar
  49. [49] Smolens, J.C., Gold, B.T., Kim, J., Falsafi, B., Hoe, J.C. and Nowatzyk, A.G. 2004. Fingerprinting: Bounding soft-error detection latency and bandwidth. ACM SIGOPS Operating Systems Review 38, 5(2004), 224-234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Smolens, J.C., Gold, B.T., Falsafi, B. and Hoe, J.C. 2006. December. Reunion: Complexity-effective multicore redundancy. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 09-13, 2006, Orlando, FL, USA, 223-234.Google ScholarGoogle Scholar
  51. [51] LaFrieda, C., Ipek, E., Martinez, J.F. and Manohar, R. 2007. Utilizing dynamically coupled cores to form a resilient chip multiprocessor. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 25-28, 2007, Edinburgh, UK, 317-326.Google ScholarGoogle Scholar
  52. [52] Sundaramoorthy, K., Purser, Z. and Rotenberg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. ACM SIGPLAN Notices 35, 11(2000), 257-268.Google ScholarGoogle Scholar
  53. [53] Subramanyan, P, Singh, V, Saluja, KK and Larsson, E. 2009. Power-Efficient Redundant Execution for Chip Multiprocessors. In Proceedings of IEEE 3rd workshop on Dependable and Secure Nano computing held in conjunction with IEEE DSN June 29 -July 2 , 2009, Lisbon, Portugal, 1-6.Google ScholarGoogle Scholar
  54. [54] Subramanyan, P, Singh, V, Saluja, KK and Larsson, E.2010. Energy-Efficient Redundant Execution for Chip Multiprocessors. In Proceedings of the twentieth ACM Great Lakes Symposium on VLSI, June 28 – July 1, 2010, Chicago, IL, USA, 143-146.Google ScholarGoogle Scholar
  55. [55] Subramanyan, P, Singh V, KK, Saluja & Larsson, E.2010. Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding. In Proceedings of IEEE International conference on Dependable Systems and Networks, June 28 – July 1, 2010, Chicago, IL, USA, 121 -130.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Gopalakrishnan, S. and Singh, V. 2017. REMORA: a hybrid low-cost soft-error reliable fault tolerant architecture. In 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), October 23-25, 2017, Cambridge, UK ,1-6.Google ScholarGoogle Scholar
  57. [57] Soman, J. and Jones, T.M. 2017. High performance fault tolerance through predictive instruction re-execution. In 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), October 23-25, 2017, Cambridge, UK, 1-4 .Google ScholarGoogle Scholar
  58. [58] Ainsworth, S. and Jones, T.M.2018. Parallel error detection using heterogeneous cores. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 25-28,2018, Luxembourg, Luxembourg,338-349.Google ScholarGoogle Scholar
  59. [59] Smolens, J.C., Kim, J., Hoe, J.C. and Falsafi, B. 2004. Efficient resource sharing in concurrent error detecting superscalar microarchitectures. In 37th International Symposium on Microarchitecture (MICRO-37'04), December 4-8,2004, Portland, OR, USA, 257-268.Google ScholarGoogle Scholar
  60. [60] Vera, X., Abella, J., Carretero, J. and González, A. 2010. Selective replication: A lightweight technique for soft errors. ACM Transactions on Computer Systems (TOCS)  27, 4(2010), 1-30.Google ScholarGoogle Scholar
  61. [61] Mukherjee, S. 2011. Architecture design for soft errors, Morgan Kaufmann, Burlington, Massachusetts, USAGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Mukherjee, S.S., Kontz, M. and Reinhardt, S.K. 2002. Detailed design and evaluation of redundant multi-threading alternatives. In Proceedings 29th annual international symposium on computer architecture, May 25- 29,2002, Anchorage, AK, USA, 99-110. IEEE.Google ScholarGoogle Scholar
  63. [63] Parashar, A., Sivasubramaniam, A. and Gurumurthi, S.2006. SlicK: slice-based locality exploitation for efficient redundant multithreading. ACM SIGOPS Operating Systems Review  40, 5(2006), 95-105.Google ScholarGoogle Scholar
  64. [64] Kumar, S. and Aggarwal, A. 2008. Speculative instruction validation for performance-reliability trade-off. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture, February 16-20 , 2018, Salt Lake City, UT, USA , 405-414.Google ScholarGoogle Scholar
  65. [65] Huang, B., Sass, R., Debardeleben, N. and Blanchard, S. 2014. Harnessing unreliable cores in heterogeneous architecture: The PyDac programming model and runtime. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks , June 23 -26, 2014, Atlanta, GA, USA, 744-749.Google ScholarGoogle Scholar
  66. [66] Schuette, M.A. and Shen, J.P.1987. Processor control flow monitoring using signatured instruction streams. IEEE Transactions on Computers 36, 3(1987), 264-276.Google ScholarGoogle Scholar
  67. [67] Namjoo, M.1982. Techniques for Concurrent Testing of VLSI Processor. In Proc. of the International Test Conference (ITC),1982, Philadelphia, PA, USA ,416-468.Google ScholarGoogle Scholar
  68. [68] Wilken, K. and Shen, J.P.1990. Continuous signature monitoring: low-cost concurrent detection of processor control errors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  9, 6(1990), 629-641.Google ScholarGoogle Scholar
  69. [69] Oh, N., Shirvani, P.P. and McCluskey, E.J.2002. Control-flow checking by software signatures. IEEE Transactions on Reliability  51, 1(2002),111-122.Google ScholarGoogle Scholar
  70. [70] Oh, N., Shirvani, P.P. and McCluskey, E.J.2002. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability 51, 1(2002), 63-75.Google ScholarGoogle Scholar
  71. [71] Reis, G.A., Chang, J., Vachharajani, N., Mukherjee, S.S., Rangan, R. and August, D.I.2005. Design and evaluation of hybrid fault-detection systems. In 32nd International Symposium on Computer Architecture (ISCA'05) June 4-8, ,2005, Madison, WI, USA, 148-159. IEEE.Google ScholarGoogle Scholar
  72. [72] Wang, C., Kim, H.S., Wu, Y. and Ying, V. 2007. Compiler-managed software-based redundant multi-threading for transient fault detection. In International Symposium on Code Generation and Optimization (CGO'07), March 11-14, 2007, San Jose, CA, USA 244-258.Google ScholarGoogle Scholar
  73. [73] Chang, J., Reis, G.A. and August, D.I. 2006. Automatic instruction-level software-only recovery. In International Conference on Dependable Systems and Networks (DSN'06),June 25-28, 2006, Philadelphia, PA, USA, 83-92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Liu, Q., Jung, C., Lee, D. and Tiwari, D. 2016. Compiler-directed soft error detection and recovery to avoid DUE and SDC via Tail-DMR. ACM Transactions on Embedded Computing Systems 16, 2(2016),1-26.Google ScholarGoogle Scholar
  75. [75] Mitropoulou, K., Porpodas, V. and Jones, T.M.2016. COMET: Communication-optimized multi-threaded error-detection technique. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2-7, 2016, Pittsburgh, PA, USA,1-10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] So, H., Didehban, M., Ko, Y., Shrivastava, A. and Lee, K. 2018. Expert: Effective and flexible error protection by redundant multithreading. In 2018 Design, Automation & Test in Europe Conference & Exhibition, March 19-23, 2018, Dresden, Germany,533-538.Google ScholarGoogle Scholar
  77. [77] So, H., Didehban, M., Shrivastava, A. and Lee, K.2019. A software-level redundant multithreading for soft/hard error detection and recovery. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), March 25-29, 2019, Florence, Italy, 1559-1562.Google ScholarGoogle Scholar
  78. [78] Wu, H., Guo, R. and Hu, Y.2021. FERNANDO: A software transient fault tolerance approach for embedded systems based on redundant multi-threading. IEEE Access 9, 67154-67166.Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] So, H., Didehban, M., Ko, Y., Shrivastava, A. and Lee, K. 2022. EXPERTISE: An Effective Software-level Redundant Multithreading Scheme against Hardware Faults. ACM Transactions on Architecture and Code Optimization 19, 4(2022), 1-26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Döbel, B., Härtig, H. and Engel, M. 2012. Operating system support for redundant multithreading. In Proceedings of the tenth ACM international conference on Embedded software, October 7 - 12, 2012, Tampere Finland, 83-92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Döbel, B. and Härtig, H. 2014. Can we put concurrency back into redundant multithreading? In Proceedings of the 14th International Conference on Embedded Software, October 12-17, 2014, Uttar Pradesh, India, 1-10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Hukerikar, S., Teranishi, K., Diniz, P.C. and Lucas, R.F. 2018. Redthreads: An interface for application-level fault detection/correction through adaptive redundant multithreading. International Journal of Parallel Programming 46, 225-251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Hukerikar, S. and Lucas, R.F. 2016. Rolex: Resilience-oriented language extensions for extreme-scale systems. The Journal of Supercomputing 72, 4662-4695.Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Hukerikar, S., Diniz, P.C., Lucas, R.F. and Teranishi, K. 2014. Opportunistic application-level fault detection through adaptive redundant multithreading. In 2014 International Conference on High Performance Computing & Simulation (HPCS), July 21-25, 2014, Bologna, Italy ,243-250.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Chen, Y.S. and Chen, P.S. 2016. A software-based redundant execution programming model for transient fault detection and correction. In 2016 45th International Conference on Parallel Processing Workshops (ICPPW), August 16-19, 2016, Philadelphia, PA, USA, 66-71.Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] Arslan, S. and Unsal, O.2021. Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading. The Journal of Supercomputing 77, 12(2021), 4130-14160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. [87] Gong, R., Dai, K. and Wang, Z. 2008. Transient fault recovery on chip multiprocessor based on dual core redundancy and context saving. In 2008 The 9th International Conference for Young Computer Scientists, November 18-21, 2008, Hunan, China,148-153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Rashid, M.W. and Huang, M.C. 2008. Supporting highly-decoupled thread-level redundancy for parallel programs. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture, February 16-20, 2008, Salt Lake City, UT, USA, 393-404.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Greskamp, B. and Torrellas, J. 2007. Paceline: Improving single-thread performance in nanoscale CMPs through core overclocking. In 16th International Conference on Parallel Architecture and Compilation Techniques, September 15-19, 2017, Brasov, Romania , 213-224.Google ScholarGoogle Scholar
  90. [90] Didehban, M. and Shrivastava, A. 2016. nZDC: A compiler technique for near zero silent data corruption. In Proceedings of the 53rd Annual Design Automation Conference, June 5-9, 2016, Austin, TX, USA,1-6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. [91] Didehban, M. and Shrivastava, A. 2018. A compiler technique for processor-wide protection from soft errors in multithreaded environments. IEEE Transactions on Reliability 67, 1(2018), 249-263.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Didehban, M., So, H., Gali, P., Shrivastava, A. and Lee, K. 2024. Generic Soft Error Data and Control Flow Error Detection by Instruction Duplication. IEEE Transactions on Dependable and Secure Computing 21, 1(2024), 78-92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. [93] Mavaddat, F. and Parhami, B. 1988. URISC: the ultimate reduced instruction set computer. International Journal of Electrical Engineering Education, 25, 4(1988),327-334.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Nürnberg, P.J., Wiil, U.K. and Hicks, D.L. 2003. A grand unified theory for structural computing. In International Symposium on Metainformatics, September 17-20, 2003, Graz, Austria, 1-16.Google ScholarGoogle ScholarCross RefCross Ref
  95. [95] Mazonka, O and Kolodin, A. 2011. A simple multi-processor computer based on subleq. arXiv: 1106.2593. Retrieved from https://arxiv.org/ftp/arxiv/papers/1106/1106.2593.pdfGoogle ScholarGoogle Scholar
  96. [96] Rajendiran A. 2012. Reliable computing with ultra-reduced instruction set co-processors, Proceedings of the forty-ninth Annual IEEE Design Automation Conference, June 03-07, 2012, San Francisco, CA, USA , 697-702.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. [97] Ananthanarayan, S., Garg, S. and Patel, H.D. 2013. Low -cost permanent fault detection using ultra-reduced instruction set co-processors. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), March 18-22, 2013, Grenoble, France, 933-938.Google ScholarGoogle Scholar
  98. [98] Shashikiran,Venkatesha and Ranjani, Parthasarathi, 2019. 32-Bit One Instruction Core: A Low-Cost, Reliable, and Fault-Tolerant Core for Multicore Systems. Journal of Testing and Evaluation 47, 6(2019), 3941–3962.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Hennessy, J.L. and Patterson, D.A. 2011. Computer architecture: a quantitative approach. Elsevier.Google ScholarGoogle Scholar
  100. [100] Kalayappan, R. and Sarangi, S.R. 2013. A survey of checker architectures. ACM Computing Surveys (CSUR) 45, 4(2013), 1-34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. [101] Lee, H., Kim, J., Park, J. and Kang, S.2023. STRAIT: Self-Test and Self-Recovery for AI Accelerator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42, 9(2023), 3092-3104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. [102] Mittal, S., and Vetter, J.S. 2015. A survey of techniques for modeling and improving reliability of computing systems. IEEE Transactions on Parallel and Distributed Systems 27, 4(2015),1226-1238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. [103] Li, T., Ambrose, J.A., Ragel, R. and Parameswaran, S.2016. Processor design for soft errors: Challenges and state of the art. ACM Computing Surveys (CSUR) 49, 3(2016), 1-44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. [104] Alcaide, S., Kosmidis, L., Hernandez, C. and Abella, J.2021. Achieving Diverse Redundancy for GPU Kernels. IEEE Transactions on Emerging Topics in Computing 10, 2(2021), 618-634.Google ScholarGoogle Scholar
  105. [105] Oz, I. and Arslan, S.2019. A survey on multithreading alternatives for soft error fault tolerance. ACM Computing Surveys (CSUR) 52, 2(2019), 1-38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. [106] Mittal, S.2020. A survey on modeling and improving reliability of DNN algorithms and accelerators. Journal of Systems Architecture 104, C(2020),101689.Google ScholarGoogle Scholar
  107. [107] Kundu, S., Basu, K., Sadi, M., Titirsha, T., Song, S., Das, A. and Guin, U. 2021. Special session: Reliability analysis for ML/AI hardware. arXiv : 2103.12166. Retrieved from https://arxiv.org/abs/2103.12166.Google ScholarGoogle Scholar
  108. [108] Postman, J. and Chiang, P. 2012. A survey addressing on-chip interconnect: Energy and reliability considerations. International Scholarly Research Notices, 2012. https://doi.org/10.5402/2012/916259Google ScholarGoogle ScholarCross RefCross Ref
  109. [109] Koomey, J., Berard, S., Sanchez, M. and Wong, H.2010. Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing 33,3(2010),46-54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. [110] Horowitz, M. 2014. Computing's energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers, February 9-13, 2014, San Francisco, CA, USA, 10-14.Google ScholarGoogle ScholarCross RefCross Ref
  111. [111] Jeyapaul, R., Hong, F., Rhisheekesan, A., Shrivastava, A. and Lee, K.2011. UnSync: A soft error resilient redundant multicore architecture. In 2011 International Conference on Parallel Processing, September 13-16, 2011, Taipei, Taiwan , 632-641.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. [112] Venkatesha, S. and Parthasarathi, R.2022. One Shot System Based Reliability Modelling and Analysis for Low-Cost Fault-Tolerant Computing System Comprising of One Instruction Cores. In 2022 International Conference on Smart Generation Computing, Communication and Networking, December 23-25, 2022, Bangalore, India, 1-9.Google ScholarGoogle Scholar
  113. [113] M. W. Rashid, E. J. Tan, M. C. Huang, and D. H. Albonesi. 2005. Exploiting coarse-grain verification parallelism for power-efficient fault tolerance. In 14th International Conference on Parallel Architectures and Compilation Techniques., Sept. 17 – 21, 2005, St. Louis, MO, USA , 315-325 .Google ScholarGoogle Scholar
  114. [114] N. Madan and R. Balasubramonian. 2007. Power efficient approaches to redundant multithreading. IEEE Transactions on Parallel and Distributed Systems 18, 8 (2007), 1066–1079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. [115] A. Meixner and D. J. Sorin. 2007. Error Detection Using Dynamic Dataflow Verification. In Proc. of the Int'l Conf. on Parallel Architectures and Compilation Techniques, Brasov, September 15-19 , 2007,Romania ,104-118.Google ScholarGoogle Scholar
  116. [116] Zhang, W., Gurumurthi, S., Kandemir, M.T. and Sivasubramaniam, A. 2003. ICR: In-Cache Replication for Enhancing Data Cache Reliability. In Proceedings of International Conference on Dependable Systems and Networks, June 22-25, 2003, San Francisco, CA, USA, 291-300.Google ScholarGoogle Scholar
  117. [117] Zhang, W.2005. Replication cache: A small fully associative cache to improve data cache reliability. IEEE Transactions on Computers, 54, 12(2005), 1547-1555.Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. [118] Sugihara, M., Ishihara, T., & Murakami, K. 2007. Task scheduling for reliable cache architectures of multiprocessor systems. In 2007 Design, Automation & Test in Europe Conference & Exhibition, April 16-20,2007, Nice, France ,1-6.Google ScholarGoogle Scholar
  119. [119] Kim, S., 2006. Area-efficient error protection for caches. In Proceedings of the Design Automation & Test in Europe Conference, March 06-10, 2006, Munich, Germany,1-6.Google ScholarGoogle Scholar
  120. [120] Mukherjee, S.S., Emer, J., Fossum, T. and Reinhardt, S.K. 2004. Cache scrubbing in microprocessors: Myth or necessity? In 10th IEEE Pacific Rim International Symposium on Dependable Computing, March 3-5, 2004., Papeete, France, 37-42.Google ScholarGoogle Scholar
  121. [121] Saleh, A.M., Serrano, J.J. and Patel, J.H.1990. Reliability of scrubbing recovery-techniques for memory systems. IEEE transactions on reliability 39, 1(1990),114-122.Google ScholarGoogle Scholar
  122. [122] Sridharan, V., Asadi, H., Tahoori, M.B. and Kaeli, D. 2006. Reducing data cache susceptibility to soft errors. IEEE Transactions on Dependable and Secure Computing 3, 4(2006), 353-364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. [123] Li, L., Degalahal, V., Vijaykrishnan, N., Kandemir, M. and Irwin, M.J. 2004. Soft error and energy consumption interactions: A data cache perspective. In Proceedings of the 2004 international symposium on Low power electronics and design, August 11, 2004, Newport Beach, CA, USA, 132-137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. [124] Asadi, G.H., Sridharan, V., Tahoori, M.B. and Kaeli, D. 2005. Balancing performance and reliability in the memory hierarchy. In IEEE International Symposium on Performance Analysis of Systems and Software, March 20-22, 2005, Austin, TX, USA ,269-279.Google ScholarGoogle Scholar
  125. [125] Kadayif, I. and Kandemir, M. 2007. Modeling and improving data cache reliability. ACM SIGMETRICS Performance Evaluation Review 35, 1(2007),12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. [126] Cai, Y., Schmitz, M.T., Ejlali, A., Al-Hashimi, B.M. and Reddy, S.M. 2006. Cache size selection for performance, energy and reliability of time-constrained systems. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, January 24-27, 2006 ,Yokohama, Japan ,923-928.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. [127] Jeyapaul, R. and Shrivastava, A. 2013. Enabling energy efficient reliability in embedded systems through smart cache cleaning. ACM Transactions on Design Automation of Electronic Systems 18, 4(2013), 1-25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. [128] A. Hashmi, H. Berry, O. Temam and M. Lipasti. 2011. Automatic abstraction and fault tolerance in cortical micro-architectures . 38th Annual International Symposium on Computer Architecture, June 4-8, 2011, San Jose, CA, USA, 1-10.Google ScholarGoogle Scholar
  129. [129] Azizimazreah, A., Gu, Y., Gu, X. and Chen, L. 2018. Tolerating soft errors in deep learning accelerators with reliable on-chip memory designs. In 2018 IEEE International Conference on Networking, Architecture and Storage, October 11-14, 2018, Chongqing, China , 1-10.Google ScholarGoogle Scholar
  130. [130] Libano, F., Wilson, B., Anderson, J., Wirthlin, M.J., Cazzaniga, C., Frost, C. and Rech, P.2018. Selective hardening for neural networks in FPGAs. IEEE Transactions on Nuclear Science 66, 1(2018), 216-222.Google ScholarGoogle ScholarCross RefCross Ref
  131. [131] Eldridge, S. and Joshi, A. 2015. Exploiting hidden layer modular redundancy for fault-tolerance in neural network accelerators. In Proc. Boston area ARChitecture (BARC) Workshop.Google ScholarGoogle Scholar
  132. [132] Mahdiani, H.R., Fakhraie, S.M. and Lucas, C.2012. Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors. IEEE transactions on neural networks and learning systems 23, 8 (2012), 1215-1228.Google ScholarGoogle Scholar
  133. [133] Dimitrov, M., Mantor, M. and Zhou, H. 2009. Understanding software approaches for GPGPU reliability. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, March 8, 2009, Washington D.C., USA ,94-104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. [134] Jeon, H. and Annavaram, M.2012. Warped-DMR: Light-weight error detection for GPGPU. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, December 01-05, Vancouver, BC, Canada, 37-47.Google ScholarGoogle Scholar
  135. [135] Wadden, J., Lyashevsky, A., Gurumurthi, S., Sridharan, V. and Skadron, K. 2014. Real-world design and evaluation of compiler-managed GPU redundant multithreading. ACM SIGARCH Computer Architecture News 42, 3(2014), 73-84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. [136] Gupta, M., Lowell, D., Kalamatianos, J., Raasch, S., Sridharan, V., Tullsen, D. and Gupta, R. 2017. Compiler techniques to reduce the synchronization overhead of gpu redundant multithreading. In Proceedings of the 54th Annual Design Automation Conference, June 18-22 , 2017, Austin, TX, USA, 1-6.Google ScholarGoogle Scholar
  137. [137] Schorn, C., Guntoro, A. and Ascheid, G. 2018. Accurate neuron resilience prediction for a flexible reliability management in neural network accelerators. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, March 19-23, 2018,979-984.Google ScholarGoogle Scholar
  138. [138] dos Santos, F.F., Draghetti, L., Weigel, L., Carro, L., Navaux, P. and Rech, P. 2017. Evaluation and mitigation of soft-errors in neural network-based object detection in three GPU architectures. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), June 26-29, 2017, Denver, CO, USA, 169-176.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Lunardi, C., Previlon, F., Kaeli, D. and Rech, P. 2018. On the efficacy of ECC and the benefits of FinFET transistor layout for GPU reliability. IEEE Transactions on Nuclear Science 65, 8(2018), 1843-1850.Google ScholarGoogle ScholarCross RefCross Ref
  140. [140] Omar, H., Shi, Q., Ahmad, M., Dogan, H. and Khan, O. 2018. Declarative resilience: A holistic soft-error resilient multicore architecture that trades off program accuracy for efficiency. ACM Transactions on Embedded Computing Systems (TECS) 17,4(2018), 1-27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. [141] Mahmoud, A., Hari, S.K.S., Sullivan, M.B., Tsai, T. and Keckler, S.W.2018. Optimizing software-directed instruction replication for gpu error detection. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, November 11-16, 2018, Dallas, TX, USA , 842-854.Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. [142] Kalra, C., Previlon, F., Rubin, N. and Kaeli, D. 2020. Armorall: Compiler-based resilience targeting GPU applications. ACM Transactions on Architecture and Code Optimization 17, 2(2020), 1-24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. [143] M. Lapedus. 2021. The Great Quantum Computing Race. Retrieved Aug. 6, 2022. From https://semiengineering.com/thegreat-quantum-computing-race/Google ScholarGoogle Scholar
  144. [144] E. Gibney. 2020. Quantum Computer Race Intensifies as Alternative Technology Gains Steam. Retrieved Aug. 6, 2022. From https://www.nature.com/articles/d41586-020-03237-wGoogle ScholarGoogle Scholar
  145. [145] E. Pednault, J. Gunnels, D. Maslov, and J. Gambetta. 2019. On quantum supremacy. IBM Research Blog 21.Google ScholarGoogle Scholar
  146. [146] Arute, F., Arya, K., Babbush, R., Bacon, D., Bardin, J.C., Barends, R., Biswas, R., Boixo, S., Brandao, F.G., Buell, D.A. and Burkett, B. 2019. Quantum supremacy using a programmable superconducting processor. Nature 574, 7779 (2019), 505-510. https://doi.org/10.1038/s41586-019-1666-5Google ScholarGoogle ScholarCross RefCross Ref
  147. [147] Bobier, J.F., Langione, M., Tao, E. and Gourevitch, A. 2021. What happens when ‘if'turns to ‘when'in quantum computing? Boston Consulting Group.Google ScholarGoogle Scholar
  148. [148] A. Y. Kitaev. 1995. Quantum measurements and the Abelian stabilizer problem. arXiv: 9511026 Retrieved from https://arXiv.org/quant-ph/9511026.Google ScholarGoogle Scholar
  149. [149] M. A. Nielsen and I. Chuang.2002. Quantum computation and quantum information. Amer. J. Phys. 70, 5(2002), 558–559.Google ScholarGoogle ScholarCross RefCross Ref
  150. [150] Shor, P.W. 1994. Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th annual symposium on foundations of computer science, November 20-22 ,1994, Santa Fe, NM, USA ,124-134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. [151] Grover, L.K. 1996. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, May 22 - 24, 1996, Philadelphia, Pennsylvania, USA , 212-219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. [152] Wu, Y., Bao, W.S., Cao, S., Chen, F., Chen, M.C., Chen, X., Chung, T.H., Deng, H., Du, Y., Fan, D. and Gong, M. 2021. Strong quantum computational advantage using a superconducting quantum processor. Physical review letters 127, 18 (2021), 180501.Google ScholarGoogle Scholar
  153. [153]. Fellner, M., Messinger, A., Ender, K. and Lechner, W. 2022. Universal parity quantum computing. Physical Review Letters 129, 18(2022), 180503.Google ScholarGoogle ScholarCross RefCross Ref
  154. [154] Akhtar, M., Bonus, F., Lebrun-Gallagher, F.R., Johnson, N.I., Siegele-Brown, M., Hong, S., Hile, S.J., Kulmiya, S.A., Weidt, S. and Hensinger, W.K. 2023. A high-fidelity quantum matter-link between ion-trap microchip modules. Nature Communications 14, 1(2023),531. https://doi.org/10.1038/s41467-022-35285-3Google ScholarGoogle ScholarCross RefCross Ref
  155. [155] Kim, Y., Eddins, A., Anand, S., Wei, K.X., Van Den Berg, E., Rosenblatt, S., Nayfeh, H., Wu, Y., Zaletel, M., Temme, K. and Kandala, A. 2023. Evidence for the utility of quantum computing before fault tolerance. Nature 618, 7965(2023), 500-505. https://doi.org/10.1038/s41586-023-06096-3Google ScholarGoogle ScholarCross RefCross Ref
  156. [156] Wang, Y., Simsek, S., Gatterman, T.M., Gerber, J.A., Gilmore, K., Gresh, D., Hewitt, N., Horst, C.V., Matheny, M., Mengle, T. and Neyenhuis, B. 2023. Fault-Tolerant One-Bit Addition with the Smallest Interesting Colour Code. arXiv:2309.09893. Retrieved from https://arxiv.org/abs/2309.09893Google ScholarGoogle Scholar
  157. [157] Lechner, W., Hauke, P. and Zoller, P. 2015. A quantum annealing architecture with all-to-all connectivity from local interactions. Science advances 1, 9(2015), 1500838.Google ScholarGoogle Scholar
  158. [158] Lvovsky, A.I., Sanders, B.C. and Tittel, W. 2009. Optical quantum memory. Nature photonics 3, 12(2009), 706 -714.Google ScholarGoogle Scholar
  159. [159] Fu, X., Rol, M.A., Bultink, C.C., Van Someren, J., Khammassi, N., Ashraf, I., Vermeulen, R.F.L., De Sterke, J.C., Vlothuizen, W.J., Schouten, R.N. and Almudever, C.G. 2017. An experimental microarchitecture for a superconducting quantum processor. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, October 14 - 18, 2017,Cambridge Massachusetts, 813-825.Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. [160] Fu, X., Riesebos, L., Rol, M.A., Van Straten, J., Van Someren, J., Khammassi, N., Ashraf, I., Vermeulen, R.F.L., Newsum, V., Loh, K.K.L. and De Sterke, J.C. 2019. eQASM: An executable quantum instruction set architecture. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), February 16-20 , 2019,Washington, DC, USA , 224-237.Google ScholarGoogle ScholarCross RefCross Ref
  161. [161] IBM. 2018. IBMQ Backend Information. Retrieved from https://github.com/Qiskit/ibmq-device information. Accessed on 2018-11-01.Google ScholarGoogle Scholar
  162. [162] Caldwell, S.A., Didier, N., Ryan, C.A., Sete, E.A., Hudson, A., Karalekas, P., Manenti, R., da Silva, M.P., Sinclair, R., Acala, E. and Alidoust, N. 2018. Parametrically activated entangling gates using transmon qubits. Physical Review Applied 10, 3(2018), 034050.Google ScholarGoogle ScholarCross RefCross Ref
  163. [163] Debnath, S., Linke, N.M., Figgatt, C., Landsman, K.A., Wright, K. and Monroe, C. 2016. Demonstration of a small programmable quantum computer with atomic qubits. Nature 536, 7614 (2016), 63-66. https://doi.org/10.1038/nature18648Google ScholarGoogle ScholarCross RefCross Ref
  164. [164] Murali, P., Linke, N.M., Martonosi, M., Abhari, A.J., Nguyen, N.H. and Alderete, C.H. 2019. Full-stack, real-system quantum computer studies: Architectural comparisons and design insights. In Proceedings of the 46th International Symposium on Computer Architecture, June 22 - 26, 2019, Phoenix Arizona , USA, 527-540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. [165] IBM. 2018. IBM Qiskit. https://qiskit.org/. Accessed on 2018-08-05.Google ScholarGoogle Scholar
  166. [166] Rigetti. 2018. PyQuil. https://github.com/rigetticomputing/pyquil. Accessed on 2018-08-01.Google ScholarGoogle Scholar
  167. [167] Google. 2018. A Preview of Bristlecone, Google's New Quantum Processor. Retrieved August 05, 2018 from https://ai.googleblog.com/2018/03/a-preview-of-bristlecone-googles-new.html.Google ScholarGoogle Scholar
  168. [168] Suzuki, Y., Sugiyama, T., Arai, T., Liao, W., Inoue, K. and Tanimoto, T. 2022. Q3DE: A fault-tolerant quantum computer architecture for multi-bit burst errors by cosmic rays. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), October 01-05, 2022,Chicago, IL, USA ,1110-1125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. [169] McEwen, M., Faoro, L., Arya, K., Dunsworth, A., Huang, T., Kim, S., Burkett, B., Fowler, A., Arute, F., Bardin, J.C. and Bengtsson, A. 2022. Resolving catastrophic error bursts from cosmic rays in large arrays of superconducting qubits. Nature Physics 18, 1(2022), 107-111. https://doi.org/10.1038/s41567-021-01432-8Google ScholarGoogle ScholarCross RefCross Ref
  170. [170] Oliveira, D., Giusto, E., Dri, E., Casciola, N., Baheri, B., Guan, Q., Montrucchio, B. and Rech, P. 2022. Qufi: a quantum fault injector to measure the reliability of qubits and quantum circuits. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 27- 30, 2022, Baltimore, MD, USA , 137-149.Google ScholarGoogle ScholarCross RefCross Ref
  171. [171] Google Quantum AI. 2021.Exponential suppression of bit or phase errors with cyclic error correction. Nature 595, 7876(2021), 383–387. https://doi.org/10.1038/s41586-021-03588-yGoogle ScholarGoogle ScholarCross RefCross Ref
  172. [172] Kukkonen, H., Rovamo, J., Tiippana, K. and Näsänen, R. 1993. Michelson contrast, RMS contrast and energy of various spatial stimuli at threshold. Vision research 33, 10(1993), 431-1436.Google ScholarGoogle Scholar
  173. [173] Peres, A. 1985. Reversible logic and quantum computers. Physical review A 32, 6(1985), 3266.Google ScholarGoogle Scholar
  174. [174] Shor, P.W. 1995. Scheme for reducing decoherence in quantum computer memory. Physical review A 52, 4(1995), R2493.Google ScholarGoogle Scholar
  175. [175] Gottesman, D. 1997. Stabilizer codes and quantum error correction. arXiv: 9705052. Retrieved from https://arxiv.org/abs/quant-ph/9705052Google ScholarGoogle Scholar
  176. [176] Cai, W., Ma, Y., Wang, W., Zou, C.L. and Sun, L. 2021. Bosonic quantum error correction codes in superconducting quantum circuits. Fundamental Research 1, 1(2021), 50-67.Google ScholarGoogle ScholarCross RefCross Ref
  177. [177] Litinski, D. 2019. A game of surface codes: Large-scale quantum computing with lattice surgery. Quantum 3, 128.Google ScholarGoogle ScholarCross RefCross Ref
  178. [178] Dennis, E., Kitaev, A., Landahl, A. and Preskill, J. 2002. Topological quantum memory. Journal of Mathematical Physics 43, 9(2002), 4452-4505.Google ScholarGoogle ScholarCross RefCross Ref
  179. [179] Bacon, D. 2006. Operator quantum error-correcting subsystems for self-correcting quantum memories. Physical Review A 73, 1(2006), 012340.Google ScholarGoogle ScholarCross RefCross Ref
  180. [180] Gottesman, D. 1996. Class of quantum error-correcting codes saturating the quantum Hamming bound. Physical Review A 54, 3 (1996), 1862.Google ScholarGoogle ScholarCross RefCross Ref
  181. [181] Steane, A., 1996. Multiple-particle interference and quantum error correction. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 452, 1954(1996), 2551-2577.Google ScholarGoogle Scholar
  182. [182] Knill, E., Laflamme, R., Martinez, R. and Negrevergne, C.2001. Benchmarking quantum computers: the five-qubit error correcting code. Physical Review Letters 86, 25(2001), 5811.Google ScholarGoogle ScholarCross RefCross Ref
  183. [183] Shor, P.W.1995. Scheme for reducing decoherence in quantum computer memory. Physical review A 52, 4(1995), R2493.Google ScholarGoogle Scholar
  184. [184] Holmes, A., Jokar, M.R., Pasandi, G., Ding, Y., Pedram, M. and Chong, F.T. 2020. NISQ+: Boosting quantum computing power by approximating quantum error correction. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, May 30 – June 3, 2020, Virtual event , 556-569.Google ScholarGoogle Scholar
  185. [185] Ueno, Y., Kondo, M., Tanaka, M., Suzuki, Y. and Tabuchi, Y. 2021. Qecool: On-line quantum error correction with a superconducting decoder for surface code. In 2021 58th ACM/IEEE Design Automation Conference, December 5 - 9, 2021, San Francisco, CA, USA, 451-456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  186. [186] Das, P., Pattison, C.A., Manne, S., Carmean, D.M., Svore, K.M., Qureshi, M. and Delfosse, N. 2022. AFS: Accurate, fast, and scalable error-decoding for fault-tolerant quantum computers. In 2022 IEEE International Symposium on High-Performance Computer Architecture, April 02-06, 2022, Seoul, Korea, 259-273.Google ScholarGoogle Scholar
  187. [187] Ueno, Y., Kondo, M., Tanaka, M., Suzuki, Y. and Tabuchi, Y. 2022. QULATIS: A Quantum Error Correction Methodology toward Lattice Surgery. In 2022 IEEE International Symposium on High-Performance Computer Architecture, April 02-06, 2022, Seoul, Korea, 274-287.Google ScholarGoogle Scholar
  188. [188] Das, P., Locharla, A. and Jones, C. 2022. LILLIPUT: a lightweight low-latency lookup-table decoder for near-term Quantum error correction. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, February 28- March 04, 2022, Lausanne Switzerland, 541-553.Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. [189] Vittal, S., Das, P. and Qureshi, M. 2023. Astrea: Accurate Quantum Error-Decoding via Practical Minimum-Weight Perfect-Matching. In Proceedings of the 50th Annual International Symposium on Computer Architecture, June 17 - 21, 2023, Orlando, FL, USA, 1-16Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. [190] Ravi, G.S., Baker, J.M., Fayyazi, A., Lin, S.F., Javadi-Abhari, A., Pedram, M. and Chong, F.T. 2023. Better than worst-case decoding for quantum error correction. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 25 - 29, 2023, Vancouver, BC, Canada, 88-102.Google ScholarGoogle Scholar
  191. [191] Google Quantum AI. 2023. Suppressing quantum errors by scaling a surface code logical qubit. Nature 614, 7949 (2023), 676–681. https://doi.org/10.1038/s41586-022-05434-1 .Google ScholarGoogle ScholarCross RefCross Ref
  192. [192] Krinner, S., Lacroix, N., Remm, A., Di Paolo, A., Genois, E., Leroux, C., Hellings, C., Lazar, S., Swiadek, F., Herrmann, J. and Norris, G.J.2022. Realizing repeated quantum error correction in a distance-three surface code. Nature 605, 7911(2022), 669-674.Google ScholarGoogle Scholar
  193. [193] Vittal, S., Das, P. and Qureshi, M. 2023. ERASER: Towards Adaptive Leakage Suppression for Fault-Tolerant Quantum Computing. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, October 28– November 1 , 2023, Toronto, ON, Canada, 509-525.Google ScholarGoogle Scholar
  194. [194] Balkind, J., Lim, K., Schaffner, M., Gao, F., Chirkov, G., Li, A., Lavrov, A., Nguyen, T.M., Fu, Y., Zaruba, F. and Gulati, K. 2020. BYOC: a" bring your own core" framework for heterogeneous-ISA research. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, March 16 - 20, 2020, Lausanne Switzerland, 699-714.Google ScholarGoogle ScholarDigital LibraryDigital Library
  195. [195] Foutris, N., Kotselidis, C. and Luján, M. 2019. Simulating Wear-out Effects of Asymmetric Multicores at the Architecture Level. In 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, October 02-04, 2019,Noordwijk, Netherlands,1-6.Google ScholarGoogle Scholar
  196. [196] Li, A., Ning, A. and Wentzlaff, D. 2023. Duet: Creating Harmony between Processors and Embedded FPGAs. In 2023 IEEE International Symposium on High-Performance Computer Architecture, Feb. 25 - March 1, 2023, Montreal, QC, Canada, 745-758.Google ScholarGoogle Scholar
  197. [197] Leng, J., Buyuktosunoglu, A., Bertran, R., Bose, P., Chen, Q., Guo, M. and Reddi, V.J.2020. Asymmetric resilience: Exploiting task-level idempotency for transient error recovery in accelerator-based systems. In 2020 IEEE International Symposium on High Performance Computer Architecture , Feb. 22 -26, 2020, San Diego, CA, USA, 44-57.Google ScholarGoogle Scholar
  198. [198] Papadimitriou, G. and Gizopoulos, D. 2023. Avgi: Microarchitecture-driven, fast, and accurate vulnerability assessment. In 2023 IEEE International Symposium on High-Performance Computer Architecture, February 25 – March 01, 2023 Montreal, QC, Canada, 935-948.Google ScholarGoogle Scholar
  199. [199] Papadimitriou, G. and Gizopoulos, D. 2021. Demystifying the system vulnerability stack: Transient fault effects across the layers. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, June 14-18, 2021, Valencia, Spain, 902-915.Google ScholarGoogle Scholar
  200. [200] Tyagi, A., Gan, Y., Liu, S., Yu, B., Whatmough, P. and Zhu, Y.2022. Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators. arXiv:2212.02649. Retrieved from https://arxiv.org/abs/2212.02649Google ScholarGoogle Scholar
  201. [201] Chatzidimitriou, A., Bodmann, P., Papadimitriou, G., Gizopoulos, D. and Rech, P. 2019. Demystifying soft error assessment strategies on arm CPUs: Microarchitectural fault injection vs. neutron beam experiments. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 24 -27, Portland, OR, USA, 26-38.Google ScholarGoogle Scholar
  202. [202] Hussain, Z., Znati, T. and Melhem, R. 2020. Enhancing reliability-aware speedup modelling via replication. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 29 - July 02, 2020, Valencia, Spain , 528-539.Google ScholarGoogle ScholarCross RefCross Ref
  203. [203] Agiakatsikas, D., Papadimitriou, G., Karakostas, V., Gizopoulos, D., Psarakis, M., Belanger-Champagne, C. and Blackmore, E. 2023. Impact of Voltage Scaling on Soft Errors Susceptibility of Multicore Server CPUs. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, October 28 - November 1, 2023, Toronto, ON, Canada, 957-971.Google ScholarGoogle ScholarDigital LibraryDigital Library
  204. [204] Papadimitriou, G and Gizopoulos, D.2023,.Silent Data Corruptions: Microarchitectural Perspectives in IEEE Transactions on Computers 72, 11(2023), 3072-3085.Google ScholarGoogle ScholarDigital LibraryDigital Library
  205. [205] Zhang, Y. and Jung, C. 2022. Featherweight soft error resilience for GPUs. In 2022 55th IEEE/ACM International Symposium on Microarchitecture, October 1-5, 2022, Chicago, IL, USA, 245-262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  206. [206] Sullivan, M.B., Hari, S.K.S., Zimmer, B., Tsai, T. and Keckler, S.W.2018. SwapCodes: Error codes for hardware-software cooperative gpu pipeline error detection. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture, October 20-24, 2018, Fukuoka, Japan, 762-774.Google ScholarGoogle ScholarDigital LibraryDigital Library
  207. [207] Raghunandana, K.K., BKSVL, V., Reorda, M.S. and Singh, V. 2023. TREFU: An Online Error Detecting and Correcting Fault Tolerant GPGPU Architecture. In 2023 IEEE 29th International Symposium on On-Line Testing and Robust System Design, July 03-05, 2023 Crete, Greece, 1-7.Google ScholarGoogle Scholar
  208. [208] Raghunandana, K.K., BKSVL, V., Reorda, M.S. and Singh, V. 2022. REFU: Redundant Execution with Idle Functional Units, Fault Tolerant GPGPU architecture. In 2022 IEEE Computer Society Annual Symposium on VLSI, July 04-06 , 2022 Nicosia, Cyprus, 394-397.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Computing Surveys
    ACM Computing Surveys Just Accepted
    ISSN:0360-0300
    EISSN:1557-7341
    Table of Contents

    Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Online AM: 6 May 2024
    • Accepted: 14 April 2024
    • Revised: 9 April 2024
    • Received: 10 September 2022

    Check for updates

    Qualifiers

    • survey
  • Article Metrics

    • Downloads (Last 12 months)122
    • Downloads (Last 6 weeks)122

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader