Electronics And Communication Engineering, Annamacharya Institute of Technology And Sciences, Kadapa
Approximate computing offers a pathway to boost hardware efficiency for error-resilient tasks such as neural networks and image processing by prioritizing speed over absolute accuracy. This briefing introduces an ultra-efficient approximate multiplier featuring error compensation. It integrates a constant compensation term for the least significant half of the product, achieving a nuanced hardware-accuracy equilibrium. Further refining precision is a low-complexity error compensation module (ECM). Through simulation with HSPICE utilizing 7nm tri-gate FinFET technology, this design significantly amplifies the energy-delay product, surpassing both exact and existing approximate designs by averages of 77% and 54% respectively. MATLAB simulations corroborate its accuracy, aligning closely with exact multipliers commonly employed in neural networks, boasting an average Peak Signal-to-Noise Ratio (PSNR) exceeding 51dB in image multiplication. Consequently, it emerges as a compelling substitute for precise multipliers in practical, error-resilient applications.
The digital circuit industry is in a state of rapid evolution, driven by the escalating density and intricacy of digital integrated circuits at deep nanometer dimensions. In response, circuit designers are delving into innovative approaches across various design abstraction levels. One such pragmatic solution to craft energy-efficient nanoscale digital circuits is approximate computing. In numerous practical applications, absolute accuracy is dispensable. Tasks like artificial neural networks (ANNs), speech and image recognition, and multimedia applications can withstand a degree of inaccuracy while yielding meaningful results. This tolerance for imprecision presents an opportunity to curtail circuit parameters like transistor count, power consumption, delay, and area by sacrificing a degree of accuracy. Multiplication serves as a cornerstone arithmetic operation in microprocessors and digital signal processing units. Especially in neural networks, where convolution layers in convolutional neural networks (CNNs) entail a multitude of multiplication-accumulation (MAC) operations, minimizing multipliers' energy consumption and hardware costs holds paramount significance. In the pursuit of hardware-efficient solutions for error-resilient applications, designing efficient approximate multipliers emerges as a promising avenue. Two primary methodologies exist for designing approximate multipliers. The first integrates approximate adders and compressors into conventional multiplier structures, while the second modifies the multiplier structure itself to achieve an approximate design. Effective error mitigation in such multipliers frequently involves deploying error compensation modules (ECMs). By amalgamating efficient ECMs with appropriate truncation techniques, designers can craft efficient approximate multipliers. This brief introduces a novel approximate multiplier endowed with an ultra-efficient error compensation module. The proposed design features an ultra-low-complexity structure, significantly reducing the number of transistors and energy consumption compared to its counterparts. Concurrently, it offers a level of accuracy apt for error-resilient applications like neural networks and image processing. The proposed multiplier comprises three primary components: a constant-truncated region enabling a hardware-accuracy trade-off, a novel efficient error compensation module, and an exact part. To devise the ultra-efficient ECM, we harness the negativeness of all error distances (EDs) stemming from the constant-truncated part. Additionally, by assuming that input bits to a multiplier follow a uniform distribution, inputs with higher error distances have lower probabilities of occurrence. Consequently, we propose an error compensation module tailored to rectify errors in scenarios with high occurrence probability. Remarkably, our ECM design leverages only two four-input OR gates, necessitating a total of 20 transistors. Furthermore, we delve into the integration of the proposed approximate multiplier in neural networks and image processing applications. Our investigations indicate that the proposed design achieves an ultra-efficient balance between hardware efficiency and accuracy in error-resilient applications.
REVIEW OF LITERATURE
1. Arasteh et al. (2018) - Integr. VLSI J.
The paper introduces an energy-efficient 4:2 compressor design based on FinFET technology. Leveraging FinFET technology to design an energy-efficient 4:2 compressor. Limited scalability beyond FinFET technology.
2. Ahmadinejad and Moaiyeri (2022) - IEEE Trans. Emerg. Topics Comput.
The paper presents energy- and quality-efficient approximate multipliers tailored for neural network and image processing applications. Designing multipliers optimized for specific application domains to achieve energy and quality efficiency. Potential limitations in applicability to domains outside neural networks and image processing.
3. Ansari et al. (2018) - IEEE J. Emerg. Sel. Topics Circuits Syst.
This paper discusses low-power approximate multipliers utilizing encoded partial products and approximate compressors. Leveraging encoded partial products and approximate compressors to design low-power approximate multipliers. Possible trade-offs between power efficiency and accuracy.
4. Strollo et al. (2020) - Proc. IEEE Int. Symp. Circuits Syst.
The paper proposes a low-power approximate multiplier with error recovery using a new approximate 4-2 compressor. Introducing a new approximate 4-2 compressor design with error recovery mechanisms for low-power multipliers. Potential overhead in error recovery mechanisms affecting overall performance.
5. Strollo et al. (2022) - IEEE Trans. Circuits Syst. I, Reg. Papers
Investigates approximate multipliers using static segmentation with error analysis and improvements. Employing static segmentation in approximate multipliers and analyzing errors to enhance accuracy. Potential complexity increase due to static segmentation.
6. Afzali-Kusha et al. (2020) - IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
Explores the design of energy-efficient accuracy-configurable Dadda multipliers with improved lifetime based on voltage overscaling. Investigating the design of Dadda multipliers with configurable accuracy and enhanced lifetime using voltage overscaling techniques. Increased complexity in voltage overscaling techniques may impact design scalability.
7. Pei et al. (2021) - IEEE Trans. Circuits Syst. II, Exp. Briefs
Focuses on the design of ultra-low power consumption approximate 4–2 compressors based on the compensation characteristic. Developing ultra-low power 4–2 compressors by leveraging compensation characteristics. Potential limitations in achieving high accuracy with ultra-low power design constraints.
8. Pilipovic et al. (2021) - IEEE Trans. Circuits Syst. I, Reg. Papers
Proposes a two-stage operand trimming approximate logarithmic multiplier. Introduces a novel two-stage operand trimming technique for logarithmic multipliers. Increased complexity due to the two-stage trimming process may impact area efficiency.
9. Esposito et al. (2018) - IEEE Trans. Circuits Syst. I, Reg. Papers
Discusses approximate multipliers based on new approximate compressors. Introducing novel approximate compressors for the design of approximate multipliers. Potential challenges in achieving high accuracy with novel compression techniques.
10. Ha and Lee (2018) - IEEE Embedded Syst. Lett.
Presents multipliers with approximate 4–2 compressors and error recovery modules. Integrates error recovery modules with approximate 4–2 compressors in multiplier design. Increased hardware overhead due to error recovery modules may affect area efficiency.
11. Han et al. (2015) - IEEE Trans. Comput.
Discusses the design and analysis of approximate compressors for multiplication. Analyzes the performance and accuracy of various approximate compressors for multiplication operations. Limited focus on error recovery mechanisms may impact overall accuracy.
12. Kim et al. (2022) - IEEE Trans. Emerg. Topics Comput.
Investigates the effects of approximate multiplication on convolutional neural networks. Studies the impact of approximate multiplication techniques on the performance of convolutional neural etworks. Potential degradation in network accuracy due to approximate multiplication.
13. Kumar et al. (2022) - IEEE Embedded Syst. Lett.
Proposes low-power compressor-based approximate multipliers with error correcting module. Introduces compressor-based approximate multipliers with error correction modules. Increased hardware complexity due to error correction modules may impact power efficiency.
14. Leon et al. (2018) - IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
Discusses approximate hybrid high radix encoding for energy-efficient inexact multipliers. Explores hybrid encoding techniques for designing energy-efficient inexact multipliers. Increased complexity in encoding techniques may impact scalability.
15. Leon et al. (2019) - Proc. 56th Annu. Des. Autom. Conf. (DAC)
Introduces cooperative arithmetic-aware approximation techniques for energy-efficient multipliers. Presents cooperative techniques for arithmetic-aware approximation in multiplier design. Complexity in cooperative techniques may impact design scalability.
16. Liu et al. (2018) - IEEE Trans. Circuits Syst. I, Reg. Papers
Discusses the design and evaluation of approximate logarithmic multipliers for low-power error-tolerant applications. Evaluates the performance of approximate logarithmic multipliers for low-power error-tolerant applications. Potential accuracy trade-offs in error-tolerant applications.
17. Yin et al. (2021) - IEEE Trans. Sustain. Comput.
Explores the design and analysis of energy-efficient dynamic range approximate logarithmic multipliers for machine learning. Investigates the design of dynamic range approximate logarithmic multipliers optimized for machine learning applications. Potential limitations in achieving high accuracy with dynamic range approximation techniques.
18. LeCun et al. (2010) - AT&T Labs
Introduces the MNIST handwritten digit database. Provides a dataset of handwritten digits for use in machine learning and pattern recognition research. Limited to providing a dataset without discussing specific methodologies or applications.
19. Netzer et al. (2011) - Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn.
Discusses reading digits in natural images with unsupervised feature learning. Presents techniques for reading digits in natural images using unsupervised feature learning methods. May lack detailed discussion on the implementation and performance of the proposed techniques.
20. Zhou et al. (2017) - Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
Focuses on the classification of distorted images with deep convolutional neural networks. Investigates the use of deep convolutional neural networks for classifying distorted images. Potential challenges in achieving high accuracy with distorted image classification.
PROPOSED APPROXIMATE MULTIPLIER
The proposed approximate multiplier offers a novel approach to partial product reduction, as illustrated in Figure-1. Given the widespread use of low-bit multipliers in machine learning and image multiplication applications, we specifically focus on the 8-bit Dadda multiplier for illustrative purposes, although our design can be readily extended to accommodate larger bit widths.
Fig-1: Partial Product Reduction of the Proposed Approximate Multiplier
The architecture of our proposed multiplier comprises three main components: a constant-truncated part, an error correction module, and an exact part. In the constant-truncated region, the partial products are not generated for the first eight least significant columns, where an 8-bit constant product of "00000110" is employed. This constant-truncated region utilizes inaccurate compressors, resulting in errors for all inputs, as detailed in Table I.
Table-1: Error Distances for All Input Combinations.
Despite only one output being accurate, our design exhibits several noteworthy features that contribute to its ultra-efficient operation and compromise between hardware efficiency and accuracy. Firstly, by generating accurate outputs for the most likely input combination of "0000", we prioritize accuracy where it is most needed. Secondly, leveraging the negative error distances (EDs) for inaccurate inputs, we design an efficient error compensation module to mitigate errors. Another significant aspect of our design is the relationship between EDs and input probabilities. Since each input's ED is equal to the number of ones in that input, states with higher EDs are less likely to occur. Assuming uniform distribution of input bits, we can derive probabilities for inputs with varying numbers of ones. For instance, the probability of an input with zero ones is 31.5%, while inputs with one, two, three, and four ones have probabilities of 10.5%, 3.5%, 1.1%, and 0.4%, respectively. Based on this probabilistic analysis, our error compensation module targets inputs with higher occurrence probabilities to effectively mitigate errors. By utilizing two 4-input OR gates comprising a total of 20 transistors, we determine the errors corresponding to specific input combinations highlighted in Table I. The output of these OR gates, serving as the Carry input to subsequent compressors, substantially compensates for the impact of negative EDs. It's worth noting that the presence of only negative EDs is well-suited for neural network applications, particularly due to the ReLU activation function. Moreover, while truncating the least significant columns typically enhances hardware efficiency, it may introduce relative accuracy loss. However, this loss can be mitigated by replacing the three least significant bits of the products with a constant correction term of "110", computed as the average value of input combinations in the least significant columns. By incorporating this constant correction term, we eliminate 28 two-input AND gates from the partial production stage and reduce the hardware overhead of subsequent adder and compressor circuits. Figure 1 illustrates the reduction part of our proposed multiplier, showcasing the utilization of a half adder, full adders, and 4:2 exact compressors in the first stage, followed by additional full adders and compressors in the second stage. Finally, a ripple carry adder (RCA) composed of half adders and full adders generates the product. The ultra-low complexity and efficient structure of our proposed multiplier yield significant reductions in hardware overhead, delay, and power consumption, while maintaining high accuracy suitable for error-resilient applications like neural networks.
SIMULATION RESULTS AND COMPARISONS
This section presents the simulation results and comparisons of various exact and approximate multipliers, focusing on performance metrics, accuracy analysis, Pareto diagrams, and applications in neural networks and image processing.
A. Hardware Analysis
Simulation of the proposed multiplier and other exact and approximate multipliers was conducted using HSPICE with a 7nm FinFET model. Operating at a supply voltage of 0.7V and frequency of 2GHz, the critical path delay and power consumption were evaluated. Table II summarizes the performance comparison, showcasing the superiority of the proposed multiplier in terms of delay, power, power-delay product (PDP), energy-delay product (EDP), and area. Notably, the proposed multiplier outperforms existing designs, particularly those with error compensation modules (ECMs), indicating its ultra-efficient structure.
B. Accuracy Analysis
The accuracy of the approximate multipliers was assessed using MATLAB simulations, measuring metrics such as normalized mean error distance (NMED), mean relative error distance (MRED), and number of effective bits (NoEB). Table III presents the accuracy metrics, demonstrating the superior performance of the proposed multiplier compared to counterparts with error compensation capability. Despite maintaining high accuracy, the proposed design exhibits significant improvements in hardware metrics, as shown in Table II.
C. Pareto Diagrams
Pareto diagrams, depicted in Fig. 2, were constructed to visualize the trade-offs between accuracy and energy efficiency. Multipliers positioned near the bottom-left corner of the diagrams offer more effective trade-offs. The results indicate that the proposed multiplier achieves a better balance between accuracy and energy efficiency compared to alternative designs with and without ECMs.
D. Applications
The efficacy of the approximate multipliers in neural network applications was evaluated using two structures: multilayer perceptron (MLP) for MNIST dataset classification and convolutional neural network (CNN) for SVHN dataset classification. Fig. 3 illustrates the classification accuracies using the multipliers under investigation. The proposed approximate multiplier achieves classification accuracies of 95.5% for MNIST and 81.3% for SVHN, comparable to exact designs, highlighting its effectiveness in neural network applications. For image processing applications, image multiplication was considered, and performance was evaluated using metrics such as peak signal-to-noise ratio (PSNR) and mean structural similarity index (MSSIM). Table IV presents the average PSNR and MSSIM values for approximate image multiplications. The proposed multiplier offers PSNR values exceeding 51dB and MSSIM values approaching 1.0, indicating high-quality results suitable for image processing tasks.
Table II: Performance Comparison of Various Multipliers
Table III: Accuracy Metrics of the Approximate Multipliers
Fig-2: Pareto diagrams (a) PDP vs. NMED (b) PDP vs. MRED.
Fig. 3: Classification Accuracies Using the Multipliers Under Investigation
Table IV: Average PSNR and MSSIM Values for Approximate Image Multiplications
This comprehensive evaluation underscores the potential of the proposed approximate multiplier to meet the demands of error-resilient applications while optimizing hardware efficiency.
CONCLUSION
This brief introduces an ultra-efficient approximate multiplier comprising a constant-truncated region, an error correction module, and an exact part. By omitting outputs in the truncated region, accurate results are obtained for the "0000" input, which is the most common scenario. Utilizing negative error distances (EDs) for inputs with higher EDs, an efficient error correction module compensates for errors in probable cases. The design's low complexity reduces energy-delay product (EDP) significantly, by 77% and 54% compared to existing exact and approximate multipliers. Moreover, the proposed multiplier exhibits high accuracy and quality in error-resilient applications like neural networks and image processing. Thus, it emerges as a compelling alternative to exact multipliers in practical error-resilient scenarios.
REFERENCES
Kanala Vijaya Swathi, S. Lakshmi Kanth, An Ultra-Efficient Approximate Multiplier with Error Compensation for Error-Resilient Applications, Int. J. in Engi. Sci., 2024, Vol 1, Issue 5, 16-24. https://doi.org/10.5281/zenodo.13889040