Profile
International Journal of Computer & Software Engineering Volume 3 (2018), Article ID 3:IJCSE-131, 15 pages
https://doi.org/10.15344/2456-4451/2018/131
Original Article
A Low-Energy Multi-Threaded Processor Design for Application Specific Embedded Systems

Mahanama Wickramasinghe and Hui Guo*

School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia
Dr. Hui Guo, School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia; E-mail: huig@cse.unsw.edu.au
22 January 2018; 15 March 2018; 17 March 2018
Wickramasinghe M, Guo H (2018) A Low-Energy Multi-Threaded Processor Design for Application Specific Embedded Systems. Int J Comput Softw Eng 3: 131. doi: https://doi.org/10.15344/2456-4451/2018/131

References

  1. Harizopoulos S, Ailamaki A (2006) Improving instruction cache performance in OLTP. ACM Transactions on Database Systems, 31: 887-920. View
  2. Joseph PM, Rajan J, Kuriakose KK, Murty SAVS (2013) Exploiting SIMD instructions in modern microprocessors to optimize the performance of stream ciphers. International Journal of Computer Network and Information Security, 5: 56-66. View
  3. Zhang K, Wang YH, Chen SM, Li ZT, Wen L, et al. (2013) Customized MMRF: Efficient matrix operations on SIMD processors. Applied Mechanics and Materials 347-350: 1727-1731. View
  4. Huang L, Xiao N, Wang Z, Wang Y, Lai M, et al. (2013) Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP. Parallel Computing, 39: 586-602. View
  5. Welch E, Patru D, Saber E, Bengtson K (2012) A study of the use of SIMD instructions for two image processing algorithms. In Proceedings of the Western New York Image Processing Workshop (WNYIPW). View
  6. Wickramasinghe M, Guo H (2014) Energy-aware thread scheduling for embedded multi-threaded processors: Architectural level design and implementation. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI). View
  7. Cleary J, Callanan O, Purcell M, Gregg D (2013) Fast asymmetric thread synchronization. ACM Transactions on Architecture and Code Optimization 27: 1-27. View
  8. Wang X, Zhao Y, Wei Y, Song S, Han B (2010) Prophet synchronization thread model and compiler support. In Processing, the International Symposium on Parallel and Distributed Processing with Applications (ISPA). View
  9. Anderson JH, Ahmed T, Kalman SS (2015) Thread synchronization by transitioning threads to spin lock and sleep state. April 7 2015. US Patent 9,003,413..
  10. Atta I, Tozun P, Tong X, Ailamaki A, Moshovos A, et al. (2013) STREX: Boosting instruction cache reuse in OLTP workloads through stratified transaction execution. In Proceedings of International Symposium on Computer Architecture. View
  11. Nickolls JR, Lew SD, Coon BW, Mills PC (2010) Synchronization of threads in a cooperative thread array. August 31 2010. US Patent 7,788,468..
  12. Foo YC (2012) Synchronization of Execution Threads on a Multi-threaded processor, US Patent 8286180B2.
  13. Zhang W, Liu F, Fan R (2014) Cache matching: Thread scheduling to maximize data reuse. In Proceedings of the High Performance Computing Symposium, Tampa, Florida.
  14. Huang YH, Tseng YY, Kuo YK, Yen TK, Lai BCC, et al. (2013) A localityaware dynamic thread scheduler for GPGPUs. In Proceedings of the International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pages, Taipei, Taiwan. View
  15. Rogers TG, O’Connor M, Aamodt TM (2013) Cache-conscious thread scheduling for massively multithreaded processors. IEEE Micro, 33: 78-85. View
  16. Rogers TG, O’Connor M, Aamodt TM (2012) Cache-conscious wavefront scheduling. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45. IEEE Computer Society. View
  17. Liang Y, Mitra T (2010) Instruction cache locking using temporal reuse profile. In Proceedings of the 47th Design Automation Conference, DAC, New York, NY, USA, ACM 10: 344-349. View
  18. Liu T, Li M, Xue CJ (2012) Instruction cache locking for embedded systems using probability profile. Journal of Signal Processing Systems 69: 173- 188. View
  19. Qiu K, Zhao M, Xue CJ, Orailoglu A (2014) Branch prediction-directed dynamic instruction cache locking for embedded systems. Transactions on Embedded Computer Systems 13: 1-156. View
  20. Anand K, Barua R (2015) Instruction-cache locking for improving embedded systems performance. ACM Transactions on Embedded Computer Systems 14: 1-53. View
  21. Buck B, Hollingsworth JK (2000) An API for runtime code patching. International Journal of High Performance Computing Applications 14: 317- 329. View
  22. Burger D, Austin TM (1997)The simplescalar tool set, version 2.0. ACM 25: 13-25. View
  23. Synopsys Design Compiler. http://www.synopsys.com. View
  24. TSMC 65nm GP Standard Cell Libraries - tcbn65gplus. https://www.cmc. ca/en/WhatWeOffer/Products/CMC-00200-01411.aspx. View
  25. Thoziyoor S, Ahn JH, Monchiero M, Brockman JB, Jouppi NP (2008) A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the 35th International Symposium on Computer Architecture. View
  26. Modelsim Simulator. http://www.mentor.com/products/fv/modelsim. View