Within HYMAGINE, circuits of increasing complexity have been conceived from simple non-volatile logic gates to microcontrollers or microprocessor.
Below is an example of magnetic Look-Up-Table (MLUT) conceived within HYMAGINE and an example of hybrid CMOS/MTJmicroprocessor.
Magnetic LUT for FPGA
Our purpose was to develop a radiation-hardened FPGA based on MRAM. A FPGA (Field Programmable Gate Array) is a standard hardware circuit whose functionality can be changed by programming elementary logic functions and interconnects (Fig.1). A FPGA is composed of elementary logic functions called LUT (Look Up Tables) programmable by an operating code stored in a configuration memory. LUTs consist of a memory associated with a multiplexer. They constitute reprogrammable logic gates. The inputs of the gate constitute the memory address. The output of the gate is the content of the memory at the input address. The truth table of the LUT is defined by the information stored in the memory. By changing the memory content, the functionality of the logic gate can be changed. These LUTs are interconnected by a programmable network of interconnections.
Fig.1: LUT (left) which form the basic tile of FPGA (right)
The LUT designed within HYMAGINE use TAS-MRAM technology developed in collaboration with Crocus Technology. The MLUT design is shown in Fig.2.
Fig.2: MLUT design for radiation-hard FPGA
This MLUT was actually built and successfully tested in WP5 using a 130nm CMOS technology and the TAS-MRAM technology (Fig.3). Its full functionality was demonstrated (Fig.4).
Fig.3: First demonstrator of radiation-hard and fully reprogrammable FPGA based on 2-inputs logic gates (non-volatile Magnetic LUT) combining CMOS transistors with Thermally Assisted MRAM
Fig.4: Demonstration of on-fly reprogrammability of the logic gates from the “0010” function to the “0110” function.
Design of hybrid CMOS/magnetic circuits
As the technology node shrinks, the leakage current increases exponentially in deep submicron CMOS, so that new strategies are required in integrated systems to save power without limiting processing performances. One of the solutions is to rely on non-volatile storage elements and to integrate them into processing systems, that is to say the microprocessor. Among these devices, Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ) seems to be an attractive device which is CMOS-process-compatible and promises power-efficient circuits such as Magnetic Random Access memory (MRAM) or NV registers. Based on the STT-MTJ, Flip-Flops (FF) circuits, widely used in synchronous digital system, have a direct impact on speed and power consumption of synchronous digital systems. Figure 5 shows a simplified non-volatile FF (NVFF) architecture. The designed cell offers the opportunity to use the usual CMOS flip-flop functionality with almost the typical FF structure. In addition, it enables storing and restoring the magnetic data by exploiting the non-volatility asset of MTJs when the circuit is powered off.
Fig.5: MTJ-based Non-volatile Flip Flop
Figure 6 shows the 1T1MTJ array architecture of the MRAM. Each MTJ is connected in series with an n-type FET. A row decoder is required for the selection of the word line indicated by the memory address (X address). A column decoder selects the column designated by the memory address (Y address). The write block is used to configure the logic state of the bit cell by passing a positive current or a negative current through the selected MTJ, respectively. The read block determines the logic value of the bit cell by comparing the voltage of the bit line (BL) to a reference voltage.
Fig.6 : STT-MRAM architecture
A few years ago, when the project was launched (2010), our goal was to demonstrate the benefits of the non-volatility within processing systems, using Rohm’s non-volatile 8-bit processor based on Zilog legendary z80 architecture where non-volatile registers were integrated by means of ferroelectric elements, thus showing up to 80% reduction of the power consumption. However, due to the slow writing speed and the limited endurance of the FeRAM technology, this kind of microprocessor is promised a very short lifetime and a limited range of applications.
Not only did we manage to implement a functional processor integrating MTJ instead of ferroelectric elements, but we also achieved such good results in terms of timings and density using the designed non-volatile-flip-flops integrating our MTJs that we decided to raise the challenge and build a better and more powerful processor deserving our STT technology.
The best candidates are found in the lower end of the 32-bit CPU class, with enough computing resources to enable any application and with a modular architecture offering a high potential for power saving. Therefore we used in this work our own modified version of the well-known Open-RISC processor: the OR1200 which is a 32-bit scalar RISC with Harvard microarchitecture which includes a 5-stages integer pipeline (fetch, decode, execute, memory, write back), a virtual memory support and basic DSP capabilities.
Our modified OpenRISC processor provides a fast interface to a scratchpad memory located just behind the MMU (Memory Management Unit), which makes instruction fetches and data load/stores very fast. It is not seen by the peripherals connected to the Wishbone interface and remains a protected private area for the processor. Figure 7 shows the architecture of the complete system including the OpenRISC with its fast scratchpad memory (QMEM) and the peripherals connected over a double Wishbone bus, including 1MB dense block memories (BMEM).
Fig.7 : Architecture of the complete system including the OpenRISC with its fast 256kB scratchpad memories (QMEM) and the peripherals connected over a double Wishbone bus, including 1MB dense block memories (BMEM).
The SoC depicted in Figure 7 has been compiled and synthesized with our own STT-MRAM designs represented in Figure 6 as well as with standard SRAM blocks of the same size in the 40nm low-power CMOS technology from STMicroelectronics. The target frequency of 400MHz is reached without any big effort, the design is DRC and LVS clean for all memory and logic blocks.
For the benchmark, we measured the energy consumption of the CPU plus all the logic of the chip and tested the functionality of the system using four different but very representative usage schemes, which were performing LZW data compression as seen in Figure 9, for which memory activity means read/write accesses, as well as data remembering : a) short compression periods of sparse data packets between which the system is powered off and the memory can be turned into the standby mode, b) same with bigger data packets between which the system is powered off and the memory can be turned into the standby mode, c) large amount of small data packets that cause long processing time for the compression during which the memory must be kept powered on, and d) series of larger and dependent data packs that need to be kept in memory between processing phases.
Fig.8: Memory/processor usage schemes investigated in the benchmark
According to Figure 7, input and output raw data is stored in the block memories (BMEM) and the data scratchpad is used as working memory during execution. Considering that a standard embedded SRAM suffers of a high leakage current in order to maintain the consistency of the storage, the energy consumption can be calculated and compared with our system. Even though STTMRAM consumes 25× and 100× more energy when reading and writing respectively, their influence on the benchmark results remains negligible. Figure 9 reports the energy numbers in Joules (J) for 4 usage cases of Figure 8. Due to the fact that NVM such as STT-MRAM have zero leakage current when not used, they are an extremely good candidate for low-power SoC with regards to power consumption. Our benchmark showed energy saving gains ranging from 50% up to 85% for applications where data arrives sparsely and must be memorized for later processing.
Fig.9 : Benchmark of the processor power consumption in 4 usage cases of Figure 8 in the case of STT-MRAM based non-volatile processor and SRAM based CMOS-only processor. Saving gains ranging from 50% up to 85% are found for the STT-MRAM based non-volatile processor.
Publications associated with WP4 and WP5 :
Radiation-hardened MRAM-based FPGA
Gonçalves, O., G. Prenat and B. Dieny
IEEE Transactions on Magnetics 49 (2013) 4355
Non-volatile FPGAs based on spintronic devices
Gonçalves, O., G. Prenat, G. di Pendina and B. Dieny
Proceedings of the 50th Annual Design Automation Conference (2013) 126
Non-volatile runtime-reconfigurable FPGA secured through MRAM-based periodic refresh
Gonçalves, O., G. Prenat, G. di Pendina, C. Layer and B. Dieny
Proceedings of the 5th IEEE International Memory Workshop (2013) 170
Field-current phase diagrams of in-plane STTRAM cells with low effective magnetization storage layers
San Emeterio Alvarez, L., B. Lacoste, B. Rodmacq, L.E. Nistor, M. Pakala, R.C. Sousa and B. Dieny
Journal of Applied Physics 115 (2014) 17D502
A novel architecture of non-volatile magnetic arithmetic logic unit using magnetic tunnel junctions
Guo, W. ; Prenat, G. ; Dieny, B.
Journal of Physics D: Applied Physics <b< 47=”” <=”” b=””>(2014) 165001