# Toshiba Fields Embedded MIPS Processor Processor and Core Designs Placed Between R3000, R4000 Families

#### by James L. Turley

Seeking to make a name for itself in the increasingly popular embedded RISC market, Toshiba is bringing out its own rendition of the MIPS architecture. Offered as either an ASIC core or a packaged processor, the new design is targeted at high-performance embedded applications and approaches R4000-level performance in a small low-power package. Competing head-on with existing ASIC cores from LSI Logic and embedded MIPS chips from IDT and NEC, the new Toshiba design will have a tough time finding a niche where it can flourish.

The new part, dubbed the R3900 to indicate its proximity to the R4000 sector of MIPS space, is based on the MIPS-I instructions and 32-bit registers of the R3000A, but adds performance enhancements. Notable among these are register scoreboarding and a fast multiplier that performs a  $32 \times 32$  multiply-accumulate in just two clock cycles.

#### Enhancements Approach R4000 Speed

Toshiba's marketing department scores points for honesty by not giving the new design an R4000-series part number. Firmly based on an R3000A core, the chip nevertheless delivers nearly as many Dhrystone MIPS as the original R4000. Its 32-bit registers and lack of conditional traps or stores prevent it from truly laying claim to the R4000 name.

The R3900 instruction set is a hybrid lying somewhere between the MIPS-I and MIPS-II definitions. It implements all standard R3000A instructions except coprocessor load/store operations, plus it has many of the enhancements first seen on the R4000 and a few of Toshiba's own additions.

Static branch prediction is supported with the branch-likely instructions that are a part of the MIPS-II architecture. The use of register scoreboarding, which is unique for this class of device, enables nonblocking loads, thus avoiding pipeline stalls when there are no data dependencies in subsequent instructions.

Toshiba engineers spent most of their effort enhancing the R3900's integer multiplier. In place of the standard (and sluggish) MIPS multiply unit, the R3900 has a fast, two-cycle integer multiplier that the new multiply-accumulate and three-operand multiply instructions use to good advantage.

The optional 4K instruction cache is direct mapped and is not lockable. Real-time programmers wishing to force certain time-critical routines into (or out of) the cache must locate them carefully in the address space and rely on the direct-mapping algorithm to do its stuff.

The 1K data cache, on the other hand, is two-way set-associative and supports locking on a per-line basis. This configuration is useful for storing small lookup tables in local storage. The cache is write-through, the logical choice for most embedded systems.

## Small Design Runs at Low Voltage

What a difference 1.5 volts makes! Although the R3900 is rated for operation as low as 1.8 V, clock speed peaks at only 15 MHz at that voltage. The frequency limit ramps linearly between 1.8 V and the 3.3-V maximum, where the R3900 hits its 50-MHz limit. Regardless of voltage, the minimum frequency bottoms out at 5 MHz. The clock may be stopped to save power, but the range between DC and 5 MHz is off limits.

The core is a diminutive  $15 \text{ mm}^2$  with no cache. Adding 5K of cache nearly doubles the die size to 26 mm<sup>2</sup>. Not coincidentally, the part is fabricated in the 0.6micron two-layer-metal process used in Toshiba's TC200 ASIC line. Although the die size is certainly not excessive, Figure 1 shows that it is considerably larger than LSI's comparable CW4001 design.

## R3901 Packages Core, MMU, Debug Unit

Toshiba has encased the new core within the R3901, a complete embedded CPU chip. The R3901, also called



Figure 1. Comparing the R3900 with other embedded processor cores shows that Toshiba has a lead over other architectures but hasn't caught up with LSI Logic's CW4010.

#### MICROPROCESSOR REPORT

Southern Cross, takes the basic R3900 core with the caches and adds a write buffer, a simple MMU, a real-time trace/debug module, and a bus interface.

The external interface is exceptionally easy to use, particularly for a MIPS processor. The 32-bit address and data buses are demultiplexed, with straightforward control signals and timing referenced to a single processor/bus clock. Four byte enables, cache-snoop control, two-wire arbitration, and a byte-order select pin make the R3901 an easy tiger to tame. For cache fills, the burst length is programmable from 16 to 128 bytes. The CPU stalls until bursts are completed, so long bursts trade memory bandwidth for instruction latency.

The chip has several software-selectable clocking options for power savings. The internal CPU pipeline and external bus have separately selectable speeds. The CPU clock can be divided by 1, 2, 4, or 8; the bus can run at the same rate or at half of the CPU's speed.

Various power-saving measures keep the current consumption of the R3901 reasonable. With everything running at full speed, a 50-MHz R3901 typically consumes 165 mA at 3.3 V, or about 550 mW. The CPU frequency can be scaled, as described above; at the slowest rate, total power drops by more than half. Progressively shutting off the processor, the caches and snoop logic, and the external interface drops power to 210 mW, 200 mW, and 0.1 mW, respectively.

These numbers place the R3901 near the bottom of the energy-consumption scale, even for a low-voltage device. A large portion of that power is devoted to the caches, MMU, debug module, PLL, and interrupt logic. Unlike the CPU core, these sections can't be slowed down; they consume either 200 mW or virtually nothing.

# Toshiba Up Against LSI and IDT

Toshiba is aiming directly at LSI's CoreWare program with the R3900. Both are nominal MIPS-I cores with extensions; both have a multiply-add instruction; both support static branch prediction. But Toshiba has the faster multiplier and nonblocking loads to LSI's multiply-subtract and conditional trap instructions.

The differences are most evident in the two cores' physical makeup. The CW4001 is a tiny 3.5 mm<sup>2</sup> without a cache—one-fourth the size of the R3900. Yet both claim similar performance of 45 and 52 Dhrystone MIPS (with caches) at similar clock rates (60 vs. 50 MHz) with similar processes (0.5 vs. 0.6 micron). Where LSI cut corners, Toshiba added on. The CW4001's three-stage pipeline and unified instruction and data bus sacrifice performance for die size vs. the five-stage dual-bus R3900.

Toshiba's features can be particularly alluring to PDA designers, who might use the fast multiply-accumulate function to implement a software-only modem, for example. It wouldn't be too surprising if the ASIC agreement between Toshiba and General Magic (see

# Price & Availability

Toshiba produced first silicon of the R3900 core and the R3901 processor in January. Limited sampling is expected to begin in April, with general sampling in July. The R3901 in a 160-lead PQFP costs \$30 in 1,000piece quantities. For more information, contact Toshiba America Electronic Components (San Jose, Calif.) at 800.879.4963; fax 408.456.9002.

090204.PDF) turns out to have an R3900 in the middle of it.

For packaged devices, the R3901 is comparable to IDT's R3051 (see MPR 10/3/90, p. 6). Although the two firms cooperate on R4600-series development, the R3900 is Toshiba's alone. IDT has a market advantage in its broad line of pin-compatible MIPS chips. At this time, the R3901 is a point product; the next step up is a big one, to Toshiba's 15-watt R4000PC chip.

The MIPS camp suffers from not having an obvious corporate cheerleader, unlike PowerPC or SPARC. So far, LSI has staked out the MIPS-in-an-ASIC business for itself, and it has the broad cell library to make it attractive. IDT has the easy-entry market, starting with \$15 chips and going up into the hundreds. Sony's surprise move to use LSI (*see 080902.PDF*) proves that Toshiba has no leg up with Japanese consumer electronics makers. The R3900's distinction is not in being the cheapest, or the first, or the smallest, but in offering just the right features. This should put Toshiba into the thick of a burgeoning ASIC core market. ◆



Figure 2. The R3901 Southern Cross MIPS processor measures  $8.48 \times 8.48$  mm in a 0.65-micron two-layer-metal process.