# IDT's R3081 Adds FPU, Larger Cache to R3052

Designed for ARC Systems, High-End Embedded Applications

#### **By Michael Slater**

IDT, which last year lowered the entry-level price point for MIPS-based embedded controllers with its R3051 and R3052 "RISControllers," is introducing a high-end version that adds an on-chip, R3010-compatible FPU, has larger caches, and offers several refinements to ease system design. The new chip, the R3081, is aimed at two separate markets: high-performance embedded applications, and low-end ACE/ARC systems. Figure 1 shows the block diagram for an R3081based ARC system.

The 3081 doubles the 3052's caches, providing a 16K instruction cache and a 4K data cache. The caches can also be configured as 8K each for instructions and data, so applications can select the cache partitioning that is most effective. The cache configuration can even



Figure 1. Block diagram of an R3081-based ARC computer system.

be changed dynamically, allowing the application program to change the partitioning for selected algorithms—though it is hard to imagine many designers taking advantage of this (and the caches must, of course, be flushed when the configuration is changed).

Like the 3051 and 3052, the 3081 includes 4-worddeep read and write buffers, provides a multiplexed bus interface, is packaged in an 84-pin PGA or PLCC, and will be offered at clock rates of 20, 25, 33, and 40 MHz. Maximum operating current ranges from 650 mA at 20 MHz to 900 mA at 40 MHz. While most of the 3051 and 3052 applications run at 20 to 33 MHz to keep cost to a minimum, the 3081 is aimed at high-end applications and the 40-MHz part is expected to be the most popular; a future 50-MHz version is planned.

Also like the 3051/52, the 3081 will be offered in versions with or without the on-chip MMU; the 3081E is

the version with the MMU. Both versions use the same silicon, with only a bonding option and the testing program distinguishing them. Many embedded applications don't need the MMU, and the MMU-less version provides a fixed address translation that eliminates the need to set up the MMU.

## **Design Enhancements**

The 3081 also incorporates a number of small design improvements. The data-cache refill size can now be dynamically set to either 1 or 4 words; in the 3051, the refill size is selected at reset and cannot be changed dynamically. Parity was also added to the caches. The bus interface was modified to allow an external bus master to invalidate selected cache lines, but full snooping logic is not provided; the external device must specify the cache line to invalidate. This is intended primarily for applications using external DMA controllers, which can either maintain a second set of tags for snooping or can blindly invalidate cache lines based only on the cache index.

Several changes have been made to simplify high-speed system operation. The clock input can be programmed as a  $2\times$  input for compatibility with the 3051/52, or as a  $1\times$  input to halve the frequency of the external oscillator. Separately, the system bus can be programmed to operate at one-half of the

|                             | IDT R3081                                | Perf. Semi. PIPER     | IDT R3051/52           | LSI LR33000                                  | LSI LR33020                                   |
|-----------------------------|------------------------------------------|-----------------------|------------------------|----------------------------------------------|-----------------------------------------------|
| On-Chip FPU                 | Yes                                      | Yes                   | No                     | No                                           | No                                            |
| Special Features            | Half-speed bus option,<br>Low-power mode | Real-time tracing bus | Low Price Versions     | Interface logic<br>(see below)               | Graphics coprocessor, video & I/O controllers |
| Instruction Cache           | 16K or 8K                                | 8K or 4K              | 8K (3052)<br>4K (3051) | 8K                                           | 4K                                            |
| Data Cache                  | 4K or 8K                                 | 2K or 4K              | 2K                     | 1K                                           | 1K                                            |
| Snooping                    | W/External HW                            | No                    | No                     | Yes                                          | Yes                                           |
| I-Cache Refill Size (words) | 4                                        | 4                     | 4                      | 1, 2, 4, 8, or 16                            | 1, 2, 4, 8, or 16                             |
| I-Cache Line Size (words)   | 4                                        | 4                     | 4                      | 4                                            | 4                                             |
| D-Cache Refill Size (words) | 1 or 4                                   | 1 or 4                | 1 or 4                 | 1, 2, 4, 8, or 16                            | 1, 2, 4, 8, or 16                             |
| D-Cache Line Size (words)   | 1                                        | 1                     | 1                      | 4                                            | 4                                             |
| TLB                         | Yes (E version only)                     | Yes                   | Yes (E versions only)  | No                                           | No                                            |
| Static Design               | No                                       | No                    | No                     | Yes                                          | Yes                                           |
| Write Buffer                | 4-Level                                  | 4-Level               | 4-Level                | 1-Level                                      | 4-Level                                       |
| DRAM Control                | No                                       | No                    | No                     | Yes                                          | Yes                                           |
| 8-Bit ROM Support           | No                                       | Yes                   | No                     | Yes                                          | Yes                                           |
| Programmable Wait States    | No                                       | No                    | No                     | Yes                                          | Yes                                           |
| Breakpoint Registers        | No                                       | No                    | No                     | Yes                                          | Yes                                           |
| Dedicated Trace Pins        | No                                       | Yes                   | No                     | No                                           | No                                            |
| Timers                      | None                                     | 2 General-Purpose     | None                   | 2 General-Purpose<br>1 Refresh               | 2 General-Purpose<br>1 Refresh                |
| Address/Data Buses          | Multiplexed                              | Multiplexed           | Multiplexed            | Non-Multiplexed                              | Non-Multiplexed                               |
| Max. Current (40 MHz)       | 900 mA                                   | 1.3 A                 | 600 mA                 | 600 mA                                       |                                               |
| Packages                    | 84-pin PLCC or PGA                       | 160-pin QFP           | 84-pin PLCC or PGA     | 155-pin PGA<br>160-pin PQFP<br>(25 MHz only) | 155-pin PGA<br>160-pin PQFP                   |
| Clock Speeds                | 20, 25, 33, 40                           | 33, 40                | 20, 25, 33, 40         | 25, 33, 40                                   | 25                                            |
| Clock Input                 | 1× or 2×                                 | 1×                    | 2×                     | 1×                                           | 1×                                            |
| Price (20/25 MHz, 1000s)    | \$130                                    | —                     | \$45 (R3051)           | \$99                                         | \$129                                         |
| Price (40 MHz, 1000s)       | \$260                                    | \$192                 | \$150 (R3052)          | \$161                                        | _                                             |

Table 1. Key characteristics of R3000-based integrated processors.

processor's clock rate. By selecting the  $1\times$  clock and half-speed bus options, a 40-MHz 3081 can be plugged into an existing, 20-MHz 3051 design. Then, like Intel's forthcoming clock-doubler 486 chips, the 3081 operates at twice the speed of the processor it replaces as long as its memory access requirements are satisfied by the on-chip cache. Using the standard  $2\times$  clock input option, the 3081 can be used to upgrade any 3051/52 design, increasing performance with the larger caches and on-chip FPU.

To enable the clock output to drive more loads without a skew-inducing buffer, the SysClk output drive has been increased. To avoid bus contention problems at high clock rates, the bus protocol has also been modified to allow an idle cycle to be optionally inserted when a write cycle immediately follows a read cycle.

In systems using a standard R3000 CPU and R3010 FPU, the interrupt output from the R3010 is connected to one of the R3000's interrupt inputs to signal a floating-point exception. The 3081 allows the on-chip FPU's interrupt output to be internally connected to any of the CPU's six interrupt inputs.

The 3081, like all other devices based on the R3000, is dynamic and does not allow the clock to be stopped or arbitrarily slowed. The 3081 does include a software-se-

lectable power-reduction mode, however, in which the processor clock is divided by 16. Enabling this mode cuts the clock rate of a 40-MHz system to 2.5 MHz and slashes the maximum power consumption from 900 mA to 250 mA. The 3081 also includes a HALT instruction that stops the processor until an interrupt or reset occurs, but this is not a low-power stop mode; it is provided simply as a software convenience.

The configuration options in the 3081 are controlled by a new configuration register, mapped to a reserved location in coprocessor zero (CP0, which is really part of the CPU and handles exceptions and memory management). The register defaults to 3051-compatibility mode, and it includes a "lock" bit that can prevent software from accidentally modifying it.

#### **Comparing MIPS Processors**

The 3081 is functionally similar to the R4000PC (the version without secondary cache support), in that it combines a MIPS CPU, FPU, on-chip caches, and a programmable (though much simpler) bus interface. At under 180K square mils in 0.8-micron technology, the 3081 is less than half the size of the R4000 in the same process. It also uses a much-cheaper 84-pin package instead of the R4000PC's 179-pin PGA. Surprisingly,

# Price & Availability

First silicon of the R3081 is expected in March, with limited sampling in April and general sampling in May. Production-qualified parts are expected by mid-year. Pricing for the PLCC version without the MMU is \$130 at 20 MHz, \$190 at 33 MHz, and \$260 at 40 MHz, all in quantities of 1000. In quantities of 10,000, the 20-MHz part is projected to sell for under \$100, while the 40-MHz version will sell for under \$200. The price premium for the R3081E, which includes the MMU, is about 5%.

Integrated Device Technology, 3236 Scott Blvd., Santa Clara, CA 95054-3090; 408/727-6116; fax 408/492-8674.

the 3081 has more on-chip cache memory than the R4000, and IDT estimates that a 40-MHz 3081 will be only 10–20% slower than a 50-MHz R4000. Note, however, that the clock rate and cache size of the R4000 will increase rapidly in the next two years.

For low-end ARC systems, the 3081 will therefore be an attractive alternative to the R4000. At under \$200 for the 40-MHz version in volume, the 3081 provides a lower-cost solution than either a discrete R3000 design or the R4000. It should be comparable in integer performance to Intel's top-of-the-line, 50-MHz 486, which costs \$610 in thousands, and its floating-point performance will be much higher, yet its price matches Intel's much slower, low-end 486SX-16. If the MIPS version of Windows NT takes off, systems based on chips such as the 3081 will have significant volume potential.

Table 1 summarizes the key features of embedded processors based on R3000 cores. The 3081's most direct competitor is Performance Semiconductor's PIPER (see  $\mu$ PR 9/18/91 p. 1), which is the only other device that includes an on-chip FPU. PIPER's caches are smaller, however, and it lacks refinements such as the halfspeed bus option. Like the 3081, two different cache configurations are available: 4K/4K or 8K instruction/2K data. PIPER's cache configuration is determined at reset, however, and cannot be changed dynamically. PIPER is in a much larger package (160 pins), due in part to its dedicated instruction tracing bus that allows instruction fetches from cache to be monitored externally. Unlike the 3081, PIPER includes two general-purpose timers and supports 8-bit program memory. At \$192 for the 40-MHz version in 1000s, PIPER is cheaper than the 3081, but negotiated volume prices are likely to be comparable.

The 3081 and PIPER are most significant because of their potential use in ARC systems; they won't dramatically expand the market for MIPS-based embedded controllers, since most embedded applications don't need floating-point. Neither PIPER nor the 3081 offers features such as programmable chip selects, automatic wait-state generation, DRAM control logic, DMA controllers, or serial ports that can significantly reduce chip count in many embedded applications, nor do they provide cache locking capability that can be useful in real-time applications. For most embedded applications, some assortment of these features would be more valuable than an on-chip FPU or a larger cache. On the other hand, different applications need different combinations of features; devices with additional features are more expensive (because of both silicon area and pin count), and tend to have narrower markets. IDT is following the strategy of providing the most cost-effective, general-purpose device possible.

## **Embedded MIPS Processors Established**

IDT plans to continue proliferating its MIPS-based processor line. A low-end version that is even less expensive than the \$30 3051 (10K units, 20-MHz) is expected to sample in the third quarter of this year, with production later in the year. IDT won't yet give details on this device, other than to say that it will be bus- and software-compatible with the 3051, but there are several obvious opportunities for cost-reduction. The 3051 is made from the same silicon as the 3052, which has twice as large an instruction cache. By redesigning the chip with a smaller cache and eliminating the MMU, IDT can significantly reduce its costs.

In addition to the 30xx family devices, a low-cost derivative of the R4000 is now in the final stages of definition and is likely to appear in early '93. Another derivative that seems sensible but is not in IDT's current plans is an integer-only version of the 3081, which would provide the larger caches and other enhancements for applications that don't need floating-point hardware.

MIPS-based embedded processors are becoming a well-established market segment. Each vendor offers a different assortment of features, giving designers a range of choices while maintaining software compatibility. The differences among the IDT, LSI, and Performance Semiconductor parts mean that there is no true alternate sourcing among these vendors. For IDT's 30xx chips, however, Siemens will be an alternate source. Siemens and IDT have entered into a joint development agreement that will lead to an increasing proliferation of MIPS-based processors, all of which are expected to be available from both companies.

Since the emergence of embedded RISCs, the key battle has been between AMD's 29000 and Intel's 960. MIPS-based embedded processors now show promise of providing strong competition for those two processor families. Not only do the MIPS-based chips offer competitive price and performance, but users can enjoy the benefits of a multivendor market.  $\blacklozenge$