# SPARC Hits Low End With TI's microSPARC

High Level of Integration Lowers System Cost



# By Brian Case

Texas Instruments and Sun Microsystems have formally unveiled the highly integrated microSPARC processor chip for use in low-cost SPARC workstations. Previously called Tsu-

nami, microSPARC integrates an integer unit, a floating-point unit, separate instruction and data caches, an MMU, and glueless interfaces to DRAM and the SBus expansion bus.

Formally known as the TMS390S10, microSPARC will help Sun address new price/performance points at the low end of the SPARC workstation business. MicroSPARC dramatically reduces the number of chips required to build a processor subsystem at the SPARCstation 2 performance level by integrating the entire processor subsystem with the logic to interface the processor to DRAM and the SBus expansion bus.

A two-chip set from NCR that further improves the level of system integration is possibly as important as microSPARC itself. The 89C100 and 89C105 peripheral interface chips connect directly to the SBus and implement disk, network, and other essential peripheral interfaces. They do not, however, include a graphics display controller.

# Chip Integration

MicroSPARC is a highly integrated chip in the tradition of nearly all high-end processors: it has onchip integer and floating-point execution units, separate instruction and data caches, and a TLB-based (SPARC reference) MMU, as shown in Figure 1. MicroSPARC goes beyond this standard level of inte-



Figure 1. The microSPARC block diagram shows the separate small, direct-mapped caches and the direct interfaces to DRAM and the SBus.

gration to include a direct DRAM interface and glueless SBus interface. (SBus I/O addresses are translated by the main MMU.) This level of integration is similar to Intel's 386SL, which implements DRAM control, cache tags and control, and PC/AT bus control.

The chip integrates about 800K transistors using a 0.8-micron (0.65-micron L-effective) CMOS process with two layers of metal. At 15 mm  $\times$  15 mm, the die size seems quite large for such a modest number of transistors; in contrast, SuperSPARC integrates over three times as many transistors in only slightly more area. The low density of microSPARC reflects the short design cycle, use of automatic tools, and low-cost two-layer metal fabrication process. At the Microprocessor Forum, Anant Agrawal of Sun stressed the benefits of the reusable microSPARC core. Still, microSPARC may be a relatively expensive chip to produce.

The chip will be offered only in a single-layer TAB (tape automated bonding) package with 288 leads (191 signal I/Os). TI says it was surprised at the large number of potential customers that were willing to use TAB packages. MicroSPARC is a 5-volt chip, and power dissipation is 3.25 W typical at 50 MHz (4.5 W worst case). TI says the TAB package gives excellent thermal dissipation characteristics.

These power requirements seem to be in conflict with plans to push microSPARC into low-power notebook and PDA applications, but since the chip is a fully static design, the clock frequency can be modulated to reduce power dissipation.

MicroSPARC's integer unit uses a standard fivestage pipeline that executes most integer instructions in one cycle. Stores of 32 bits or less and 64-bit loads take two cycles, while 64-bit stores take three cycles.

> MicroSPARC implements the SPARC version-8 architecture, which includes integer multiply and divide; in this implementation, these instructions do not use the floating-point unit. Table 1 compares integer multiply and divide latencies for various SPARC processors.

> MicroSPARC's integer register file implements seven register windows for a total of 120 registers; the 7C601, which is used in the SPARCstation 2, has eight register windows.

> The floating-point unit uses a core licensed from Meiko, Ltd., a U.K. minisupercomputer company. It is a relatively

#### MICROPROCESSOR REPORT

| Microprocessor                          | Int. mult | Int. div | FP add<br>SP/DP | FP mult<br>SP/DP | FP div<br>SP/DP | FP sq. rt.<br>SP/DP |
|-----------------------------------------|-----------|----------|-----------------|------------------|-----------------|---------------------|
| microSPARC                              | 19        | 39       | 5/5             | 5/9              | 20/35           | 37/65               |
| 7C601 + TMS390C602A<br>(SPARCstation-2) | n/a       | n/a      | 3/3             | 3/5              | 15/25           | 21/31               |
| SuperSPARC                              | 4         | 15       | 3/3             | 3/3              | 6/7             | 8/10                |
| hyperSPARC                              | 17        | 36       | 3/3             | 3/3              | 10/14           | 13/19               |

Table 1. MicroSPARC floating-point latencies (in cycles) are good but slower than SuperSPARC and hyperSPARC. Note that actual microSPARC cycle counts are data dependent; the numbers shown are averages.

simple design that achieves good—but not state-of-theart—performance in a relatively small area. The integer unit uses 33 mm<sup>2</sup> of die area while the FPU takes a total of only 27.6 mm<sup>2</sup>. The FPU has a single functional unit, so only one floating-point operation can be executing at once. The FPU is not pipelined. Table 1 compares the floating-point latencies of various SPARC microprocessors for single- and double-precision operands.

The caches are small by current standards: 4K for instructions and 2K for data. While it is not reasonable to expect microSPARC to live up to the standards of its big brother SuperSPARC with 36K of cache, even the 486 and 68040—previous-generation CISCs—outdo microSPARC. The direct-mapped organization of the caches gives the advantage of a simple implementation with the disadvantage of the lowest hit rate of any organization. On the other hand, the direct DRAM interface reduces miss cost compared to a bus-based DRAM interface. Line sizes are 32 bytes for the I-cache and 16 bytes for the D-cache. The data cache is writethrough. Because of the direct DRAM interface, it is not possible to add a second-level cache to a microSPARC system.

Cache misses are satisfied in four cycles with pagemode DRAM accesses or nine cycles if there is a pagemode miss. The cost of cache misses is reduced by using the critical-word-first refill order, which delivers the needed word from DRAM first regardless of its alignment with respect to the cache block, allowing the processor to continue execution immediately. A singleentry, eight-byte write buffer lets the processor proceed past a single store instruction.

Memory management conforms to the SPARC reference MMU specification, which means the MMU uses a TLB instead of Sun's venerable dual-level SRAM structure. The TLB is fully associative with 32 entries, and TLB misses are satisfied by a hardware table-walk algorithm. The MMU has a six-bit context register, and the memory page size is 4K, 256K, or 16M.

In an implementation like microSPARC with separate, physically addressed caches, the natural organization is to have separate instruction and data TLBs to



Die photo of microSPARC, which includes 800,000 transistors on a 15 mm square die.

allow concurrent cache accesses. The disadvantage of two TLBs is the chip area required. The R3000 proved that it is acceptable to implement an instruction "micro-TLB"—a TLB with just a few entries—without compromising performance because instruction accesses have a high degree of locality.

MicroSPARC implements the smallest micro-TLB possible: it has a single entry. While this may seem too small, the micro-TLB is only accessed when the main TLB is unavailable because of a load or store conflict. Also, the micro-TLB is updated whenever the main TLB is accessed, so the likelihood that it contains the needed translation is maximized. When the micro-TLB is accessed and misses, the pipeline stalls for one to three cycles.

MicroSPARC also includes JTAG boundary scan and on-chip clock generation and control. The clock controller allows the processor clock to be stopped for debugging or power management at a precise point based on internal events or the state of an external pin. Thus, microSPARC could be used in low-power notebook computers when mated with a power-management chip. The clock controller also allows the system to be single stepped, but the usefulness of stopping the clock and single stepping may be limited since stopping the clock also stops DRAM refresh.

# System Integration

MicroSPARC reduces the number of components needed to build a SPARC workstation in several ways.

### MICROPROCESSOR REPORT

Compared to existing SPARC computers, the on-chip caches and MMU eliminate several external components. While this level of integration is relatively new to SPARC microprocessors, it has, of course, been taken for granted in every other microprocessor family. MicroSPARC does innovate, however, in the areas of DRAM and expansion bus control. No other high-end microprocessor intended for workstation-class applications has such a high level of system integration.

The on-chip DRAM controller generates RAS, CAS, and WE for up to four banks of DRAM. Each bank can be up to 32M for a maximum main memory of 128M. A full-speed memory system with a 50-MHz processor frequency requires 60-ns DRAMs and delivers a peak

memory bandwidth of 115MB/s (eight bytes every three cycles assuming page-mode hits). The interface is truly glueless—no buffers are required for small DRAM arrays, but large arrays require address buffers to boost drive capability.

Support for memory parity has become a standard feature for most highend processors. MicroSPARC implements parity, but the memory controller uses word parity instead of byte parity. Parity is optional. The upside of a larger granularity is cost savings—fewer parity bits implies fewer DRAMs and less board space—but the downside is an inability to detect double-bit errors, and slower byte and half-word writes, since they require read-modify-write cycles to insure that the parity bit is correctly set for the entire word.

While the inclusion of a glueless DRAM controller on microSPARC lowers the cost and eases the implementation of a typical system, it also precludes the use of a second-level cache to

increase performance. Because microSPARC is intended only for low-cost systems, this may not be much of a handicap.

Since its introduction a couple of years ago in Sun's SPARCstation family, the SBus has become the standard expansion bus in SPARC systems. MicroSPARC carries this trend to its logical extreme: SBus is essentially microSPARC's processor bus. Since microSPARC implements a complete processor subsystem on chip and has a dedicated DRAM interface, there is really no need for a traditional processor bus.

Like its DRAM interface, microSPARC's SBus interface is glueless. The on-chip controller implements the state sequencer, and it has arbitration logic for up to six SBus master slots. Since the microSPARC itself occupies one of the master slots, the maximum number of external SBus slots is actually five. With a 50-MHz microSPARC, the SBus operates at 25 MHz.

One of the system requirements not addressed directly by microSPARC is the area of standard system peripherals. This is an important consideration because workstations include an unusually rich set of system peripherals, at least by the standards of personal computers.

To complement microSPARC, a set of two companion chips is available from NCR. Both chips connect directly to the SBus, and despite the two-chip implementation, they occupy only one logical SBus slot ad-

> dress. One chip is a master/slave while the other is a slave-only device. Thus, a realistic microSPARC workstation will have four available SBus slots (one more than the current SPARCstation 2 and the same as the SPARCstation 10), although one of those slots must be used for a video interface.

> The master/slave chip—the 89C100—implements a DMA controller, a parallel port, an Ethernet interface, and a SCSI controller. The slaveonly chip—the 89C105—implements an auxiliary eight-bit bus interface, some counter/timers, an interrupt controller, serial ports, mouse and keyboard interfaces, and a floppy disk controller.

> The combination of the rich set of peripherals and the clean SBus interface should make these two chips popular in all SPARC workstations, although MBus disk and network interfaces might be a better match to the requirements of faster systems. Perhaps NCR has planned an MBus

version of this chip set.

# Analysis & Conclusions

MicroSPARC brings to the workstation market a dramatic reduction in cost through an equally dramatic improvement in system integration. According to Sun, its previous low-end, entry-level machine—the 40-MHz Fujitsu 86903-based SPARCstation IPX—has a processor subsystem requiring 29 chips with a total cost of over \$500. The equivalent of those 29 chips is integrated into microSPARC for less than \$200. Of course, microSPARC uses much less board space and also reduces power consumption from over 20 Watts to less than 4.



"The goal was to get the die size to a target size of  $15 \times 15$  mm and then stop iterating on the design. There is a lot of room for improvement still on the chip and one could actually reduce this die size substantially."

Anant Agrawal, Sun Microsystems

| Microprocessor                                            | SPECint92 | SPECfp92  |  |
|-----------------------------------------------------------|-----------|-----------|--|
| microSPARC (50 MHz)                                       | 22.8      | 18.4      |  |
| 7C601 + TMS390C602A (40 MHz)<br>(SPARCstation 2)          | 21.8      | 22.8      |  |
| SuperSPARC (36 MHz, no L-2 cache)<br>(SPARCstation 10/30) | 44.2      | 52.9      |  |
| hyperSPARC (66.7 MHz)                                     | 62 (est.) | 64 (est.) |  |
| 486DX (33 MHz, 256 KB L-2 cache)                          | 19.5      | 8.9       |  |
| 486DX2 (66 MHz, 256 KB L-2 cache)                         | 32.4      | 16.1      |  |

Table 2. A microSPARC-based workstation will offer SPARCstation-2-level performance, but a high-end 486based personal computer easily beats it in integer performance and nearly equals it in floating-point.

A microSPARC-based motherboard can be built with only 13 chips (including microSPARC, the two NCR chips, a bootstrap ROM, some buffers, and an NVRAM), three SBus connectors, eight SIMM slots, and five peripheral connectors for serial ports, SCSI, keyboard, mouse, and the network. This is the kind of low-chip-count system engineering previously seen only in the highly refined, low-end offerings from Apple and other PC manufacturers. Note that the NCR chip set provides an unusually high level of integration: not even PC chip sets have integrated SCSI or Ethernet interfaces.

Sun and TI claim this microSPARC system delivers performance roughly equivalent to the mid-range SPARCstation 2. (The SuperSPARC-based SPARC-station 10 is Sun's current high-end personal workstation). Table 2 compares SPEC92 results for various SPARC and 486 microprocessors. The integer performance is, in fact, slightly better than that of the SPARCstation 2, but the floating-point performance falls a little short.

The embarrassing comparison is with current highend PC technology. As shown in Table 2, a 66-MHz 486DX2 machine—which requires only an inexpensive 33-MHz motherboard—far surpasses the integer performance of microSPARC and nearly equals it for floating-point applications.

MicroSPARC also compares unfavorably to the 486 (and nearly every other microprocessor, for that matter) in terms of fabrication technology. MicroSPARC is over 2.5 times as big as the current 486 (2.25 cm<sup>2</sup> vs.  $0.84 \text{ cm}^2$ ), and the 486 implements more cache. The 486 is not as highly integrated, but it at least permits the use of a second-level cache. As an example of what a really aggressive design can yield, the PowerPC 601 (*see 061401.PDF*) implements over three times as many transistors in a little more than half the die area.

# Price & Availability

Samples of microSPARC (TMS390S10) are available from TI now for \$500. The 10K volume price will be \$179 with volume quantities available in the fourth quarter.

Texas Instruments Incorporated, Semiconductor Group (SC-92093), P.O. Box 809066, Dallas, TX, 75380-9066, 214/995-6611, ext. 3990.

Samples of NCR's 89C100 and 89C105 SBus peripheral chips are available now, with production planned for December. Pricing in quantities of 1000 is \$120 for the 89C100 and \$80 for the 89C105. Both are in 160-pin PQFP packages.

NCR, 2001 Danfield Court, Fort Collins, CO 80525; 800/334-5454; 303/226-9500.

Perhaps microSPARC should be compared to the 386SL, which has a similar level of system integration (the 386SL has on-chip cache tags instead of a complete on-chip cache). Even though the 386SL is fabricated in a 1-micron process, at  $1.7 \text{ cm}^2$ , it uses 25% less chip area than microSPARC.

To some degree this comparison is unfair. Clearly, the most important characteristics of microSPARC for Sun and TI were that it be completed quickly and correctly. To this end, Sun used automatic design tools that result in somewhat lower density but enabled them to produce working chips in 18 months. The 486, on the other hand, is a high-volume chip that has undergone a comprehensive redesign in a triple-metal process. TI claims (and the die photo suggests) that microSPARC is pad limited, but this calls into question the decision to implement such a small amount of cache. The upshot is that microSPARC will probably be considerably more expensive to produce than other microprocessors of comparable performance.

Sun and TI stress the point that microSPARC was designed to meet functionality, integration, and cost goals. As soon as those goals were met, they stopped the design iteration process. For microSPARC's market, which has small volumes compared to the 486, it makes little sense to spend additional millions to develop a smaller die when that money is unlikely to be recovered from chip sales. When volume becomes significant, microSPARC can be shrunk with minimal effort: even the current pads are oversized so that they can be optically shrunk without redesign.

While the suitability of microSPARC for SPARCbased laptops seems clear, it is probably wishful thinking to expect microSPARC to compete successfully for sockets in PDA-class computers. For one thing, the performance-per-milliwatt is likely to be uncompetitive even at 50 MHz; reducing the clock rate will save power but will reduce performance below competing Hobbit (see **061403.PDF**) and ARM (see **061404.PDF**) chips. Also, the SBus interface is probably inappropriate for PDAs.

In short, TI clearly lacked a focus on performance and has left plenty of room for improvement. MicroSPARC achieves its SPARCstation 2 performance level only through a 25% higher clock rate. Even so, 50-MHz internal operation is only modest by the current standards of general-purpose processors.

To Sun and TI, however, none of these complaints matters. What is important is that Sun has a low-cost, moderate-performance entry point to satisfy the needs of its customers. Workstation users buy SPARC workstations not because of their superior performance there are many alternatives based on PA-RISC, IBM POWER, MIPS R4000, and even the 486, that are cheaper, faster, or both.

SPARC workstations are popular because of the breadth and depth of available software, the rich networking characteristics, and the wide spectrum of available systems. MicroSPARC and the NCR peripheral chips will enable Sun to build the much-needed under-\$5000 color SPARCstation, and microSPARC gives TI a good shot at the lion's share of future SPARC processor sales across the system spectrum.

MicroSPARC is the first member of a family of highly integrated chips for low-cost systems. It is likely that future members will address performance and diesize deficiencies. The most obvious improvements will be increasing the clock rate and cache sizes since, according to Sun, performance is currently cache-limited.

While it will be possible to borrow superscalar capabilities from SuperSPARC, Sun agrees with Earl Killian of QED who, at the Microprocessor Forum, said that the low end is best served by high clock rate, nonsuperscalar processors with larger caches. Sun and TI expect to evolve microSPARC in this direction.

MicroSPARC may also be signalling a trend in microprocessor design. With ever greater on-chip integration, it may be unnecessary to implement a traditional processor bus. An optimized DRAM interface, such as Rambus, and a standard expansion bus, such as Intel's PCI bus, are perhaps all that will be needed.  $\blacklozenge$