# **National Builds Pentium-Level NC Chip**

New 586L-Based Chips Enable NCs and Windows Browsers



# by Jim Turley

National Semiconductor lifted the veil on a pair of integrated Pentiumclass parts at the recent Micropro-

cessor Forum. One is aimed squarely at Internet terminals and the other at information appliances. The two chips, which are due in about six months, build on internal x86 core designs and on National's strength in PC peripherals. Compared with other PCs-on-a-chip from AMD and SGS-Thomson, National's new N586L and N7-lite will offer more features for less money.

Both chips further the concept of the PC as simply a convenient design macro. Increasingly, integration and ease of development are more important to many system designers than performance or cost. While there are many 32-bit microprocessors on the market that boast better performance than the new National parts, few will be as familiar to programmers or leverage the wealth of development tools that have grown up around desktop PCs. For quick time to market, National's parts can provide an important head start.

## New Core Is Souped-Up 486 at 133 MHz

National's director for integrated processors, Dan O'Neill, said his company's new Pentium-class CPU core builds on National's previous work with 486-class cores but includes a number of improvements developed over the intervening two years. It executes the entire Pentium instruction set, excluding floating-point and MMX operations. These features were omitted to reduce die size and thus manufacturing cost. O'Neill described the CPU as a Pentium-class processor, although the 586 is not superscalar and doesn't have an FPU or MMX unit.

The new core is not based on a Cyrix CPU design, given that National's acquisition of Cyrix is not yet complete (see MPR 8/4/97, p. 1). Instead, the 586 is a custom National design that's similar to its 486SXL and 'SXF parts



Figure 1. National's NS586 core has a simple five-stage pipeline that is able to move cache accesses between the execute and writeback stages, which permits loading, modifying, and storing cached data in two consecutive cycles.

(see MPR 9/11/95, p. 1) but with several changes that enhance performance, partly through higher clock rates. The three-stage pipeline in the 486SXF and 'SXL reduced these parts' die sizes and manufacturing costs significantly-which was National's goal-but also limited clock speeds to just 25 MHz. Partly by opening up the pipeline to five clock stages, the 586 is able to hit 133 MHz in National's 0.35-micron process.

Unlike AMD's 486DX5-133 (see MPR 10/7/96, p. 4), National's 586 core is a bit more than just a faster 486. O'Neill described several improvements to the decode logic, cache and memory accesses, and instruction execution that improve the performance of the 586 more than the clock speed would suggest. Thus, National claims its part is equivalent to a 95-MHz Pentium, versus AMD's claim of Pentium-75 performance for its 133 MHz part.

## Some Out-of-Order Execution

One improvement the 586 core has over a basic 486 microarchitecture is in its instruction decoding, a notoriously difficult task on x86 chips. The 586 can decode one x86 instruction per cycle. Prefix bytes, which are fairly common in x86 binaries, add one extra decode cycle per prefix. To help with decoding and to deal with the uneven length of x86 instructions, the 586 core includes a 32-byte prefetch buffer-more than enough to hold the longest x86 instruction. The buffer allows the decode logic to examine the entire x86 instruction while it determines where the instruction boundaries lie. The 586 does not use intermediate ROPs, like the K6 or Pentium II; x86 instructions persist as x86 instructions throughout their lifetime.

O'Neill also described the 586's ability to dynamically shift the cache access from the third to the fourth stage of

| Instruction   | NS586   | Pentium | 486DX4  |
|---------------|---------|---------|---------|
| ALU reg, reg  | 1 cycle | 1 cycle | 1 cycle |
| ALU reg, mem  | 2 cycle | 2 cycle | 2 cycle |
| ALU mem, reg  | 2 cycle | 3 cycle | 2 cycle |
| MOV reg, reg  | 1 cycle | 1 cycle | 1 cycle |
| MOV reg, mem  | 1 cycle | 1 cycle | 1 cycle |
| MOV mem, reg  | 1 cycle | 1 cycle | 1 cycle |
| PUSH reg      | 1 cycle | 1 cycle | 1 cycle |
| POP reg       | 1 cycle | 1 cycle | 4 cycle |
| Jcc rel       | 3 cycle | 1 cycle | 3 cycle |
| CALL near rel | 3 cycle | 1 cycle | 3 cycle |
| RET near      | 4 cycle | 2 cycle | 5 cycle |
| REP MOVS      | 3 cycle | 1 cycle | 3 cycle |
| REP CMPS      | 4 cycle | 4 cycle | 7 cycle |

Table 1. National's redesigned core executes most instructions as fast as or faster than an Intel 486 or Pentium processor.

the pipeline, as needed. For ALU operations that read from the cache, the data is accessed in the third pipe stage; for writeback accesses, the write is delayed one clock cycle, as Figure 1 shows.

One final improvement over run-ofthe-mill 486 cores is the 586's ability to dispatch memory accesses and then continue executing nondependent code. That is, the chip will not stall on memory loads or stores as long as subsequent instructions are not dependent on the data.

National's rejiggered pipeline executes all instructions as fast as or faster than a 486, and some are faster than on a Pentium. A few instructions are one or two cycles slower than Pentium, as Table 1 shows. Specifically, the flow-control instructions CALL, Jcc, and near RET are much slower than on Pentium, because the 586 does not implement any branch prediction. Without the 586's short pipeline, such instructions would exact an even greater toll.

On the other hand, logical operations that store their result to memory are one cycle quicker than on Pentium because the 586's cache/writeback access happens earlier than it does in Pentium's pipeline. This

allows instructions that read from memory, manipulate the data, and write the result back to complete in just two cycles, compared with Pentium's three cycles.

National claims performance of 90 MIPS (based on Dhrystone 2.1) for the 100-MHz part and 120 MIPS for the 133-MHz version. These numbers place N7-lite's performance ahead of the fastest 486 chip to date, AMD's 486DX5-133, but well behind any Pentium in recent memory. Without superscalar execution, floating-point support, MMX instructions, or branch prediction, National's 586 core lies somewhere between a Pentium-90 and a fast 486SX.

#### Updated 486 Core Still Bulky

The 3.3-V core incorporates about 930,000 transistors, slightly less than half of which (426,000) are dedicated to a pair of 4K caches. Like the 486SXF and 'SXL before it, National's newest x86 processor has smaller caches than the chip it emulates, helping to reduce die size and manufacturing cost.

Built in National's new 0.35-micron four-layer-metal process, the 586 core (including caches) measures 25.8 mm<sup>2</sup>— coincidentally about the same size as National's NS486 core in 0.65-micron technology. This is also about the same size as Motorola's 68030 but considerably larger than any modern embedded RISC core. For a scalar processor without an FPU built in 0.35-micron technology, the 586 is a big core, but no worse than most other CISC chips.



Figure 2. National's NS586L and N7-lite are similar, but N7-lite includes far more logic for video and audio processing as well as a PCI interface and USB port.

## Two Chips, Two Levels of Integration

At the Forum, O'Neill described the 586 core and two integrated chips that will use it. The NS586L is only modestly integrated, with a DRAM controller, real-time clock, DMA, interrupt controller, timers, and both VL-bus and ISA interfaces. The other, dubbed N7-lite, is much more ambitious. The N7-lite has all the features of the NS586L but also includes a complete SVGA controller, a DSP similar to Texas Instruments' 'C5x, a PCI interface, a USB controller, and more. Where the NS586L is a general-purpose integrated x86 part, the N7-lite is a complete network terminal on a chip.

As Figure 2 shows, the N7-lite is not quite a superset of the NS586L. The latter device has a Pentium-compatible bus interface that runs at 66 MHz. The N7-lite drops the VL-bus in favor of PCI. It also replaces the NS586L's DRAM controller with one geared toward managing a UMA (unified memory architecture) subsystem that's shared between main memory and the frame buffer.

The two major additions to N7-lite are its graphics controller and its integrated DSP. The former handles graphics and video, while the latter is tuned for audio or modem chores.

The N7-lite is designed to use a television as its only display, so the graphics controller includes NTSC, PAL, and SECAM outputs. A 24-bit color lookup table (CLUT), scaling, and flicker filtering produce a display suitable for a TV. Only a color DAC is required for output.

### Price & Availability

National's NS586L is expected to begin sampling in 2Q98. No production schedule has been set, but pricing is expected to be about \$25 for the 100-MHz version. Pricing and availability for the N7-lite have not been announced.

For more information, please contact National (Santa Clara) at 408.721.2880 or set your browser to www.national.com/appinfo/ns486.

The audio subsystem is based on a licensed DSP core that is compatible with TI's popular 'C5x product line. The DSP has its own interface to external memory and to a serial interface to an AC'97 audio codec. Local memory (4 Kwords of program memory and 1 Kword of data memory) keeps the DSP running without having to constantly fetch instructions from an external source.

#### National Offers the Most for Less

Compared with SGS-Thomson's STPC Consumer (see MPR 8/4/97, p. 1), the NS586L and N7-lite are both bargains. The N7-lite is a close match to the STPC, with similar video outputs and similar performance. Both have PCI interfaces and both include a DRAM controller. The STPC Consumer has a 64-bit memory bus; the width of the bus on N7-lite was not disclosed. The wide bus will improve performance noticeably in a UMA system. On the other hand, the European part doesn't include any of National's audio/DSP logic, nor does it have ROM control, DMA, interrupt control, a USB interface, timers, or an RTC. Ironically, SGS-Thomson recommends National's super I/O chip as the

ideal companion to STPC Consumer for many of these important functions.

O'Neill was not discussing prices for the N7-lite. SGS-Thomson, for its part, quotes prices in the \$45 range. The advantage STPC has that N7-lite doesn't is a pair of EIDE and ISA interfaces. These are useful for disks and legacy expansion cards, respectively—features that might be useful for a very low end PC but could be addressed via PCI. Adding a disk drive to an N7-lite system would require a PCI disk controller, which is simple enough but costs money. The STPC Consumer also has an FPU, which is useful in rendering and printing applications, but these are not strong markets for x86 chips.

The other major player in the PC-on-a-chip market, AMD, has four different parts in its Elan family. The two 486based parts, Elan 400 and 410 (see MPR 10/28/96, p. 5), are

Dan O'Neill describes National

Semiconductor's NS586 CPU core at the Microprocessor Forum

both at least as expensive as the ST part or the National part is likely to be, but offer less integration.

AMD charges \$50-\$55 for the two Elans at 100 MHz. Neither has the video or PCI interfaces of the STPC or N7-lite, or the 64-bit bus found on the SGS-Thomson part. Instead, AMD includes keyboard, serial, and parallel interfaces and (on the Elan 400) an LCD and PCMCIA controller. The LCD controller makes the Elan 400 more suitable for handheld devices than either of the other two chips but, by the same token, makes it unsuitable for set-top video-based applications. Without the LCD and PCMCIA, the Elan 410 is closer to National's NS586L: a moderately integrated, general-purpose embedded processor with PC software compatibility. The NS586L is much less expensive, at \$25, than AMD's offering, and it has a Pentium bus in addition to its DRAM controller.

#### Integration More Valuable Than Performance

Two years ago, National made a play for the low end of the embedded x86 market with its two homegrown 486 chips. Because they weren't entirely PC compatible, sales were modest but served to prove the company's intention to compete

> in the 32-bit market. They also showcased, somewhat humbly, the company's ability to design an x86 processor from scratch.

> The 586 core pushes that design to a higher performance level. And National's acquisition of Cyrix proves the company is serious about accelerating the pace of its advances. Like Intel and AMD before it. National will now be able to push top-level x86 designs down into the embedded market when it suits the company's strategy.

> That strategy revolves around integration and ease of software development rather than the best MIPS/Watt, Dhrystones/dollar, furlongs/fortnight, or other objective metric. For applications that will see only modest volumes, absolute cost is not critical; for systems without hard real-time

requirements, performance is not paramount. But for small companies attacking new markets while scrambling for funding, ease of development is the determinant: time to money is the driving factor.

On those counts, National will have two very attractive platforms for quick development of systems that rely on DOS, Windows, or just PC development tools. The performance of both chips will be sufficient for Netscape to run acceptably if you don't use a lot of taxing VRML, RealVideo, or Java plugins. The NS586L can be used as the base for most any kind of system; the N7-lite is clearly intended for a WebTV-like box running a Microsoft operating system such as DOS or Windows CE. As ease of use and time-to-market become more important to a segment of the embedded development community, integrated x86 chips will find a warm welcome. National's two newest chips fit in well with that trend.



