# Hitachi, NEC Processors Take Aim at PDAs New RISC Architectures Tuned for Low-Cost Applications

#### By Linley Gwennap

Both Hitachi America and NEC have begun sampling processors based on new RISC architectures specifically designed for general-purpose handheld computers ("personal digital assistants" or PDAs) and other consumer devices. Hitachi's SH7032 is a highly integrated processor selling for \$33 in volume, while NEC's V810 includes fewer system functions but sells for about \$20.

Both Hitachi, a licensee of PA-RISC, and NEC, a MIPS processor vendor, rejected standard 32-bit RISC instruction sets in favor of developing new architectures optimized for low cost. These designs will compete with ARM, Hobbit, the x86, and Motorola's Dragon (*see* **070803.PDF**) for a share of the increasingly crowded PDA market.

The SH7000 and V810 will also compete for embedded applications. NEC's chip has already been selected for Nintendo's forthcoming CD-ROM system, which it expects to sell a million units per year. Other consumer applications are moving from 8-bit or16-bit CPUs to 32bit RISC processors, due to the lower prices of these chips as well as increased performance demands for graphics.

#### Hitachi Integrates Many PDA Functions

Hitachi claims to have built a single-chip PDA, and has come closer than any other vendor so far. Figure 1 shows an impressive array of functional units: CPU core, RAM, optional ROM, DMA engine, DRAM controller, interrupt controller, 8-channel A/D converter, two serial ports, and an array of timers and pulse generators. Most PDA systems will need an external graphics controller, although the on-chip timers and pattern generator can handle a simple LCD display. Another interface chip is needed to provide expansion slots, such as PCMCIA.

The first two chips in the SH7000 series are the 7032 and 7034. The 703x chips are identical except for the internal memory configuration. The 7032 includes 8K of on-board RAM with no ROM. The 7034 has only 4K of RAM but includes 64K of ROM. Hitachi plans to ship both mask ROM and one-time programmable (OTP) versions; reprogrammable "windowed" parts are available in small quantities to developers for evaluation. The maximum clock rate of both chips is 20 MHz.

#### Slimmed-Down RISC Architecture

By creating a new, simplified RISC architecture, Hitachi was able to design a very small CPU core, just  $6.6 \text{ mm}^2$ . This is smaller than the current ARM6 core (7.2 mm<sup>2</sup>), making it the smallest RISC CPU available. ARM6 uses a previous-generation IC process, however; the ARM core would be smaller than the SH core if implemented in an equivalent process, which is likely by the end of this year.

Like ARM, the SH architecture uses sixteen 32-bit general-purpose registers and a limited number of instructions (56 in the Hitachi design) to simplify the implementation. SH takes one further step, fixing the instruction length at 16 bits, half the size of ARM (and most other RISC) instructions.

The smaller instructions have some limitations. Arithmetic instructions have only two operands; there is not enough room to encode the typical three-operand RISC instructions. Conditional branch and load/store instructions are limited to 8-bit displacements; addresses that do not fit in this format must be precalculated in a register. Constants longer than 8 bits must be taken from in-line data using short-displacement loads.

In most situations, however, the shorter instruction length reduces program size. Hitachi claims that Dhrystone, when compiled for SH, is about 40% smaller than the SPARC version. More compact code increases the efficiency of both on-chip and external memory. (We will examine the SH instruction set in more detail in a future article.)

The 703x chips have an integer multiplier that can calculate a 32-bit product from two 16-bit operands in 3 cycles. Other non-dependent instructions can execute



Figure 1. Hitachi's SH703 $\times$  includes a 32-bit RISC CPU on-board memory, and a large set of peripherals.

#### MICROPROCESSOR REPORT



Figure 2. Die photo of Hitachi's SH7034, which measures 9.5 mm  $\times$  9.7 mm. The 7032 is very similar except that it replaces the 64K of ROM with an additional 4K of RAM.

during the multiply latency. The design includes a special 42-bit accumulator register; the 32-bit product can be added to the 42-bit accumulator with the same 3-cycle latency, as the accumulation is done concurrently with the multiply using partial products. While a 150-ns multiply doesn't compare to high-performance DSP (digital signal processing) chips, Hitachi believes it is adequate for applications such as a 2400-bps modem.

# Glueless Memory and I/O Support

Although the SH architecture allows up to 4G of memory space, the 703x parts provide only 22 address bits to external devices. Eight individual chip-select signals are also provided, allowing a memory space of eight 4M segments, or 32M total. Each segment can be programmed with a different set of parameters, including 8- or 16-bit access, variable wait states, fast page mode, and burst access. The memory controller provides RAS and CAS and handles DRAM refresh, so no external logic is needed. At 16 MHz, standard 70-ns DRAMs can be accessed in a single cycle on page hits.

The chip-select signals can also be used to directly control I/O devices. The 703x provides both a 16-bit multiplexed address/data bus and a 22-bit demultiplexed address bus so the processor can easily interface to a variety of devices.

The DMA (direct memory access) controller provides up to four concurrent memory transfers without CPU intervention. Two of the channels can handle requests from external devices. The interrupt controller handles eight external signals (IRQ7...0) and NMI (nonmaskable interrupt), as well as interrupts from the onchip peripherals and the DMA engine.

# Many On-Chip Peripherals

The 703x has an 8-channel ADC (analog-to-digital converter) that calculates 10 bits of resolution in 6.7 µs. The ADC can be used to interface to a digitizer in a penbased system, but Hitachi concedes that the resolution may not be adequate for all applications. The ADC can also monitor the battery voltage, allowing more precise estimates of remaining battery life.

Each of the two full-duplex serial ports can be configured in either synchronous or asynchronous mode. Baud rates are generated from the internal clock. In asynchronous mode, the maximum data rate is 625 kbps; using clocked synchronous mode, that rate increases to 5 Mbps. These ports are commonly connected to a keyboard, digitizer (if the on-chip ADC is not used), or modem chip set.

There are several timers available. Five have a set of input capture and output compare registers. These timers are useful in embedded control applications; for example, in an engine controller, the timers could be used to monitor the engine timing and generate precise control signals. They can also be used for both input and output of PWM (pulse-width modulated) signals. PWM outputs can be converted easily into analog signals.

An on-chip watchdog timer can be set up to reset the processor if the CPU fails to respond after a certain period of time. If not used for this purpose, this timer can be reconfigured as an interval timer.

All of the peripheral I/O pins can be used for general I/O instead of their primary function, if desired. This selection can be made on a pin-by-pin basis. If none of the special functions is used, 32 input/output signals and 8 input-only signals become available. These signals can be used to control LEDs or other subsystems, to read switch settings, or for a variety of other tasks.

## Designed for Low Power

Since the 703x chips are intended for both batteryoperated and line-powered systems, they can operate at either 3.3V or 5V. As usual, the performance is lower at 3.3V, where the chip tops out at 12.5 MHz. At 5V, it reaches 20 MHz. At the lower voltage and frequency, the 7034 typically uses 130 mW.

The 703x offers two low-power modes. In "sleep" mode, the CPU is halted but the peripherals continue to be clocked; this mode reduces power to 50 mW (at 3.3V). For maximum power savings, the chip can be placed in "standby" mode, which retains CPU and memory state but halts and initializes all peripherals. Standby mode requires less than 2  $\mu$ A. These modes are entered using the SLEEP instruction; the chip resumes normal operation after reset or an interrupt.

#### MICROPROCESSOR REPORT

The 7034, shown in Figure 2, uses 900,000 transistors on a 92 mm<sup>2</sup> die. The 7032 has the same die size but uses 593,000 transistors. Both 703x chips use a 112-pin PQFP. Hitachi also expects to sample a version in a 120pin thin QFP (with a 0.4-mm lead pitch) by year end. The company is building these chips in a mature 0.8-micron CMOS process that it also uses for 4-Mbit DRAMs.

#### **NEC Delivers Compact Processor**

NEC's V800 series has been introduced in Japan but never formally announced in the US. Like Hitachi, NEC also aims to provide good performance with low cost and low power. The company went in the opposite direction from Hitachi, however, choosing to implement few system functions on the processor chip. Figure 3 shows that the first family member, the V810, includes a CPU, floating-point unit (FPU), and 1K of instruction cache. The cache is direct-mapped and can be disabled for realtime applications.

The FPU handles single-precision floating-point operations. The V810 has no hardware assist for integer multiply or divide, so these operations take much longer than on the 703x. Table 1 shows the latencies for various operations on the V810.

Unlike the Hitachi chip, the V810 requires an external clock generator and extra "glue logic" to connect to memory and I/O devices. There are no peripherals or general-purpose I/O pins and no DMA controller on the V810. Sixteen interrupt sources are multiplexed onto four pins, plus a separate NMI pin.

The V810 is available in a 120-pin PQFP or a 176pin PGA. Most of the pins are used by the non-multiplexed I/O bus with 32 bits of address and 32 bits of data. This bus will provide higher throughput for devices that need it but is more expensive to connect to than Hitachi's simpler 16-bit multiplexed bus. The V810 supports dynamic bus sizing for 16-bit (but not 8-bit) devices.

#### Mixed-Length Instruction Set

The V810 is the first implementation of NEC's V800 family architecture. Although V800 is not as complex as MIPS or other workstation RISCs, it is also not as restricted as Hitachi's SH architecture. For example, it includes twice as many general-purpose registers as SH. NEC decided to compromise on the instruction length, with some 16-bit instructions and some 32-bit instructions. The longer formats are used for loads and stores, long-displacement branches, and three-operand calculations; these formats solve many of the shortcomings of SH's 16-bit instructions.

The long instruction formats provide little apparent advantage for either code length or performance. NEC estimates that a V800 version of Dhrystone is about 40% smaller than a MIPS version, or roughly the same size as a version compiled for Hitachi's chip (or Hobbit, for that



Figure 3. A typical system using NEC's V810 processor requires external glue logic for memory and peripherals.

matter). Many branches, loads, and stores take two or more cycles to execute on the V810, making up for the extra address calculations in an SH processor.

V800 also has a set of CISC-like bit-string operations that can copy, move, or combine data of arbitrary length. These instructions take multiple cycles to execute, as shown in Table 1. They can be used to perform bit block-transfers (BitBLTs), a common graphics operation. (We will examine the V800 instruction set in more detail in a future article.)

# 2.2V Operation for Ultra-Low Power

The V810 is designed to operate over a wide range of supply voltages. The chip's top speed is 25 MHz when running at 5V. The maximum frequency is reduced to 16 MHz for 3.3V operation; in this configuration, the V810 typically uses 100 mW. For the most power-sensitive applications, the supply voltage can be as low as 2.2V, limiting the CPU to 10 MHz but reducing power to just 40 mW. This low-voltage operation makes the NEC chip suitable for systems powered by two AA batteries, which may provide as little as 2.7V.

The V810 uses a fully static design. Power management must be implemented in external circuitry; there is no built-in sleep mode.

| Type of            | Execution Latency |                  |  |
|--------------------|-------------------|------------------|--|
| Operation          | Clock Cycles      | Time (at 25 MHz) |  |
| FP Add (SP)        | 24 cycles         | 960 ns           |  |
| FP Subtract (SP)   | 26 cycles         | 1040 ns          |  |
| FP Multiply (SP)   | 27 cycles         | 1080 ns          |  |
| FP Divide (SP)     | 44 cycles         | 1760 ns          |  |
| Integer Multiply   | 13 cycles         | 520 ns           |  |
| Integer Divide     | 38 cycles         | 1520 ns          |  |
| Bit-String Search  | 3 cycles*         | 120 ns*          |  |
| Bit-String Logical | 6 cycles*         | 240 ns*          |  |

Table 1. Execution times for math and bit-string instructions on the V810. \*Execution time per word of bit/string data.

#### MICROPROCESSOR REPORT



Figure 4. Die photo of NEC V810, which uses 240,000 transistors on a 7.7 mm  $\times$  7.7 mm die.

Figure 4 shows a die photo of the V810, which is 53  $\text{mm}^2$ , about a third smaller than the SH703x. Since both chips use a 0.8-micron, two-layer-metal CMOS process, the difference in die size is mainly due to the much larger on-chip memory of the 703x. This also accounts for the V810 needing 240,000 transistors, less than half as many as the Hitachi chip.

# V800 Family to Expand in Future

The V810, expected to begin volume shipments in

|                   | SH7032             | V810               | Hobbit             | ARM610             |
|-------------------|--------------------|--------------------|--------------------|--------------------|
| Frequency (5V)    | 20 MHz             | 25 MHz             | 30 MHz             | 25 MHz             |
| Frequency (3.3V)  | 12.5 MHz           | 16 MHz             | 20 MHz             | n/a                |
| Power (5V)*       | 500 mW             | 500 mW             | 900 mW             | 625 mW             |
| Power (3.3V)*     | 130 mW             | 100 mW             | 250 mW             | n/a                |
| Dhrystone MIPS*   | 16 MIPS            | 18 MIPS            | 20 MIPS            | 14 MIPS            |
| MIPS/watt (5V)*   | 32 M/W             | 36 M/W             | 22 M/W             | 22 M/W             |
| Math on-chip      | MAC                | FPU                | none               | none               |
| MMU on-chip       | none               | none               | 64 entry           | 32 entry           |
| Memory on-chip    | 8K RAM             | 1K cache           | 3K cache           | 4K cache           |
| Periphs on-chip   | A/D, 2S, etc       | none               | none               | none               |
| Transistors       | 593,000            | 240,000            | 419,000            | 359,000            |
| Die Area          | 92 mm <sup>2</sup> | 53 mm <sup>2</sup> | 92 mm <sup>2</sup> | 71 mm <sup>2</sup> |
| IC Process        | 0.8μ, 2M           | 0.8µ, 2M           | 0.9µ, 2M           | 1.0μ, 2M           |
| Voltage (Vdd)     | 3.3V–5V            | 2.2V–5V            | 3.3V–5V            | 5V only            |
| List Price        | \$33               | about \$20         | \$35               | \$20               |
| in Quantities of: | 25,000             | large              | 10,000             | 10,000             |

Table 2. The V810 offers good performance with low power and a low price, but the SH7032 has more memory and features. \*Test conditions vary. (Source: vendor data)

September, is the first in a planned series of processors implementing the V800 architecture. NEC's next step will be the V805, which reduces the external bus to 16 bits, cutting cost and package size and simplifying the interface for low-cost peripherals. Another planned product, the V820, will look more like the Hitachi processor, integrating DMA control, timers, and a serial port. NEC plans to begin sampling both of these derivatives by the end of the year.

By year end, the company also expects to offer the V810 core CPU/FPU, which is 35 mm<sup>2</sup>, as an ASIC core. It eventually plans to move the full V810 design to its half-micron CMOS process, boosting the clock rate to at least 33 MHz. This version is expected in 1994.

#### New Chips Surpass ARM, Hobbit

Table 2 compares the SH7032 and V810 to popular PDA processors from Advanced RISC Machines (ARM) and AT&T. The ARM610 has been selected for Apple's Newton PDA. Hobbit (*see 061403.PDF*) is used in EO's personal communicators.

It is difficult to compare power ratings, since vendors test under different conditions, but both the 7032 and the V810 appear to use significantly less power then their competitors. The NEC chip has the added advantage of 2.2V operation for even lower power. While the 7032 appears to use more power than the V810 at 3.3V, the 7032 includes more on-chip memory and other features that might be implemented using external devices in a V810 system. At a system level, 7032-based designs should be comparable to or better than V810 systems in conserving power.

Based on Dhrystone 1.1, the V810 offers slightly better performance than the 7032 but a bit less than Hobbit. Given Hobbit's higher power, however, the V810 has a much better MIPS/watt rating, indicating a more efficient use of power. The 7032 has slightly lower MIPS/watt than the V810, but this rating is misleading due to the higher integration of the Hitachi chip.

All of these MIPS ratings are inflated due to the use of the small Dhrystone program as a metric. Because Dhrystone fits into the fast on-chip memory of any of these chips, that program performs much better than code stored in off-chip memory. For example, Hitachi measures a MIPS rating of 8.9 when using 70-ns external ROM, compared to 16 for internal ROM. The effectiveness of the V810's 1K instruction cache versus the 7032's 8K of software-mapped memory will vary from application to application.

The 7032 has the largest die size of this group of processors, while the V810 has the smallest. This difference is based primarily on the feature set of the chips. Both the ARM and Hobbit processors include a memory-management unit (MMU) not found on the others. The 7032 has an assortment of on-chip peripherals as well as

more memory than any of the others.

Die size is a key component of price, where the V810 also appears to comes out ahead, although NEC's reluctance to quote a firm price makes it difficult to compare these processors. The 7032 is about the same price as Hobbit, yet Hobbit systems must add a separate systemmanagement chip to implement the same functions as are included in the 7032. If the 7032 drops to \$25 by the end of the year, as Hitachi expects, it will be much more competitive on price.

#### **Development Environments Available**

Hitachi offers a set of hardware and software development tools for the SH architecture and 703x chips. Most are currently in beta test, with final products available in 3Q93. An assembler, simulator, and C compiler for SH are available on either SPARC workstation or x86 PC systems, with a future port to PA-RISC systems. The Free Software Foundation's GNU tool kit will be available by the end of the year.

For hardware debugging, Hitachi offers the E7000 in-circuit emulator (ICE). Since the 703x is not available in a PGA package, the company provides an adapter that allows its QFP devices to plug into a standard socket. A complete SH7000 evaluation board is also available.

NEC has somewhat more extensive support for the V800 series. It uses the Green Hills tool set, including compilers for C, C++, Fortran, and Pascal. These tools are available on SPARC, x86, and PA-RISC systems as well as NEC's MIPS-based workstations. NEC also supplies an ICE and an evaluation board; HP and Sophia Systems plan to support the V800 in the future.

#### A Need for PDA Partners

At first, both Hitachi and NEC appear to be late in pursuing PDA applications, as both ARM and Hobbit are already shipping processors. The entire PDA market, however, is still very new, and it will take a few more generations of products before it becomes clear which processors will succeed.

The V810 significantly exceeds existing offerings in both performance/dollar and performance/watt. Like ARM and Hobbit, the V810 requires an external systemlogic chip set for a complete PDA design. Unlike the other vendors, however, NEC has revealed no plans to provide a standard solution for these functions, forcing PDA designers to do this work themselves. The V820, as well as the availability of an ASIC core, will simplify PDA design in the future.

The SH703x chips include many system-logic and peripheral functions on-chip, easing the burden of system design. The 703x should enable physically smaller systems, a critical factor for some handheld applications. At a system level, it should match the V810 in performance/dollar and performance/watt, surpassing other

# Price and Availability

The SH7032 and SH7034 are currently sampling; production volumes are expected in 4Q93. The 7032 (part number HD6417032F20) will cost \$33 in quantities of 25,000. The 7034 (part number HD6477034F20) will cost \$41 in the same quantity. Hitachi expects these prices to fall to \$25 and \$35, respectively, in 4Q93. For more information, request literature package #M27P001 from Hitachi America, MS-080, 2000 Sierra Point Parkway, Brisbane, CA 94005; 800/285-1601 or call your local sales office.

The V810 is currently sampling and is expected to be in production in 3Q93. The company expects high volume pricing to be about \$20 for the V810. For more information, contact Kenji Matsui of NEC Electronics at 401 Ellis Street, Mt. View, CA 94039; 415/965-6554, fax 415/965-6264.

competitors on these metrics. The Hitachi chip is most appropriate for systems that take advantage of most of its on-chip peripherals and memory; for systems with different requirements, the 703x is less competitive.

The major challenge facing both NEC and Hitachi is lining up hardware and software vendors behind their new architectures. ARM, Hobbit, and Dragon all have strong system backing and at least one major PDA operating system (OS) committed to their architectures. Both NEC and Hitachi claim to have PDA design wins, and Hitachi says at least one OS is being ported to SH. The lack of an MMU will prevent the Newton OS or GO's Penpoint from running on either chip and may reduce performance in other applications.

Even if the new processors fail to gain acceptance in the general-purpose PDA market, many opportunities will be available in fixed-function handheld devices. These products ship with most or all software installed, reducing the need for a binary-compatible software market. (Sharp's Wizard is a good example.) Both the NEC and Hitachi chips are also well-suited for traditional embedded applications such as laser printers and network controllers, as well as emerging consumer video applications with high processing needs.

The PDA market remains the primary focus because of its incredible, yet unknown, potential. While five architectures seems too many for this market, it is quite possible that two or three will thrive. Today's software vendors are stressing portability; it is unlikely that new markets will follow the PC model and be restricted to a single processor architecture. Also, multiple market segments (e.g. communicators, organizers, DOS-compatible systems) may each use a different CPU. History has shown that a single big design win can "make" an architecture; Hitachi and NEC are waiting for that lightning to strike in the PDA arena. ◆