This document was originally published in IBM's flash database as the document number indicated. This material is Copyright IBM Corporation, 1995. All rights reserved. Document ID G023542 TITLE: WSC FLASH 9522 PC SERVER 500 SYSTEM/390 PERFORMANCE PC SERVER 500 SYSTEM/390 PERFORMANCE DISCLAIMER __________ The information contained in this document has not been submitted to any formal IBM test and is distributed on an as is basis WITHOUT ANY WARRANTY EITHER EXPRESSED OR IMPLIED. The use of this information or the implementa- tion of any of these techniques is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Performance data contained in this document were determined in various con- trolled laboratory environments and are for reference purposes only. The results that may be obtained in other operating environments may vary signif- icantly. Users of this document should verify the applicable data for their specific environment. References in this document to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any references to an IBM licensed program in this publication is not intended to state or imply that only IBM's programs may be used. Any functionally equivalent program may be used instead. INTRODUCTION ____________ This flash contains tuning and performance details for the PC Server 500 System/390 system that augment the information provided in the document enti- tled IBM PC Server 500 S/390 ...Is it right for you? Technical Application Brief available from MKTTOOLS in PCSVR390 PACKAGE or orderable as publication number GK20-2763. The publication is not scheduled to be orderable using the publication number until general availability of the IBM PC Server 500 S/390 in July 1995. Information in this flash was developed jointly by the performance department at the Endicott Programming Lab, PC Server S/390 Competency Center in Atlanta, and S/390 Performance Design in Poughkeepsie. Questions may be directed to RHQVM15(PCSVR390) or PCSVR390@VNET.IBM.COM. Other information sources are o PC500390 Forum via TalkLink on IBMLink o PC Server S/390 Competency Center in Atlanta o PCSVR390 on RHQVM15 o Internet: PCSVR390@vnet.ibm.com o PartnerLink ID: G641530 o World Wide Web: http://www.pc.ibm.com (support) o IBM Support Family of Services The IBM PC Server S/390 system is a combination of the System/390 and PC Server architectures that provides access and use of both in a single package. While the S/390 instructions execute natively on a dedicated CMOS chip on the S/390 Microprocessor Complex, the execution of the S/390's I/O is handled by OS/2 device managers, device drivers, and S/390 channel emulation. The S/390 design point in the PC Server S/390 is unique when compared to other S/390 processors. In this implementation, S/390 devices (tapes and printers) are either channel attached (via S/370 Channel Emulator/A) or emu- lated on PC devices in a manner that is transparent to the S/390. In addi- tion to emulating the S/390 I/O, the high performance Pentium(**) processor also supports OS/2 applications and advanced local area network (LAN) functions. While this mix creates an environment with many exciting options for I/O attachment and support, some S/390 capabilities are not available with this technique. For example, the I/O design point for the PC Server S/390 does not support multiple channels attached to the same physical device, a capa- bility that is exploited in other S/390 systems. Application of the PC Server S/390 to a specific customer's unique workload requires some planning to achieve good results. The contents of IBM PC Server 500 S/390 ...Is it right for you? provide the essentials for determining if there is a possible match between this system and the targeted work. This flash provides more detail that can be used to fine tune applications of this system to customer environments. HIGHLIGHTS OF PERFORMANCE AND TUNING FROM THIS FLASH ____________________________________________________ The following is a list of highlighted points that are summarized from the text of the flash: 1. Tuning and setup are important to maximizing throughput. Read the advice contained herein and do not just take defaults for everything during installation and operation. o For array models, select the stripe size that is best for your expected I/O characteristics. Base your decision on the combined OS/2 as well as S/390 read and write operations. Be aware of the func- tional differences between the AWSFBA and AWSCKD device drivers I/O. o For dedicated S/390 workloads on array models, consider your perform- ance needs as well as data integrity requirements. Write-Back caching in conjunction with an Uninterruptible Power Supply (UPS) and LAZY=OFF for OS/2 disk caches may offer the best trade-offs. 2. For VM/ESA, a key to good performance is minimizing page I/Os and reducing their impact through use of CKD paging volumes. 3. I/O capacity and system throughput can be extended a limited amount through use of additional hardware (for example additional disk drives and RAID adapters). 4. More CMS or TSO users can be supported by increasing the S/390 memory via a memory expansion card. 5. The PC Server 500 System/390 shows VSE/ESA guest (of VM/ESA) to native CPU usage ratios that are similar to those seen on mainframes. 6. OS/2 and S/390 workloads can coexist, but expect performance interactions because the workloads will share a number of server resources. 7. The PC Server 500 System/390 should be considered to be a server with many S/390 capabilities as opposed to a small mainframe. To size capacity, look at I/O rates and characteristics first, then real storage requirements, then processor capacity. SYSTEM HARDWARE USED IN PERFORMANCE MEASUREMENTS ________________________________________________ Two different server hardware configurations were used in the laboratory per- formance measurements for VM/ESA and VSE/ESA. The hardware descriptions are in Table 1. +----------------------------------------------------------------+ | Table 1. Server hardware description for performance runs. | +------------------------+-------------------+-------------------+ | | SYSTEM 1 | SYSTEM 2 | +------------------------+-------------------+-------------------+ | Hard Drives | 7 | 3 | +------------------------+-------------------+-------------------+ | Approx GB/Drive | 1G Fast/Wide | 2.25G Fast/Wide | +------------------------+-------------------+-------------------+ | Average seek time (ms) | 8.6 | 7.5 | +------------------------+-------------------+-------------------+ | Average latency (ms) | 5.56 | 4.17 | +------------------------+-------------------+-------------------+ | Rotational speed (RPM) | 5400 | 7200 | +------------------------+-------------------+-------------------+ | Array Stripe Width | 8 | 64 | | (KB) | | | +------------------------+-------------------+-------------------+ | Channels used on RAID | 2 | 1 | | adapter | | | +------------------------+-------------------+-------------------+ | Logical drive types | RAID-5 | RAID-5 | +------------------------+-------------------+-------------------+ | Logical drives per | 2 | 2 | | array | | | +------------------------+-------------------+-------------------+ | Partitions per logical | 1 | 1 | | drive | | | +------------------------+-------------------+-------------------+ | Format type | HPFS | HPFS | +------------------------+-------------------+-------------------+ | OS/2 level | V2.11 CSD XR06200 | V3 Warp Fullpack | | | | GA | +------------------------+-------------------+-------------------+ | LAN Adapter | LanStreamer MC 32 | AutoLAN Streamer | | | | MC 32 | +------------------------+-------------------+-------------------+ | 390 Licensed Internal | 12/94 | 3/95 | | Code level | | | +------------------------+-------------------+-------------------+ | SCSI Adapter | F/W Streaming | F/W Streaming | | | RAID Adapter/A | RAID Adapter/A | +------------------------+-------------------+-------------------+ | NOTE: Seek, latency, and RPM specifications are advertised | | values for individual drives in the array and were not meas- | | ured here. | +----------------------------------------------------------------+ TPNS CONFIGURATION FOR CMS WORKLOADS ____________________________________ TPNS measurements were obtained by utilizing the AWS3172 device driver and establishing a VTAM 3172 XCA connection across an isolated Token Ring LAN between a PC Server 500 S/390 running VM/ESA ESA Feature with VTAM and a second server running VM/ESA 370 Feature with VTAM and IBM's Teleprocessing Network Simulator product (TPNS). TPNS simulates actual VTAM cross domain logon sessions to the target system. The simulated VTAM sessions logged onto the VSCS APPL on the target system. CMS users were then logged onto on the measured system, and the users ran their scripts with TPNS measuring the end user response time through the VTAM network. Thus, the CMS workload measure- ments also stressed the LAN adapter on the server as the VTAM data streams flowed back and forth across the Token Ring. The setup is pictorially repres- ented as follows: *-----------------* | PC Server S/390 | *-------* | VM/ESA ESA |---------| 8228 | Isolated | VTAM/VSCS | | M.A.U.| Token Ring | | *-------* *-----------------* | Measured System | | | | *---------------------* | PC Server | | VM/ESA 370 Feature | | VTAM | | TPNS | *---------------------* TPNS Driver System MEASUREMENT RESULTS ___________________ This section contains a summary of the performance measurement results that have been obtained for VM/ESA, VM/ESA + LAN File Serving, VSE/ESA, and MVS/ESA. VM/ESA: NUMBER OF CMS USERS ___________________________ Several measurements were obtained using the FS8F workload to get an under- standing of how many CMS users can be supported in various S/390 storage sizes, while maintaining an average response time of (approximately) one second. The results are summarized in Table 2. FS8F represents a CMS program development environment. It includes a wide range of CMS and Xedit commands along with assembler, COBOL, FORTRAN, and PL/I compiles and program executions. Users are simulated as TPNS scripts. The average think time of 26 seconds and the think time distribution are intended to reflect the average characteristics of a logged-on CMS user. The minidisk-only version of FS8F was used for these measurements. See Appendix A of VM/ESA Release 2.2 Performance Report, GC24-5673-01, for further infor- _____________________________________ mation. The following conditions applied to all three measurements: o dedicated The Server 500 was dedicated to processing the CMS workload. That is, there was no other activity coming in over the LAN during the measurement period. o HPFS write caching (lazy on) o no RAID adapter write caching (write-through) o VM/ESA Version 1 Release 2.2 o emulated DASD volumes: 2 system volumes (9336) 3 page volumes (3380) 2 spool volumes (9336) 3 minidisk volumes (9336) 2 t-disk volumes (9336) o 16KB CP trace table o STORBUF 300 200 100 o LDUBUF 600 300 100 In the following table and elsewhere in this section, System 1 and System 2 refer to the two measurement configurations described in Table 1. +----------------------------------------------------------+ | Table 2. PC Server 500 S/390 Performance - FS8F CMS | | Workload | +-------------------------+----------+----------+----------+ | RUN ID | PC7E5075 | PC7E5105 | PC7E5190 | +-------------------------+----------+----------+----------+ | Variables | | | | | Configuration | System 2 | System 1 | System 2 | | Real Storage Total | 32MB | 128MB | 128MB | | Real Storage Used | 32MB | 64MB | 128MB | | Users | 70 | 100 | 190 | | MDC BIAS | 0.1 | 0.5 | 0.2 | +-------------------------+----------+----------+----------+ | Response Time (sec) | | | | | Trivial (R) | 0.50 | 0.52 | 0.30 | | Average (T) | 0.99 | 1.33 | 1.02 | +-------------------------+----------+----------+----------+ | Throughput | | | | | ETR (T) | 2.44 | 3.44 | 6.57 | +-------------------------+----------+----------+----------+ | S/390 CPU Usage (msec) | | | | | CPU/CMD (V) | 154 | 136 | 122 | | CP/CMD (V) | 52 | 49 | 36 | | EMUL/CMD (V) | 102 | 87 | 86 | +-------------------------+----------+----------+----------+ | S/390 Utilization | | | | | TOTAL (V) | 37.5 | 46.8 | 80.3 | +-------------------------+----------+----------+----------+ | Paging | | | | | PAGE IO RATE (V) | 7.3 | 6.3 | 4.8 | | PAGE/CMD | 18.1 | 13.4 | 5.6 | | PAGE IO/CMD (V) | 3.0 | 1.8 | 0.7 | | PGBLPGS/USER (R) | 100 | 150 | 163 | +-------------------------+----------+----------+----------+ | I/O | | | | | RIO RATE (V) | 23 | 26 | 39 | | MDC REAL, MB (R) | 1.8 | 10.1 | 13.7 | | MDC HIT RATIO (R) | 0.69 | 0.83 | 0.86 | +-------------------------+----------+----------+----------+ | SPM2 | | | | | Pentium Util (S) | 54.5 | 77.1 | 66.5 | | I/O Req/sec (S) | 49 | 43 | 61 | | HPFS Hit % (S) | 20 | 17 | 25 | +-------------------------+----------+----------+----------+ | NOTE: T=TPNS, V=VMPRF, S=SPM2, R=RTM | +----------------------------------------------------------+ The results show that, for this workload, the number of CMS users that can be supported is mostly determined by the amount of S/390 memory that is avail- able. Contention for the S/390 processor is not a significant factor until 128MB are made available, at which point S/390 processor utilization rises to 80%. VM/ESA performs better on S/390 when steps are taken to minimize the amount of page I/O that the system has to do. Page I/Os are expensive because each I/O typically reads or writes multiple 4K pages. In addition, VM/ESA's block paging mechanism is optimized for traditional mainframe DASD and therefore does not work as well with the device emulation used here. We reduced the page I/O rate by taking the following tuning actions: o The minidisk cache BIAS parameter was used to reduce the amount of real storage that was used for minidisk caching. This left more real storage available to reduce paging. o A small CP trace table was used. We also found that the response time impact of VM/ESA page I/O tends to be reduced when emulated CKD devices are used as paging volumes and multiple page volumes are defined. This does, however, tend to result in higher Pentium utilizations relative to using FBA page volumes. These measurements were done with write caching in effect. An additional measurement (not shown) was done without write caching. The results showed that the number of CMS users had to be reduced by 30% in order to maintain 1-second average response time. See section "VSE/ESA: CICS transactions" for additional discussion of write caching. The Pentium utilization arises from handling the S/390 I/O requests. As this utilization increases, contention for the Pentium will cause I/O service times (as seen by the S/390) to increase. The Pentium, then, is one of the resources that can limit the S/390 I/O rate that can be sustained while still providing acceptable response times. Note that the OS/2 I/O rate (as reported by SPM2) is higher than the S/390 I/O rate (as reported by VMPRF). This is because the S/390 I/O emulation code will sometimes split one S/390 I/O request into multiple OS/2 I/O requests. The 64MB results should be viewed as approximate. This is because 1) the measurement was done on a system with 128MB with VM/ESA generated to only use 64MB and 2) the measurement was done using the System 1 configuration. We expect that a measurement done with the System 2 configuration and 64MB would be similar, but with a lower average response time (about 1 second) and a lower Pentium utilization. COMBINED VM/ESA AND LAN FILE SERVER ACTIVITY ____________________________________________ The three measurements summarized in Table 3 illustrate the interactions that can occur when VM/CMS work on the S/390 is combined with LAN file serving work. The first two measurements show the effects of doing each type of work alone in a dedicated manner and the third shows the effects of running the two workloads simultaneously. The following configuration and set-up parameters apply to all three measure- ments: o System configuration 1 o 32M of S/390 storage o 17MB HPFS386 cache o LAZY writes ON specifed for HPFS386 cache o No RAID adapter write caching (Write-through (WT)) o VM set-up is the same as that used on the previous CMS measurements as described in section "VM/ESA: number of CMS users." Except for the size of the HPFS386 cache (which was changed to 17M), OS/2 LAN Server was not tuned. The VM system was tuned in the same manner as all other runs. These tuning parameters may not represent optimums for the combined environment. No significance should be assigned to the relative amount of work that was being run in each of the dedicated runs. There will be different relative amounts of work for each customer environment with potential for combined work. However, since each workload contends for many of the same resources, it is expected that all environments with combined work will show some degree of interaction. The amount of interaction will be dependent on many factors such as I/O rates, cache sizes, storage size, and the specific type of and of amount work being run in each environment. The table shows that the combination of the two workloads caused the response time of the CMS users to increase by about 0.4 sec while the response time to the LAN clients increased by about 7% over their respective dedicated values. The Pentium utilization per I/O is much less for the LAN environment as com- pared to the S/390 I/O support and is mainly due to the processor cycles required for the S/390 I/O emulation. The CMS environment was impacted more than the LAN work by the relatively higher demand that the LAN serving work placed on the disk adapter/array. In this respect the arbitrary amount of LAN work chosen for the measurement was more intense than the CMS work. +----------------------------------------------------------+ | Table 3. PC Server 500 Performance: CMS + LAN Server | | Activity | +-------------------------+----------+----------+----------+ | RUN ID | LANONLY | PC7E5043 | PC7E5041 | +-------------------------+----------+----------+----------+ | Variables | | | | | CMS Users | 0 | 40 | 40 | | MDC BIAS | na | 0.2 | 0.2 | +-------------------------+----------+----------+----------+ | File Serve | | | | | seconds/MB | 1.49 | na | 1.59 | +-------------------------+----------+----------+----------+ | S/390 (CMS) | | | | | Response Time (sec) | | | | | Trivial (R) | na | 0.37 | 0.39 | | Average (T) | na | 0.77 | 1.20 | | Throughput | | | | | ETR (T) | na | 1.40 | 1.38 | | CPU Usage (msec) | | | | | CPU/CMD (V) | na | 156 | 164 | | Processor Util. | | | | | TOTAL (V) | na | 21.9 | 22.6 | +-------------------------+----------+----------+----------+ | SPM2 | | | | | Pentium Util (S) | 17.5 | 40.2 | 60.5 | | I/O Req/sec (S) | 70 | 20 | 87 | | Disk Util (S) | 39.1 | 13.5 | 54.1 | +-------------------------+----------+----------+----------+ | NOTE: T=TPNS, V=VMPRF, S=SPM2, R=RTM | +----------------------------------------------------------+ VSE/ESA: CICS TRANSACTIONS __________________________ A number of measurements were obtained for VSE/ESA native and VM/ESA guest environments using the VSECICS workload to get an idea of how much CICS on VSE transaction processing can be done on the PC Server 500 S/390 under various conditions. A subset of these results is summarized in Table 4. The VSECICS workload consists of seven CICS applications, written in COBOL and assembler, which include order entry, receiving and stock control, inven- tory tracking, production specification, banking, and hotel reservations. These applications invoke a total of 17 transactions averaging approximately 6 VSAM calls (resulting in about 3 DASD I/Os) and 2 communication calls per transaction. Terminals are simulated by an internal driver tool. See Appendix A of VM/ESA Release 2.2 Performance Report, GC24-5673-01, for further information. Except where noted in the results tables, the following conditions apply to all the VSECICS measurements: o dedicated o configuration: System 1 o S/390 real storage size: 32MB o 225 users o 15 second think time o VSE/ESA 1.3.2 o emulated DASD volumes for VSE: 2 system volumes (9336) 10 CICS volumes (9336) For the guest measurements: o VM/ESA Version 1 Release 2.2 o 2 system volumes (9336) o CCWTRANS OFF (V=R run) o 23MB V=R area (V=R run) +----------------------------------------------------------+ | Table 4. PC Server 500 S/390 Performance - VSECICS Work- | | load | +-------------------------+----------+----------+----------+ | RUN ID | NAT390V3 | VR1390V1 | VV1390V3 | +-------------------------+----------+----------+----------+ | Variables | | | | | Mode | native | V=R | V=V | +-------------------------+----------+----------+----------+ | Response Time, sec (C) | 0.52 | 0.53 | 0.55 | +-------------------------+----------+----------+----------+ | Throughput | | | | | ETR (C) | 11.11 | 9.87 | 8.87 | +-------------------------+----------+----------+----------+ | S/390 CPU Usage (msec) | | | | | CPU/CMD (E,V,V) | 64.9 | 71.3 | 80.2 | | Relative CPU/CMD (E,V,|) 1.00 | 1.10 | 1.24 | +-------------------------+----------+----------+----------+ | Processor Utilitzations | | | | | S/390 Util (E,V,V) | 72.1 | 70.4 | 71.1 | | Pentium Util (S) | 42.8 | 37.1 | 32.7 | +-------------------------+----------+----------+----------+ | I/O | | | | | RIO RATE (E) | 35.4 | 31.1 | 27.3 | | Service Time, msec (E)| 32 | 31 | 32 | +-------------------------+----------+----------+----------+ | NOTE: C=CICSPARS, E=EXPLORE(**), V=VMPRF, S=SPM2 | +----------------------------------------------------------+ These measurements were done with 32MB of S/390 memory. This was more than adequate for this workload and there was negligible paging. These results show guest-to-native CPU usage ratios that are similar to those observed on mainframe processors for this workload. The guest-to-native ratio is influenced by the I/O content of the workload. Workloads that are more I/O-intensive will tend to show higher (less favorable) CPU usage ratios, while workloads that are less I/O-intensive will tend to show lower CPU usage ratios. Regarding the V=R case, the PC Server 500 S/390 does not provide I/O passthru (SIE assist). It does, however, support the CCWTRANS OFF optimization. These measurements were obtained using the System 1 configuration. The System 2 configuration would have yielded somewhat lower response times (faster DASD, larger array stripe width) and lower Pentium utilizations (the 3/95 390 LIC level does more efficient S/390 I/O emulation). The default write cache settings were used for these measurements. That is, write caching was specified for OS/2's HPFS cache (lazy on) while write caching was not specified for the RAID adapter cache (write-through). An additional measurement (not shown) was obtained with write caching done in both the HPFS cache and the RAID adapter cache (lazy on, write-back). This showed little or no additional improvement relative to lazy on, write- through. Another measurement was taken with write caching done only by the RAID adapter cache (lazy off, write-back). This yielded very similar results to the lazy on, write-back results shown above. Taken together, these findings indicate that 1) write caching is equally effective at improving performance, whether done in the HPFS cache or the RAID adapter cache and 2) it is only necessary to do write caching in one of these two caches to achieve all or most of the write caching benefits. As discussed in the tuning section, the safest way to do write caching is to use the RAID adapter's cache configured for write-back with the system pro- tected by an uninterruptible power supply with software caching by OS/2 turned off (LAZY=OFF). The pair of measurements summarized in Table 5 illustrate the performance benefits of write caching. +----------------------------------------------------------+ | Table 5. VSECICS Workload - Write Caching Benefits | +-------------------------------+-------------+------------+ | RUN ID | VR1390V6 | VR1390V1 | +-------------------------------+-------------+------------+ | Variables | | | | Lazy Write | OFF | ON | | Users | 140 | 225 | | Mode | V=R | V=R | +-------------------------------+-------------+------------+ | Response Time, sec (C) | 0.94 | 0.53 | +-------------------------------+-------------+------------+ | Throughput | | | | ETR (C) | 6.75 | 9.87 | | ETRR (C) | 1.00 | 1.46 | +-------------------------------+-------------+------------+ | S/390 CPU Usage (msec) | | | | CPU/CMD (V) | 73.3 | 71.3 | +-------------------------------+-------------+------------+ | Processor Utilitzations | | | | S/390 Util (V) | 49.5 | 70.4 | | Pentium Util (S) | 27.0 | 37.1 | +-------------------------------+-------------+------------+ | I/O | | | | RIO RATE (E) | 24e | 31.1 | | Service Time, msec (E) | na | 31 | +-------------------------------+-------------+------------+ | NOTE: C=CICSPARS, E=EXPLORE, V=VMPRF, S=SPM2, | | e=estimated | +----------------------------------------------------------+ When write caching was used, response time improved by 0.4 seconds even though the number of users was increased from 140 to 225. The VSECICS workload has an exceptionally low read/write ratio (0.6). Work- loads having higher read/write ratios would receive correspondingly less benefit from the use of write caching. For this workload, when write caching is used, system capacity is gated by S/390 processor speed. When write caching is not used, system capacity (given the 1-second response time goal) is gated by the I/O subsystem. The write caching measurement was done using HPFS write caching (lazy on). Similar results have been observed with RAID adapter write caching (write- back). SPLITTING I/O ACROSS MULTIPLE ARRAYS ____________________________________ The total disk I/O throughput on this platform can be significantly improved by paying particular attention to the array and channel configuration of the PC Server S/390. Just as in the mainframe world, spreading I/O operations over multiple drives and S/390 channels improves potential throughput. This is accomplished on the PC Server S/390 by creating multiple arrays and spreading the data across multiple RAID adapter channels. The IBM SCSI-2 F/W Streaming-RAID Adapter/A has two SCSI channel connectors on the adapter and 4MB of cache of which more than 3MB are available for caching. The cache may be configured either Write Through or Write Back. Laboratory measurements were performed in three different I/O configurations using a VSE batch workload. o I/O Config 1 - Single adapter with one channel to one array o I/O Config 2 - Single adapter with two arrays (one on each channel) o I/O Config 3 - Two adapters with one channel from each having a single array. Each array contained three 2.25G drives configured into an array with a RAID-5 logical drive. All S/390 DASD volumes in all three configurations were emulated FBA devices. +----------------------------------------------------+ | Table 6. Comparison of utilizing multiple SCSI | | channels and adapters. | +-------------------+----------+----------+----------+ | | I/O | I/O | I/O | | | CONFIG 1 | CONFIG 2 | CONFIG 3 | +-------------------+----------+----------+----------+ | RAID adapters | 1 | 1 | 2 | +-------------------+----------+----------+----------+ | Channels used per | 1 | 2 | 1 | | adapter | | | | +-------------------+----------+----------+----------+ | RAID arrays | 1 | 2 | 2 | +-------------------+----------+----------+----------+ | I/Os per second | 61 | 80 | 98 | +-------------------+----------+----------+----------+ | % increase in | n/a | 31 | 61 | | I/Os from Config | | | | | 1 | | | | +-------------------+----------+----------+----------+ | Pentium utiliza- | 55 | 66 | 81 | | tion | | | | +-------------------+----------+----------+----------+ In I/O Config 2, the data volumes were spread across two arrays and two chan- nels, the I/O overlap increased and allowed an increase in I/Os per second executed by 31%. Taking this a step further, the same workload was run with two arrays and two RAID adapters (one channel each) and achieved 61% improve- ment in I/O rates over the base configuration (one array and one channel). Adding the second adapter provided not only the additional channel, but also an additional 3MB in write back cache. The OS/2 files containing the S/390 DASD volumes that were placed on the second array were chosen so that the I/O in the workload was balanced across the arrays. This action is comparable to tuning by spreading the I/Os across mainframe DASD volumes and channels. Pentium capacity limits the extent to which such extensions to the I/O sub- system can increase I/O capacity. For the I/O Config 3 case, Pentium utili- zation has risen to 81%, indicating that Pentium capacity is coming close to being the limiting resource. Caution: The I/O rates shown in this section were obtained using a batch workload. The I/O rates that can be achieved by interactive workloads, while maintaining adequate response times, will typically be much lower. It should be noted that occupying Bank D of the PC Server 500 requires installation of a DASD backplane (into which the hot-pluggable drives plug) and also the optional 220-watt power supply. The optional power supply pro- vides power for Banks D and E. MVS/ESA PERFORMANCE ___________________ The performance data in the table below provides guidance to help you deter- mine if your dedicated MVS production online workloads will fit on the PC Server 500 System/390. The table includes several online environments and their key characteristics. Due to the inherent I/O content of these work- loads, the disk I/O rate becomes a key factor to consider as you evaluate the potential use of the PC Server 500 System/390 in your business. The data in the table is based on a PC Server 500 System/390 system that has 128M of S/390 storage, 2 bays of 2GB disk drives containing 11 drives total. The drives are configured into two arrays with 1 logical drive each and RAID level 5 was specified. Each logical drive contained 10 3380 (various densi- ties) equivalents and the 3380s were loaded with the workload components so that the I/O rates to each logical drive were no more than 60 percent of the total I/O rate. A smaller system with a single array on a single channel will handle about 1/2 the I/Os and users shown. All workloads have LAZY set to off for the HPFS cache and CICS also has the RAID adapter cache set to write-through. IMS and DB2 have the RAID adapter cache set to write-back, thus allowing disk writes to be cached for these two workloads. +---------------------------------------------------------------------------+ | Table 7. MVS Database Workloads | +-----------+------------+------------+------------+------------+-----------+ | WORKLOAD | USERS/TERMI|ATHINK TIME | RESPONSE | I/OS PER | # OF DISK | | TYPE | | (SECONDS) | TIME | SECOND | ARRAYS | | | | | (SECONDS) | | | +-----------+------------+------------+------------+------------+-----------+ | IMS/DL1 | 210 | 11 | 1 | 50 | 2 | +-----------+------------+------------+------------+------------+-----------+ | IMS/DB2 | 30 | 4 | 1 | 50 | 2 | +-----------+------------+------------+------------+------------+-----------+ | CICS | 140 | 12 | 1 | 40 | 2 | +-----------+------------+------------+------------+------------+-----------+ While the internal response times achieved by the PC Server 500 System/390 are within the specified limits, other S/390 processors, due to their standard I/O design point, typically yield lower response times. Since internal response time is one of several factors that contribute to end user response time. there may be some instances where end user times are longer than those achieved by other S/390 processors. There are many instances, particularly in remote applications, where use of the PC Server 500 System/390 can eliminate or reduce other time components to yield net improvement in overall response time. Interactive users for these MVS workloads are simulated with an IBM internal driver tool and not with TPNS. The IBM internal IMS workload consists of light to moderate transactions cov- ering diverse business functions, including order entry, stock control, inventory tracking, production specification, hotel reservations, banking, and teller systems. These applications are similar to the CICS applications but contain IMS functions such as logging and recovery. The IMS workload contains sets of 17 unique transactions, each using a different database. The workload uses both VSAM and OSAM databases with VSAM primary and sec- ondary indexes. The DB2 workload consists of light to moderate transactions from two defined and well-structured applications -- inventory tracking and stock control. IMS/DC is used as the transaction manager. The applications are functionally similar, but not identical to, two of the IMS/DL1 and CICS applications. The DB2 contains seven unique transactions. Conversational and wait-for-input transactions are not included in the DB2 workload. The CICS workload consists of light to moderate transactions from many of the same applications mentioned for the IMS work. The CICS applications are written in COBOL or Assembler and are functionally similar, but not iden- tical, to the applications used in the IMS workload and use VSAM datasets only. There are six sets of 17 unique transactions. For MVS/TSO performance data, please refer to IBM PC Server 500 S/390 ... Is it right for you? Technical Application Brief referenced in the Introduction section of this document. PERFORMANCE TUNING HINTS AND TIPS _________________________________ This section contains performance and tuning hints that were gathered during laboratory measurements and experiences. It is intended to give information that may be useful in planning for installation and tuning a PC Server 500 System/390. PC SERVER 500 SYSTEM/390 ARRAY CONSIDERATIONS _____________________________________________ ARRAY STRIPE UNIT SIZE On array models of the PC Server 500 System/390, the customer sets the stripe unit size (amount of data written on a given disk before writing on the next disk). The default stripe unit size is 8K. Choices are 8K, 16K, 32K, and 64K. Sizes larger than 8K will probably yield better performance for S/390 work- loads than the default 8K. Also consider the I/O characteristics of any other OS/2 applications that you may run concurrently on the PC Server 500 System/390 when choosing a stripe unit size. For example, larger stripe sizes may not be the best performing choice for LAN file serving workloads. A compromise between larger and smaller stripe sizes might be in order depending on the overall system I/O characteristics. WARNING: Once the stripe unit is chosen and data is stored in the logical drives, the stripe unit cannot be changed without destroying data in the logical drives. WRITE POLICY There are two choices for write policy with the RAID adapter. The default write policy is write-through (WT), where the completion status is sent after the data is written to the hard disk drive. To improve performance, you can change this write policy to write-back (WB), where the completion status is sent after the data is copied to the RAID adapter's cache memory, but before the data is actually written to the storage device. There is 4MB of cache memory of which more than 3MB are available for caching data. WARNING: If you use lose power before the data is actually written to the storage device, data in cache memory is lost. See also section "LAZY writes" for related information. You can achieve a performance improvement by using WB, but you run a far greater risk of data loss in the event of a power loss. An uninterruptible power supply (UPS) can help minimize this risk and is highly recommended for this reason and for the other power protection benefits it supplies as well. OS/2 HPFS CACHE _______________ BASE OS/2 SYSTEM HPFS CACHE SIZE The HPFS.IFS device driver delivered with OS/2 has a maximum cache size of 2048K (2 Megabytes). The /CACHE:NNNN parameter of the IFS device driver specifies the size of the cache. The default is 10% of available RAM (if not specified) with a maximum of 2048K. The specified value after an install of OS/2 is dependent on installed RAM at the time of installation. If you are using the standard OS/2 provided IFS device driver, then specifying /CACHE:2048 is highly recommended. Enter HELP HPFS.IFS at the OS/2 command prompt for further explanation of the parameters. /CRECL ON IFS HPFS CACHE The /CRECL parameter of the HPFS IFS driver allows you to specify the size of the largest record eligible for this cache. The OS/2 default is 4K. From a S/390 perspective, increasing this value may increase cache read hits if the S/390 operating system is performing repetitive I/Os of the same data in blocks bigger than the default 4K. You can use performance analysis tools for each S/390 operating system to understand the characteristics of I/Os that are being performed by the S/390 operating system and applications. OS/2 performance tools like IBM's SPM/2 V2 can also assist in tuning the /CRECL value. Enter HELP HPFS.IFS at the OS/2 command prompt for further explanation of the parameters. OS/2 LANSERVER 4.0 HPFS386 CACHE OS/2 LanServer Advanced provides its own installable file system named HPFS386. HPFS386 provides the ability to specify caches larger than 2M. If you are installing OS/2 LanServer Advanced on the PC Server 500 System/390, then tuning this cache must be done also. Refer to the LanServer Advanced documentation for information on tuning this cache. It is similar to tuning the base OS/2 provided HPFS cache, but is done with a file named HPFS386.INI. LAZY WRITES Lazy writes are defaulted to ON with OS/2's HPFS. If lazy writes are enabled then when a write occurs for a block of data that is eligible for the HPFS cache, the application is given completion status before the data is actually written to the hard drive. The data is actually written to the hard drive during idle time or when the maximum age for the data is reached. Lazy writes are a SIGNIFICANT performance enhancement, especially on non-array models of the PC Server 500 System/390 where there may be no hardware caching on the SCSI adapter. WARNING: There is a risk to the data in the event of an OS/2 software failure or power loss before the data is written from the cache to the hard drive. See section "Write policy" for related information. You can control whether lazy writes are enabled or not with the OS/2 CACHE command (or the CACHE386 command if using HPFS386) as well as maximum age and idle times for the disk and cache buffers. Enter HELP CACHE at the OS/2 command prompt for further information. (Enter CACHE386 ? for help with CACHE386.) OS/2 FAT CACHE ______________ LAZY WRITES Lazy writes are defaulted to ON with OS/2's FAT DISKCACHE. If lazy writes are enabled then when a write occurs for a block of data that is eligible for the FAT cache, the application is given completion status before the data is actually written to the hard drive. The data is actually written to the hard drive during idle time or when the maximum age for the data is reached. Lazy writes are a SIGNIFICANT performance enhancement, especially on non-array models of the PC Server 500 System/390 where there may be no hardware caching on the SCSI adapter. WARNING: There is a risk to the data in the event of a OS/2 software failure or power loss before the data is written from the cache to the hard drive. See section "Write policy" for related information. You can control whether or not lazy writes occur for the FAT cache with parameters on the DISKCACHE= statement in CONFIG.SYS. Enter HELP DISKCACHE at the OS/2 command prompt for more information on DISKCACHE parameters. OS/2 CONFIG.SYS TUNING ______________________ MAXWAIT MAXWAIT in CONFIG.SYS defines the number of seconds that an OS/2 thread waits before being assigned a higher dispatching priority. Applications that are I/O intensive could benefit from setting MAXWAIT=1 in CONFIG.SYS. Since the S/390 operating system running on the PC Server 500 System/390 is likely to be I/O intensive, setting MAXWAIT=1 is generally recommended on the PC Server 500 System/390. The valid ranges for MAXWAIT are 1 to 255. The OS/2 default is 3 seconds. Tuning this setting may only show results when there is other OS/2 work being performed in addition to the S/390 workload. FAT DISKCACHE If your PC Server 500 System/390 has no FAT formatted partitions, then the DISKCACHE= device driver can be commented out (REM) of the PC Server 500 System/390's CONFIG.SYS in order to save some memory. By default, OS/2 places this device driver in CONFIG.SYS. The size of the DISKCACHE may be tuned. Enter HELP DISKCACHE for information on the parameters that may be specified on DISKCACHE. PRIORITY_DISK_IO This command in the CONFIG.SYS file controls whether or not an application running in the foreground of the OS/2 desktop receives priority for its disk accesses over an application running in the background. Because the S/390 operating system is probably serving multiple clients accessing the system over LAN or other communication methods, you would not want users of the S/390 operating system to receive lower priority for the S/390 I/Os in the event someone opens an OS/2 application or window in the foreground. Specifying PRIORITY_DISK_IO=NO is recommended. NO specifies that all appli- cations (foreground and background) are to be treated equally with regard to disk access. The default is YES. YES specifies that applications running in the foreground are to receive priority for disk access over applications running in the background. S/390 DASD DEVICE DRIVERS _________________________ FUNCTIONAL DIFFERENCES The AWSCKD device driver has some functional differences when compared with the AWSFBA device driver. The AWSCKD device driver reads and writes a full track when an I/O is performed. The device driver has an internal cache where the track is kept until it must be flushed. As the AWSFBA device driver does not implement an internal cache, the performance characteristics between the two can be different depending upon the I/O workload. VM/ESA ESA Feature's block paging methodology seemed to benefit from the internal cache of the AWSCKD device driver in controlled laboratory experiments. You should con- sider using 3380 volumes for VM/ESA ESA Feature paging volumes for this reason. You should not generalize this observation into a statement that AWSCKD per- forms better than AWSFBA. In fact, AWSFBA DASD volumes performed extremely well in laboratory experiments and offer some benefits over AWSCKD including finer granularity on OS/2 file allocation sizes, less Pentium time to handle S/390 I/Os, and a close mapping to the underlying sectors of the dasd media. VM/ESA and VSE/ESA utilize FBA DASD in a very efficient manner. The flexi- bility of the PC Server 500 System/390 in supporting both CKD and FBA emu- lated volumes in a mixture allows you to easily have both types in your configuration. LAPS TUNING ___________ Newer technology LAN adapters such as IBM's Streamer family are highly recom- mended for maximizing the communications throughput of the PC Server 500 System/390. XMITBUFSIZE IF USING THE IBM 16/4 TOKEN RING ADAPTER/A Information in this section is specific to the named adapter and does not apply to the "Streamer" family of IBM LAN adapters. The value for XMITBUFSIZE (Transmit Buffer Size) is a tuneable value for this adapter card. The default value used for IBM's 16/4 Token Ring Adapter/A may be a poor choice if you are using VTAM for subarea communications between two VTAM subareas. When performing full screen operations such as XEDIT under VM/CMS, the buffer used by VTAM will exceed the XMITBUFSIZE size specified in PROTOCOL.INI and cause segmentation. For example, when using a 16/4 Token Ring Adapter/A in a laboratory environment, multi-second response time was observed while scrolling in XEDIT when logged on via a cross-domain VTAM session from one PC Server 500 System/390 to another. Increasing the value of XMITBUFSIZE so that it was more than the VTAM RUSIZE restored response time to its expected sub-second value. A rule of thumb for tuning XMITBUFSIZE: z = (VTAM RUSIZE in bytes) + 9 + 40 minimum XMITBUFSIZE = Round-to-next-highest-multiple-of-eight( z ) where the "9" is the nine bytes for the transmit header and the request header, and the "40" is some extra to give a little room for bytes that may not be accounted for at this time. Note that there are different maximums for XMITBUFSIZE depending on whether your token ring is a 4Mbit or 16Mbit ring. For example, the maximum size of XMITBUFSIZE for the IBM 16/4 Token Ring Adapter/A on a 4Mbit ring is 4456. Other older adapters have limits that are smaller still for 4Mbit rings. It should also be noted that in this particular situation, when REMOTE was set ON under CMS/XEDIT, data compression performed by CMS for fullscreen I/O also restored sub-second response time. This indicates the continued value of this virtual machine setting in tuning for VTAM use in a VM environment. VM/ESA TUNING _____________ From a VM/ESA perspective, the same kinds of VM tuning that are done on larger systems apply to VM/ESA when running on the PC Server 500 System/390. While this discussion is not meant to be an exhaustive review, some of the more important CMS tuning actions that affect system-wide performance are discussed here. Most of the discussion items concern eliminating paging I/O by sharing pages in S/390 memory whenever possible. Page counts in the bene- fits below are from VM/ESA 1.2.2. CHECKLIST OF COMMON SYSTEM PAGE REDUCTION ITEMS ITEM BENEFIT VMLIB/VMMTLIB SHARED IN A SEGMENT _________________________________ All CMS users share 208 pages in real memory. CMSQRYL/CMSQRYH IN A SHARED SEGMENT ___________________________________ All CMS users share 13 pages in real memory for modules handling CMS QUERY and SET commands. (Does not apply to CMS 12 in VM/ESA V2) HELP DISK FSTS IN A SEGMENT All CMS users who use HELP share 128 pages in ___________________________ real memory. REDUCE CP TRACE TABLE SIZE For a system that is not experiencing CP soft- __________________________ ware problems, reduce the CP trace table size from system defaults. Between four and 16 pages should be adequate for most systems. You may need to increase it in the event you experience a problem and trace entries were overwritten because the table wrapped. VM/ESA 1.2.2 and higher allows you to dynamically alter the table size with the CP SET TRACEFRAMES command. USE SHARED FSTS WHEN POSSIBLE When you use SAVEFD to save a copy of the mini- _____________________________ disk directory for a minidisk used read only by several users, you avoid private copies of directories in each user's virtual machine. USE DIRCONTROL SFS DIRECTORIES ______________________________ DIRCONTROL directories in VM Dataspaces for seldom updated directories provide shared page access to data. BIAS AGAINST MDC Use SET MDCACHE command to bias against mini- ________________ disk caching leaving more real storage avail- able to reduce paging. (VM/ESA 1.2.2 and higher) CHECKLIST OF ITEMS THAT HELP IMPROVE NETWORK RESPONSE ITEMS ITEM BENEFIT SET REMOTE ON Compresses nulls and repetitive strings of _____________ characters in displays. Minimizes the amount of data transmitted for fullscreen CMS and XEDIT and shortens the buffer, thus speeding transmission. TRADEMARKS __________ The following terms, are trademarks of the IBM Corporation in the United States or other countries: IBM LANStreamer OS/2 VM/ESA VSE/ESA MVS/ESA ACF/VTAM S/390 System/390 The following terms, denoted by a double asterisk(**) in this publication are trademarks of other companies as follows. Explore is a trademark of Legent Corporation. Pentium is a trademark of Intel Corporation.