PowerVM AIX Monitoring Mitec W210A 2&3

Published on February 2017 | Categories: Documents | Downloads: 37 | Comments: 0 | Views: 229

of 76

Content

PowerVM/AIX Monitoring
MITEC Session W 210A-2 & 3

Steve Nasypany
[email protected]

Agenda
What This Presentation Covers
PowerVM Review
Monitoring Questions
Topology/System Aggregation
VIOS Summary
Metrics/Tools
– CPU
– Memory
– IO

2

© 2013 IBM Corporation

What This Presentation Covers
There are a plethora of Performance Management and Monitoring
products and free software. Listings and links are provided for reference
later in this presentation.
This presentation does not focus on any product solution – you should
review your requirements and contact the vendors that interest you. Many
higher end products have trial versions or can demonstrate their
functionality in person. Many of the free products have vast capabilities –
but require study, experimentation and configuration work.
This presentation is focused on the basic metrics that should be monitored
for assessing performance, and where you can find them in AIX

3

© 2013 IBM Corporation

PowerVM Virtualization Definitions Review
Logical Partitions (LPARs) are defined to be dedicated or shared
Shared partitions use whole or fractions of CPUs (smallest increment is 0.1, can be greater than 1.0)
Dedicated partitions use whole number of CPUs, traditional Unix environment
Dedicated-donating can donate free cycles to a shared pool to allow higher capacity

Shared Processor Pools are a subset or all of the physical CPUs in a system. Desire is
to virtualize all partitions to maximize exploitation of physical resources.
Shared Pool partitions run on Virtual Processors (VP). Each Virtual Processor maps to
one physical processing unit in capacity (a physical core in AIX).
Capacity is expressed in the form of a number of 10% CPU units, called Entitlement
Entitlement value for an active partition is a guarantee of processing resources
Sum of partition entitlements must be less than or equal to physical resources in shared pool
Desired:
Desired size of partition at boot time
Minimum:
Partition will start will less than desired, but won’t start if Minimum not available
Maximum:
Dynamic changes to desired cannot exceed this capacity

Capped vs Uncapped
Capped:
Entitlement is a hard limit, similar to a dedicated partitions
Uncapped:
Capacity is limited by unused capacity in pool and number of Virtual Processors a
partition has configured

When the pool is constrained, Variable Capacity Weight setting provides automatic load
balancing of cycles over entitlement
4

© 2013 IBM Corporation

Questions
How do we monitor for shared CPU pool constraints?
– AIX provides metrics to show physical and entitlement utilization on
each partition
– AIX optionally provides the amount of the shared pool that is idle when you are out of shared pool, you are constrained
What about Hypervisor metrics?
– There are really no metrics that see into the hypervisor layer
– Metrics like %hypv in AIX lparstat are dominated by idle time, as idle
cycles are ceded to they hypervisor layer
This presentation will help you with determining what performance metrics
are important at the partition and frame level

5

© 2013 IBM Corporation

Questions
What product is best for interactive and short-term analysis of AIX
resources?
– nmon provides access to most of the important metrics required for
benchmarks, proof-of-concepts and regular monitoring
• CPU
vmstat, sar, lparstat, mpstat
• Memory
vmstat, svmon
• Paging
vmstat
• Hdisk
iostat, sar, filemon
• Adapter
iostat, fcstat
• Network
entstat, netstat
• Process
ps, svmon, trace tools (tprof, curt)
• Threads
ps, trace tools (curt)
– nmon Analyser & Consolidator provide free and simple trend reports
– Not provided in AIX/nmon
• Java/GC
must use java tools
• Transaction times
application specific
• Database
database products

6

© 2013 IBM Corporation

nmon On Demand Recording (ODR)
Ideal for benchmarks, proof-of-concepts and problem analysis
Allows “high-resolution” recordings to be made while in monitoring mode
– Records samples at the interactive monitoring rate
– AIX 5.3 TL12 & AIX 6.1 TL05
Usage
– Start nmon, use “[“ and “]” brackets to start and end a recording
• Records standard background recording metrics, not just what is on
screen.
• You can adjust the recorded sampling interval with
-s [seconds]
on startup
Interactive options “-” and “+” (<shift> +) do NOT change ODR interval

– Generates a standard nmon recording of format:
<host>_<YYYYMMDD>_<HHMMSS>.nmon
– Tested with nmon Analyser v33C, and works fine

7

© 2013 IBM Corporation

nmon ODR

8

© 2013 IBM Corporation

Currency/Naming
The majority of commercial products supporting PowerVM/AIX monitoring
provide access to all of the important virtualization metrics
Metric naming is a problem. In many cases, different implementations
display the same data, but have different naming conventions. Any
customer wanting to evaluate these products will have to have an AIX
specialist study the products metric definitions to map apples-to-apples.
– Many products pre-package user interface views that aggregate these
metrics
– But usage modes for recording, post-processing, capacity planning
can all vary
– Many customers are using products in older “dedicated” modes and
are not aware of differences in monitoring virtualized systems
Customers interested in a particular solution should evaluate each product
for complete support. Most, if not all, of these products are available for trial
evaluation.

9

© 2013 IBM Corporation

Currency/Naming Example (Tivoli)
AIX LPAR

VIOS

KPX_memrepage_Info
KPX_vmm_pginwait_Info
KPX_vmm_pgfault_Info
KPX_vmm_pgreclm_Info
KPX_vmm_unpin_low_Warn KPX_vmm_pgout_pend_Info
KPX_Pkts_Sent_Errors_Info
KPX_Sent_Pkts_Dropped_Info
KPX_Pkts_Recv_Errors_Info KPX_Bad_Pkts_Recvd_Info
KPX_Recv_pkts_dropped_Info KPX_Qoverflow_Info
KPX_perip_InputErrs_Info
KPX_perip_InputPkts_Drop_Info
KPX_perip_OutputErrs_Info KPX_TCP_ConnInit_Info
KPX_TCP_ConnEst_Info
KPX_totproc_cs_Info KPX_totproc_runq_avg_Info
KPX_totproc_load_avg_Info KPX_totnum_procs_Info
KPX_perproc_IO_pgf_Info KPX_perproc_nonIO_pgf_Info
KPX_perproc_memres_datasz_Info
KPX_perproc_memres_textsz_Info
KPX_perproc_mem_textsz_Info KPX_perproc_vol_cs_Info
KPX_Active_Disk_Pct_Info
KPX_Avg_Read_Transfer_MS_Info
KPX_Read_Timeouts_Per_Sec_Info
KPX_Failed_Read_Per_Sec_Info
KPX_Avg_Write_Transfer_MS_Info
KPX_Write_Timeout_Per_Sec_Info
KPX_Failed_Writes_Per_Sec_Info
KPX_Avg_Req_In_WaitQ_MS_Info
KPX_ServiceQ_Full_Per_Sec_Info
KPX_perCPU_syscalls_Info KPX_perCPU_forks_Info
KPX_perCPU_execs_Info
KPX_perCPU_cs_Info
KPX_Tot_syscalls_Info
KPX_Tot_forks_Info
KPX_Tot_execs_Info
KPX_LPARBusy_pct_Warn
KPX_LPARPhyBusy_pct_Warn KPX_LPARvcs_Info
KPX_LPARfreepool_Warn KPX_LPARPhanIntrs_Info
KPX_LPARentused_Info KPX_LPARphyp_used_Info
KPX_user_acct_locked_Info KPX_user_login_retries_Info
KPX_user_idletime_Info

KVA_memrepage_Info
KVA_vmm_pginwait_Info
KVA_vmm_pgfault_Info
KVA_vmm_pgreclm_Info
KVA_vmm_unpin_low_Warn KVA_vmm_pgout_pend_Infov Networking
KVA_Pkts_Sent_Errors_Info KVA_Sent_Pkts_Dropped_Info
KVA_Pkts_Recv_Errors_Info KVA_Bad_Pkts_Recvd_Info
KVA_Recv_pkts_dropped_Info
KVA_Qoverflow_Info
KVA_Real_Pkts_Dropped_Info KVA_Virtual_Pkts_Dropped_Info
KVA_Output_Pkts_Dropped_Info KVA_Output_Pkts_Failures_Info
KVA_Mem_Alloc_Failures_Warn
KVA_ThreadQ_Overflow_Pkts_Info KVA_HA_State_Info
KVA_Times_Primary_Per_Sec_Info KVA_perip_InputErrs_Info
KVA_perip_InputPkts_Drop_Info KVA_perip_OutputErrs_Info
KVA_TCP_ConnInit_Info
KVA_TCP_ConnEst_Infov Process
KVA_totproc_cs_Info
KVA_totproc_runq_avg_Info KVA_totproc_load_avg_Info
KVA_totnum_procs_Info
KVA_perproc_IO_pgf_Info KVA_perproc_nonIO_pgf_Info
KVA_perproc_memres_datasz_Info
KVA_perproc_memres_textsz_Info KVA_perproc_mem_textsz_Info
KVA_perproc_vol_cs_Info
KVA_Firewall_Info
KVA_memrepage_Info
KVA_vmm_pginwait_Info
KVA_vmm_pgfault_Info
KVA_vmm_pgreclm_Info
KVA_vmm_unpin_low_Warn KVA_vmm_pgout_pend_Infov Networking
KVA_Pkts_Sent_Errors_Info KVA_Sent_Pkts_Dropped_Info
KVA_Pkts_Recv_Errors_Info KVA_Bad_Pkts_Recvd_Info
KVA_Recv_pkts_dropped_Info
KVA_Qoverflow_Info
KVA_Real_Pkts_Dropped_Info
KVA_Virtual_Pkts_Dropped_Info
KVA_Output_Pkts_Dropped_Info KVA_Output_Pkts_Failures_Info
KVA_Mem_Alloc_Failures_Warn
KVA_ThreadQ_Overflow_Pkts_Info KVA_HA_State_Info
KVA_Times_Primary_Per_Sec_Info KVA_perip_InputErrs_Info
KVA_perip_InputPkts_Drop_Info KVA_perip_OutputErrs_Info
KVA_TCP_ConnInit_Info
KVA_TCP_ConnEst_Infov Process

10

VIOS (cont)
KVA_totproc_cs_Info
KVA_totproc_runq_avg_Info KVA_totproc_load_avg_Info
KVA_totnum_procs_Info
KVA_perproc_IO_pgf_Info KVA_perproc_nonIO_pgf_Info
KVA_perproc_memres_datasz_Info
KVA_perproc_memres_textsz_Info
KVA_perproc_mem_textsz_Info KVA_perproc_vol_cs_Info
KVA_Firewall_Info
KVA_Active_Disk_Pct_Info KVA_Avg_Read_Transfer_MS_Info
KVA_Read_Timeouts_Per_Sec_Info
KVA_Failed_Read_Per_Sec_Info
KVA_Avg_Write_Transfer_MS_Info
KVA_Write_Timeout_Per_Sec_Info
KVA_Failed_Writes_Per_Sec_Info
KVA_Avg_Req_In_WaitQ_MS_Info
KVA_ServiceQ_Full_Per_Sec_Info
KVA_perCPU_syscalls_Info
KVA_perCPU_forks_Info
KVA_perCPU_execs_Info
KVA_perCPU_cs_Info
KVA_Tot_syscalls_Info KVA_Tot_forks_Info
KVA_Tot_execs_Info
KVA_LPARBusy_pct_Warn KVA_LPARPhyBusy_pct_Warn
KVA_LPARvcs_Info
KVA_LPARfreepool_Warn
KVA_LPARPhanIntrs_Info
KVA_LPARentused_Info
KVA_LPARphyp_used_Info KVA_user_acct_locked_Info
KVA_user_login_retries_Info
KVA_user_idletime_Info
HMC
KPH_Busy_CPU_Info
KPH_Paging_Space_Full_Info
KPH_Disk_Full_Warn
KPH_Runaway_Process_InfoThe

© 2013 IBM Corporation

Currency/Naming Example (HP)
Excerpts from Agent definitions:
BYLS_CPU_PHYS_TOTAL_UTIL: On AIX, this metric is equivalent to sum of
BYLS_CPU_PHYS_USER_MODE_UTIL and BYLS_CPU_PHYS_SYS_MODE_UTIL. For AIX lpars, the metric
is calculated with respect to the available physical CPUs in the pool to which this LPAR
belongs to
BYLS_LS_MODE This metric indicates whether the CPU entitlement for the logical system is
Capped or Uncapped.
BYLS_LS_SHARED This metric indicates whether the physical CPUs are dedicated to this
logical system or shared.
GBL_CPU_PHYS_TOTAL_UTIL: The percentage of time the available physical CPUs were not idle
for this logical system during the interval. This metric is calculated as
GBL_CPU_PHYS_TOTAL_UTIL =GBL_CPU_PHYS_USER_MODE_UTIL + GBL_CPU_SYS_MODE_UTIL
GBL_POOL_CPU_AVAIL: The available physical processors in the shared processor pool during
the interval.
GBL_POOL_CPU_ENTL: The number of physical processors available in the shared processor
pool to which this logical system belongs. On AIX SPLPAR, this metric is equivalent to
"Active Physical CPUs in system" field of 'lparstat -i' command. On a standalone system,
the value is "na".
GBL_POOL_TOTAL_UTIL: Percentage of time, the pool CPU was not idle during the interval.

11

© 2013 IBM Corporation

IBM AIX Links
AIX Service Tool PERFPMR:
ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/
IBM’s PM for Power Systems
Subscription reports for capacity planning
http://www-03.ibm.com/systems/power/support/pm/index.html
The AIX Community Wiki
http://www.ibm.com/systems/p/community/
AIX Wiki, use links under Performance to find nmon, nmon Analyser and
other tools:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/
Power%20Systems/page/AIX

12

© 2013 IBM Corporation

ISV Monitoring Products
This is not an all-inclusive list, search for the “VIOS Recognized” here:
http://www-304.ibm.com/partnerworld/wps/pub/systems/power/solutions
ATS Group Galileo http://www.theatsgroup.com/
BMC ProactiveNet Performance Management
http://www.bmc.com/products/product-listing/ProactiveNet-PerformanceManagement.html

CA
http://www.ca.com
HP GlancePlus http://www.hp.com/go/software
Metron Athene http://www.metron-athene.com/index.htmll
Orsyp Sysload http://www.orsyp.com/products/software/sysload.html
Power Navigator
http://www.mpginc.com
SAP AIX CCMS Agents http://www.sap.com
Teamquest
http://www.teamquest.com/products-services/full-product-serviceslist/index.htm

13

© 2013 IBM Corporation

System Topology

Processors

Memory

CPU type, frequency, count
# of processors in the shared pool,
pool utilization
Total CEC utilization
Number of unused processors

Total amount of memory in CEC,
Memory allocated to partitions
Unallocated memory

I/O

Partitions

Set of storage adapters
Set of network adapters
Other adapters
Virtual I/O connections

Active partitions,
operating systems,
names and types
inactive partitions

CEC – Central Electronics Complex (System)
14

© 2013 IBM Corporation

System Aggregation
Virtualization allows and even encourages that a system contains multiple
independent partitions, all with resources
Ideally, monitoring tools will easily determine all of the active partitions on a
system and organize the data for those partitions together
– Normally, this requires “consulting” with an HMC to identify the active
partitions on a system
– With mobile partitions, the tools must accommodate the movement of
partitions between systems
– If automatic detection of partitions to systems isn’t possible, some means
of organizing them manually must be applied
Various products do automatic CEC aggregation
Nmon Analyzer and Consolidator allow manual aggregation
The most important metric for CPU pool monitoring is the Available Pool
Processor (APP) value. It represents the amount of a pool not currently
consumed, and can be retrieved from individual partitions or calculated by
aggregators

15

© 2013 IBM Corporation

VIOS Performance Overview
We do not distinguish metrics between regular shared partitions and Virtual I/O Server
partitions
– All the metrics have the same meanings
– nmon recording works on both
– We treat the VIOS just like we’d treat dedicated partitions hosting adapters
– As many VIOS host Shared Ethernet devices, we will cover this information in the
network section
Properly configured, a VIOS system can perform as well and achieve the same
throughputs as a dedicated system
– We tune the OS, adapters, queue depths exactly the same
– All IBM storage devices have Redbooks with best practices/performance sections
– See Best Practices section for VIOS recommendations
If you have IO performance issues, review the VIOS first
– Make sure you aren’t out of CPU
– Make sure you aren’t out of memory
A persistent complaint from customers is that there are no breakdowns of NPIV clients
activity on VIOS, and each client must be individually monitored. We are working on it.

16

© 2013 IBM Corporation

VIOS CPU & Memory
VIOS overhead is not a problem
– Virtual SCSI architecture between client and server is simple, a read is a read and a
write is a write. These are in-memory transfers.
– CPU overhead is a tiny fraction of the overall I/O time
• VIOS cpu entitlements are typically fractions of a physical core, more Virtual
Processors are better than running VIOS dedicated
• Always run production VIOS uncapped
– Monitor the same CPU and memory statistics covered later in this presentation
– Adjusting the VIO server’s Variable Capacity Weighting (biasing a partition’s access
to free cycles when the shared pool is constrained) is advised for heavier production
IO workloads
– Hypervisor will allocate private memory for adapters depending on type and
number. Allocations supported by the System Planning Tool
http://www.ibm.com/systems/support/tools/systemplanningtool/

17

© 2013 IBM Corporation

Metrics: CEC
CPU
– Number of CPUs
• Dedicated
• Shared pool
• Unallocated
– Entitlement settings
– Utilization %
• Dedicated consumed
• Pool consumed (alternatively, entitlement consumed or free)
Memory
– Allocated, in use, computational, non-computational
– Unallocated
IO
– Aggregated adapter totals – read, write, IOPS, MB/sec

18

© 2013 IBM Corporation

HMC Performance Information
HMC -> System Management ->
Server -> Operations -> Utilization
Data
–Change Sampling Rate
• 30 seconds, 1/5/30 minutes,
1 hour
–View
• Snapshot, Hourly, Daily,
Monthly
• Beginning and ending
time/date
• All, Utilization sample,
Configuration change, Utility
CoD usage
• Maximum number of events

19

© 2013 IBM Corporation

HMC Performance Information
System Utilization Views
– System summary
– Partition Processor/Memory
– Physical Processor, Shared Process or Shared Memory Pool

20

© 2013 IBM Corporation

Metrics: Dedicated CPU Partition
Unix CPU “buckets” utilization
%User: Fraction of entitlement consumed in user program
%System: Fraction of entitlement consumed running in the AIX kernel
• The time in the kernel could be from system calls made by user programs, interrupts,
or kernel daemons
• Sometimes %system is used as an indicator of health. If system becomes very high, it
*might* mean that a problem exists. But, some subsystems such as the NFS server
code run in system mode, so this is not always a good indicator
%Idle:
Fraction of entitlement that the partition was not running any processes or
threads
%I/O wait: Fraction of the entitlement that the partition was not running any processes or
threads, and there was disk I/O outstanding
• %I/O wait is also sometimes used as a health indicator. However, like system time, it
is not always a good indicator due to the nature of the programs that may run
%User + %System + %Idle + %I/O wait = 100% of a fixed “entitlement”
In the dedicated world, the sum of %user + %system is the utilization percentage or busy
time of the dedicated partition
Physical Busy = (%User + %System) X (# of dedicated cpus)
21

© 2013 IBM Corporation

Metrics: PowerVM Shared CPU Partition
CPU “buckets” and “cpu busy” percentages no longer tell us physical
utilization
– This is because the number of physical cores being utilized is no longer a fixed
whole number, and can always be changing
– User and System percentages are relative to consumption
• A logical CPU can be 99% “busy” on 0.01 to 1.0 physical cores

For shared partitions, you must understand new metrics
– Entitlement: physical resources guaranteed to a shared partition
• Can range from 0.1 to maximum number of physical cores
• When entitlement is not being used, it can be ceded back to the pool
– Entitlement Consumed (reported as entc% or ec%)
– Physical Consumed (reported as physc or pc in different tools)

Entitlement Consumed
– Capped partitions can only go to 100%
– Uncapped partitions can go over 100%
• How far depends on the number of Virtual Processors (VP)
• Active VP count defines # of physical cores a partition can consume
• If a partition has an entitlement of 0.5 and 4 VP’s, the maximum entitlement
consumed possible is 800% (4.0/0.5 = 8), and maximum physical consumption
is 4.0

22

© 2013 IBM Corporation

POWER6 vs POWER7 SMT Utilization
POWER6 SMT2
Htc0
Htc1

busy
idle

Htc0

busy

Htc1

busy

100%
busy

100%
busy

POWER7 SMT2
Htc0
Htc1

busy
idle

Htc0

busy

Htc1

busy

POWER7 SMT4
~70%
busy

100%
busy

Htc0

busy

Htc1

idle

Htc2

idle

Htc3

idle

~65%
busy

“busy” = user% + system%

Simulating a single threaded process on 1 core, 1 Virtual Processor, utilization values
change. In each of these cases, physical consumption can be reported as 1.0.
Real world production workloads will involve dozens to thousands of threads, so
many users may not notice any difference in the “macro” scale
Whitepapers on POWER7 SMT and utilization
Simultaneous Multi-Threading on POWER7 Processors by Mark Funk
http://www.ibm.com/systems/resources/pwrsysperf_SMT4OnP7.pdf
Processor Utilization in AIX by Saravanan Devendran
https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki
/Power%20Systems/page/Understanding%20CPU%20utilization%20on%20AIX

23

© 2013 IBM Corporation

SMT, Dispatch Behavior & Consumption
SMT Thread
POWER7 processors can run in Single-thread,
SMT2, SMT4 modes
Primary
– Like POWER6, the SMT threads will dynamically
adjust based on workload
– SMT threads dispatch via a Virtual Processor (VP) Secondary
– POWER7 threads start with different priorities on
Primary, Secondary and Tertiary instances, but
Tertiary
can be equally weighted for highly parallel
workloads

0
1
2

3

POWER5 and POWER6 overstate utilization as the CPU utilization algorithm does not
account for how many SMT threads are active
– One or both SMT threads can fully consume a physical core and utilization is 100%
– On POWER7, a single thread cannot exceed ~65% utilization. Values are calibrated
in hardware to provide a linear relationship between utilization and throughput
When core utilization reaches a certain threshold, a Virtual Processor is unfolded and
work begins to be dispatched to another physical core

24

© 2013 IBM Corporation

POWER6 vs POWER7 Dispatch
Another Virtual Processor is activated at the utilization values below (both systems
may have a reported physical consumption of 1.0):
POWER7 SMT4
POWER6 SMT2
Htc0

busy

Htc1

busy

~80% busy

Htc0

busy

Htc1

idle

Htc2

idle

Htc3

idle

Activate

Virtual
Processor

~50% busy

There is a difference between how workloads are distributed across cores
in POWER7 and earlier architectures
– In POWER5 & POWER6, the primary and secondary SMT threads are
loaded to ~80% utilization before another Virtual Processor is unfolded
– In POWER7, all of the primary threads (defined by how many VPs are
available) are loaded to at least ~50% utilization before the secondary
threads are used. Once the secondary threads are loaded, only then
will the tertiary threads be dispatched. This is referred to as Raw
Throughput mode.
– Why? Raw Throughput provides the highest per-thread throughput and
best response times at the expense of activating more physical cores
25

© 2013 IBM Corporation

POWER6 vs POWER7 Dispatch

proc0

proc1

proc2

proc3
Primary

POWER6

Secondary

proc0

POWER7

proc1

proc2

proc3
Primary
Secondary
Tertiaries

Once a Virtual Processor is dispatched, the Physical Consumption metric will
typically increase to the next whole number
Put another way, the more Virtual Processors you assign, the higher your Physical
Consumption is likely to be

26

© 2013 IBM Corporation

POWER7 Consumption: A Problem?
POWER7 may activate more cores at lower utilization levels than
earlier architectures when excess Virtual Processors are present
Customers may complain that the physical consumption (physc or
pc) metric is equal to or possibly even higher after migrations to
POWER7 from earlier architectures. They may also note that CPU
capacity planning is more difficult in POWER7 (discussion to follow)
Expect every POWER7 customer with this complaint to also have
significantly higher idle% percentages over earlier architectures
Expect that they are consolidating workloads and may also have
many more VP’s assigned to the POWER7 partition.

27

© 2013 IBM Corporation

POWER7 Consumption: Capacity Planning
Because POWER5 and POWER6 SMT utilization will always be at or above 80%
before another VP is activated, utilization ratios (80% or 0.8 of a core) and physc of
1.0 core may be closer to each other than POWER7 environments
– Physical Consumption alone was close enough for capacity planning in
POWER5/POWER6 and many customers use this
– This may not be true in POWER7 environments when excess VPs are present
Under the default “Raw” throughput mode, customers that do not want to reduce VPs
may want to deduct higher idle buckets (idle + wait) from capacity planning metric(s)
Physical Busy = (User + System)% X Reported Physical Consumption

This is reasonable presuming the workload benefits from SMT. This will not work
with single-threaded “hog” processes that want to consume a full core.
AIX 6.1 TL8 & AIX 7.1 TL2 offer an alternative VP activation mechanism known as
“Scaled Throughput”. This can provide the option to make POWER7 behavior more
“POWER6 Like” – but this is a generalized statement and not a technical one.

28

© 2013 IBM Corporation

Metrics: Shared Pool Monitoring, Step 1
The most important metric in
PowerVM Virtualization monitoring is
the Available Pool Processor (APP)
value. This represents the current
number of free physical core resources
in the shared pool.
Only partitions with this collection
setting can display this pool value – it
is obtained via a hypervisor
mechanism
The topas –C command calculates
this value automatically because it
collects utilization values from each
AIX instance.
Change is dynamic and takes effect
immediately for lparstat (various
products may not see the value until
they restart or recycle – for recording
agents, typically at beginning of day)

29

© 2013 IBM Corporation

Metrics: APP in lparstat

Logical CPUs

Pool size
Entitlement

[email protected] /tmp # lparstat -h 1 4
System configuration: type=Shared mode=Capped smt=On lcpu=4 mem=4096 psize=2 ent=0.40

30

%user

%sys

%wait

%idle physc %entc

lbusy

app

vcsw phint

%hypv hcalls

-----

----

-----

----- ----- ----- ------

---

---- -----

----- ------

84.9

2.0

0.2

12.9

0.40

99.9

27.5

1.59

521

2

13.5

2093

86.5

0.3

0.0

13.1

0.40

99.9

25.0

1.59

518

1

13.1

490

physc

Shows the number of physical processors consumed. For a capped partition this
number will not exceed the entitled capacity. For an uncapped partition this number
could match the number of processors in the shared pool; however, this my be
limited based on the number of on-line Virtual Processors.

%entc

Shows the percentage of entitled capacity consumed. For a capped partition the
percentage will not exceed 100%; however, for uncapped partitions the percentage
can exceed 100%.

app

Shows the number of available processors in the shared pool. The shared pool
‘psize’ is 2 processors. Must set ‘Allow performance information collection’. View
the “properties” for a partition and click the Hardware tab, then Processors and
Memory.

Shared
Mode
Only

© 2013 IBM Corporation

Metrics: nmon APP and Utilization
‘p’ display option

CPU and Entitlement Consumption

Available Pool

31

© 2013 IBM Corporation

LPAR(s) View: nmon Analyser
Analyser can
only see:
- Pool Size
- APP
- Local
Utilization

Other LPAR(s) Utilization of Pool =
Pool Size – APP – Local Utilization

Available Pool
Local LPAR
Physical Used

© 2013 IBM Corporation

LPAR(s) View: nmon Consolidator

View depends on partitions selected
Breakdowns by dedicated and shared partitions
© 2013 IBM Corporation

Topas Partitions & CEC View
topas –C
Upper section displays aggregated CEC information
Lower section displays shared/dedicated data – closely mimics lparstat
Topas CEC Monitor
Partition Info
Monitored : 6
UnMonitored: Shared
: 3
Dedicated : 3
Capped
: 1
Uncapped
: 1

Interval: 10
Thu Jul 28 17:04:57 2006
Memory (GB)
Processor
Monitored :24.6
Monitored :1.2
Shr Physical Busy: 0.30
UnMonitored:
UnMonitored: Ded Physical Busy: 2.40
Available :24.6
Available : UnAllocated:
0
UnAllocated: Hypervisor
Consumed
: 2.7
Shared
:1.5
Virt. Context Switch: 632
Dedicated : 5
Phantom Interrupts :
7
Pool Size : 3
Avail Pool :2.7
APP = Pool Size Host
OS M Mem InU Lp Us Sy Wa Id PhysB Ent %EntC Vcsw PhI
Shared Physical Busy
-------------------------------------shared--------------------------ptoolsl3
A53 c 4.1 0.4 2 14 1 0 84
0.08 0.50 15.0 208
0
ptoolsl2
A53 C 4.1 0.4 4 20 13 5 62
0.18 0.50 36.5 219
5
ptoolsl5
A53 U 4.1 0.4 4
5 0 0 95
0.04 0.50
7.6 205
2
------------------------------------dedicated------------------------ptoolsl1
A53 S 4.1 0.5 4 20 10 0 70
0.30
ptoolsl4
A53
4.1 0.5 2 100 0 0 0
2.00
ptoolsl6
A52
4.1 0.5 1
5 5 12 88
0.10
AIX 5.3 TL-08 topas supports

Dashes represent data not available at OS level

View of each shared pool, if AIX partitions

Can be provided via command-line

CPU cycle donations made by dedicated partitions

Topas can be configured to collect via ssh to HMC

© 2013 IBM Corporation

Metrics: Partition CPU
Run queue length is another well known metric of CPU usage
– It refers to the number of software threads that are ready to run, but have to wait
because the CPU(s) is/are busy or waiting on interrupts
– The length is sometimes used as a measure of health, and long run queues usually
mean worse performance, but many workloads can vary dramatically
– It is quite possible for a pair of single-threaded workloads to contend for a single
physical resource (batch, low run queue, bad performance) while dozens of multithreaded workloads share it (OLTP, high run queue, good performance)
In a dedicated processor partition or a capped shared processor partition, the run queue
length will increase with higher CPU utilization
– The longer the run queue length, the more time that software threads must wait for
CPU, increasing response times and generally degrading the end user experience
In an uncapped shared processor partition, the run queue length may not increase with
higher consumption
– Extra capacity of the shared processor pool can be used
– This assumes shared pool resources are available and the partition has adequate
Virtual Processors assigned to fluctuate with the demand
Because run queue is not a consistent indicator depending on workload type and the ability
of shared partitions to vary physical CPU resources, run queue is no longer considered a
good generic indicator
Tools output
– vmstat reports global run queue as “r”
– mpstat reports per logical CPU run queue as “rq”

35

© 2013 IBM Corporation

Metrics: Partition CPU
Context switches
– The number of times a running entity was stopped and replaced by another
– Collected for Threads (operating system) and Virtual Processors (hypervisor)
– There are voluntary and involuntary context switches
How Many “context switches” are Too Many?
– No rules of thumb exist
– Voluntary: Not an issue because it means no work for the CPU
– Involuntary: Could be an issue, but generally the bottleneck will materialize in a
easier to diagnosis metric; such as, CPU utilization, physical consumption,
entitlement consumed, run queue
How can context switch metrics be used?
– Establish a baseline and compare when system encounter performance problems
– When benchmarking or performing a PoC, if these values “blow up,” this is
indicative of software scaling issues (architecture, latches, locks, etc)
Tool outputs
– vmstat reports total context switches as “cs”
– mpstat reports total “cs” and involuntary “ics”
– lparstat reports virtual processor context switches as “vcsw”

36

© 2013 IBM Corporation

Partition Memory (AIX)
Other operating systems focus only on total real used and unused
memory. This is not enough for AIX.
Because AIX memory maps files, the length of the free list (or
number of free pages in the partition) is not usually a good indicator
of memory utilization
– If software does not actually use memory pages it requested, VMM does
not allocate them
– AIX does not actively scrub memory. It scans and frees pages
depending on demand
– The free list will likely spike when a large process exits, as those pages
will become free
– Default AIX memory tunings typically result in total memory use being
reported as near 100%

37

© 2013 IBM Corporation

Metrics: Partition Memory (AIX)
In AIX, you must understand the difference between these types:
– Computational
• Code and process/kernel working storage
• This metric is not available visible to the HMC
– File Cache (sometimes referred to as non-computational)
• File: legacy JFS file pages in memory
• Client: other file pages (JFS2, NFS, etc) in memory
• May be broken out in some tools or reported as one “file cache” value
• Cache will not be cleared from file systems until other IO workloads
write over them or the file systems are remounted
Computational% is the only memory metric that really matters in AIX.
Computational memory ALWAYS has priority over file cache until you
reach Computational = 97%

38

© 2013 IBM Corporation

Metrics: Partition Memory (AIX)
When computational memory becomes short, or the memory management is
mistuned, paging occurs. Paging is the worst consequence of memory
problems, so it should be monitored. Metric names used in vmstat:
– pi: Reads from paging devices
– po: Writes to paging devices
– sr & fr: Pages scanned and freed to satisfy new allocation request. A
persistently high scan rate (tens of thousands of pages per second) and
low free rate typically indicate the system is struggling to allocate memory.
– All physical paging is bad. If you are not near the computational threshold,
you may need APARs specific to 4K or 64K memory page defects.
Advanced features like Active Memory Sharing and Active Memory
Expansion (AME) require a more complex set of metrics
– AME adds metrics for pages in (ci) and out (co) out of a compressed pool
– If a requested expansion factor is too aggressive, a “memory hole” can be
created and may result in paging

39

© 2013 IBM Corporation

Metrics: Process & inode
“How can I figure out process memory?”
– The best tool is the svmon command
– Use –O filters to view
• Process or user breakdowns of cache, kernel, shared and private memory
• It is most important to filter out kernel and shared segments all sophisticated
software products share kernel extension, shared/memory mapped areas
(SGA, PGA, etc), libraries and some code text
“I’ve used nmon, topas or svmon and I can’t account for all the used memory. Where
is it?”
– Likely you a lot of file inode & metadata cached
– This is only available via the “proc” file system. Worst case, these areas in AIX 6.1
can take up to 20% of system memory (tuned lower in AIX 7.1)
# cat /proc/sys/fs/jfs2/memory_usage
metadata cache: 254160896
inode cache: 57212928
total: 311373824

40

(values in bytes)

© 2013 IBM Corporation

Memory Monitoring with vmstat

Key Points:
Computational (avm)
Paging Rates

# vmstat -I 1

Scanning Rates
System configuration: lcpu=2 mem=912MB
kthr
-------r b p
1 1 0
1 1 0
3 0 0
1 1 0
1 2 0
1 2 0
kthr
Memory
Page

41

memory
page
----------- -----------------------------avm
fre fi
fo pi po fr
sr
139893 2340 12288
0
0
0
0
0
139893 1087 4503
0
8 733 3260 126771
139893 1088 9472
0
1 95 9344 100081
139893 1087 12547
0
6
0 12681 13407
140222 1013 6110
1 39
0 6169
6833
139923 1087 6976
0 31
2 7062
7599

faults
cpu
------------ ----------in
sy cs us sy id wa
200 25283 496 77 16 0 7
415 9291 440 82 15 0 3
191 19414 420 77 20 0 3
207 25762 584 71 21 0 7
160 15451 471 83 11 0 5
183 19306 544 79 14 0 7

b

The number of threads blocked waiting for a file system I/O operation to complete.

p

The number of threads blocked waiting for a raw device I/O operation to complete.

avm

The number of active virtual memory pages, which represents computational memory requirements. The maximum avm
number divided by number of real memory frames equals the computational memory requirement.

fre

The number of frames of memory on the free list. Note: A frame refers to physical memory vs. a page which refers to virtual
memory.

fi / fo

File pages In and File pages Out per second, which represents I/O to and from a file system.

pi / po

Page Space Page In and Page Space Page Out per second, which represents paging.

fr / sr

The number of pages scanned ‘sr’ and the number of pages stolen (or freed) ‘fr’. The ratio of scanned to freed represents
relative memory activity. The ratio will start at 1 and increase as memory contention increases. Examine ‘Sr # of pages’ to
steal ‘Fr pages’. Note: Interrupts are disabled at times when ‘lrud’ is running.

© 2013 IBM Corporation

Where is computational memory? vmstat
# vmstat -v
1048576
992384
668618
1
152370
80.0
3.0
90.0
11.4
113546
11.4
90.0
113546

memory pages
Total Real Memory
lruable pages
Memory addressable by lrud
free pages
Free List
memory pools
One lrud per memory pool
pinned pages
maxpin percentage
minperm percentage
maxperm percentage
numperm percentage
file pages
JFS-only, or reports “client” if no JFS
numclient percentage
% file cache if JFS2 only
maxclient percentage
client pages
JFS2/NFS
…
25.4 percentage of memory used for computational pages

New in AIX 6.1 TL06

42

© 2013 IBM Corporation

Where is computational memory? svmon
# svmon -G
memory
pg space

pin
in use

size
233472
262144

inuse
125663
54233

free
107809

pin
108785

work
67825
79725

pers
0
536

clnt
0
4442

lpage
40960
0

virtual
140123

Size: Total # of Memory Frames (Frames are in 4K units)
Inuse: # of Frames in Use
Free: # of Frames on Free List
Pin:

# of pinned Frames

Virtual: Computational
Working (or computational) memory = 140123
%Computational = virtual/size = 140213 / 233472 = 60%
Pers (or JFS file cache) memory = 536
Clnt (or JFS2 and NFS file cache) memory = 4442
43

© 2013 IBM Corporation

Where is computational memory? topas

Metrics related to Memory monitoring
%Comp – Working memory
%Noncomp, %Client – File memory

44

© 2013 IBM Corporation

nmon Memory (‘m’) + Kernel (‘s’)

Scans & Frees (Steals)

Computational
Run Queue + Process Context Switches
45

© 2013 IBM Corporation

Where is computational memory? nmon Analyser
System+Process = Computational

Computational%

nmon Analyser cannot graph computational rates over 100% (physical paging)
© 2013 IBM Corporation

Metrics: CPU & Partition Memory/Paging
CPU

Memory

Dedicated

Shared

RAM

Paging

Total Size
(vmstat –v “memory”
svmon “size”)

Total Size
(lsps -a)

Physical consumed
Entitlement consumed
(lparstat, vmstat, sar)

Computational
(vmstat “avm”, vmstat –v,
svmon “virtual”)

In Use
(lspa –a, svmon)

Available Pool
(lparstat)

Cache
(vmstat –v “client”,
svmon “pers” + “clnt”)

User/System/Idle/Wait
(vmstat, lparstat, sar)

Run Queue
(vmstat, sar –q)

Scan Rate & Free Rate
(vmstat “sr” & “fr”)

Pages In/Out
(vmstat, vmstat -s)

Context Switches
(vmstat, mpstat)

Nmon, nmon Analyser and topas provide all of these metrics
47

© 2013 IBM Corporation

Thresholds
The following sample threshold tables are intended to be examples
and not standard recommendations by IBM
We do not maintain or advise one set of thresholds for all environments

48

© 2013 IBM Corporation

Sample Thresholds: CPU
Metric

49

Threshold
> 80%
n/a

Explanation

cpu busy%
(user+sys)%

Dedicated
Shared

relative to physical
consumption in shared
partitions

Entitlement %

> 100% for 60 minutes, 3X day

Physical consumed
(physc or pc)

90% of lpar Virtual Processors
#VP x .90 = physc_thresh

Physical busy
cpu busy% x physc

80% of lpar VPs
#VP x .80 = physb_thresh

Adjusts for idle time in
physical consumed, you
must compute this

Available Pool

< CEILING(#cores in pool x .10)
for 10 minutes

Pool size >= 4 cores
Ceiling = round up func()

Run Queue

Workload dependent

Context Swithces

Workload dependent

Possible entitlement
under-sizing

Highly variable, monitor
for unusual spikes
© 2013 IBM Corporation

Sample Thresholds: Memory
Metric

50

Threshold

Explanation

Computational %

Green: < 80%
Yellow: 80-92%
Red: > 92%

Assumes AIX 6.1
Physical paging begins at 97%

Free Memory %

none

AIX 6.1 will use all free mem
for file cache (~100% in use)

File Cache %

none

Useful for determining what
percentage of memory is just
cache

Scan:Free ratio

> 8:1 and scanning > 10K
pages/sec for 10 minutes

High filecache turnover, likely
nearing 97% computational

Physical Paging in/out Any

Any persistent paging indicates
over computational rate or
defect

Physical Paging %

Potential to crash

> 80% of page space

© 2013 IBM Corporation

Metrics: Disk/Storage Activity
We need to know how many I/O’s are being read/written and the total bytes
read/written to know where we are relative to bandwidth
– nmon, topas and iostat provide aggregate values, but not relative to
bandwidth
– Many tools/products support this at the hdisk layer, and can become
very complicated on large systems with hundreds or thousands of
hdisks.
– Many tools/products collect adapter statistics, but understanding their
relationship to physical adapters in a virtualized environment can get
complicated. For customers using newer features like NPIV, there is no
simple aggregated view available within AIX at the VIOS level – at this
time

51

© 2013 IBM Corporation

Metrics: Disk/Storage Activity
%utilization (or %busy) of devices and adapters
– While device utilization is a commonly used metric, it is not always
a good metric of quality of service, it can indicate for simple devices
(SCSI) or imply for more complex devices (e.g. FC-attached LUN’s)
performance issues
– Disk utilization measures the fraction of time that AIX has I/O’s
outstanding to a device. A very fast device (say a RAID5 4+1P
LUN on DS4800) can process many I/O’s per second. Even if a
LUN is 100% busy, it may be offering good response time and be
capable of processing more requests per second
– These metrics are good for one thing – sorting between hundreds
or thousands of active vs in-active disks

52

© 2013 IBM Corporation

Metrics: Disk/Storage Activity
Response & service times
– Provide a much better view of whether I/O’s are delayed. There are
two commonly used measures of response time.
– One is the time the I/O is queued at the device. This measures the
responsiveness of the device to I/O’s.
– Another is the time the I/O is queued for service, which might
include time queued in the operating system or device driver. If a
large number of I/O’s are launched at a device, the queued time
may become an important metric.
– AIX is well instrumented for these metrics now and they are the
primary means for assessing storage performance

53

© 2013 IBM Corporation

Queue Depths
If IO service times are reasonably good, but queues are getting filled up, then
– Increase queue depths until:
• You aren’t filling the queues or
• IO service times start degrading (bottleneck at disk)
For hdisks, queue_depth controls the maximum number of in-flight IOs
For FC adapters, num_cmd_elems controls maximum number of in-flight IOs
Drivers for hdisks and adapters have service and wait queues
– When the queue is full and an IO completes, then another is issued
Tools used on partitions to identify queue issues
SDDPCM:# pcmpath query devstats <interval> <count>
# pcmpath query adaptstats <interval> <count>
SDD: # datapath query devstats <interval> <count>
# datapath query adaptstats <interval> <count>
iostat: # iostat –D <interval> <count>
fcstat: # fcstat fcs*

54

© 2013 IBM Corporation

Service times. For SAN environment,

hdisk service times: iostat
hdisk1

xfer:

%tm_act
87.7

read:
Use –l option
For wide output

rps
271.8

write:

wps
0.5

queue:

avgtime
1.1

bps
62.5M
avgserv

tps

minserv
0.2
minserv

4.0
mintime

bread

272.3

9.0
avgserv

Reads > 10 msec & writes > 2 msec are high

1.9
maxtime

0.0

14.1

62.5M
maxserv
168.6
maxserv
10.4
avgwqsz
0.2

bwrtn
823.7
timeouts

fails

0

0

timeouts

fails

0

0

avgsqsz

sqfull

1.2

60

Virtual adapter’s extended throughput report (-D)
Metrics related to transfers (xfer:)
tps
Indicates the number of transfers per second issued to the adapter.
recv
The total number of responses received from the hosting server to this adapter.
sent
The total number of requests sent from this adapter to the hosting server.
partition id
The partition ID of the hosting server, which serves the requests sent by this adapter.
Adapter Read/Write Service Metrics (read:)
avgserv
Indicates the average time. Default is in milliseconds.
minserv
Indicates the minimum time. Default is in milliseconds.
maxserv
Indicates the maximum time. Default is in milliseconds.
Adapter Wait Queue Metrics (wait:)
avgtime
Indicates the average time spent in wait queue. Default is in milliseconds.
mintime
Indicates the minimum time spent in wait queue. Default is in milliseconds.
maxtime
Indicates the maximum time spent in wait queue. Default is in milliseconds.
avgwqsz
Indicates the average wait queue size.
qvgsqsz
Indicates the average service queue size – Waiting to be sent to the disk.
sqfull
Indicates the number of times the service queue becomes full.
55

All –D outputs are rates, except
sqfull, which is an interval delta.
Recent APARs change this to rate.

Can’t exceed queue_depth for the disk
If this is often > 0, then increase queue_depth
Average IO Sizes
read = bread/rps
write = bwrtn/wps

© 2013 IBM Corporation

nmon adapter (‘a’) + hdisk busy (‘o’) and detail (‘D’)
‘D’

‘DDD’

‘DDDD’

56

© 2013 IBM Corporation

FC Adapter
# pcmpath query adaptstats
Adapter #: 0
=============
Total Read Total Write Active Read Active Write Maximum
I/O:
1105909
78
3
0
200
SECTOR: 8845752
0
24
0
88

Maximum of 200 with adapter num_cmd_elems=200 means we filled the queue
# fcstat fcs0
…
FC SCSI Adapter Driver Information
No DMA Resource Count: 4490
No Adapter Elements Count: 105688
No Command Resource Count: 133
…

<- Increase max_xfer_size
<- Consult with IBM
<- Increase num_cmd_elems

• Number of Command Elements guidance is dependent on storage
system and configuration
• For IBM Storage, consult associated Redbook
• For other vendors, consult their documentation

57

© 2013 IBM Corporation

FC Adapter Attributes
Fibre channel adapter attributes:
# lsattr -El fcs0
bus_intr_lvl 8355
bus_io_addr
0xffc00
bus_mem_addr 0xf8040000
init_link
al
intr_priority 3
lg_term_dma
0x1000000
max_xfer_size 0x100000
num_cmd_elems 200
pref_alpa
0x1
sw_fc_class
2

Bus interrupt level
False
Bus I/O address
False
Bus memory address
False
INIT Link flags
True
Interrupt priority
False
Long term DMA
True
Maximum Transfer Size
True
Maximum number of COMMANDS to queue to the adapter True
Preferred AL_PA
True
FC Class for Fabric
True

The max_xfer_size attribute also controls a DMA memory area used to hold data for transfer,
and at the default is 16 MB
Changing to other allowable values increases it to 128 MB and increases the adapter’s
bandwidth
Usually not required unless adapter is pushing many 10’s MB/sec
Change to 0x200000 based on guidance from Redbooks or tools
This can result in a problem if there isn’t enough memory on PHB chips in the IO drawer
with too many adapters/devices on the PHB
Make the change and reboot – check for Defined devices or errors in the error log, and
change back if necessary
NPIV and virtual FC adapters the DMA memory area is 128 MB at 6.1 TL2 or later

58

© 2013 IBM Corporation

nmon FC Monitoring

Option ‘a’ for all adapters or ‘^’ for FC adapters
nmon Analyser shows IO over time

59

© 2013 IBM Corporation

fcstat NPIV Monitoring – NEW!
New breakdown by World Wide Port Name
– fcstat -n wwpn device_name

– Displays the statistics on a virtual port level that is specified by the
worldwide port number (WWPN) of the virtual adapter.
#fcstat -n C050760547E90000
FIBRE CHANNEL STATISTICS REPORT: fcs0
Device Type: 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) (adapter/pciex/df1000f114108a0)
Serial Number: 1B03205232
Option ROM Version: 02781135
ZA: U2D1.10X5
World Wide Node Name: 0xC050760547E90000
World Wide Port Name: 0xC050760547E90000
FC-4 TYPES:
Supported: 0x0000012000000000000000000000000000000000000000000000000000000000
Active:
0x0000010000000000000000000000000000000000000000000000000000000000
Class of Service: 3
Port Speed (supported): 8 GBIT
Port Speed (running):
8 GBIT
Port FC ID: 0x010f00
Port Type: Fabric
Seconds Since Last Reset: 431494
Transmit Statistics
Receive Statistics
-----------------------------------Frames: 2145085
1702630
Words: 758610432
187172864
…

http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hcg/fcstat.htm
60

© 2013 IBM Corporation

Adapter Performance Chart
Adapter
2 Gbps FC adapter (single port)

5716

IOPS 4K

Sustained Sequential b/w

38,461

198 MB/s simplex, 385 MB/s duplex

4 Gbps FC adapter (single port)

5758

n/a

DDR slots: 400 MB/s simplex, ~750
MB/s duplex,
SDR slots: 400 MB/s simplex, 500
MB/s duplex

4 Gbps FC adapter (dual)

5759

n/a

DDR slots: ~750 MB/s,
SDR slots: ~500 MB/s

4 Gbps FC adapter PCI-e

5773

n/a

400 MB/s simplex, ~750 MB/s duplex

4 Gbps FC adapter (dual) PCI-e

5774

n/a

8 Gbps FC dual port PCI-e

10 Gb FCoE PCIe Dual Port

61

FC

5735

5708

~750 MB/s

142,000

750 MB/s per port simplex, 997 MB/s
duplex per port
1475 MB/s simplex per adapter, 2000
MB/s duplex per

150,000

930 MB/s per port simplex, 1900 MB/s
per port duplex
1630 MB/s simplex per adapter, 2290
MB/s duplex per adapter
© 2013 IBM Corporation

Metrics: Network Monitoring
Network monitoring is different than disk monitoring, as it is not a
master/slave model
– Disk requests follow a start/end model
– Network requests may be all send, all receive, or any arbitrary mix
Measure bandwidth
– A 1GB Ethernet link can sustain at most 125MB/sec of bandwidth
– If a link approaches the bandwidth limit, it is likely a point of resource
contention
– Knowledge of topology is important to identify if the links are 100MB,
1GB, or 10GB per second

62

© 2013 IBM Corporation

Metrics: Network Monitoring
Measure packet rates. Each adapter will be limited to some maximum
number of packets per second for small packets (though the cumulative
bandwidth may still approach the wire limits)
Related health metrics may include:
– Packets dropped (a variety of reasons exist), collision errors, timeout
errors, etc
– These health metrics may imply topology issues or logical resource limits
– VIOS Performance Advisor will report on these errors

63

© 2013 IBM Corporation

Network Capacity
How do I know network capacity?
– You should be able to reach 70-80% of line speeds with 1 Gb but 10 Gb
may require special tuning (beyond tcp/send receive, rfc1323, etc)
– On VIOS, if you are not near limits
• Review CPU and Memory, always run uncapped
• Review netstat –s/-v and entstat for any errors
• Review CPU, Memory and network on problem client
– On 10 GB, if you are driving hundreds of thousands of tiny/small
packets/sec at 1500 MTU, or have very short latency requirements, tuning
will be required
• Two POWER7 cores will be required to reach 10Gb
• Large Send/Receive, mtu_bypass (assuming AIX clients only)
• Virtual buffer tunings (backup slide)
• Nodelay Ack, nagle, dog threads
• In extreme environments, using dedicated-donating mode on VIOS
– If tuning exhausted and still have issues, then likely network environment
or APARs required

64

© 2013 IBM Corporation

VIOS Network
Network requirements are the same for CPU, regardless of operation as
dedicated or shared lpar
– Driving 10 Gb adapters to capacity for workloads can take two physical
cores
– Larger packets, larger MTU sizes dramatically decrease CPU utilization
Integrated Virtual Ethernet vs Shared Ethernet
– IVE is the performance solution, will take more memory
– Shared ethernet 1 Gb performance is competitive with IVE, but 10 Gb
performance is more limited for small MTU receives (~5-6 Gb/sec) without
tuning
Virtual Ethernet (Switch)
– Reliable < 1 ms latency times, but driving 1 Gb at normal packet sizes will
consume up to two physical cores
– Virtual switch within hypervisor is not designed to scale to 10 Gb at 1500
MTU for a single session (usually gated by core throughput for single
session)

65

© 2013 IBM Corporation

VIOS Network: CPU Capacity = Throughput

Effect of 'capping' on VIO TCP/IP throughput

1.0

140,000
120,000
100,000
80,000
60,000
40,000
20,000
0

0.8
0.7
0.6
0.5
0.4
12
8
25
6
51
2
10
24
20
48
40
96
81
92
16
38
4

64

32

16

8

4

2

0.3
1

Throughput
(KBytes/Second)

0.9

Packet Size (Bytes)

0.2
0.1
CPU
Entitlement
(capped)

Throughput is a function of entitled capacity
*POWER5 1.65 GHz

66

© 2013 IBM Corporation

Shared Ethernet Tools
Review physical adapter values
– seastat
– nmon/topas
– entstat/netstat
– VIOS Performance Advisor
Check the virtual adapter
– Check CPU utilization
– Shared Ethernet
• lsattr –El en#
• entstat
• topas (AIX 6.1)

67

© 2013 IBM Corporation

SEA Monitoring: seastat
Accounting must first be enabled per device
chdev -dev ent* -attr accounting=enabled
Command line for seastat
seastat -d <device_name> -c [-n | -s search_criterion=value]
<device_name>
-c
-n
-s

shared adapter device
clears per-client SEA statistics
displays name resolution on the IP addresses
search values
MAC address (mac)
VLAN id (vlan)
IP address (ip)
Hostname (host)
Greater than bytes sent (gbs)
Greater than bytes recv (gbr)
Greater than packets sent (gps)
Greater than packets recv (gpr)
Smaller than bytes sent (sbs)
Smaller than bytes recv (sbr)
Smaller than packets sent (sps)
Smaller than packets recv (spr)

68

© 2013 IBM Corporation

SEA Monitoring: seastat
$

seastat –d ent5
================================================================================
Advanced Statistics for SEA
Device Name: ent5
================================================================================
MAC: 32:43:23:7A:A3:02
---------------------VLAN: None
VLAN Priority: None
Hostname: mob76.dfw.ibm.com
IP: 9.19.51.76
Transmit Statistics:
Receive Statistics:
-------------------------------------Packets: 9253924
Packets: 11275899
Bytes: 10899446310
Bytes: 6451956041
================================================================================
MAC: 32:43:23:7A:A3:02
---------------------VLAN: None
VLAN Priority: None
Transmit Statistics:
Receive Statistics:
-------------------------------------Packets: 36787
Packets: 3492188
Bytes: 2175234
Bytes: 272207726
================================================================================
MAC: 32:43:2B:33:8A:02
---------------------VLAN: None
VLAN Priority: None
Hostname: sharesvc1.dfw.ibm.com
IP: 9.19.51.239
Transmit Statistics:
Receive Statistics:
-------------------------------------Packets: 10
Packets: 644762
Bytes: 420
Bytes: 484764292

69

© 2013 IBM Corporation

SEA Monitoring: topas -E
Usage
chdev –dev ent* –attr accounting=enabled
topas –E or from topas screen hit E
Topas Monitor for host: P7_1_vios1 Interval: 2 Wed Dec 15 10:09:13 2010
=======================================================================
Network
KBPS I-Pack O-Pack KB-In KB-Out
ent6 (SEA PRIM)
38.7 5.0 29.0 1.8 36.9
|\--ent0 (PHYS)
19.6 4.0 14.0 1.7 17.9
|\--ent5 (VETH)
19.2 1.0 15.0 0.1 19.0
\--ent4 (VETH CTRL)
0.1 0.0 3.5 0.0 0.1
lo0
2.7 14.0 14.0 1.3 1.3

Note: In order for this tool to work on a Shared Ethernet Adapter, the state of the layer-3
device (en) cannot be in the defined state. If you are not using the layer-3 device on the
SEA, the easiest way to change the state of the device is to change one of its parameters.
The following command will change the state of a Shared Ethernet Adapter’s layer-3
device, without affecting bridging.
chdev -l <sea_en_device> -a state=down

70

© 2013 IBM Corporation

SEA Monitoring: nmon ‘O’ on VIOS

71

© 2013 IBM Corporation

SEA Monitoring: nmon –O recording option
nmon Analyser
V34a SEA &
SEAPACKET tab
Throughput in
KB/sec

Packet Counts

Unfortunately, Analyser does not provide stacked graphs for SEA aggregation views
© 2013 IBM Corporation

FC over Ethernet Adapter (10Gb FC5708)
Test

Direction

Sessions

1500 MTU
Single port

TCP STREAM

send

receive

duplex

TCP_Request
& Response

1 byte
message

Both ports

9000 MTU
Single port

Both ports

1

870 MB/s

1311 MB/s

1076 MB/s

1647 MB/s

4

1068 MB/s

1402 MB/s

1111 MB/s

1668 MB/s

1

785 MB/s

1015 MB/s

1173 MB/s

1393 MB/s

4

925 MB/s

992 MB/s

1179 MB/s

1393 MB/s

1

1439 MB/s

1712 MB/s

1733 MB/s

2106 MB/s

4

1527 MB/s

1914 MB/s

1756 MB/s

2176 MB/s

1

13324 TPS1

26171 TPS

150

182062 TPS

237415 TPS

Host: P7 750 4-way, SMT-2, 3.3 GHz, AIX 5.3 TL12, dedicated LPAR, dedicated adapter
Client: P6 570 with two single-port 10 Gb (FC 5769), point-to-point wiring (no ethernet switch)
1Single

session 1/TPS round trip latency is 75 microseconds, default ISNO settings, no interrupt coalescing

AIX 6.1 should do better with SMT4, Disk I/O will do better due to larger blocks/buffers

73

© 2013 IBM Corporation

Metrics: IO
IO
Hdisk

Storage
Adapter

Network
Adapter

%busy, IO/sec, KB/sec
(iostat, sar -d)

IO/sec
(iostat -as)

Send/Receive Packets
(entstat, netstat)

Read/Write IOPS & KB
(iostat)

Read/Write Bytes
(iostat -as, fcstat)

Send/Receive MB
(entstat, netstat)

Avg Service Time(s)
(iostat –D, sar –d)

%IO relative to bandwidth
(estimate from adapter rates)

%IO relative to bandwidth
(esitimate from adapter rates)

Service Queue Full or Wait
(iostat -D, sar -d)

Service Queue Counters
(fcstat, MPIO pkg commands)

Packet errors, drops, timeouts
(netstat)

nmon, nmon Analyser provide metrics with yellow background

74

© 2013 IBM Corporation

Sample Thresholds: IO
Metric

Threshold

Hdisk %busy

n/a

Busy is not a reliable relative
throughput metric in modern
SANs, but useful for sorting
most active hdisks

Hdisk IOPS

> 300 IOPS per spindle,
poor service times

Expectation of 15K RPM disk
Hdisks comprising array of
physical disks can go higher

Hdisk KB in/out

n/a

Useful for benchmarking

Hdisk service time

Read > 15 msec
Write > 2 msec

Assuming tuned SAN

Hdisk queue full

Any persistence over 3 min

Indicates queue depth tuning
should be reviewed

FC IOPS & Bytes

80% of nominal capacity

Use IBM ATS thruput charts

Network Packet/sec > 100K/sec @ 10Gb
Byte rates 80% of nominal capacity

75

Explanation

High packet rates typically
require advanced tuning

© 2013 IBM Corporation

Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not
actively marketed or is not significant within its relevant market.
Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:
*, AS/400®, e business(logo)®, DBE, ESCO, eServer, FICON, IBM®, IBM (logo)®, iSeries®, MVS, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA,
WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®

The following are trademarks or registered trademarks of other companies.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
HP is a registered trademark of Hewlett-Packard Development Company in the United States, other countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
* All other products may be trademarks or registered trademarks of their respective companies.
Notes:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will
experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual
environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without
notice. Consult your local IBM business contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance,
compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

76

© 2013 IBM Corporation

PowerVM AIX Monitoring Mitec W210A 2&3

Comments

Content

Sponsor Documents

Recommended