Bottlenecks Exposed

Published on December 2016 | Categories: Documents | Downloads: 48 | Comments: 0 | Views: 310
of 28
Download PDF   Embed   Report

Comments

Content


^Bottlenecks Exposed:
The Most Frequently Found Performance
Problems – and How to Nail Them!
Dan Downing, VP Testing Services
MENTORA
Atlanta • Boston • DC • San Jose
404.250.6515 • www.mentora.com
Bottlenecks Exposed – Title Slide
Web Application
Copyright Mentora 2001
2
• Identify common website performance bottlenecks:
• Source (what component they occur on)
• Symptom (how you know there’s a problem)
• Causes (what creates the problem)
• Measurements (how to nail it)
• Cures (how to make it go away)
• Illustrate with examples of B2C, B2B, B2E cases
Audience: Performance Engineer, Load
Testing Expert, with intermediate experience
Objectives
3
Terms & Concepts
• Application Performance Testing: A repeatable methodology for volume-simulation
of real-world applications in a customer’s environment to yield performance results that
can be implemented to deliver efficient utilization of computing resources.
• Scalability: The demonstrated ability (or lack thereof) of a system (or component) to
yield the same response time of a business process irrespective of the magnitude of
the load applied to the system.
• Bottleneck: A hardware component or process or software of the system-under-test
that is causing performance degradation and low scalability under load.
• Resource Utilization: The quantification of a shared computing resource being
consumed by an application process or component.
• Symptom: The outwardly visible but unquantifiable effect of a performance
bottleneck
• Cause: The specific and measurable factor yielding one or more symptoms.
• Cure: The specific action applied to the Cause that will measurably improve the
visible symptom.
• Measurement: A numeric value of a performance-affecting factor that can be
quantified by a monitoring tool and related to a specific component of the system-
under-test.
4
Symptoms
• “It’s Too Slow”
– As perceived from slow browser response by functional
testers
– As measured by poor scalability during first low-load test
– As experienced (too late!) by low productivity by real
production users
• “It’s broken”
– Page ‘never returns’ after button press
– Web server errors (404, 500…)
– Application error messages in application logs
Symptoms are usually veryunspecific!
5
3-Tier Environment
• Network
– Firewall, load balancer, routers, network interface
cards, cabling between all components
• Web Server Tier
– One or more (usually many) low capacity computers
that receive, route, and display results of http requests
from visitors’ browsers
• Application Server Tier
– One or more (often 2) medium-high capacity computers
that receives, applies business logic to, and returns to
the web server the results of the http request
• Database Server Tier
– One or more (usually one with redundant stand-by) high
capacity computers that operate database software,
and access database (often on large disk arrays) for
servicing user data requests
Web Server Sun E220
DB Server Sun E4500
App Server Sun E420
Oracle
6
Performance Bottleneck Sources
Network
Web Server App Server
DB Server
30% 16% >30%
16% 12% 21-30%
25% 40% 11-20%
27% 29% <10%
Ntwk Web
Srvr
How
often?
What in your experience* do you find as the
relative distribution of bottlenecks?
9% 7% >60%
29% 21% 41-60%
32% 48% 21-40%
21% 11% 11-20%
7% 11% <10%
DB
Srvr
App
Srvr
How
often?
* Poll results of 56 Mercury Conference ’01
attendees of intermediate to advanced experience.
7
Performance Bottleneck Sources
In my experience, it’s the application! (~80% of the time)
Network
8%
Web Server
12%
App Server
35%
DB Server
45%
- % distribution is a SWAG based on
experience testing dozens of apps
Most of the application
code resides here…
21-40% (48%)
21-40% (32%)
11-20% (40%)
>30%% (30%)
Highest ranges from
poll shown in color
8
Database (Simple) Anatomy
Data
Data
Data
Log
BI
C
l
i
e
n
t

C
o
m
m
B
u
f
f
e
r
Query
Parser
Query
Opti-
mizer
Query
Plan
Storage
Query
Executor
Metadata cache
Write
Buffer
Shared Memory
Data
Cache
Disk Array (e.g.
Sun A10000)
DB Server (e.g. Sun 4500
quad cpu 2 GB memory)
DB
Connection
Pool
App Server (e.g. Sun 420)
Data
SQL
Data
9
Key DB Server Measurements
Should be ~80% of available user memory on Server, and should average < 75%; else, add!
DB Memory
Should be balanced across all drives, else indicates ‘db hot spot’ on large, hi-access tables,
which need to be striped across multiple drives; avg 20% below disk IO saturation level
Server I/O
Correlates with cache-hit ratio; should decrease run-to-run as cache is tuned
Physical reads/writes
A measure of the number of open client queries; should be low, or could be an indicator of
inefficient query model
Open cursors
A measure of the data-intensiveness of queries; read bytes should be <50% of sent bytes, else
indicates complex application queries should become stored procedures
SQL*Net bytes
rcvd/sent from/to client
A general indicator of db load handling, and should be compared run-to-run
Transactions/second
Should be low (<20%); else could indicate under-sized query cache, old/no optimizer statistics,
or flawed query model in app server function
Parse-to-execute ratio
Should be low for normal transactions (can be high for reporting functions); else indicates that
indexes missing or poorly designed
Table scan blocks/sec
Should be zero at target loads; if not, indicated transaction model design problem
Deadlocks
Should be hi – 90-95% range; else data cache sized too low and too much physical IO
Cache Hit Ratio
Should be low and constant, else yields virtual memory disk IO, which indicates insufficient
memory allocated to DB processes
Server Page Faults/s.
Memory available should stay constant and average below 70-80%; else add memory
Server Memory
Shows raw horsepower consumption on the server; should average 70-80%; else add cpus!
Server CPU
Impact/Range Measurement
10
DB Server Causes & Cures
Pinpoint and correct! Inefficient access method; too many DB
connections; small comm buffers;…
Other
Fix application transaction code Deadlocks non-zero /errors in error log Deadlocks
rerun optimizer statistics high table scan blocks; many slow
functions
Out-of-date statistics
Increase cache size Low cache-hit ratio, hi physical reads Data cache too small
Review/fix transaction logic;
modify DB locking strategy
Hi blocked transactions, high table locks Inefficient concurrency
model
Raise size of query plan cache Hi parsed-to-executed queries ratio Query plan cache too small
Find/add/fix table indexes high table scan blocks; slow function Missing/ineffective indexes
Tune query prepares in App
server / code
Hi open cursors; hi bytes sent from client Overuse of row-at-a-time
processing
Reconfigure DB (add memory,
write processes, threads, …)
Low correlation btw DB and Server
resource utilization; unbalanced I/O
Inefficient DB configuration
Convert client SQL to stored
procedures | optimize slow q’s
Many slow pages; hi 'bytes recvd' by db
server; low db cpu; or: many slow queries
Inefficient SQL query
model
Analyze query plan, optimize
query
Slow page (>10 sec) which ties to a specific
function, thus an SQL query; hi db cpu | IO
Inefficient SQL statement
Cure Measurement Cause
11
Inefficient SQL statement
24%
Inefficient SQL query model
17%
Inefficient DB configuration
14%
Hi row-at-a-time logic
12%
Missing indexes
9%
Inefficient concurrency
model
7%
Query cache too small
7%
Data cache too small
5%
Other
5%
Database Server Causes
~60% of the time the time it’s bad SQL or bad indexes!
12
Example:
B2B Supply Chain Management
• Symptom:
– Transactions that return list data running
very slowly; they don’t scale
• Measurement: (using LR Oracle Monitor)
– Hi table scan blocks
– Low index fast full scans
• Cure:
– Add additional indexes
– Design indexes so queries can be resolved
with index table columns w/o accessing
base table
– Enable fast scan Oracle parameter
Web Server Sun E220
DB Server Sun E420
App Server Sun E420
Oracle
Apache
WebLogic
Oracle
13
LR Oracle Monitor
Table scan blocks
average = 12
Index fast full
scans = 0
14
App Server (Simple) Anatomy
C
o
n
n
e
c
t
i
o
n

M
g
r
Presentation
Manager
Object
Cache
DB Server
App Server (e.g. usually two; Sun
420 dual cpu 1GB memory)
Data
SQL
Web Server
Client
Requests
html
pages
Business Logic
Presentation
Logic
S
e
c
u
r
i
t
y

M
g
r
T
r
a
n
s
a
c
t
i
o
n

M
g
r
D
B

C
o
n
n
.

M
g
r
M
e
s
s
a
g
i
n
g

M
g
r
C
o
m
m
u
n
i
c
.

M
g
r
15
Key App Server Measurements
Should see all app server instance doing similar amount of work; else indicates load balacing
problem
Load balancing
Should contain low/no error messages, low warnings; else indicates application problems
Application log
Memory should track App Server memory, should stabilize at target load at 70% average, else
possible memory leak or add memory
Server Memory
Active sessions should rise with load, and stabilize at less than Total; if does not stabilize,
indicates insufficient processing power to keep up with DB; if maxes out, too few connections
Active/Total DB Pool
Connections
A general indicator of app server load as evidenced by web server request volume, and should
be compared run-to-run and track with load applied
Requests/second
Should be a relatively low ratio vs. non-secure transactions (<15%?); else, eating up cpu, bw
SSL transactions/sec
Should be rise as load increases, stabilize at target load, approximate vendor target/instance;
else, decrease inactive session keep-alive time
Active/Total Sessions
Memory should rise as active sessions grow, should shrink in garbage collection cycle, and
should stabilize at target load at 70% average, else possible memory leak or add memory
App Server memory
Should be hi – 90% range; else data/object caches sized too low and too much physical IO
Cache Hit Ratios
Should be low and constant, else yields virtual memory disk IO, which indicates insufficient
memory allocated to App Server processes
Server Page Faults/s.
Shows raw horsepower consumption on the server; should average 70-80%; else add cpus!
Server CPU
Impact/Range Measurement
16
App Server Metrics & Cures
Cure Measurement Cause
Pinpoint and correct! Low OS resources; erratic
transaction performance
Other
Change object access method Slow object creation Inefficient object access method
Review/relax app security Hi calls on port 7002 Inefficient security model
Pinpoint & diagnose longest
running business processes
Slow specific business function Inefficiently coded transaction
Raise DB connections; lower
no. of App Server instances
Steadily rising active connections, hi
cpu utilization
Poorly configured DB connection
pool
Add cpus, memory; decrease
no. App server instances
Hi cpu, memory, I/O utilization Insufficient hardware resources
Validate proper JVM-to-app
server match; Increase data &
object caches; add HW memory
Low correlation btw App and HW
resource utilization; overall poor
performance
Poorly configured App Server
Tune session keep-alive setting Steadily rising active sessions Sub-optimal session model
Tune app server load balancing Spikes in transaction times Inefficient garbage collection
Find and fix memory faulty
application code
Memory utilization rises steadily,
doesn't recover
Memory leak
17
App Server Causes
Memory leak
15%
Inefficient garbage
collection
12%
Sub-optimal session model
12%
Poorly configured App
Server
12%
Insufficient hardware
resources
10%
Poorly configured DB
connection pool
9%
Inefficiently coded
transaction
11%
Inefficient DB access
architecture
4%
Inefficient object access
method
5%
Other
10%
60% of the time: object caching, SQL, db connection pool;
20% of the time: inefficient application server
18
Example:
B2C Large Retail Web Store
Web Server Sun E420
DB Server Sun E4500
App Server Sun E420
Oracle
• Symptom:
– App server memory leak
• Measurement:
– Steadily increasing, non-recovering
memory usage in Dynamo console
– Memory exhausted and app server dies
over 8 hour run
• Solution:
– Test individual functions
– Isolate errant function not releasing
memory
– Fix code!
– Re-test to validate fix (longevity test)
Apache
ATG Dynamo
Oracle
19
Web Server Metrics & Cures
Cure Measurement Cause
Add cpus, memory; add web servers;
distribute content; add specialized
servers (images, streaming media…)
Hi cpu, memory, I/O; timeout
errors
Insufficient hw capacity
Tune web server configuration Hi I/O, hi memory utilization, low
throughput
Poorly configured server
Review/revise load balancing policies Uneven utilization across web
servers
Unbalanced load across
servers
Review/relax secure transaction
model
Memory utilization >70%, low
throughput; hi port 443 calls
Hi SSL transactions
Diagnose App, DB servers Low OS resource utilization,
overall poor throughput
Other
Reduce keep-alive time; correct
transaction design
Hi ip connections per active
session
Inefficient transaction design
Diagnose / fix application Broken link errors Broken links
Direct firewall and user traffic to
different ports
Hi firewall-to-web server traffic Security too tight
20
Web Server Causes
Security too tight
8%
Broken links
8%
Inefficient transaction
design
11%
Other
12%
Hi SSL transactions
13%
Unbalanced load across
servers
15%
Poorly configured server
15%
Insufficient hw capacity
18%
Major contributor: Secure transactions; often: load
balancing; sometimes: high-resource specialized
functions (external links, email, chat)
21
Example:
B2E Collaborating Communities
Web/ App
Server Dell 1550
DB Server Dell 2450
SQL Server
IIS/Visual
Basic
SQL
Server
Cisco Load
Director
• Symptom:
– Slow overall performance
– DB server low activity
• Measurement:
– Web/App server resources maxed out
– Non-scalable transaction times
• Solution:
– Short-term: Move “Chat” function to
dedicated server
– Long-term: Re-architect system in java,
separate Web and App tiers, introduce
dedicated server for chat and email
functions
22
Network Metrics & Cures
Review/tune configuration of
NICs, Routers, other devices
Hi latency values in network
delay monitor; low throughput
Poor network architecture
Cure Measurement Cause
??? ??? Other
Tune NIC buffers; add 2nd
NIC for failover heartbeat
Low throughput btw servers Poorly configured/insufficient
network interface cards
Loosen security policies;
redesign application security
High traffic btw firewall &
servers
Security too tight
Get hoster to raise bw ceiling;
increase system bw; add NICs
for failover functions
Low, maxed throughput; high
collision rate
Insufficient overall bandwidth
Revise load balancing policy Uneven load at web servers Load balancing ineffective
23
Network Causes
Load balancing ineffective
22%
Insufficent overall
bandwidth
13%
Security too tight
15%
Poorly
configured/insufficient NICs
10%
Other
20%
Poor network architecture
20%
No single major cause; often problem is load
balancing, security, or network architecture.
24
Web Server Sun E420
DB Server Sun E4500
App Server Sun E420
Oracle
Example:
B2C On-line Printing Services
• Symptom:
– Low transaction performance scalability
under load
– High latency across load balancer
• Measurement:
– Unbalanced load on web server tier
• Solution:
– Replace load balancer (bad hardware)
– Change load balancer policies from IP-
based to server-load based
Cisco Load
Director
25
Monitoring Tools
• LoadRunner
– Transaction performance monitor
– Server resource monitor
– Oracle, SQL Server, selected app servers monitors
– Network delay monitor
• Database performance monitoring tools
– Quest Oracle Instance Monitor, Embarcadero, BMC DB Patrol
• App Server System Console (from app server vendor)
• Java object monitoring tools
– JProbe, Performasure (Sitraka)
• Network Analyzer (aka network sniffer)
• Operating system utilities
– Unix top, sar, vmstat, iostat
– 2000/NT Perfmon
26
Tool Example:
WebLogic Console
27
Lessons Learned
1. 80% of the time it is the application or system software, not
the infrastructure!
2. Make friends with your app server, db server, and hardware
monitoring tools!
3. Application architect, DBA, and App Server experts are
indispensable and must be involved during load tests!
4. Arrive armed with the Top 10 Things to check for each
component!
5. Id the measurements you need to be able to make
6. Systems Engineer with networking, firewall, and load
balancer expertise is very handy!
28
Questions?
[email protected]

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close