Verification and Validation of Simulation ModelsVerification and validation of simulation models

Published on July 2016 | Categories: Documents | Downloads: 50 | Comments: 0 | Views: 225
of x
Download PDF   Embed   Report

Verification and validation of simulation models

Comments

Content


E L S E VI E R European Journal of Operational Research 82 (1995) 145-162
EUROPEAN
JOURNAL
OF OPERATIONAL
RESEARCH
T h e o r y a n d Me t h o d o l o g y
Veri fi cati on and val i dati on of si mul ati on model s
J a c k P . C. Kl e i j n e n
CentER and Department of Information Systems and Auditing, Katholieke Universiteit Brabant (Tilburg University),
P.O. Box 90153, 5000 LE Tilburg, Netherlands
Received September 1992; revised November 1993
Abstract
This paper surveys verification and validation of models, especially simulation models in operations research. For
verification it discusses 1) general good programming practice (such as modular programming), 2) checking
intermediate simulation outputs through tracing and statistical testing per module, 3) statistical testing of final
simulation outputs against analytical results, and 4) animation. For validation it discusses 1) obtaining real-world
data, 2) comparing simulated and real data through simple tests such as graphical, Schruben-Turing, and t tests, 3)
testing whether simulated and real responses are positively correlated and moreover have the same mean, using two
new statistical procedures based on regression analysis, 4) sensitivity analysis based on design of experiments and
regression analysis, and risk or uncertainty analysis based on Monte Carlo sampling, and 5) white versus black box
simulation models. Both verification and validation require good documentation, and are crucial parts of assessment,
credibility, and accreditation. A bibliography with 61 references is included.
Keywords: Simulation; Statistics; Regression; Risk analysis; modelling
1. I nt r o duc t i o n
Ter mi nol ogy in t he ar ea of ver i f i cat i on and
val i dat i on or V&V is not st andar d; see Bar l as
and Ca r pe nt e r (1990, p.164, f oot not e 2), Davi s
(1992a, p. 4), and Mur r ay- Smi t h (1992). Thi s pa-
pe r uses t he def i ni t i ons of V & V gi ven in t he
classic si mul at i on t ext book by Law and Kel t on
(1991, p.299): " Ver i f i cat i on is det er mi ni ng t hat a
si mul at i on c omput e r pr ogr a m pe r f or ms as in-
t ended, i.e., debuggi ng t he c omput e r pr ogr a m ....
Val i dat i on is concer ned wi t h det er mi ni ng whe t he r
t he concept ual si mul at i on model (as oppos e d t o
t he c omput e r pr ogr a m) is an accur at e r epr es ent a-
t i on of t he syst em unde r st udy". The r e f or e t hi s
p a p e r assumes t hat ver i f i cat i on ai ms at a ' pe r f e c t '
c omput e r pr ogr am, in t he sense t hat t he corn-
put e r code has no pr ogr a mmi ng er r or s l eft (it
ma y be ma de mor e ef f i ci ent and mor e user
fri endl y). Val i dat i on, however , can not be as-
s umed t o r esul t i n a per f ect model , since t he
per f ect model woul d be t he r eal syst em i t sel f (by
defi ni t i on, any model is a si mpl i fi cat i on of real i t y).
The model shoul d be ' good enough' , whi ch de-
pends on t he goal of t he model . For exampl e,
s ome appl i cat i ons ne e d onl y rel at i ve ( not abso-
l ut e) si mul at i on r es pons es cor r es pondi ng t o dif-
f er ent scenari os; see Sect i on 3.3.
Anot he r wel l - known aut hor on V & V in si mu-
l at i on di scusses t hese i ssues f or t he var i ous phas e s
of model i ng: Sar gent (1991, p. 38) st at es " t h e con-
cept ual mo d e l is t he ma t h e ma t i c a l / l o g i c a l / v e r b a l
r e pr e s e nt a t i on ( mi mi c) of t he pr obl e m ent i t y de-
vel oped f or a par t i cul ar study; and t he c omput e r -
0377-2217/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved
SSDI 0377-2217(94)00016-6
146 J.P.C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162
ized model is t he concept ual model i mpl ement ed
on a comput er . The concept ual model is devel-
oped t hr ough an analysis and modelling phase,
t he comput er i zed model is devel oped t hr ough a
computer programming and implementation phase,
and i nf er ences about t he pr obl em ent i t y ar e ob-
t ai ned by conduct i ng comput er exper i ment s o n
t he comput er i zed model in t he experimentation
phase". The concept ual model is also di scussed i n
det ai l by Or al and Ket t ani (1993).
In pr act i ce V&V ar e i mpor t ant issues. A com-
put er pr ogr am wi t h bugs may gener at e out put
t hat is sheer nonsense, or worse, it may gener at e
subt l e nonsense t hat goes unnot i ced. A nonvali-
dat ed model may l ead t o wr ong decisions. In
pract i ce, veri fi cat i on and val i dat i on ar e of t en
mixed; see Davis (1992a, pp. 5- 6) and also Mi ser
(1993, p.212).
The i nt er est in V&V shows a shar p i ncr ease
in t he USA def ens e communi t y; see Davis (1992
a,b), Fosset t , Har r i son, Wei nt r ob, and Gass
(1993), Pace (1993), Pa c he c o (1988), Williams and
Si kora (1991), and Youngbl ood (1993). I n Eur ope
and Chi na t he def ense or gani zat i ons also seem t o
t ake t he initiative; see Kl ei j nen and Al i nk (1992)
and Wang, Yin, Tang and Xu (1993). The re-
newed i nt er est in V&V is also i l l ust rat ed by t he
publ i cat i on of a monogr aph on val i dat i on by Kne-
pel l and Ar angno (1993) and t he Speci al Issue on
" Mode l Val i dat i on in Oper at i onal Res ear ch" of
t he European Journal of Operational Research;
see Landr y and Or al (1993).
The r e is no st andar d t heor y on V&V. Nei t her
is t her e a s t a nda r d ' box of t ool s' f r om whi ch t ool s
ar e t aken in a nat ur al or der ; see Davis (1992a,
p.19) and Landr y and Or al (1993). The r e does
exist a pl et hor a of phi l osophi cal t heor i es, statisti-
cal t echni ques; soft ware pract i ces, and so on.
Several classifications of V&V met hods are pos-
sible; exampl es a r e pr ovi ded by Davi s (1992a),
Fosset t e t al. (1991), Landr y and Or al (1993),
Or al and Ket t ani (1993), Pace (1993), and
Williams and Si kora (1991). The emphasi s of this
art i cl e i s. on statistical techniques, whi ch may yi el d
r epr oduci bl e, objective, quant i t at i ve dat a about
t he qual i t y of si mul at i on model s. To classify t hese
t echni ques, t he pa pe r stresses t hat in pr act i ce t he
quant i t i es of d a t a on si mul at i on i nput s and out -
put s may vary greatly; also see Bankes (1993),
Or al and Ket t ani (1993, p.223) and Wang et al.
(1993). The obj ect i ve of this paper is t o survey
statistical V&V t echni ques. Mor eover , it i nt ro-
duces two new statistical t echni ques f or valida-
t i on ( based on fami l i ar r egr essi on analysis).
Unf or t unat el y, it will t ur n out t hat t her e ar e
no per f ect sol ut i ons f or t he pr obl ems of V&V in
si mul at i on. The whol e pr ocess has el ement s of
ar t as well as sci ence ( t he title of one of t he first
books on si mul at i on was The Art of Simulation;
see Tocher , 1963). Taki ng a wi der perspect i ve
t han si mul at i on, Mi ser (1993, p. 207) states: " Th e
nat ur e of scientific i nqui ry i mpl i es t hat it is im-
possi bl e t o el i mi nat e pitfalls ent i rel y"; also see
Maj one and Qua de (1980).
Thes e pr obl ems occur in all t ypes of model s
(for i nst ance, economet r i c model s) and in all t ypes
of comput er pr ogr ams (for exampl e, bookkeepi ng
programs), but this pa pe r concent r at es on simula-
t i on model s in oper at i ons research. ( Exper t sys-
t ems or mor e general l y, knowl edge based systems
ar e closely r el at ed t o si mul at i on model s; t hei r
val i dat i on is di scussed in Benbasat and Dhal i wal
(1989); also see Davis (1992a).)
Thi s art i cl e is or gani zed as follows. Sect i on 2
discusses veri fi cat i on. Sect i on 3 exami nes valida-
tion. Sect i on 4 bri efl y reviews document at i on,
assessment , credibility, and accredi t at i on. Sect i on
5 gives suppl ement ar y l i t er at ur e. Sect i on 6 pro-
vides conclusions. I t is fol l owed by a list of 61
r ef er ences. ( To avoid draggi ng al ong a cumul at i ve
list of everyt hi ng publ i shed on V&V in simula-
t i on, onl y t hose publ i cat i ons are i ncl uded t hat
ei t her seem t o deser ve special ment i on or t hat
ar e not ment i oned in t he r ef er ences of this paper .
Thi s pa pe r i ncl udes t hr ee bi bl i ographi es, namel y
Balci and Sar gent (1984a), DeMi l l o, McCr acken,
Mar t i n and Passafi ume (1987), and Youngbl ood
(1993).)
2. Verification
Once t he si mul at i on model has been pro-
gr ammed, t he a na l ys t s / pr ogr a mme r s must check
if this comput er code cont ai ns any pr ogr ammi ng
er r or s (' bugs' ). Several t echni ques ar e appl i cabl e,
J.P.C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162 147
but none is perfect. This paper discusses 1) gen-
eral good programming practice such as modular
programming, 2) checking of intermediate simu-
lation outputs through tracing and statistical test-
ing per module, 3) comparing (through statistical
tests) final simulation outputs with analytical re-
suits, and 4) animation.
2.1. General good programming practice
Software engineers have developed numerous
procedures for writing good computer programs
and for verifying the resulting software, in gen-
eral (not specifically in simulation). Software en-
gineering is indeed a vast area of research. A few
key terms are: modular programming, object ori-
ented programming, chief programmer' s ap-
proach, structured walk-throughs, correctness
proofs. Details are given in Adrion, Branstad and
Cherniavsky (1982), Baber (1987), Dahl (1992),
DeMillo et al. (1987), and Whi t ner a nd Balci
(1989); also see Benbasat and Dhaliwal (1989)
and Davis (1992a). A comprehensive bibliography
can be found in DeMiilo et al. (1987).
Modular testing will be further discussed in
the next subsections. Object orientation was al-
ready implemented in the old simulation lan-
guage Simula 67. The importance of good docu-
mentation for both verification and validation will
be discussed in Section 4.
2.2. Verification of intermediate simulation output
The analysts may calculate some intermediate
simulation results manually, and compare these
results with outputs of the simulation program.
Getting all intermediate results from a computer
program automatically is called tracing. Even if
the analysts do not wish to calculate intermediate
results by hand, they can still 'eyeball' the pro-
gram's trace and look for programming errors.
Davis (1992a, pp.21-23) seems to equate ' eye
bailing' with ' face validity'. Modern simulation
software provides tracing facilities and more ad-
vanced 'debuggers'; see Pegden, Shannon and
Sadowski (1990, pp.137-148).
In practice, many simulation programs are very
big. Good programming requires that the com-
puter code be designed modularly (no 'spaghetti
programming'; see Section 2.1 and Davis, 1992a,
p.23). Then the analysts 'divide and conquer' ,
that is, they verify the total computer code, mod-
ule by module. Different members of the team
may check different modules. Some examples now
follow.
1) The analysts may test the pseudorandom
number generator separately, if they had to pro-
gram that generator themselves or they do not
trust the software supplier's expertise. By defini-
tion, random numbers are continuous statistical
variables, uniformly distributed between zero and
one, and statistically independent. The main
problem in practice is that pseudorandom num-
ber generators give outputs that are not indepen-
dent (but show a ' lattice structure'). Selecting a
new generator may result in better statistical be-
havior. Moreover the pseudorandom number
generator may be wrong because of programming
errors: many generators require either machine
programming or rather sophisticated program-
ming in a higher language.
Schriber (1991, p.317) points out that GPSS/ H
automatically computes chi-square statistics to
test the hypothesis that the pseudorandom num-
bers used in a particular simulation experiment,
are uniformly distributed. Ripley (1988, p.58)
mentions two simulation studies that gave wrong
results because of an inferior generator. Kleijnen
and Van Groenendaal (1992) provide a detailed
discussion of different types of pseudorandom
number generators and of many tests to verify
their correctness.
2) The analysts may further test the subrou-
tines that generate samples from certain non-uni-
form distributions. Experience shows that analysts
may think that the computer gives normal vari-
ates with standard deviation (say) 10, whereas
actually the variates have a variance of 10. This
confusion is caused by the lack of standard nota-
tion: some authors and some software use the
notation N(iz, o-), whereas others use N(/z; o-2).
Similar confusion arises for exponential distribu-
tions: some authors use the parameter (say) A to
denote the mean interarrival time, but others use
that symbol to denote the arrival rate.
The analysts may also specify the wrong unit of
148 z P. c. IOeijnen / European Journal of Operational Research 82 (1995) 145-162
meas ur ement , f or i nst ance, seconds i nst ead of
mi nut es. In this exampl e t he resul t s ar e wr ong by
a f act or 60.
To veri fy t hat t he r andom var i at e subr out i ne
does what it is i nt ended t o do, t he analysts shoul d
first of all r ead t he document at i on of t he subrou-
t i ne. Next t hey may est i mat e t he mean and vari-
ance of t he sampl ed vari abl e, and compar e t hose
statistics wi t h t he t heor et i cal values. Thes e val ues
ar e i ndeed known in a si mul at i on study; f or in-
st ance, service t i mes are sampl ed f r om an expo-
nent i al di st ri but i on wi t h a known mean, namel y
t he mean t hat is i nput t o t he si mul at i on pr ogr am.
Syst emat i c devi at i ons bet ween t he obser ved sta-
tistics and t he t heor et i cal val ues may be det ect ed
t hr ough par amet r i c or t hr ough di st r i but i on- f r ee
tests. An exampl e of a t t est will be di scussed in
Eq. (4).
Ra ndom ( not significant, not syst emat i c) devia-
t i ons bet ween t he sampl e average (say) ~ and its
expect ed val ue /% always occur ( r andom vari-
ables ar e under l i ned) . T o r educe t he ef f ect of
such a devi at i on, a var i ance r educt i on t echni que
( VRT) cal l ed control variates can be appl i ed.
Thi s VRT cor r ect s x, t he si mul at i on out put (for
exampl e, aver age wai t i ng time), f or t he r andom
devi at i on bet ween t he i nput ' s sampl e aver age and
popul at i on mean:
x c = x + f i ( / ~ y - _ ~ ) , ( 1)
wher e a pr ope r choi ce of t he coef f i ci ent / 3 means
t hat t he var i ance of t he new est i mat or _xc is
r educed. See Kl ei j nen and Van Gr oenendaal
(1992, pp. 200- 201) .
I nst ead of t est i ng onl y t he mean or vari ance,
t he analysts may t est t he whol e di st ri but i on of t he
r andom vari abl e. The n t hey can appl y a good-
ness-of-fi t t est such as t he wel l -known chi -square
and Kol mogor ov- Smi r nov tests; see t he survey in
Kl ei j nen (1987, pp, 94- 95) .
2.3. Compari ng f i nal simulation out put s wi t h ana-
lytical results
2.3.1. Int roduct i on
The f i nal out put of (say) a queuei ng simula-
t i on pr ogr am may resul t onl y af t er millions of
cust omer s have been processed. Thi s is i ndeed
t he case i f t he st eady st at e mean wai t i ng t i me is
of i nt er est and t raffi c i nt ensi t y is high. Anot he r
exampl e is pr ovi ded by t he si mul at i on of ' r ar e
event s' such as br eakdowns of highly rel i abl e
systems. Veri fyi ng such t ypes of si mul at i on re-
sponses by hand or by eyebal l i ng t he t r ace (dis-
cussed in t he pr ecedi ng subsect i on) is pract i cal l y
impossible. Rest ri ct i ng at t ent i on t o short t i me
series is misleading.
I n t hese si t uat i ons t he analysts may veri fy t he
si mul at i on r esponse by r unni ng a simplified ver-
sion of t he si mul at i on pr ogr am wi t h a known
analytical solution. Thi s appr oach assumes t hat
t he analysts can i ndeed fi nd a ' t est case' with a
known solution, but this is not an unreal i st i c
assumpt i on. For exampl e, in logistics si mul at i on
t he analysts of t en model real i t y as a queuei ng
system. The n t he analysts can use a t ext book on
queuei ng t heor y t o find f or mul as f or t he st eady
st at e expect at i ons of several t ypes of r esponse
( mean wai t i ng t i me of jobs and mean utilizations
of machi nes). Thes e formul as, however, assume
Mar kovi an ( exponent i al ) arrival and service times,
wi t h (say) n servers: M/ M/ n model s. Fi rst t he
analysts can r un t he si mul at i on pr ogr am with
exponent i al arrival and service times, onl y t o ver-
ify t he cor r ect ness of t he comput er pr ogr am. Sup-
pose t he r esponse of t hat si mul at i on does not
significantly devi at e f r om t he known mean re-
sponse (see t he statistical t est in Eqs. ( 2) - ( 4) in
Sect i on 2.3.2). Next t hey r un t he si mul at i on pro-
gr am wi t h non- exponent i al i nput vari abl es t o sim-
ul at e t he r esponses t hat ar e of real i nt erest t o t he
users. The analysts must t hen hope t hat this
mi nor change in t he comput er pr ogr am does not
i nt r oduce new bugs.
It may be assert ed t hat in all si mul at i on stud-
ies t he analysts shoul d be gui ded by knowl edge of
t heor et i cal model s wi t h known solutions, when
t hey st udy r eal systems. In many si mul at i on stud-
i e s t he analysts model real i t y as a (compl i cat ed)
queuei ng system. The r e is much l i t er at ur e on
queuei ng syst ems. The s e syst ems compr i se
servers, in par al l el and in sequence, and cus-
t omer s who can follow di f f er ent pat hs t hr ough
t he queuei ng net work. For cer t ai n queuei ng net -
works (for exampl e, wi t h i nfi ni t e buf f er s f or wor k
in process) st eady st at e sol ut i ons can be com-
J.P. C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162 1 4 9
put e d numer i cal l y. Besi des nume r ous t ext books
and art i cl es t her e is sof t war e t hat gives anal yt i cal ,
numer i cal , and si mul at i on resul t s; see Kl ei j nen
and Va n Gr oe ne nda a l (1992, p.127). I nde e d much
r es ear ch is goi ng on in queuei ng t heor y wi t h
appl i cat i ons in comput er , communi cat i ons, and
manuf act ur i ng syst ems. I n ot her ar eas ( f or exam-
pl e, i nvent or y ma n a g e me n t and economet r i cs )
t her e is al so a subst ant i al body of t heor y avai l ,
abl e; see Kl ei j nen and Va n Gr oe ne nda a l (1992).
I n a mi ne hunt i ng case st udy t her e is an anal yt i -
cal model besi des a si mul at i on model ; see Kl ei j -
nen and Al i nk (1992). The i mpor t a nc e of ' t he o-
ret i cal anal ysi s' is al so di scussed in Davi s (1992a,
pp. 18- 19) . So a s t r eam of publ i cat i ons and soft -
war e can hel p t he si mul at i on anal yst s t o fi nd
model s t hat ar e r el at ed t o t hei r si mul at i on mod-
els and t hat have anal yt i cal or numer i cal solu-
t i ons. Ge ne r a l syst ems t heor y emphas i zes t hat
t he scope of a st udy can be r e duc e d by ei t her
st udyi ng a subsyst em onl y (say, queuei ng at one
speci fi c machi ne) or by r est r i ct i ng t he r es pons e
t ypes (for exampl e, f i nanci al var i abl es only); al so
see Davi s (1992b). I n t hi s way t he anal yst s may
fi nd si mpl i fi ed model s wi t h known r esponses f or
cer t ai n modul es or t hey may veri fy cer t ai n re-
sponse t ypes of t he t ot al si mul at i on pr ogr am.
Si mul at i ng a r el at ed syst em wi t h known solu-
t i on may al so be us ed t o r educe t he var i ance
t hr ough cont r ol var i at es. Now in (1) y denot es
t he aver age r es pons e of t he s i mul at ed syst em
wi t h known r esponse, /Zy denot es t he known ex-
pe c t e d val ue of t hat r esponse, _x is t he si mul at i on
r es pons e of r eal i nt er est , _x c is t he be t t e r est i ma-
t or, bot h syst ems ar e s i mul at ed wi t h c ommon
ps e udor a ndom number s . The mor e t he t wo sys-
t ems ar e si mi l ar, t he hi gher is t he cor r el at i on
be t we e n t hei r r es pons es and t he l ower is t he
var i ance of t he new es t i mat or f or t he syst em of
r eal i nt er est . Al so see Kl ei j nen (1974, pp. 162-
163).
So t he ef f or t of si mul at i ng a r el at ed syst em
wi t h known sol ut i on may pay off, not onl y in
debuggi ng but al so in var i ance r educt i on t hr ough
cont r ol - var i at es. But t her e ar e no guar ant ees !
I n s ome si t uat i ons no ma t he ma t i c a l st at i st i cs is
ne e de d t o ver i f y t he cor r ect nes s of t he si mpl i fi ed
si mul at i on model , namel y if t hat model has onl y
deterministic i nput s (so t he si mpl i fi ed si mul at i on
is det er mi ni st i c wher eas t he si mul at i on model of
r eal i nt er est may be r andom) . One exampl e is an
i nvent or y model wi t h cons t ant de ma nd pe r pe-
ri od, so - unde r cer t ai n ot her assumpt i ons - t he
classic ' economi c or der quant i t y' ( EOQ) sol ut i on
hol ds. A second exampl e is a single ser ver queue-
ing model wi t h const ant arri val and servi ce t i mes
(say) 1/ A and 1/ / z r espect i vel y wi t h h/ t z < 1, so
it is known t hat t he ut i l i zat i on r at e of t he ser ver
is h / / z and t hat all cus t omer wai t i ng t i mes a r e
zero. Exampl es of economi c mode l s wi t h det er -
mi ni st i c i nput s and known out put s ar e gi ven in
Kl ei j nen and Van Gr oe ne nda a l (1992, pp. 58- 64) .
I n t hese exampl es t he si mul at i on r esponses mus t
be i dent i cal t o t he t heor et i cal r esponses ( except
f or numer i cal i naccuraci es).
2.3.2. Statistical technique
How can anal yst s c ompa r e t he out put of t he
si mpl i fi ed si mul at i on pr ogr a m wi t h its known ex-
pe c t e d val ue? They shoul d under s t and t hat in t he
st eady st at e t he syst em is still st ochast i c ( but t he
pr obabi l i t y l aw t hat gover ns t he st ochast i c pr o-
cess no l onger depends on t he initial st at e), so
mat hemat i cal st at i st i cs is needed. He nc e t hey
shoul d use a st at i st i cal t est t o ver i f y t hat t he
expect ed val ue of y, t he si mul at i on r es pons e of
t he simplified si muf at i on pr ogr am, is equal t o t he
known st eady st at e me a n / z r :
H0: E(_y) = ~ y - ( 2)
The wel l - known St udent t t est assumes nor mal l y
and i ndependent l y di st r i but ed ( NI D) si mul at i on
r es pons es y wi t h me a n /xy and var i ance o -f. To
es t i mat e thqs unknown var i ance, t he anal yst s may
par t i t i on t he si mul at i on r un i nt o (say) m subr uns
and c omput e Yi, t he aver age of s ubr un i, and y,
t he aver age of - t hese m s ubr un aver ages (which-ls
i dent i cal t o t he aver age of t he whol e si mul at i on
run), whi ch yi el ds
2 _ ( , _ ; ) 2
S y - -
i=1 m- 1 ( 3)
The n t he t est st at i st i c be c ome s
_ t i n _ 1 - - _,_,_Sy/~/~- . ( 4 )
150 J.P.C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162
Many si mul at i on r esponses ar e i ndeed appr oxi -
mat el y nor mal l y di st ri but ed: a var i at i on of t he
cent r al limit t he or e m appl i es, when t he si mul a-
t i on r es pons e is t he aver age of aut ocor r el at ed
wai t i ng t i mes of successi ve cust omer s. I f t he si m-
ul at i on r es pons e is not ( appr oxi mat el y) nor mal ,
t hen t he t t est may still be appl i ed becaus e t hi s
t est is not ver y sensi t i ve t o nonnor mal i t y, espe-
cially if m is l arge; see Kl ei j nen (1987, pp. 14- 23) .
( Kl ei j nen and Van Gr oe ne nda a l (1992, pp.
190- 195) pr e s e nt several al t er nat i ve appr oaches
(such as r enewal anal ysi s) t o t he est i mat i on of t he
var i ance of t he si mul at i on r es pons e in t he st eady
st at e. Kl ei j nen (1987, pp. 23- 25) di scusses sever al
di st r i but i on- f r ee tests. )
I n pr act i ce, however , mos t si mul at i on st udi es
concer n t he behavi or of t he r eal syst em in t he
transient st at e, not t he st eady st at e. For exampl e,
t he user s may be i nt er es t ed in t he t ot al wai t i ng
t i me dur i ng t he next day - unde r var i ous schedul -
ing al gor i t hms ( pr i or i t y rul es) - so t he si mul at i on
r un st ops as soon as t he end of t hat si mul at ed day
is r eached. Such t ypes of si mul at i on ar e cal l ed
' t e r mi na t i ng' si mul at i ons. Whe n veri fyi ng such a
si mul at i on, t her e ar e usual l y no anal yt i cal or nu-
mer i cal sol ut i ons avai l abl e: mos t sol ut i ons hol d in
t he st eady st at e only. The anal yst s may t hen fi rst
si mul at e a non- t er mi nat i ng var i ant of t he si mul a-
t i on model , f or ver i f i cat i on pur pos es only. Next
t hey change t he si mul at i on pr ogr am, t hat is, t hey
i nt r oduce t he t er mi nat i ng event (in t he exampl e
t hi s event is t he ' ar r i val ' of t he end of t he wor ki ng
day). As poi nt ed out (in Sect i on 2.3.1, pa r a gr a ph
2), t hey mus t t hen hope t hat this mi nor change in
t he c omput e r pr ogr a m does not i nt r oduce new
bugs. Agai n, t her e is no guar ant ee (see Sect i on
1).
The r e is a statistical compl i cat i on, as vi rt ual l y
all si mul at i on pr ogr a ms have multiple r esponses
(for exampl e, me a n wai t i ng t i me of j obs and
me a n ut i l i zat i ons of machi nes) . So t he c omput e r
pr ogr a m t r ans f or ms (say) S i nput s i nt o T out put s
wi t h S > 1 and T > 1. Tha t t r ans f or mat i on mus t
be cor r ect f or all r es pons e t ypes of t he si mpl i fi ed
si mul at i on pr ogr a m wi t h known means . Conse-
quent l y t he pr obabi l i t y of r ej ect i ng a nul l -hy-
pot hesi s l i ke (2) i ncr eases as T (t he numbe r of
r esponses) i ncr eases, even if t he pr ogr a m is cor-
rect . Thi s pr ope r t y follows f r om t he def i ni t i on of
t he t ype I or a er r or of a st at i st i cal t est ( di f f er ent
er r or t ypes will be f ur t her di scussed in Sect i on
3.2). For t unat el y t her e is a si mpl e sol ut i on bas ed
on Bonferroni' s inequality. Tr adi t i onal l y t he t m_ 1
val ue in (4) is c ompa r e d wi t h t i n _ l ; a / 2, whi ch
denot es t he critical val ue t aken f r om t he t abl e f or
t he t st at i st i c wi t h m- 1 degr ees of f r eedom,
t ype I er r or pr obabi l i t y fixed at a, in a t wo- si ded
t est . Usi ng Bonf er r oni ' s i nequal i t y, t he anal yst s
mer el y r epl ace a by a / T . Thi s i mpl i es t hat bi g-
ger di scr epanci es bet ween t he known means and
t he si mul at i on r esponses ar e accept ed:
t m- 1; a/ 2 ~ t i n- l ; a/(2T)"
I t can be pr oved t hat Bonf er r oni ' s i nequal i t y
keeps t he overal l ' exper i ment wi s e' er r or pr obabi l -
ity bel ow t he val ue a. I t is r e c omme nde d t o
combi ne t he Bonf er r oni i nequal i t y wi t h a val ue
such as a = 0.20 i nst ead of t he t r adi t i onal val ue
0.05.
( Mul t i var i at e t echni ques pr ovi de al t er nat i ves
t o t hi s combi nat i on of uni var i at e t echni ques (such
as t he t t est in Eq. (4)) and Bonf er r oni ' s i nequal -
ity. Mul t i var i at e t echni ques ar e mor e sophi st i -
cat ed, but not always mor e power f ul ; see Balci
and Sar gent (1984b), Bar l as (1990), and Kl ei j nen
and Van Gr oe ne nda a l (1992, pp.144,155).)
2.4. Ani mat i on
To veri fy t he c omput e r pr ogr a m of a dynami c
syst em, t he anal yst s may use animation. The users
t hen see dynami c di spl ays ( movi ng pi ct ures, car-
t oons) of t he si mul at ed syst em. Si nce t he user s
ar e f ami l i ar wi t h t he cor r es pondi ng r eal syst em,
t hey can det ect pr ogr a mmi ng er r or s ( and concep-
t ual er r or s t oo, but t hat concer ns val i dat i on).
Wel l - known exampl es ar e si mul at i ons t hat show
how vehi cl es def y t he laws of nat ur e and cross
t hr ough each ot her , and si mul at i ons t hat have
cus t omer s who mi r acul ousl y di s appear dur i ng t he
si mul at i on r un (this was not t he pr ogr a mme r s
i nt ent i on so it concer ns veri fi cat i on, not val i da-
tion).
Most si mul at i on r es ear cher s agr ee t hat ani ma-
t i on may be danger ous t oo, as t he anal yst s and
user s t end t o concent r at e on ver y shor t si mul a-
t i on r uns so t he pr obl ems t hat occur onl y in l ong
J.P. C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162 151
r uns go unnot i ced. Of course, good analysts, who
ar e awar e of this danger , will cont i nue t he r un
l ong enough t o cr eat e a r a r e event , whi ch is t hen
di spl ayed t o t he users.
3. Val i dat i on
Once t he analysts bel i eve t hat t he si mul at i on
model is pr ogr a mme d correct l y, t hey must face
t he next quest i on: is t he concept ual si mul at i on
model (as oppos ed t o t he comput er pr ogr am) an
accur at e r epr es ent at i on of t he syst em unde r st udy
(see Sect i on 1)?
( A ver y ol d phi l osophi cal quest i on is: do hu-
mans have accur at e knowl edge of real i t y or do
t hey have onl y fl i ckeri ng i mages of reality, as
Pl at o st at ed? I n this paper , however, we t ake t he
view t hat manager s act as if t hei r knowl edge of
real i t y wer e suffi ci ent . Al so see Barl as and Car-
pent er (1990), Landr y and Or al (1993), and Nay-
lor, Balintfy, Bur di ck and Chu (1966, pp. 310-
320).)
Thi s sect i on discusses 1) obt ai ni ng real -worl d
dat a, whi ch may be scarce or abundant , 2) si mpl e
t est s f or compar i ng si mul at ed and r eal dat a
( namel y graphi cal , Sc hr ube n- Tur i ng, and t tests),
3) t wo new si mpl e statistical pr ocedur es ( based
on r egr essi on analysis) f or t est i ng whet her simu-
l at ed and r eal r esponses ar e posi t i vel y cor r el at ed
and, possibly, have t he same means t oo, 4) sensi-
tivity analysis (using statistical desi gn of experi -
ment s wi t h its concomi t ant r egr essi on analysis)
and risk analysis ( based on Mont e Car l o sam-
piing), and 5) whi t e and bl ack box simulations.
3.1. Obtaining real-world dat a
Syst em analysts must explicitly f or mul at e t he
laws t hat t hey t hi nk gover n t he ' syst em unde r
st udy' , whi ch is a syst em t hat al r eady exists or is
pl anned t o be i nst al l ed in t he r eal worl d. The
syst em concept , however , i mpl i es t hat t he analysts
must subjectively deci de on t he boundar y of t hat
syst em and on t he at t r i but es t o be quant i f i ed in
t he model .
To obt ai n a val i d model , t he analysts shoul d
t ry t o meas ur e t he i nput s and out put s of t he r eal
system, and t he at t r i but es of i nt er medi at e vari-
ables. I n pr act i ce, dat a ar e available in di f f er ent
quant i t i es, as t he next f our si t uat i ons illustrate.
1) Somet i mes it is difficult or impossible t o
obt ai n r el evant dat a. For exampl e, in si mul at i on
st udi es of nucl ear war, it is ( f or t unat el y) impossi-
bl e t o get t he necessar y dat a. In t he si mul at i on of
whal e popul at i on dynamics, a maj or pr obl em is
t hat dat a on whal e behavi or ar e har d t o obt ai n.
I n t he l at t er exampl e mor e ef f or t is ne e de d f or
dat a col l ect i on. I n t he f or mer exampl e t he ana-
lysts may t r y t o s how t hat t he exact val ues of t he
i nput dat a are not critical. Thes e pr obl ems will be
f ur t her anal yzed in t he subsect i on on sensitivity
analysis (Sect i on 3.4.1).
2) Usually, however, it is possi bl e to get some
dat a. Typi cal l y t he analysts have dat a onl y on t he
existing syst em vari ant or on a few historical
vari ant s; f or exampl e, t he existing manuf act ur i ng
syst em wi t h its cur r ent schedul i ng rul e.
3) I n t he mi l i t ary it is common t o conduct f i el d
tests in or der t o obt ai n dat a on f ut ure vari ant s.
Kl ei j nen and Al i nk (1992) pr esent a case study,
namel y mi ne hunt i ng at sea by means of sonar:
mi ne fields ar e cr eat ed not by t he enemy but by
t he f r i endl y navy, and a mi ne hunt is execut ed in
this fi el d t o col l ect dat a. Davis (1992a) and Fos-
set t et al. (1991) also discuss several fi el d t est s f or
mi l i t ary simulations. Shannon (1975, pp. 231- 233)
bri efl y discusses mi l i t ary fi el d tests, too. Gr ay and
Mur r ay- Smi t h (1993) and Mur r ay- Smi t h (1992)
consi der aer onaut i cal fi el d tests.
4) I n some appl i cat i ons t her e is an Overload of
i nput dat a, namel y i f t hese dat a ar e col l ect ed
el ect roni cal l y. For exampl e, in t he si mul at i on of
t he per f or mance of comput er systems, t he ana-
lysts use har dwar e and soft ware moni t or s t o col-
l ect dat a on t he syst em st at e at r e gul a r t i me
poi nt s (say, each nanosecond) or at each syst em
st at e change (event ). Thes e dat a can be used t o
dri ve t he si mul at i on. Anot he r exampl e is pro-
vi ded by poi nt -of-sal e ( POS) systems: based on
t he Uni versal Pr oduct Code ( UPC) all t ransac-
t i ons at t he s uper mar ket check- out s ar e r ecor ded
el ect roni cal l y (real -t i me dat a col l ect i on, dat a cap-
t ur e at t he source); see Li t t l e (1991). I n t he near
f ut ur e mor e appl i cat i ons will be real i zed; f or ex-
ampl e, t he geogr aphi cal posi t i ons of t rucks and
152 J.P.C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162
r ai l r oad cars will be det er mi ned and communi -
cat ed el ect roni cal l y, and el ect r oni c dat a i nt er-
change ( EDI ) among compani es will gener at e
l arge quant i t i es of dat a; see Geof f r i on (1992) and
Sussman (1992).
The f ur t her t he analysts go back i nt o t he past,
t he mor e dat a t hey get and (as t he next subsec-
t i ons will show) t he mor e power f ul t he val i dat i on
t est will be, unless t h e y g o so far back t hat di ffer-
ent laws gover ned t he system. For exampl e, many
economet r i c model s do not use dat a pr i or t o
1945, becaus e t he e c onomi c i nf r as t r uct ur e
changed drast i cal l y duri ng Wor l d War II. Of
course, knowi ng when exactly di f f er ent laws gov-
er ned t he system is i t sel f a val i dat i on issue.
So real -worl d dat a may be ei t her scarce or
abundant . Mor eover t he dat a may show observa-
tion error, whi ch compl i cat es t he compar i son of
real and si mul at ed t i me series. Barl as (1989, p.72)
and Kl ei j nen and Al i nk (1992) discuss observa-
t i on er r or s in a t heor et i cal and a pract i cal situa-
t i on respect i vel y.
( The t i me series char act er of t he model i nput s
and out put s, and t he r andom noi se ar e t ypi cal
aspect s of si mul at i on. Ot he r model s - f or exam-
ple, i nvent or y and economet r i c model s - shar e
some of t hese charact eri st i cs wi t h si mul at i on
model s. Val i dat i on of t hese ot her t ypes of model s
does not seem t o t each si mul at i on analysts much. )
3.2. Some simple techniques f or comparing si mu-
lated and real data
Suppose t he analysts have succeeded in obt ai n-
ing dat a on t he r eal syst em (see t he pr ecedi ng
subsect i on), and t hey wish t o val i dat e t he simula-
t i on model . The y shoul d t hen f eed real -worl d
i nput dat a i nt o t he model , in historical or der . I n
t he si mul at i on of comput er systems this is cal l ed
trace driven simulation. Davis (1992a, p.6) dis-
cusses t he use of ' offi ci al dat a bases' t o dri ve
mi l i t ary simulations. Af t er r unni ng t he si mul at i on
pr ogr am, t he analysts obt ai n a t i me series of
si mul at i on out put and compar e t hat t i me seri es
wi t h t he hi st ori cal t i me series f or t he out put of
t he existing system.
It is emphasi zed t hat in val i dat i on t he analysts
shoul d not sample t he si mul at i on i nput f r om a
(raw or smoot hed) di st ri but i on of real -worl d in-
put values. So t hey must use t he historical i nput
val ues in historical or der . Af t er t hey have vali-
dat ed t he si mul at i on model , t hey shoul d compar e
di f f er ent scenari os usi ng sampl ed inputs, not his-
t ori cal inputs: it is ' cer t ai n' t hat hi st ory will never
r epeat i t sel f exactly. As an exampl e we consi der a
queuei ng simulation. To val i dat e t he si mul at i on
model , we use act ual arrival t i mes in historical
or der . Next we col l ect t hese arrival t i mes in a
f r equency di agram, whi ch we smoot h formal l y by
fi t t i ng an exponent i al di st ri but i on wi t h a par ame-
t er (say) ~. Fr om this di st ri but i on we sampl e
arrival times, using ps eudor andom numbers. I n
sensitivity analysis we doubl e t he par amet er ~ t o
i nvest i gat e its ef f ect on t he average wai t i ng time.
Not i ce t hat val i dat i on of individual modul es
wi t h observable i nput s and out put s pr oceeds in
exactly t he same way as val i dat i on of t he simula-
t i on model as a whol e does. Modul es wi t h unob-
servabl e i nput s and out put s can be subj ect ed t o
sensitivity analyses (see Sect i on 3.4.1).
How can syst em analysts compar e a t i me series
of si mul at i on model out put wi t h a historical t i me
series of real out put ? Several simple t echni ques
ar e available:
1) The out put dat a of t he real system and t he
si mul at ed system can be pl ot t ed such t hat t he
hor i zont al axis denot es t i me and t he vert i cal axis
denot es t he r eal and si mul at ed val ues respec-
tively. The users may eyeball timepaths t o deci de
whet her t he si mul at i on model ' accur at el y' refl ect s
t he phe nome na of i nt erest . For exampl e, do t he
si mul at i on dat a in a business cycle st udy i ndi cat e
an economi c downt ur n at t he t i me such a slump
occur r ed in pr act i ce? Do t he si mul at i on dat a in a
queuei ng st udy show t he same sat ur at i on behav-
i or (such as expl odi ng queuel engt hs and bl ocki ng)
as ha ppe ne d in t he real syst em?
(Barl as (1989, p.68) gives a system dynami cs
exampl e t hat seems t o allow subjective graphi cal
analysis only, since t he t i me series (si mul at ed and
real ) show ' hi ghl y t ransi ent , non- st at i onar y behav-
i or' . )
2) Anot he r si mpl e t echni que is t he Schruben-
Turing test. The analysts pr esent a mi xt ure of
si mul at ed and real t i me series t o t hei r clients,
J. P. C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162 153
and chal l enge t hem t o i dent i fy (say) t he dat a t hat
wer e gener at ed by comput er . Of course, t hese
cl i ent s may cor r ect l y i dent i fy some of t he dat a by
mer e chance. Thi s coi nci dence, however, t he ana-
lysts can t est statistically.
Tur i ng i nt r oduced such an appr oach t o vali-
dat e Art i fi ci al I nt el l i gence comput er programs:
users wer e chal l enged t o i dent i fy whi ch dat a (say,
chess moves) wer e gener at ed by comput er , and
whi ch dat a wer e resul t s of human reasoni ng.
Schr uben (1980) appl i es this appr oach t o t he vali-
dat i on of si mul at i on model s. He adds several
statistical t est s and pr esent s some case studies.
Al so see St ani sl aw (1986, p.182).
3) I nst ead of subjectively eyebal l i ng t he simu-
l at ed and t he r eal t i me series, t he analysts can
use mat hemat i cal statistics t o obt ai n quant i t at i ve
dat a about t he qual i t y of t he si mul at i on model .
The pr obl em, however, is t hat si mul at i on out put
dat a f or m a t i me series, wher eas pr act i t i oner s are
fami l i ar wi t h el ement ar y statistical pr ocedur es
t hat assume i dent i cal l y and i ndependent l y dis-
t r i but ed (i.i.d.) observat i ons. Never t hel ess it is
easy t o deri ve i.i.d observat i ons in si mul at i on (so
t hat el ement ar y statistical t heor y can be appl i ed),
as t he next exampl e will demonst r at e.
Le t w i and _vg denot e t he average wai t i ng t i me
on day i in t he si mul at i on and t he r eal syst em
respect i vel y. Suppose t hat n days ar e si mul at ed
and obser ved in real i t y respect i vel y, so i =
1 , . . . , n. Thes e averages, w i and _v_g, do not ne e d
t o be comput ed f r om a st eady st at e t i me seri es of
i ndi vi dual wai t i ng times. They may be cal cul at ed
f r om t he i ndi vi dual wai t i ng t i mes of all cust omer s
arri vi ng bet ween 8 a.m. and 5 p. m. The n each day
i ncl udes a st art -up, t r ansi ent phase. Obvi ousl y
t he si mul at ed averages w i ar e i.i.d, and so ar e t he
r eal averages _vg. Suppose f ur t her t hat t he histori-
cal arrival and servi ce t i mes ar e used t o dri ve t he
si mul at i on model . Statistically this t r ace, dr i ven
si mul at i on means t hat t her e ar e n pai red ( cor r e-
l at ed) di f f er ences _d i = w i - _vi, whi ch ar e i.i.d.
The n t he t statistic anal ogous t o (4) is
~ - 6
_ t n _ 1 - - S d / / V / ~ , ( 5 )
wher e _ff denot es t he aver age of t he n d' s, g is t he
expect ed val ue of d, and s a r epr esent s t he esti-
mat ed st andar d devi at i on of d.
( The vari abl e d i = w i - - U i denot es t he di ffer-
ence bet ween si mul at ed and r eal aver age wai t i ng
t i me on day i when usi ng t he same arrival and
servi ce times. He nc e d is t he aver age of t he n
di f f er ences bet ween t he n aver age si mul at ed and
n average r eal wai t i ng t i mes per day. Ot he r statis-
tics of i nt er est may be t he per cent age of cus-
t omer s wai t i ng l onger t han (say) one mi nut e, t he
wai t i ng t i me exceeded by onl y 10% of t he cus-
t omer s, etc. Test i ng t hese statistics is di scussed in
Kl ei j nen and Van Gr oe ne nda a l (1992, pp. 195-
197).)
Suppose t hat t he nul l -hypot hesi s is H0 : 8 = 0,
and (5) gives a val ue _t,_ 1 t hat is significant
( I_tn-1 I > tn_l;~/2). The n t he si mul at i on model is
r ej ect ed, since this model gives average wai t i ng
t i mes per day t hat devi at e significantly f r om real -
ity. I n case of a non-si gni fi cant I t , _1 I t he concl u-
sion is t hat t he si mul at ed and t he r eal means ar e
' pract i cal l y' t he same s o t he si mul at i on is ' val i d
enough' . Thi s i nt er pr et at i on, however, deserves
some comment s.
Strictly speaki ng, t he si mul at i on is onl y a
model , so 8 ( t he expect ed val ue of d and hence
t he expect ed val ue of _if) is never exactly zero. Let
us consi der t hr ee poi nt s.
1) The bi gger t he sampl e size is, t he smal l er
t he critical val ue tn_l;~/2 is; f or exampl e, f or a
fi xed a = 0.05 but n = 5 and 121 respect i vel y,
t , _1; ~/ 2=2. 776 and 1.980 respect i vel y. So, all
ot her things bei ng equal , a si mul at i on model has
a hi gher chance of bei ng r ej ect ed as its sampl e
size is bigger.
2) Si mul at i ng ' many' days ( ' l ar ge' n) gives a
' pr eci se' est i mat e _ d and hence a significant _t,_ 1
(in Eq, (5), S d / f n goes t o zer o because of n; in
t he numer at or , _d has expect ed val ue di f f er ent
f r om 0; so t he t est statistic _t,_ 1 goes t o infinity,
wher eas t he critical val ue tn_l;~/2 goes t o z~/ 2,
whi ch denot es t he 1 - a / 2 quant i l e of t he stan-
dar d nor mal variable). So model mi s-speci fi cat i on
woul d always l ead t o r ej ect i on if t he sampl e size
n wer e infinite.
3) The t statistic may be significant and yet
uni mpor t ant . I f t he sampl e is ver y large, t hen t he
t statistic is near l y always significant f or 8 ~ 0;
154 J..P.C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162
never t hel ess t he s i mul at ed and t he r eal means
may be ' pr act i cal l y' t he s ame so t he si mul at i on is
' val i d enough' . For exampl e, if E( w i) = 1000 and
E ( v _ i ) = 1001 (so 3 = 1), t hen t he si mul at i on model
is good enough f or all pr act i cal pur poses. Al so
see Fl emi ng and Schoemaker (1992, p.472).
I n gener al , whe n t est i ng t he val i di t y of a model
t hr ough st at i st i cs such as (5), t he anal yst s can
ma ke ei t her a ' t ype I ' or a ' t ype I I ' er r or . So t hey
may r ej ect t he model whi l e t he model is valid:
t ype I or a er r or . Or t hey may accept t he model
whi l e t he model is not valid: t ype I I o r / 3 er r or .
The pr obabi l i t y of a / 3 er r or is t he c ompl e me nt of
t he ' powe r ' of t he test, whi ch is t he pr obabi l i t y of
r ej ect i ng t he model when t he model is wr ong
i ndeed. The pr obabi l i t y of a t ype I er r or in si mu-
l at i on is al so cal l ed t he model builder's risk; t he
t ype I I er r or pr obabi l i t y is t he model user's risk.
The power of t he t est of H0: ~ = 0 i ncr eases as
t he model speci fi cat i on er r or (t he ' t r ue ' 8) in-
creases. For exampl e, as (t he t r ue) 6 goes t o
infinity so does _t n_ 1 in (5), hence t he si mul at i on
model is r ej ect ed (for any n and a, whi ch fix
tn_l;~/2). (Thi s power can be c omput e d t hr ough
t he ' non- c e nt r a l ' t statistic, whi ch is a t st at i st i c
wi t h non- zer o mean. ) A si gni fi cance or ' cr i t i cal '
level a ( used in t n_l ;~/ z) me a ns t hat t he t ype I
er r or pr obabi l i t y equal s a. The pr obabi l i t y of a / 3
er r or i ncr eases as a decr eases, gi ven a fixed
n u mb e r of si mul at ed days: as a decr eases, t he
cri t i cal val ue t n _ l ; a / 2 i ncr eases. To keep t he t ype
I pr obabi l i t y fixed and t o decr eas e t he t ype I I
probabi l i t y, t he anal yst s may i ncr ease t he numbe r
of s i mul at ed days: if a is kept const ant and n
i ncreases, t hen t~_1;~/2 decr eases.
The anal yst s ma y al so ma ke t he t t est mor e
power f ul by appl yi ng variance reduction tech-
niques ( VRTs) , such as cont r ol var i at es (see Eq.
(1)). I f cont r ol var i at es work, t hey decr eas e t he
var i ance of w and hence t he var i ance of d ( = w
- v). The n _s d in (5) has a smal l er expect ed val ue,
and t he pr obabi l i t y of a hi gh _t~_ 1 i ncreases. The
si mpl est and mos t popul a r VRT is c ommon
( ps e udo) r a ndom number s . Runni ng t he si mul a-
t i on wi t h r eal - wor l d i nput s is a f or m of t hi s VRT.
I t decr eas es var(_d) ( not var(_w)).
Balci and Sar gent (1984b) anal yze t he t heor et i -
cal t r adeof f s among a and fi er r or probabi l i t i es,
s ampl e size, and so on.
The sel ect i on of a val ue f or a is pr obl emat i c.
Popul ar val ues ar e 0.10 and 0.05. Theor et i cal l y,
t he anal yst s shoul d det er mi ne t hese val ues by
account i ng f or t he fi nanci al consequences - or
mor e general l y, t he disutilities - of maki ng t ype I
and t ype I I er r or s respect i vel y. Such an appr oach
is i ndeed fol l owed in deci si on t heor y and in
Bayesi an analysis; see Bodi l y (1992), Kl ei j nen
(1980, pp. 115- 134) and al so Davi s (1992a, p.20).
Becaus e t he quant i f i cat i on of t hese utility func-
t i ons is ext r emel y di ffi cul t in mos t si mul at i on
st udi es, t hi s p a p e r follows classic st at i st i cal t he-
ory.
3.3. Two new simple statistical tests f or comparing
simulated and real data
Two t est s bas ed on new i nt er pr et at i ons of clas-
sic t est s in r egr essi on anal ysi s ar e di scussed in
t hi s subsect i on.
1) Consi der agai n t he exampl e wher e w i and v_ i
de not e d t he aver age wai t i ng t i me on day i in t he
si mul at i on and t he r eal syst em respect i vel y, whi ch
use t he s ame i nput s. Suppos e t hat on day 4 t he
r eal aver age wai t i ng t i me is rel at i vel y high, t hat
is, hi gher t han expect ed ( because servi ce t i mes
wer e rel at i vel y hi gh on t hat day): l)4 > E(U). The n
it seems r eas onabl e t o r equi r e t hat on t hat day
t he si mul at ed aver age (whi ch uses t he s ame ser-
vi ce t i mes) is al so rel at i vel y high: w 4 > E(_w). So
t he new t est checks t hat _v and w ar e positively
correlated: H0: p > 0 wher e p denot es t hei r l i near
cor r el at i on coeffi ci ent . ( They might have t he
s ame me a n so ~ = 0 in Eq. (5).) So t he anal yst s
may t hen f or mul at e a less stringent val i dat i on
test: si mul at ed and r eal r es pons es do not neces-
sari l y have t he s ame mean, but t hey ar e posi t i vel y
cor r el at ed.
To i nvest i gat e t hi s cor r el at i on, t he anal yst s may
pl ot t he n pai r s (/)~, wi). Tha t gr aphi cal appr oach
can be f or mal i zed t hr ough t he use of t he ordinary
least squares ( OLS) al gor i t hm. Test i ng t he hy-
pot hesi s of posi t i vel y cor r el at ed _v and _w is si m-
pl e if _v and w ar e bi/)ariate normally di st ri but ed.
Thi s is a real i st i c as s umpt i on in t he exampl e,
J..P.c. Kleijnen / European Journal of Operational Research 82 (1995) 145-162 155
becaus e of a cent r al l i mi t t h e o r e m (see t he com-
me nt on Eq. (4)). I t can be pr ove d t hat such a
bi var i at e nor ma l di st r i but i on i mpl i es a l i near re-
l at i onshi p be t we e n t he condi t i onal me a n of one
var i abl e and t he val ue of t he ot her var i abl e:
E ( wl v = v) =/ 30 +/31 v. ( 6)
So t he anal yst s can use OLS t o es t i mat e t he
i nt er cept and sl ope of t he st r ai ght l i ne t hat passes
t hr ough t he ' cl oud' of poi nt s (vi , wi ). The pr o-
pos ed t est concer ns t he one- s i ded hypot hesi s
H o :/31 < 0. ( 7)
To t est t hi s nul l -hypot hesi s, a t st at i st i c can be
used, as any t ext book on r egr essi on anal ysi s
shows. Thi s t est means t hat t he anal yst s r ej ect t he
nul l - hypot hesi s and accept t he si mul at i on model
if t her e is st r ong evi dence t hat t he s i mul at ed and
t he r eal r es pons es ar e posi t i vel y cor r el at ed.
2) Somet i mes si mul at i on is me a n t t o pr edi ct
absol ut e responses ( not r el at i ve r es pons es cor r e-
spondi ng t o di f f er ent scenari os; f or exampl e, what
is t he ef f ect of addi ng one ser ver t o a queuei ng
syst em?). For exampl e, in t he mi ne hunt i ng case
st udy ( Kl ei j nen and Al i nk, 1992) one of t he ques-
t i ons concer ns t he pr obabi l i t y of det ect i ng mi nes
in a cer t ai n ar ea: is t hat pr obabi l i t y so hi gh t hat it
makes sense t o do a mi ne sweep? The anal yst s
may t hen f or mul at e a mor e st ri ngent test:
(i) t he me a ns of w ( t he s i mul at ed r es pons e)
and v ( t he hi st ori cal r es pons e) ar e i dent i cal , a n d
(ii) if a hi st ori cal obs er vat i on exceeds its mean,
t hen t he cor r es pondi ng s i mul at ed obser vat i on
t ends t o exceed its me a n t oo.
Thes e t wo condi t i ons l ead t o t he compos i t e hy-
pot hesi s
H0 :/3o = 0 and /31 = 1, (8)
whi ch i mpl i es E( w) = E( v ) (whi ch was al so t es t ed
t hr ough Eq. (5)) and is mor e speci fi c t han Eq. (7)
is.
( Not e t hat / 31 = P°' w/ °' v. So if/31 = 1 and p < 1
t hen ~r w > ~rv: if t he model is not per f ect ( p < 1),
t hen its var i ance exceeds t he r eal var i ance. )
To t est t hi s compos i t e hypot hesi s, t he anal yst s
shoul d c omput e t he Sum of Squar ed Er r or s (SSE)
wi t h and wi t hout t hat hypot hesi s (whi ch cor r e-
spond wi t h t he ' r e duc e d' and t he ' f ul l ' r egr essi on
model respect i vel y), and c ompa r e t hese t wo val -
ues. I f t he r esul t i ng F st at i st i c is si gni fi cant l y
high, t he anal yst s shoul d r ej ect t he hypot hesi s
and concl ude t hat t he si mul at i on model is not
valid. Det ai l s on t hi s F t est can be f ound in
Kl ei j nen and Va n Gr oe ne nda a l (1992, pp. 209-
210).
St at i st i cal t est s r equi r e ma ny obser vat i ons t o
ma ke t he m power f ul . I n val i dat i on however , t her e
ar e of t en not ma ny obser vat i ons on t he r eal sys-
t e m (see Sect i on 3.1). Somet i mes, however, t her e
ar e ver y ma ny obser vat i ons. The n not onl y t he
means of t he si mul at ed and t he r eal t i me seri es
and t hei r ( cr oss) cor r el at i on p can be compar ed,
but also t he aut ocor r el at i ons cor r es pondi ng wi t h
l ag 1, 2, etc, Spect r al anal ysi s is a sophi st i cat ed
t echni que t hat est i mat es t he aut ocor r el at i on
st r uct ur e of t he s i mul at ed and t he hi st ori cal t i me
seri es respect i vel y, and c ompa r e s t hese t wo st ruc-
t ures. Unf or t unat el y, t hat analysis is r at her diffi-
cul t ( and - as st at ed - r equi r es l ong t i me series).
Bar l as (1989, p. 61) criticizes Box&Jenki ns model s
f or t he s ame r easons.
Not e t hat Fl emi ng and Sc hoe ma ke r (1992) dis-
cuss t he use of r egr essi on pl ot s in case of mul t i -
pl e out put s.
3.4. Sensi t i vi t y anal ysi s a n d ri sk analysis
3.4.1. Sensi t i vi t y analysis
Model s and s ubmodel s ( modul es) wi t h unob-
servabl e i nput s and out put s can not be subj ect ed
t o t he t est s of Sect i on 3.2 and Sect i on 3.3. The
anal yst s shoul d t hen appl y sensitivity analysis, in
or der t o det er mi ne whe t he r t he model ' s behavi or
agr ees wi t h t he j udgment s of t he exper t s (users
and analysts). I n case of obser vabl e i nput s and
out put s sensitivity analysis is also useful , as t hi s
subsect i on will show. ( The obser vabi l i t y of sys-
t ems is al so di scussed in Zei gl er (1976).)
Sensi t i vi t y anal ysi s or wh a t - i f analysis is de-
f i ned in t hi s p a p e r as t he syst emat i c i nvest i gat i on
of t he r eact i on of model out put s t o drast i c
changes in model i nput s and mode l st r uct ur e:
gl obal ( not l ocal ) sensitivities. For exampl e, what
ar e t he ef f ect s i f in a queuei ng si mul at i on t he
arri val r at e doubl es; what i f t he pr i or i t y rul e
changes f r om FI FO t o LI FO?
156 J.P.C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162
The techniques for sensitivity analysis dis-
cussed in this paper, are design of experiments
and regression analysis. Unfortunately, most
practitioners apply an inferior desi gn o f experi -
me n t s : they change one simulation input at a
time. Compared with (fractional) factorial designs
(such as 2 K - P designs), the ' one at a time' designs
give estimated effects of input changes that have
higher variances (less accurate). Moreover, these
designs cannot estimate interactions among in-
puts. See Kleijnen and Van Groenendaal (1992,
pp.167-179).
How can the results of experiments with simu-
lation models be analyzed and used for interpola-
tion and extrapolation? Practitioners often p l o t
the simulation output (say) y versus the simula-
tion input Xk , one plot for each input k with
k = 1 . . . . . K. (For example, if the arrival and
service rates are changed in an M/ M/ 1 simula-
tion then K = 2.) More refined plots are conceiv-
able, for instance, superimposed plots. Also see
the 'spiderplots' and ' tornado diagrams' in Es-
chenbach (1992).
This practice can be formalized through re-
gressi on anal ys i s . So let Yi denote the simulation
response (for example, average waiting time per
day) in combination (or run) i of the K simula-
tion inputs, with i = 1 . . . . , n, where n denotes
the total number of simulation runs. Further let
Xi k be the value of simulation input k in combi-
nation i , [31, the main or first order effect of input
k , f i kk' the interaction between inputs k and k ' ,
and e i the approximation (fitting) error in run i.
Then the i nput / out put behavior of the simula-
tion model may be approximated through the
regression (meta)model
K K- 1 K
Yi = [30 q- E [3kXik -~- E Z [3kk' XikXik ' -}- ei .
k=l k=l k' =k+l
(9)
Of course, the validity of this approximation
must be tested. Cr os s - v al i dat i on uses some simu-
lation inputs and the concomitant output data to
get estimated regression parameters /3. Next it
employs the estimated regression model to com-
pute the forecast 33 for some other input combi-
nations. The comparison of forecasted output 39
and simulated output y is used to validate the
regression model. See Kleijnen and Van Groe-
nendaal (1992, pp.156-157).
Inputs may be qual i t at i v e . A n example is the
priority rule in a queueing simulation. Techni-
cally, binary variables ( Xi k is zero or one) are
then needed; see Kleijnen (1987).
An example of experimental design and re-
gression analysis is provided by Kleijnen, Rot-
mans and Van Ham (1992). They apply these
techniques to several modules of a (deterministic)
simulation model of the greenhouse effect of
carbon dioxide (CO 2) and other gases. This ap-
proach gives estimates /3 of the effects of the
various inputs. These estimated effects should
have the right si gns: the users (not the statisti-
cians) know that certain inputs increase the global
temperature. Wrong signs indicate computer er-
rors (see Section 2) or conceptual errors. Indeed
Kleijnen et al. (1992, p.415) give examples of
sensitivity estimates with the wrong signs, which
lead to correction of the simulation model. One
more example is given by Kleijnen and Alink
(1992). The role of experimental design in V&V
of simulation models is also discussed in Gray
and Murray-Smith (1993), Murray-Smith (1992),
and Pacheco (1988).
Classic experimental designs (with n > K ) ,
however, require too much computer time, when
the simulation study is still in its early (pilot)
phase. Then very m a n y i nput s may be conceivably
important. Bettonvil and Kleijnen (1991) derive a
s cr eeni ng technique based on sequential experi-
mentation with the simulation model. They split
up (bifurcate) the aggregated inputs as the exper-
iment proceeds, until finally the important indi-
vidual inputs are identified and their effects are
estimated. They apply this technique to the eco-
logical simulation mentioned above. In this appli-
cation there are 281 inputs. It is remarkable that
this statistical technique identifies some inputs
that were originally thought to be unimportant by
the users.
The ma g n i t u d e s of the sensitivity estimates
show which inputs are important. For important
inputs the analysts should try to collect data on
the input values that may occur in practice. If the
J.P.C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162 157
analysts succeed, t hen t he val i dat i on t echni ques
of t he pr ecedi ng subsect i ons can be appl i ed.
( I f t he si mul at i on i nput s ar e unde r t he deci-
sion maker s' cont rol , t hen t hese i nput s shoul d be
s t eer ed in t he ri ght di rect i on. The r egr essi on
( met a) model can hel p t he analysts det er mi ne t he
di r ect i ons in whi ch t hose i nput s shoul d be st eer ed.
For exampl e, in t he gr eenhous e case t he govern-
ment s shoul d r est r i ct emi ssi ons of t he gases con-
cer ned. )
Bef or e execut i ng t he exper i ment al desi gn
( ei t her a one at a t i me or a f r act i onal fact ori al
design), t he analysts must det er mi ne t he experi -
ment al domai n or exper i ment al f r ame. The de-
sign tells how t o expl or e this domai n, usi ng t he
expert i se of t he statistician. Zei gl er (1976, p. 30)
defi nes t he experimental frame as "a l i mi t ed set
of ci r cumst ances unde r whi ch t he r eal syst em is
t o be obser ved or exper i ment ed wi t h". He em-
phasi zes t hat " a model may be valid in one exper-
i ment al f r ame but invalid in anot her " . Thi s paper
(Sect i on 3.1) has al r eady ment i oned t hat goi ng far
back i nt o t he pas t may yi el d hi st ori cal dat a t hat
ar e not r epr esent at i ve of t he cur r ent system; t hat
is, t he ol d syst em was r ul ed by di f f er ent laws.
Similarly, a model is accur at e onl y if t he val ues of
its i nput dat a r emai n wi t hi n a cer t ai n area. For
exampl e, Bet t onvi l and Kl ei j nen' s (1991) screen-
ing st udy shows t hat t he gr eenhous e si mul at i on is
valid, onl y if t he si mul at i on i nput val ues r ange
over a rel at i vel y small area. Some aut hor s (for
exampl e, Banks, 1989, and Barlas, 1989), how-
ever, cl ai m t hat a model shoul d r emai n valid
unde r extreme condi t i ons. Thi s pa pe r rej ect s t hat
claim, but per haps this di s agr eement is a mat t er
of defi ni t i on: what is ' ext r eme' ?
So t he si mul at i on model is valid wi t hi n a cer-
t ai n ar ea of its i nput s onl y ( t he ar ea may be
def i ned as t he K- di mensi onal hyper cube f or med
by t he K i nput ranges). Wi t hi n t hat ar ea t he
si mul at i on model ' s i n p u t / o u t p u t behavi or may
vary. For exampl e, a first or der r egr essi on (me-
t a) model (see Eq. (9) wi t h t he doubl e summat i on
t er m el i mi nat ed) is a good appr oxi mat i on of t he
i n p u t / o u t p u t behavi or of a si mul at ed M/ M/ 1
system, onl y i f t he t raffi c l oad is ' low' . Whe n
t raffi c is heavy, a second or der r egr essi on model
or a l ogari t hmi c t r ansf or mat i on may apply.
Our concl usi on is t hat sensitivity analysis
shoul d be appl i ed t o fi nd out whi ch i nput s ar e
real l y i mpor t ant . That i nf or mat i on is useful , even
i f t her e ar e many dat a on t he i nput and out put of
t he si mul at ed syst em (see t he first par agr aph of
Sect i on 3.4.1). Col l ect i ng i nf or mat i on on t he im-
por t ant i nput s - if possible - is wor t h t he effort .
However , it may be i mpossi bl e or i mpract i cal t o
col l ect rel i abl e i nf or mat i on on t hose i nput s, as
t he exampl es of t he whal e and t he nucl ear at t ack
si mul at i ons have al r eady demons t r at ed (see Sec-
t i on 3.1). The n t he analysts may appl y t he follow-
ing t echni que.
3.4.2. Risk analysis
In risk analysis or uncertainty analysis t he ana-
lysts first deri ve a probabi l i t y di st r i but i on of i nput
val ues, usi ng t he clients' exper t knowl edge. Next
t hey use Mont e Car l o sampl i ng t o gener at e i nput
val ues f r om t hose di st ri but i ons. Thes e val ues are
f ed i nt o t he si mul at i on model , whi ch yields a
pr obabi l i t y di st ri but i on of out put values. Techni -
cal det ai l s and appl i cat i ons ar e given by Bodi l y
(1992), Kl e i j ne n and Van Gr oenendaal (1992,
pp. 75- 78) , and Kr umm and Rol l e (1992).
The st udy o f t he sensitivity t o t he i nput distri-
but i ons assumed in t he risk analysis may be cal l ed
robustness analysis. The rel at i onshi ps among sen-
sitivity, risk, and r obust ness analyses r equi r e mor e
r esear ch; see Kl ei j nen (1994).
3.5. White box simulation versus black box simula-
tion
Kar pl us (1983) per cei ves a whol e spect r um of
mat hemat i cal model s ( not onl y si mul at i on mod-
els), rangi ng f r om black box (noncausal ) model s
in t he social sci ences t hr ough gray box model s in
ecol ogy t o white box (causal) model s in physics
and ast ronomy. What does this classification
scheme me a n f or t he val i dat i on of si mul at i on
model s, especi al l y in oper at i ons r esear ch ( OR) ?
(Thi s r ange of model t ypes is also f ound in
OR: exampl es ar e r egr essi on analysis (black box),
l i near pr ogr ammi ng (gray box), and : i nvent ory
cont r ol (whi t e box). Al so see Or al and Ket t ani
(1993).)
A typical aspect of many si mul at i on st udi es is
158 J.P.C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162
t hat t hei r conceptual model s ar e based on com-
mon sense and on di r ect obser vat i on of t he real
system: white box si mul at i on. For exampl e, logis-
t i c pr obl ems in a f act or y may be st udi ed t hr ough
a si mul at i on pr ogr am t hat model s t he f act or y as a
queuei ng net work. Thi s model can di rect l y incor-
por at e i nt ui t i ve knowl edge about t he r eal system:
a j ob arrives, l ooks f or an idle machi ne in t he first
st age of t he pr oduct i on process, leaves t he ma-
chi ne upon compl et i on of t he r equi r ed service,
goes t o t he second st age of its f abr i cat i on se-
quence, and so on (i f expedi t i ng of j obs is ob-
ser ved in t he real system, t hen this compl i cat i on
can be i ncl uded in t he simulation). Count er - i nt ui -
tive behavi or of t he model may i ndi cat e ei t her
pr ogr ammi ng and model i ng er r or s or new insight
(surpri se val ue of i nf or mat i on; see Kl ei j nen, 1980,
pp. 115- 134, and Ri char dson and Pugh, 1981,
pp. 317- 319) .
The analysts can f ur t her appl y a bottom-up
appr oach: connect i ng t he submodel s (or modul es)
f or t he individual f act or y depar t ment s, t hey de-
vel op t he t ot al si mul at i on model . In this way t he
si mul at i on grows in compl exi t y and - hopef ul l y -
realism. (Davis (1992b) exami nes combi ni ng mod-
els of di f f er ent r esol ut i on (aggregat i on) t hat wer e
not ori gi nal l y desi gned t o be combi ned. Bankes
(1993) criticizes l arge si mul at i on model s used in
policy analysis.)
Ani mat i on is a good means t o obt ai n face
validity of whi t e box si mul at i on model s. Mor e-
over, many whi t e box systems have rel at i vel y many
dat a available (so Karpl us' s classification is re-
l at ed, not or t hogonal , t o t he classification used in
this paper ) . The n t he statistical t est s di scussed in
Sect i on 3.2 and Sect i on 3.3 can be appl i ed.
In some appl i cat i on areas, however, si mul at i on
model s ar e black box model s. Exampl es ar e pl en-
tiful in aggr egat ed economet r i c model i ng: macr o-
economi c consumpt i on funct i ons r el at e t ot al na-
t i onal consumpt i on t o Gr oss Nat i onal Pr oduct
( GNP) ; see Kl ei j nen and Van Gr oenendaal (1992,
pp. 57- 69) . The val i dat i on of bl ack box model s is
mor e difficult, since (by defi ni t i on) t he analysts
can not meas ur e t he i nt er nal rel at i onshi ps and
t he i nt er nal dat a of t hese model s. Maybe t hey
can meas ur e i nput and out put dat a, and appl y
t he t est s of Sect i on 3.2 and Sect i on 3.3; also see
Bankes (1993) and Pagan (1989). Model s and
submodel s wi t h unobservable i nput s and out put s
can be subj ect ed t o t he sensitivity analysis of
Sect i on 3.4.1.
In bl ack box model s t he emphasi s in val i dat i on
is on prediction, not explanation. Never t hel ess
sensitivity analysis of bl ack box model s may give
est i mat ed effect s of vari ous i nput s t hat have
wr ong signs. Thes e wr ong signs i ndi cat e comput er
er r or s or concept ual errors. Pr edi ct i on versus ex-
pl anat i on in val i dat i on is discussed in mor e det ai l
in Davis (1992a, pp. 7- 10) .
Some analysts use model calibration, t hat is,
t hey adjust t he si mul at i on model ' s par amet er s
(using some mi ni mi zat i on al gori t hm) such t hat
t he si mul at ed out put devi at es mi ni mal l y f r om t he
r eal out put . (Obviously, t hos e l at t er dat a can not
be used t o val i dat e t he model . ) Exampl es can be
f ound in ecol ogi cal model i ng; see Beck (1987).
Anot he r exampl e is pr ovi ded by t he mi ne hunt i ng
si mul at i on in Kl ei j nen and Al i nk (1992), whi ch
uses an artificial pa r a me t e r t o st eer t he simula-
t i on r esponse i nt o t he di r ect i on of t he obser ved
real responses. Cal i brat i on is a last r esor t em-
pl oyed in bl ack box simulation. Davis (1992b)
discusses how aggr egat ed model s can be cali-
br at ed using det ai l ed model s. Al so see Bankes
(1993, p.443).
4. Document at i on, assessment , credibility, and
accreditation
The model ' s assumpt i ons and i nput val ues de-
t er mi ne whet her t he model is valid, and will
remain valid when t he real system and its envi-
r onment will change: model mai nt enance pr ob-
lem. The r e f or e t he analysts shoul d pr ovi de i nfor-
mat i on on t hese assumpt i ons and i nput val ues in
t he model ' s document at i on. In pract i ce, however,
many assumpt i ons ar e l eft implicit, del i ber at el y
or acci dent l y. And i nput dat a i ncl udi ng scenari os
ar e l eft undocument ed. (Davis (1992a, p.4) distin-
gui shes bet ween ' bar e model ' and ' dat a base' ,
whi ch cor r esponds with t he t er ms ' model ' and
' i nput dat a' in this paper . )
V&V ar e i mpor t ant component s of assess-
ment, def i ned as "a pr ocess by whi ch i nt er est ed
J.P.C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162 159
parties (who were not involved in a model's ori-
gins, development, and implementation) can de-
termine, with some level of confidence, whether
or not a model's result can be used in decision
making" (Fossett et al., 1991, p.711). To enable
users to assess a simulation model, it is necessary
to have good documentation. Assessment is dis-
cussed at length in Davis (1992a); also see Oral
and Kettani (1993, p.229).
Credibility is "the level of confidence in [a
simulation's] results"; see Fossett et al. (1991,
p.712). These authors present a framework for
assessing this credibility. That framework com-
prises 14 inputs. These inputs have also been
discussed in this paper, explicitly or implicitly.
They apply their framework to three military
weapon simulations.
V&V are important components of accredita-
tion, which is " a n official determination that a
model is acceptable for a specific purpose", see
Davis (1992a), Gass (1993), and Williams and
Sikora (1991).
The present paper shows that V&V have many
aspects, involve different parties, and require
good documentation. Gass (1984) proposes to
produce four manuals, namely for analysts, users,
programmers, and managers respectively.
(The lack of good documentation is a problem,
not only with simulation programs but also with
other types of mathematical models and with
software in general; see Section 2.1.)
5. Suppl ementary literature
V&V of simulation models have been dis-
cussed in many textbooks on simulation. Exam-
ples are Banks and Carson (1984), Law and Kel-
ton (1991, pp.298-324), and Pegden et al. (1990,
pp.133-162). These books give many additional
references Stanislaw (1986) gives many references
to the behavioral sciences.
Some case studies were mentioned above. In
addition, Kleijnen (1993) gives a production-plan-
ning case study, Carson (1989) presents a cigarette
fabrication study, and Davis (1992a) gives sum-
maries of several military studies.
Dekker, Groenendijk and Sliggers (1990) dis-
cuss V&V of models that are used to compute
air pollution. These models are employed to issue
permits for building new factories and the like.
Banks (1989) proposes control charts, which
are well-known from quality control. Reckhow
(1989) discusses several more statistical tech-
niques.
Hodges (1991) gives a more polemical discus-
sion of validation.
Findler and Mazur (1990) present an approach
based on Artificial Intelligence methodology, to
verify and validate simulation models.
In case no data are available, Diener, Hicks
and Long (1992) propose t o compare the new
simulation model to the old well-accepted but
non-validated simulation model, assuming the lat-
ter type of simulation is available. Also see Mur-
ray-Smith (1992).
Balci and Sargent (1984a) and Youngblood
(1993) give detailed bibliographies. The refer-
ences of this paper augment those bibliographies.
6. Concl usi ons
This paper surveyed verification and validation
(V&V) of models, especially simulation models
in operations research. It emphasized statistical
techniques that yield reproducible, objective,
quantitative data about the quality of simulation
models.
For verification it discussed the following tech-
niques (see Section 2):
1) general good programming practice such as
modular programming;
2) checking of intermediate simulation outputs
through tracing and statistical testing per module
(for example, the module for sampling random
variables);
3) comparing f i nal simulation outputs with
analytical results for simplified simulation mod-
els, using statistical tests;
4) animation.
For validation it discussed the following tech-
niques (see Section 3):
1) obtaining real-world data, which may be
scarce or abundant;
2) simple tests for comparing simulated and
real data: graphical, Schruben-Turing, and t tests;
160 J.P.C. Kleijnen /European Journal of Operational Research 82 (1995) 145-162
3) two new simple statistical pr ocedur es f or
t est i ng whet her si mul at ed and r eal r esponses ar e
positively cor r el at ed and, possibly, have t he same
means t oo;
4) sensitivity analysis (based on desi gn of ex-
per i ment s and r egr essi on analysis) and risk analy-
sis ( Mont e Carl o sampling) f or est i mat i ng whi ch
i nput s ar e real l y i mpor t ant and f or quant i fyi ng
t he risks associ at ed wi t h i nput s f or whi ch no dat a
can be obt ai ned at all, respectively;
5) whi t e and bl ack box simulations.
Bot h veri fi cat i on and val i dat i on r equi r e good
document at i on. V&V ar e cruci al par t s of assess-
ment , credibility, and accredi t at i on. Suppl emen-
t ar y l i t er at ur e on V&V is given f or f ur t her study.
Thi s essay demonst r at es t he useful ness of
mat hemat i cal statistics in V&V. Nevert hel ess,
analysts and users of a si mul at i on model shoul d
be convi nced of its validity, not onl y by statistics
but also by ot her pr ocedur es; f or exampl e, ani ma-
t i on (which may yi el d face validity).
It seems i mpossi bl e t o pr escr i be a fixed or der
f or appl yi ng t he vari ous V&V t echni ques. I n
some appl i cat i ons cer t ai n t echni ques do not ap-
ply at all. Pr act i ce shows t hat V&V t echni ques
ar e appl i ed in a haphazar d way. Hopeful l y, this
paper st i mul at es si mul at i on analysts and users t o
pay mor e at t ent i on t o t he vari ous aspect s of V&V
and t o appl y some of t he t echni ques pr es ent ed in
this paper . The t axonomy di scussed in this paper
in detail, and t he ot her t axonomi es r ef er r ed to,
may also serve as checklists f or pract i t i oners.
Never t hel ess, si mul at i on will r emai n bot h an art
as well as a science.
Acknowledgements
The reviews by t hr ee r ef er ees l ead t o drast i c
r eor gani zat i ons and expansi ons of previ ous ver-
sions of this paper .
References
Adrion, W.R., Branstad, M.A., and Cherniavsky, J.C. (1982),
"Validation, verification and testing of computer soft-
ware", ACM Computing Surveys 14, 159-192.
Baber, R. (1987), The Spine of Software; Designing Provable
Correct Software: Theory and Practice, Wiley, Chichester.
Balci, O., and Sargent, R.G. (1984a), "A bibliography on the
credibility, assessment and validation of simulation and
mathematical models," Simuletter 15 3, 15-27.
Balci, O., and Sargent, R.G. (1984b), "Validation of simula-
tion models via simultaneous confidence intervals," Amer-
ican Journal of Mathematical and Management Science 4
3-4, 375-406.
Bankes, S. (1993), "Exploratory modeling for policy analysis,"
Operations Research 41 3, 435-449.
Banks, J. (1989), "Testing, understanding and validating com-
plex simulation models," in: Proceedings of the 1989 Winter
Simulation Conference.
Banks, J., and Carson, J.S. (1984), Discrete-event System Simu-
lation, Prentice-Hall, Englewood Cliffs, NY.
Barlas, Y. (1989), "Multiple tests for validation of system
dynamics type of simulation models," European Journal of
Operational Research 42 1, 59-87.
Barlas, Y. (1990), "An autocorrelation function test for output
validation," Simulation 56, 7-16.
Barlas, Y., and Carpenter, S. (1990), "Philosophical roots of
model validation: Two paradigms," System Dynamics Re-
view 6 2, 148-166.
Beck, M.B. (1987), "Wat er quality modeling: A review of the
analysis of uncertainty," Water Resources Research 23 8,
1393-1442.
Benbasat, I., and Dhaliwal, J.S. (1989), " A framework for the
validation of knowledge acquisition," Knowledge Acquisi-
tion 1, 215-233,
Bettonvil, B., and Kleijnen, J.P.C. (1991), "Identifying the
important factors in simulation models with many factors,"
Tilburg University.
Bodily, S.E. (1992), "Introduction; the practice of decision
and risk analysis," Interfaces 22 6, 1-4.
Carson, J.S. (1989), "Verification and validation: A consul-
tant' s perspective," in: Proceedings of the 1989 Winter
Simulation Conference.
Dahl, O. (1992), Verifiable Programming, Prentice-Hall, En-
glewood Cliffs, NY.
Davis, P.K. (1992a), "Generalizing concepts of verification,
validation, and accreditation (VV&A) for military simula-
tion," RAND, October 1992a (to be published as R-4249-
ACQ).
Davis, P.K. (1992b), "An introduction to variable-resolution
modeling and cross-resolution model connection," RAND,
October 1992b (to be published as R-4252-DARPA).
Dekker, C.M., Groenendijk, A., and Sliggers, C.J. (1990),
"Kwaliteitskriteria voor modellen om luchtverontreiniging
te berekenen" (Quality criteria for models to compute air
pollution), Report 90, VROM, Leidschendam, Nether-
lands.
DeMillo, R.A., McCracken, W.M., Martin, R.J., and Passafi-
ume, J.F. (1987), Software Testing and Evaluation, Ben-
j ami n/ Cummi ngs, Menlo Park, CA.
Diener, D.A. Hicks, H.R., and Long, L.L. (1992), "Compari-
son of models: Ex post facto val i dat i on/ accept ance?, " i n :
Proceedings of the 1992 Winter Simulation Conference.
ZP. C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162 161
Eschenbach, T.G. (1992), "Spiderplots versus tornado dia-
grams for sensitivity analysis", Interfaces 22/ 6, 40-46.
Findler, N.V., and Mazur, N.M: (1990), " A system for auto-
matic model verification and validation", Transactions of
the Society for Computer Simulation 6/ 3, 153-172.
Fleming, R.A., and Schoemaker, C.A. (1992), "Evaluating
models for spruce budworm-forest management: Compar-
ing output with regional field data", Ecological Applica-
tions 2/ 4, 466-477.
Fossett, C.A., Harrison, D., Weintrob, H., and Gass, S.I.
(1991), "An assessment procedure for simulation models:
A case study", Operations Research 39/ 5, 710-723.
Gass, S.I. (1984), "Document i ng a computer-based model",
Interfaces 14/ 3, 84-93.
Gass, S.I. (1993), "Model accreditation: A rationale and pro-
cess for determining a numerical rating", European Jour-
nal of Operational Research 66/ 2, 250-258.
Geoffrion, A.M. (1992), "Forces, trends and opportunities in
MS/ OR" , Operations Research 40/ 3, 423-445.
Gray, G.J. and Murray-Smith, D.J. (1993), "The external
validation of nonlinear models for helicopter dynamics",
in: R. Pooley and R. Zobel (eds.), Proceedings of the
United Kingdom Simulation Society Conference, UKSS.
Hodges, J.S. (1991), "Six (or so) things you can do with a bad
model", Operations Research 39/ 3, 355-365.
Karplus, W.J. (1983), " The spectrum of mathematical models",
Perspectives in Computing 3/ 2, 4-13.
Kleijnen, J.P.C. (1974), Statistical Techniques in Simulation,
Part 1, Marcel Dekker, New York.
Kleijnen, J.P.C. (1980), Computers and Profits: Quantifying
Financial Benefits of Information, Addison-Wesley, Read-
ing, MA.
Kleijnen, J.P.C. (1987), Statistical Tools for Simulation Practi-
tioners, Marcel Dekker, New York.
Kleijnen, J.P.C. (1993), "Simulation and optimization in pro-
duction planning: A case study", Decision Support Systems
9, 269-280.
Kleijnen, J.P.C. (1994), "Sensitivity analysis versus uncertainty
analysis: When to use what?", in: Proceedings Predictability
and Nonlinear Modeling in Natural Sciences and Eco-
nomics, Kluwer, Dordrecht.
Kleijnen, J.P.C., and Alink, G.A. (1992), "Val i dat i on of simu-
lation models: Mine-hunting case study", Tilburg Univer-
sity.
Kleijnen, J.P.C., and Van Groenendaal, W. (1992), Simula-
tion: A Statistical Perspective, Wiley, Chichester.
Kleijnen, J.P.C., Rotmans, J., and Van Ham, G. (1992),
"Techni ques for sensitivity analysis of simulation models:
A case study of the CO 2 greenhouse effect", Simulation
58/ 6, 410-417.
Knepell, P.L., and Arangno, D.C. (1993), Simulation valida-
tion: A confidence assessment methodology, I EEE Com-
puter Society Press, Los Alamitos, CA.
Krumm, F.V., and Rolle, C.F. (1992), "Management and
application of decision and risk analysis in Du Pont",
Interfaces 22/ 6, 84-93.
Landry, M., and Oral, M. (1993), "I n search of a valid view of
model validation for operations research", European Jour-
nal of Operational Research 66/ 2, 161-167.
Law, A.M., and Kelton, W.D. (1991), Simulation Modeling and
Analysis, 2nd ed., McGraw-Hill, New York.
Little, J.D.C. (1991), "Operations research in industry: Ne w
opportunities in a changing world", Operations Research
39/ 4, 531-542.
Majone, G., and Quade, E.S. (1980), Pitfalls of Analysis,
Wiley, Chichester.
Miser, H.J. (1993), " A foundational concept of science appro-
priate for validation in operational research", European
Journal of Operational Research 66/ 2, 204-215.
Murray-Smith, D.J. (1992), "Problems and prospects in the
validation of dynamic models" in: A. Sydow (ed.), Compu-
tational Systems Analysis 1992, Elsevier, Amsterdam.
Naylor, T.H., Balintfy, J.L., Burdick, D.S., and Chu, K. (1966),
Computer Simulation Techniques, Wiley, New York.
Oral, M., and Kettani, O. (1993), " The facets of the modeling
and validation process in operations research", European
Journal of Operational Research 66/ 2, 216-234.
Pace, D.K. (1993), " A paradigm for modern modeling and
simulation, verification, validation and accreditation",
Johns Hopkins University, Laurel, MD.
Pacheco, N.S. (1988), "Session III: Simulation certification,
verification and validation", in: SDI Testing: the Road to
Success; 1988 Symposium Proceedings International Test &
Evaluation Association, ITEA, Fairfax, VA, 22033.
Pagan, A. (1989), " On the role of simulation in the statistical
evaluation of econometric models", Journal of Economet-
rics 40, 125-139.
Pegden, C.P., Shannon, R.E., and Sadowski, R.P. (1990),
Introduction to Simulation using SIMAN, McGraw-Hill,
New York.
Reckhow, ILH. (1989), "Validation of simulation models:
Philosophical and statistical methods of confirmation", in:
M.D. Singh (ed.), Systems & Control Encyclopedia, Perga-
mon Press, Oxford.
Richardson, G.H., and Pugh, A. (1981), Introduction to Sys-
tem Dynamics Modeling with DYNAMO, MIT Press, Cam-
bridge, MA.
Ripley, B.D., "Uses and abuses of statistical simulation",
Mathematical Programming 42, 53-68.
Sargent, R.G. (1991), "Simulation model verification and vali-
dation", in: Proceedings of the 1991 Winter Simulation
Conference.
Schriber, T.J. (1991), An Introduction to Simulation Using
GPS S / H, Wiley, New York.
Sehruben, L.W. (1980), "Establishing the credibility of simula-
tions", Simulation 34/ 3, 101-105.
Shannon, R.E. (1975), Systems Simulation: The Art and Sci-
ence, Prentice-Hall, Englewood Cliffs, NJ.
Stanislaw, H. (1986), "Test s of computer simulation validity:
What do they measure?", Simulation & Games 17/2,
173-191.
Sussman, J.M. (1992), "Intelligent vehicle highway systems: A
challenge awaits the transportation and OR/ MS commu-
nity", OR / MS Today 19/ 6, 18-23.
162 J.P.C. Kleijnen / European Journal of Operational Research 82 (1995) 145-162
Tocher, K.D. (1963), The Art of Simulation, English University
Press, London.
Wang W., Yin, H., Tang, Y., and Xu, Y. (1993), " A methodol-
ogy for validation of system and sub-system level models",
Depart ment of System Engineering and Mathematics, Na-
tional University of Defense Technology, Changsha, Hu-
nan, 410073, P.R. China.
Whitner, R.B., and Balci, O. (1989), "Guidelines for selecting
and using simulation model verification techniques", in:
Proceedings of the 1989 Winter Simulation Conference.
Williams, M.K., and Sikora, J. (1991), "SI MVAL Minisympo-
slum - A report", Phalanx, The Bulletin of Military Opera-
tions Research, 24/ 2, PAGES?
Youngblood, S.M. (1993), "Li t erat ure review and commentary
on the verification, validation and accreditation of models
and simulations", Johns Hopkins University, Laurel, MD.
Zeigler, B. (1976), Theory of Modelling and Simulation, Wiley
Interscience, New York.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close