Toward Interactive and Intelligent Decision Support System

Published on December 2016 | Categories: Documents | Downloads: 23 | Comments: 0 | Views: 232
of 23
Download PDF   Embed   Report

Comments

Content

EUROPEAN
JOURNAL
OF OPERATIONAL
RESEARCH

ELSEVIER

European Journal of Operational

Research

IO7 ( 1998) 507-529

Theory and Methodology

Multi-attribute decision making:
A simulation comparison of select methods
Stelios H. Zanakis aY* , Anthony Solomon b, Nicole Wishart a, Sandipa Dublish ’
a Decision

Sciences and Information

Systems Department,

College of Business Administration,

Miami,
’ Decision

& Information

’ Marketing

FL 33199,

Science Department,

Department,

Fairleigh

Oakland

Dickinson

University,

IJniversiQ,

Received 7 August 1996; accepted

Florida

International

University,

USA
Rochester,

MI 4U309. USA

Teaneck, NJ 07666,

USA

18February 1997

Abstract
Several methods have been proposed for solving multi-attribute decision making problems (MADM). A major criticism
of MADM is that different techniques may yield different results when applied to the same problem. The problem
considered in this study consists of a decision matrix input of N criteria weights and ratings of L alternatives on each
criterion. The comparative performance of some methods has been investigated in a few, mostly field, studies. In this
simulation experiment we investigate the performance of eight methods: ELECTRE, TOPSIS, Multiplicative Exponential
Weighting (MEW), Simple Additive Weighting (SAW), and four versions of AHP (original vs. geometric scale and right
eigenvector vs. mean transformation
solution). Simulation parameters are the number of alternatives, criteria and their
distribution. The solutions are analyzed using twelve measures of similarity of performance. Similarities and differences in
the behavior of these methods are investigated. Dissimilarities in weights produced by these methods become stronger in
problems with few alternatives; however, the corresponding final rankings of the alternatives vary across methods more in
problems with many alternatives. Although less significant, the distribution of criterion weights affects the methods
differently. In general, all AHP versions behave similarly and closer to SAW than the other methods. ELECTRE is the least
similar to SAW (except for closer matching the top-ranked alternative), followed by MEW. TOPSIS behaves closer to AHP
and differently from ELECTRE and MEW, except for problems with few criteria. A similar rank-reversal experiment
produced the following performance order of methods: SAW and MEW (best), followed by TOPSIS, AHPs and ELECTRE.
It should be noted that the ELECTRE version used was adapted to the common MADM problem and therefore it did not
take advantage of the method’s capabilities in handling problems with ordinal or imprecise information. 0 1998 Elsevier
Science B.V.
Keywords:

Multiple criteria analysis;

Decision

theory; Utility theory; Simulation

1. Introduction

Multiple
to making

criteria decision making (MCDM) refers
decisions in the presence of multiple,

* Corresponding
author.
[email protected].
0377-2217/98/$19.00
PII

0

SO377-2217(97)00147-l

Fax:

+ I-305-348-4126;

1998 Elsevier

e-mail:

Science B.V. All rights reserved

usually conflicting
criteria. MCDM problems are
commonly categorized as continuous or discrete, depending on the domain of alternatives. Hwang and
Yoon (1981) classify them as (i) Multiple Attribute
Decision Making (MADM), with discrete, usually
limited, number of prespecified alternatives, requiring inter and intra-attribute
comparisons,
involving

508

S.H. Zanakis et al. / European Journal ofOperational Research 107 (1998) 507-529

implicit or explicit tradeoffs; and (ii) Multiple Objective Decision Making (MODM), with decision variable values to be determined
in a continuous
or
integer domain, of infinite or large number of choices,
to best satisfy the DM constraints, preferences or
priorities. MADM methods have also been used for
combining
good MODM solutions based on DM
preferences (Kok, 1986; Kok and Lootsma, 1985).
In this paper we focus on MADM which is used
in a finite ‘selection’ or ‘choice’ problem. In literature, the term MCDM is often used to indicate
MADM, and sometimes MODM methods. To avoid
any ambiguity we would hence forth use the term
MADM when referring to a discrete MCDM problem. Methods involving only ranking discrete alternatives with equal criteria weights, like voting
choices, will not be examined in this paper.
Churchman et al. (1957) were among the earlier
academicians
to look at the MADM problem formally using a simple additive weighting method.
Over the years different behavioral scientists, operational researchers and decision theorists have proposed a variety of methods describing how a DM
might arrive at a preference judgment when choosing
among multiple attribute alternatives. For a survey of
MCDM methods and applications see Stewart (1992)
and Zanakis et al. (1995).
Gershon and Duckstein (1983) state that the major
criticism of MADM methods is that different techniques yield different results when applied to the
same problem, apparently under the same assumptions and by a single DM. Comparing 23 cardinal
and 9 qualitative aggregation methods, Voogd (1983)
found that, at least 40% of the time, each technique
produced a different result from any other technique.
The inconsistency
in such results occurs because:
(a> the techniques use weights differently in their
calculations;
(b) algorithms differ in their approach to selecting
the ‘best’ solution;
cc> many algorithms attempt to scale the objectives, which affects the weights already chosen;
(d) some algorithms introduce additional parameters that affect which solution will be chosen.
This is compounded
by the inherent differences in
experimental conditions and human information pro-

cessing between DM, even under similar preferences. Other researchers have argued the opposite;
namely that, given a type of problem, the solutions
obtained by different MADM methods are essentially the same (Belton, 1986; Timmermans
et al.,
1989; Karni et al., 1990; Goicoechea et al., 1992;
Olson et al., 1995). Schoemaker and Waid (1982)
found different additive utility models produce generally different weights, but predicted equally well
on the average. Practitioners seem to prefer simple
and transparent methods, which, however, are unlikely to represent weight trade-offs that users are
willing to make (Hobbs et al., 1992).
The wide variety of available techniques, of varying complexity and possibly solutions, confuses potential users. Several MADM methods may appear to
be suitable for a particular decision problem. Hence
the user faces the task of selecting the most appropriate method from among several alternative feasible
methods.
The need for comparing MCDM methods and the
importance of the selection problem were probably
first recognized by MacCrimmon
(1973) who suggested a taxonomy of MCDM methods. More recently several authors have outlined procedures for
the selection of an appropriate MCDM method such
as Ozernoy (1992), Hwang and Yoon (1981), Hobbs
(1986), Ozernoy (1987). These classifications
are
primarily driven by the input requirements
of the
method (type of information that the DM must provide and the form in which it must be provided).
Very often these classifications
serve more as a tool
for elimination
rather than selection of the ‘right’
method. The use of expert systems has also been
advocated for selecting MCDM methods (Jelassi and
Ozernoy, 1988).
Our literature search revealed that a limited number of works has been done in terms of comparing
and integrating the different methods. Denpontin et
al. (1983) developed a comprehensive
catalogue of
the different methods, but concluded
that it was
difficult to fit the methods in a classification schema
since “decision studies varied so much in quantity,
quality and precision of information.”
Many authors
stress the validity of the method as the key criterion
for choosing it. Validity implies that the method is
likely to yield choices that accurately reflect the
values of the user (Hobbs et al., 1992). However

S.H. Zunakis

et al/European

Journal

there is no absolute, objective standard of validity as
preferences can be contradictory when articulated in
different ways. Researchers often measure validity
by checking how well a given method predicts the
unaided decisions made independently
of judgments
used to fit the model (Schoemaker and Waid, 1982;
Currim and Sarin, 1984). Decision scientists question
the applicability of this criterion, particularly in complex problems that will cause users to adopt less
rational heuristics and to be inconsistent.
Studies in
decision making have shown that the efficiency of a
decision made has an inverted U shaped relationship
with the amount of information provided (Kok, 1986;
Gemunden and Hauschildt, 1985).
Researchers, who have attempted the task of comparing the different MADM methods have used either real life cases or formulated a real life like
problem and presented it to a selected group of users
(Currim and Satin, 1984; Gemunden and Hauschildt,
1985; Belton, 1986; Roy and Bouyssou, 1986; Hobbs,
1986; Buchanan and Daellenbach,
1987; Lockett and
Stratford, 1987; Stillwell et al., 1987; Karni et al.,
1990; Stewart, 1992; Goicoechea et al., 1992). Such
field experiments
are valuable tools for comparing
MADM methods, based on user reactions. If properly designed,
they assess the impact of human
information
processing
and judgmental
decision
making, beyond the nature of the methods employed.
Users may compare these methods along different
dimensions,
such as perceived simplicity, trustworthiness, robustness and quality. However, field studies have the following limitations and disadvantages:
(a) The sample size and range of problems studied
is very limited.
(b) The subjects are often students, rather than real
decision makers.
(c) The way the information is elicited may influence the results more than the model used (Olson
et al., 1995).
(d) The learning effect biases outcomes, especially
when a subject employs various methods sequentially (Kok, 1986).
(e> Inherent human differences led Hobbs et al.
(1992) to conclude that “decisions
can be as or
more sensitive to the method used as to which
person applies it.” However, in a similar study,
Goicoechea et al. (1992) concluded that “rankings

of Operational

Research

107 (1998) 507-529

509

are not affected significantly
by the choice of
decision maker or which of these methods is
used.” The fact that judgments were elicited from
working professionals
in one study and graduate
students in the other may explain partially the
discrepancy.
(f) It is impossible or difficult to answer questions
like:
1. Which method is more appropriate for what
type of problem?
2. What are the advantages/disadvantages
of using one method over another?
3. Does a decision change when using different
methods? If yes, why and to what extent?
The above limitations may be overcome via simulation. However, since they cannot capture human
idiosyncrasies,
their findings
should supplement
rather than substitute those of the field experiments.
We have found only three simulation studies comparing solely AHP type methods.
Zahedi (1986) generated symmetric
AHP and
asymmetric matrices of size 6 and 22 from uniform,
gamma and lognormal distributions, with muhiplicative error term. Criteria weights were derived using
six methods: Right eigenvalue, row and column geometric means, harmonic mean, simple row average,
and row average of columns normalized first by their
sum (called mean transformation method). The accuracy of the corresponding weight and rank estimators
was evaluated
using MAE, MSE, Variance and
Theil’s coefficient.
She concluded that, when the
input matrix is symmetric, the mean transformation
method outperformed all other methods in accuracy,
rank preservation and robustness toward error distribution. Differences between methods were noticeable only under a gamma error distribution, where
the eigenvalue method did poorly, while the row
geometric mean exhibited better rank preservation
with large-size
matrix. All methods
performed
equally well (except simple row average) and much
better when errors had a uniform than lognormal
distribution.
Takeda et al. (1987) conducted an AHP simulation study, with multiplicative
random errors, to
evaluate different eigen-weight
vectors. They advocate using their graded eigenvector
method over
Saaty’s simpler right eigenvector approach.

S.H. Zunakis et al./ European Journal

510

Triantaphyllou
and Mann (1989) simulated random AHP matrices of 3-21 criteria and alternatives.
Each problem
was solved using four methods:
Weighted
sum model (WSM), weighted product
model (WPM), right-eigenvector
AHP and AHP revised by normalizing each column by the maximum
rather the sum of its elements, according to Belton
and Gear (1984) suggestion for reducing rank reversals. Solutions were compared against the WSM
benchmark
and rate of change in best alternative
when a nonoptimal alternative is replaced by a worse
one. They concluded that the revised AHP appears to
perform closest to the WSM; AHP tends to behave
like WSM as the number of alternatives increases;
and that the rate of change does not depend on the
number of criteria.
The first two studies are limited to a single AHP
matrix; i.e. different methods for deriving weights
only for the criteria or only for the alternatives under
a single criterion - not simultaneously
for the entire
MADM problem. And all three are limited to variants of the AHP. A further limitation of the third
study is that it employs only two measures of performance: The percentage
contradiction
between
a
method’s rankings to WSM, and the rate of rank
reversal of top priority. There is clearly a need for a
simulation study comparing also other MADM type
methods, using various measures of performance.
Our work in that regard is explained in the next
section. The MADM problem under consideration is
depicted by the following DM matrix of preferences
for m alternatives rated on n criteria:
Criterion
c,

c2

...

cJ

...

cN

rll

rt*

...

rIj

...

rl

2

r21

rz2

...

rlj

...

r2N

i

r 11

ri2

...

rij

...

riN

.

rL2
.

...

rLj

. ._

‘LN

Alternative
1

r,

TLI

N

Where c, is the importance
(weight) of the jth
criterion and rij is the rating of the ith alternative on
the jth criterion. As commonly done, we will assume that the latter are column normalized, to also

of Operational Research 107 (1998) 507-529
add to one. Different MADM methods will be examined for eliciting these judgments and aggregating
them into an overall score S, for each alternative.
Then, the overall evaluation (weight) of each alternative will be W, = S,/CS,,
leading to a final ranking
of all alternatives. Development
of a cardinal measure of overall preference of alternatives (S;) have
been criticized by advocates of outranking methods
as not reliably portraying true or incomplete preferences. Such methods establish measures of outranking relationships among pairs of alternatives, leading
to a complete or partial ordering of alternatives.

2. Methods compared
Of the many MADM methods available we have
chosen the following five for comparison
in our
research, when applied to solve the same problem
with the decision matrix information stated earlier:
1. Simple Additive Weighting (SAW): Si = Cjcjri,.
2. Multiplicative Exponent Weighting (MEW ): Si =
n, rz.
3. Analytic Hierarchy Process (AHP) - four versions.
4. ELECTRE.
5. TOPSIS (Technique for Preference by Similarity
to the Ideal Solution).
The rationale for selection has been that most of
these are among the more popular and widely used
methods and each method reflects a different approach to solve MADM problems. SAW’s simplicity
makes it very popular to practitioners (Hobbs et al.,
1992, Zanakis et al., 1995). MEW is a theoretically
attractive contrast against SAW. However, it has not
been applied often, because of its practitioner-unattractive mathematical
concept, yet in spite of its
scale invariant property (depends only on the ratio of
ratings of alternatives). TOPSIS (Hwang and Yoon,
1981) is an exception in that it is not widely used;
we have included it because it is unique in the way it
approaches the problem and is intuitively appealing
and easy to understand. Its fundamental premise is
that the best alternative,
say ith, should have the
shortest Euclidean
distance S,! = [C(rij - r,?)2]1’2
from the ideal solution (r,?, made up of the best
value for each attribute regardless of alternative) and

S.H. Zmakis et al./ European Journal of Operational Research 107 (1998) 507-529

the farthest distance S; = [C(rjj - r,:)2]‘/2 from the
negative-ideal
solution (r,:, made up of the worst
value for each attribute). The alternative with the
highest relative closeness measure S,T/<.S,T + S,:) is
chosen as best. In a sense, S,? and S,: correspond to
ELECTRE’s concordance and discordance indexes.
The ELECTRE method is much more popular in
Europe than in the US. Proponents argue that its
outranking concept is more relevant to practical situations than the restrictive dominance
concept. It
elicits from the DM, for each pair of alternatives, a
concordance and discordance index. The first represents the sum of weights of attributes for which
alternative A is better than B. The second denotes
the absolute difference of this pair of attributes divided by the maximum difference over all pairs. By
establishing threshold values for these two indices,
one can generate a set of alternatives
that is not
outranked by other alternatives.
In our simulation
experiments,
we set these threshold values as the
average of each index over all pairs of alternatives,
as suggested by Hwang and Yoon (1981). In order to
obtain an overall ranking of the alternatives in our
experiment, the procedure was reapplied to all alternatives in the same bracket (dominated
or nondominated).
In the case of AHP we tested four versions using
two different methods for obtaining
the relative
weights (right eigenvalue vs. mean transformation,
as in Zahedi, 1986) and two different types of scale:
AHP original scale:
1
2
3
4
5
AHP geometric scale :
1

e0.5

el.O

e1.5

e2.0

6

7

8

9

e2.5

e3.0

e3.5

e4.0

Geometric scales have been advocated over Saaty’s
original AHP scale, because of their transitivity and
larger value span found in many physical situations,
resulting in more robust selections (Legrady et al.,
1984).
Our choice of methods in this simulation study
may seem strange at first. They require different
input preference information
or scales and aim at
different outputs. SAW and MEW, assume additive
and multiplicative
weighted preferences in an interval scale. AHP employs a ratio scale to elicit pairwise comparisons
of alternatives on each criterion
(even without explicitly rating each pair) and an

511

additive aggregation to global weights. Normalization of the decision matrix is necessary to handle
different types of attributes (e.g. benefits vs. costs) in
all methods, except ELECTRE which can also handle ordinal or descriptive (imprecise)
information
and criteria importance not adding up to one. TOPSIS uses the Euclidean norm to normalize the decision matrix, while the regular AHP normalizes
weights by dividing them by their sum. ELECTRE’s
output differs from the other methods, in that it does
not provide a global preference of alternatives, but a
partial (sometimes complete) ranking of alternatives.
In that sense, ELECTRE results can be compared to
the final ranking of alternatives
produced by the
other methods.
This common
‘denominator
approach’ overlooks some of the ELECTRE’s advantages of dealing with different or less precise situations via binary relationships.
However, it is of
interest in building computerized evaluation and DSS
(Pomerol, 1993) for handling the common problem
defined by the earlier decision matrix; namely, a
decision matrix of explicitly rated alternatives and
criteria weights. Many MCDA methods have been
developed over the years, but little is known about
their relative merits on similar problems. Surveys of
MCDM research status point to needs of more validation studies, choice of aggregation
procedures
based on problem characteristics, as well as simple,
understandable,
and usable approaches for solving
MCDM and MAUT problems (Dyer et al., 1992;
Stewart, 1992).
The methods examined in this experiment have
been contrasted in field studies by other researchers.
Olson et al. (1995) used a single problem to examine
how a group of students used and compared software
implementing MAUT, SAW, AHP and ZAPROS - a
procedure of ordinal tradeoffs with additive value
function, whose parameters are not explicitly determined. Several other field studies (but no simulation
study) have compared ELECTRE to one or more of
the other methods. Karni et al. (1990) concluded that
ELECTRE, AHP and SAW rankings did not differ
significantly in three real life case studies. Lootsma
(1990) contrasted AHP and ELECTRE as representing the American and French schools in MCDA
thought found to be “unexpectedly
close to each
other.” In extensive field studies Hobbs et al. (1992)
and Goicoechea et al. (1992) had graduate students

512

S.H. Zmukis

et d/European

Journul of Operational Research 107 (19981507-529

and U.S. Army Corps Engineers
evaluate AHP,
ELECTRE, SAW and other methods on water supply
planning studies. Their results were contradictory;
the first found perceived differences across methods
and users, while the latter study did not. Finally,
Comes (1989) compared ELECTRE to his method
TODIM
(a combination
of direct rating, AHP
weighting and dominance ordering rules) on a transportation problem and concluded that both methods
produced essentially
the same ranking of alternatives. The above findings highlight our motivation
and justification
for undertaking
this simulation
study. Our major objective was to conduct an extensive numerical comparison of several MCDA methods, contrasted in several field studies, when applied
to a common problem (a decision matrix of explicitly rated alternatives and criteria weights) and determine when and how their solutions differ.

3. Simulation experiment
According to Hobbs et al. (1992) a good experiment should satisfy the following conditions:
(a) Compare methods that are widely used, represent divergent philosophies of decision making or
claimed to represent important methodological improvements.
(b) Address the question of appropriateness,
ease
of use and validity.
(c) Well controlled,
uses large samples and is
replicable.
(d) Compares methods across a variety of problems.
(e) Problems involved are realistic.
Our simulation
experiment
satisfies all conditions
except the second one.
Computer simulation was used for the purpose of
comparing
the MADM methods. The reason for
using simulation was that it is a flexible and versatile
method which allows us to generate a range of
problems, and replicate them several times. This
provides a vast database of results from which we
can study the patterns of solutions provided by the
different methods.

The following
simulation:

parameters

were chosen

for our

1. Number of criteria N: 5 10 15 20.
2. Number of alternatives L: 3 5 7 9.
3. Ratings of alternatives
rjj: randomly generated
from a uniform distribution in O-l
4. Weights of criteria c,: set all equal (l/N),
randomly generated from a uniform distribution in
O-l (std. dev. l/12) or from a ‘beta’ U-shaped
distribution in O-l (std. dev. l/24).
100 for each combina5. Number of replications:
tion, thus producing 4 criteria levels X 4 alternative levels X 3 weight distributions X 100 replications = 4800 problems, resulting in a total of
38,400 solutions, across eight approaches - four
methods plus AHP with four versions.
An explanation of these choices is in order. The
range for the number of criteria and alternatives is
typical of those found in many applications. This is
representative of a typical MADM problem, where a
few alternatives are evaluated on the basis of a wide
set of criteria, as explained below. Many empirical
studies on the size of the evoked set in the consumer
and industrial market context have shown that the
number of intensely discussed alternatives does not
exceed 4-5 (Gemunden
and Hauschildt,
1985). In
practice a simple check-list of desirable features will
rule out unacceptable alternatives early, thus leaving
for consideration
only a small number. The number
of criteria, though, can be considerably higher. Three
distributions for weights were assumed: No distribution, i.e. all weights equal to l/N (class of problems
where criteria are replaced by judges or voters of
equal impact); uniform distribution,
which may reflect an unbiased, indecisive or uninformed user; and
a U shape distribution,
which may typify a biased
user, strongly favoring some issues while rigidly
opposing others. Under group pressure, similar situation may not arise often in openly supporting pet
projects. For this reason and in order to keep this
simulation size manageable, we considered only one
distribution
(uniform) for ratings under each criterion.
Additional care was taken during the data generation phase. The ratio of any two criteria weights or
alternative ratings should not be extremely high or

S.H. Zanakis et ul. / Eurc~peun Joumul of Operutionul Reseurch 107 (1998) 507-529

extremely low; this will avoid pathological cases or
scale-induced
imbalances between methods, whose
performance then deteriorates (Zahedi, 1986). After
some experimentation,
this was set at 75 (and l/75),
one step beyond the maximum e4 of the geometric
AHP scale. Symmetric reciprocal matrices were obtained from these ratio entries for the AHP methods.
No alternative was kept if it was dominating
all
others on every criterion, or if it was dominated by
another alternative on all criteria. For each criterion,
all weights were normalized to add up to one. Similar normalization was applied to the final weights of
the alternatives overall criteria in each problem. The
AHP pairwise comparisons a,, (> 1) were generated
by selecting the closest original (Saaty) or geometric
scale value to the ratio c,/ci for two criteria and
rrk/rlk for two alternative ratings under criterion k;
and then filling the symmetric
entries using the
reciprocal ratio condition aji = l/a;,.
The generated data were also altered subsequently
to simulate rank reversal conditions, when a non-optimal new alternative is introduced. This is a primary
criticism of AHP and has created a long and intense
controversy
among researchers (Belton and Gear,
1984; Saaty, 1984; Saaty, 1990; Dyer, 1990; Harker
and Vargas, 1990; Stewart, 1992). This experimentation was applied to each method solution and initial
problem, say of L alternatives, as follows: (i) A new
alternative is introduced in the problem by randomly
generating
n ratings for each criterion from the
uniform distribution; (ii) the ranks of L + 1 alternatives in the new problem are determined; (iii) if the
new (L + 1) alternative
gets the first rank, it is
rejected and another alternative is generated as in
step (ii); (iv) if the new alternative gets any other
rank, the new rank order of the old alternatives is
determined after removing the new alternative rank.
Thus an original array of ranks and a new array of
ranks are produced for each problem and method.
These two rank arrays are used in computing the
rank reversal measures.
Two categories of performance
measures were
employed in our experiment: (1) measures that compare each method with the SAW method, in terms of
final weights or ranks of alternatives; and (2) rank
reversal measures for each method. In the absence of
any other objective standard, the solution provided
by SAW was used as the benchmark. The rationale

513

for selecting SAW as the benchmark
is that its
simplicity makes it extremely popular in practice.
For each method, the following measures of similarity were computed on its final evaluation (weights or
ranks) against those of the SAW method, averaged
over all alternatives in the problem:
1. Mean squared error of weights (MSEW) and the
same for ranks (MSER).
2. Mean Absolute error of weights (MAEW) and the
same for ranks (MAER).
3. Theil’s coefficient U for weights (UW) and the
same for ranks CUR).
4. Kendall’s correlation Tau for weights (KWC).
5. Sperman’s correlation for ranks (SRC).
6. Weighted rank crossing 1 (WRCI).
7. Weighted rank crossing 2 (WRC2).
8. Top rank matched count (TOP).
9. Number of ranks matched, as % of number of
alternatives L (MATCH%).
The reason for looking at measures for both final
weights and ranks is because methods may produce
different final weights for alternatives, but they can
result in the same or different rank order of alternatives. Our last four measures capture this rank disagreement (crossings of rank order), of which measures, two are giving more weight to higher rank
differences:

WRC=

5 W,R.s*w- R,.blETHv
5Y

i=l

<=I

where
W,=L+
W, = l/i

1 -i

i= 1,2 ,...,

L

i = 1,2,. . ,L

forWRC1
for WRC2.

Perfect agreement between a method and SAW would
have zero MSEW, MAEW, MSER, MAER, UW,
UR, WRCl and WCR2, as well as perfect correlations KWC = 1 and SRC = 1, and perfect match
TOP = 1 and MATCH% = 1.
Similar measures of rank reversal were computed
on the rank order of the L alternatives before and
after the introduction of the additional alternative, for
each method and problem: WRl, WR2, MSER,
MAER, and SRC. Additionally, we counted for each
method and problem, the percent of time the top
ranked alternative remained the same after the intro-

514

S.H. Zanakis et al. /European

Journal

qf Operational Research 107 II 998) 507-529

duction of a new nonoptimal alternative (TOP); and
the total number or ranks not altered as a percent of
number of alternatives (MATCH%) for that problem.
Here we would like to clarify that the efficiency
of a method is not merely a function of the theory
supporting it or how rigorous it is mathematically
speaking. The other aspects which are also very
important, relate to its ease of using it, user understanding and faith in the results, method reliability
(consistency)
vs. variety. These are important and
have been tackled by some authors (Buchanan and
Daellenbach,
1987; Hobbs et al., 1992; Stewart,
1992). Such issues can not be studied in a simulation
experiment.

4. Analysis of experimental

measures at the 95% confidence level, except by
distribution type for KWC, SRC, MSER, UR, and
marginally for MAER, WRCl and WRC2. According to the parametric ANOVA, the number of alternatives, number of criteria and method, as well as
most of their interactions,
affect significantly
all
measures of performance. However, the distribution
type and few of its interactions,
do not influence
significantly
four performance
measures;
namely
KWC and UR (as was the case with the nonparametric tests), SRC and MSER at the 95% level.
Table 5 portrays the average performance measure for each method, along with Tukey’s studentized range test of mean differences.
Performance
measures on weights are not given for ELECTRE,
since it only rankorders the alternatives.
The four
AHP methods produce indistinguishable
results on
all measures, and they were always closer to SAW
than the other three methods. The only exception is
the TOP result for ELECTRE,
indicating
that it
matched the top ranked alternative produced by SAW
90% of the time, vs. 82% for the AHPs. Any differences among the four AHP version results are affected more by the scale (original vs. geometric) than

results

The simulation results were analyzed using the
SAS package. Each measure of performance
was
analyzed via parametric ANOVA and nonparametric
(Kruskal-Wallis).
The results are summarized
in
Tables 1 and 3. The nonparametric
tests reveal that
N, L and distribution
type affect all performance

Table 1
Summary

of ANOVA

L
V
METH
N
L*V
L*METH
N*L
V * METH
N*V
N+METH
N*L*V
N* L*METH
N*V*MJTH
L* V*METH
N*L*V*h4ETH

significance

levels for factors and interactions

KWC

MATCH%

WRCl

WRC2

SRC

MSER

MAER

MSEW

MAEW

UW

UR

0.0001

0.0001
0.0019
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001

0.0001
0.0410
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0058
0.0001
0.0498
0.0002

0.0001
0.0373
0.0001
0.0001
O.cOOl
0.0001
0.0001
0.0001
0.0001
0.0001
0.0025
0.0001
0.0030

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0015
0.0001
-

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0998
0.0001
0.0503
0.0001

0.0001
0.0607
O.oool
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0155
0.0001
0.0204
0.0002

0.0001
0.0001
0.0001
o.ooo1
0.0001
0.0001
0.0001
0.0010
0.0787
0.0001
0.0013
0.0001
-

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
o.Oc01
0.0079
0.0577
0.0001
0.0004
0.0329
-

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0138
0.0001
0.0001
0.0001
-

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0094
0.0001
0.0253
-

0.0001
0.0001
0.0001
0.0001
0.0001
0.0410
0.0001
0.0071

-

-

-: Indicates not significant result (P-value > 0.10).
L: Number of alternatives.
N: Number of criteria.
V: Type of distribution = 1 equal weights; 2 uniform; 3 beta U.
MISTH: Method = 1 Simple Additive Weighting (SAW); 2 AHP with original scale using eigenvector; 3 AHP with geometric scafe using
eigenvector; 4 AHP with original scale using mean transformation;
5 AHP with geometric scale using mean transformation; 6 Multiplicative
Exponential Weighting; 7 TOPSIS; 8 ELECTR.

S.H. Zanakis et al. /European
Table 2
Summary

of ANOVA

significance

L
V
METH
N
L*V
L * METH
N*L
V * METH
N*V
N*METH
N*L*V
N* L*METH
N* V*METH
L* v*METH
N* L*V*METH

Journal

qf Operational Research 107 (I 998) 507-529

levels for factors and interactions

515

rank reversal experiment

MATCH%

WRCl

WRC2

SRC

MSER

MAER

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0226
O.OOQl
0.0055
0.0001

0.0001
0.0001
O.OCQl
0.0001
0.0753
0.0001
0.0001
0.0001
0.0039
0.0001
0.0077
0.0001
0.005 1
0.0001
0.0181

0.0001
0.0001
0.0001
0.0001
O.OQOl
O.CQOl
0.0030
O.oool
0.0185
0.0001

0.0001
0.0001
O.OOQl
0.0001
0.0001
0.0001
0.007 1
0.0001
0.0126
0.0001
0.0796

0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
O.OOfll
0.0001
0.0006
0.0001
0.0110
0.0001
0.0001
0.0001
0.0001

0.0001
0.0001
0.0001
0.0001
0.0089
0.0001
0.0001
0.0001
0.0041
0.0001
0.0161
0.0001
0.0004
0.0001
0.0175

0.0146

0.0001
0.0261
0.0001
0.0433

-: Indicates not significant result (P-value > 0. IO).
L: Number of alternatives.
N: Number of criteria.
V: Type of distribution = 1 equal weights; 2 uniform; 3 beta U.
METH: Method = 1 Simple Additive Weighting (SAW); 2 AHP with original scale using eigenvector; 3 AHP with geometric scale using
eigenvector; 4 AHP with original scale using mean transformation;
5 AHP with geometric scale using mean transformation; 6 Multiplicative
Exponential Weighting; 7 TOPSIS; 8 ELECTRE.

by the solution approach (eigenvector
vs. mean
transformation).
The latter contradicts
Zahedi’s
(1986) study that examined single AHP matrices,
possibly due to the aggregating effect of looking at
criteria and alternatives together. The MAEW for
each AHP version was only about 0.008, implying
weights of about +0X% away from those of SAW
on the average. The most dissimilar method to SAW
is ELECTRE followed by MEW, and TOPSIS to a
lesser extent. More specifically, the MEW method

Table 3
Summary

Alternatives
Criteria
Distribution
Method

of Kruskal-Wallis

nonparametric

ANOVA significance

produces significantly different results from all AHP
versions on all measures. MEW and ELECTRE behave similarly in SRC and MSER, but differ according to MAR, UR, WRCl and WCR2. TOPSIS differs from ELECTRE and MEW on all measures; and
agrees with AHP only on SRC and UR (only for
original scale). The rankorder results of all methods
mostly agree with those of SAW, as indicated by
their high correlations (all SRC > 0.80). In light of
the prior comments, SRC gives a stronger impression

levels

SRC

MSER

MAER

UR

WRCl

WRC2

MAEW

MSEW

UW

KWC

MATCH%

O.oool
o.ooo3
0.0473
0.0001

0.0001
0.0006
0.0151
0.0001

0.0001
0.0004
0.0177
0.0001

0.0001
0.0004
0.0518
0.0001

0.0001
0.0010
0.0260
0.0001

0.0001
0.0005
0.0464
O.oool

0.0001
0.0001
O.OtlOl
0.0001

0.0001
0.0001
O.oool
0.0001

O.OQOl
0.0001
0.0001
0.0001

O.Oc@l
0.0002
0.1021
0.0001

0.0001
O.OflOl
0.0234
0.0001

S.H. Zanakis et al. /European

516
Table 4
Summary

Alternatives
Criteria
Distribution
Method

of Kruskal-Wallis

nontwametric

ANOVA

Journal of Operational Research 107 (1998) 507-529

significance

levels rank reversal exoeriment

SRC

MSER

MAER

WRC 1

WRC2

MATCH%

0.0001
0.0001
0.0001
0.0001

0.0001
0.0001
0.0001
0.0001

0.0001
0.0001
0.0001
0.0001

0.0001
0.0001
0.0001
0.0001

0.0001
0.0001
0.0001
0.0001

0.0001
0.0001
0.0001
0.0001

of similarity than it actually exists. For the large
sample sizes involved, SRC should be below 0.04
approximately to imply no correlation or above 0.96
to imply perfect rank agreement, neither of which is
the case here. SRC results sometimes contradicted
those of the other rank performance
measures. In
those cases we lean towards the latter, since SRC
does not consider rank importance, unlike our measures WRCl and WRC2 (the former giving larger
values than the later by design). Comparing SRC to
WRCl or WCR2, one may observe that although
TOPSIS and the four AHPs have similar SRC, the
higher WRC values imply that TOPSIS differs from
the AHPs more in higher ranked than lower ranked
alternatives. Similarly, ELECTRE differs from MEW

also more in higher ranked alternatives than lower
ones. An interesting finding is that although ELECTRE matches SAW top rank more often (90%) than
the other methods, its match of all SAW ranks
(MATCH%)
is far smaller than any of the other
methods. Many graphs were also drawn to further
identify parameter value impacts, mean differences
and important interactions.
However, space limitations prevent showing all of them.
ESfect of number of alternatives (L): As the number of alternatives L increases, all methods tend to
produce overall weights closer to SAW’s (especially
TOPSIS). This is reflected in higher correlations
KWC (except for the insensitive method MEW) and
SRC, higher Theil’s UW (only for AHPs), and lower

0.65 I
2

3

4

5
Method

Fig. 1. KWC by number of alternatives.

6

7

S.H. Zanuki.s et al. / European Journal ~fOperutionu1
Table 5
Average performance

measures by method and Tukey’s

517

Research 107 (1998) 507-529

test on differences
WRC2

WRC I

SRC
Methods

Mean

Tukey

Mean

Tukey

Mean

Tukey

AHP, Original, eigen
AHP, Geometric, eigen
AHP, Original, MTM
AHP, Geometric, MTM
MEW
TOPSIS
ELECTRE

0.8967
0.8992
0.8969
0.8992
0.8045
0.8921
0.8078

A
A
A
A
B
A
B

0.362 1
0.3507
0.3626
0.3500
0.6278
0.4047
0.7267

D
D
D
D
B
C
A

0.3253
0.3142
0.3258
0.3138
0.5726
0.3723
0.686 I

D
D
D
D
B
C
A

KWC

MAEW

MSEW

Methods

Mean

Tukey

Mean

Tukey

Mean

Tukey

AHP, Original, eigen
AHP, Geometric, eigen

0.8257
0.8280

A
A

O.OQOl7
0.00019

B
B

AHP, Original, MTM
AHP, Geometric, MTM
MEW
TOPSIS

0.8257
0.827 1
0.7329
0.7764

A
A
C
B

0.ooo17
0.00019
0.00074
o.OOQ77

B
B
A
A

0.0085
0.0087
0.0084
0.0087
0.0194
0.0158

C
C
C
C
A
B

ELECTRE

MSER

MAER

uw

Methods

Mean

Tukey

Mean

Tukey

Mean

Tukey

AHP, Original, eigen
AHP, Geometric, eigen
AHP, Original, MTM
AHP, Geometric, MTM
MEW
TOPSIS
ELECTRE

0.4972
0.4784
0.4974
0.4779
1.1820
0.6747
1.2132

C
C
C
C
A
B
A

0.3590
0.348 1
0.3592
0.3474
0.6376
0.4093
0.7250

D
D
D
D
B
C
A

0.023
0.0236
0.0232
0.0235
0.0565
0.0416

C
C
C
C
A
B

UR

TOP

MATCH%

Methods

Mean

Tukey

Mean

Tukey

Mean

Tukey

AHP, Original, eigen
AHP, Geometric, eigen
AHP, Original, MTM
AHP, Geometric, MTM
MEW
TOPSIS
ELECTRE

0.0663
0.0647
0.0663
0.0646
0.1055
0.0690
0.1168

CD
D

0.8215
0.8246
0.8206
0.8254
0.7548
0.7549
0.9035

B
B
B
B
C
C
A

0.6910
0.6966
0.6908
0.6950
0.567 1
0.6343
0.3537

A
A
A
A
C
B
D

CD
D
B
C
A

Note: The same letter (A, B, C, D) indicates no significant
is from largest to smallest average value.

average difference

MSEW and MAEW. However, when the number of
alternatives is large, rank discrepancies are amplified
(to a lesser extent for TOPSIS), as evident by higher
rank performance measures MAER, MSER, WRCl,
WRC2 and to some extent UR. In contrast to the

between methods, based on Tukey’s test. Letter order A to D

clear rank results of MATCH%, WRCl and WRC2,
SRC produces mixed results as L increases; this
demonstrates further its inability to account for different rank importance. ELECTRE matched the SAW
top (all) ranked alternatives more (less) often than

S.H. Zanakis et al./ European Journal of Operational Research 107 (1998) 507-529

518
0.035

0.03

3

0.025

B
P

t

p

0.02

E
t
z

0.015

z
P

I

0.01
t

f
OC

2

4

3

6

5

7

Method

Fig. 2. MAEW by number of alternatives.

Effect of number of criteria (N): Most performance measures (MAER, MSER, SRC, KWC, UR,
WRCI, WRC2) for most methods changed slightly
with N, but significantly according to ANOVA. This

any other method, resulting in larger WRCs, regardless of the number of alternatives. The change in L
affects each AHP version the same way. See Figs.
1-6.

.~_

i.2
r-----

7

+3

L,
+5

-A-7
*9

I

4

3

5

6

Method
Fig.

3. MAER by number of alternatives.

7

8

519

S.H. Zanakis et al. / European Journal of Operational Research 107 (1998) 507-529

I

0.1 -

Fig. 4. TOP by number of alternatives.

is because MEW and the four AHPs are hardly
sensitive to changes in N (no change in KWC and
all rank performance measures). As the number of
criteria N increases, the methods (especially ELEC-

TRE but not TOPSIS) tend to produce different
rankings of the alternatives from those of SAW, as
documented by higher MAER, MSER, UR, WRCl,
WRC2 and lower SRC; and to some extent different

0.1

Fig. 5. MATCH% by number of alternatives.

520

S.H. Zanakis et al. /European

0

2

3

4

Journal of Operational

Research 107 (1998) 507-529

-~__
5

6

7

8

Method

Fig. 6. WRCl by number of alternatives

weights of alternatives,
as implied by somewhat
smaller KWC. However, differences
in the final
weights for alternatives were larger in problems with
fewer criteria, as proven by increased
MAEW,
MSEW, UW and lower KWC. TOPSIS behaved

differently from the other methods, more so in its
final rankings than its final weights. TOPSIS rankings differ from those of SAW and the AHPs when
N is large (= 20) and, to a lesser extent, when N is
small (= 5) where it behaved more like ELECTRE

0 025

P
P

0.02

L
$
0‘
ii 0.015
d
J
I
5

0.01

P

0.005

3

4

5
Method

Fig. 7. MAEW by number of criteria.

6

7

S.H. Zanakis et al. /European

Journal

qf Operational

521

Research 107 (1998) 507-529

09

0.6

0.7

:

0.3

i

0

~~

2

3

4

5

6

7

a

Method

Fig. 8. MAERE by number of criteria.

and MEW. This is evident by its increased MAER,
MSER, UR, WRCl,
WRC2 and reduced TOP,
MATCH% and SRC. Again, ELECTRE matched the
SAW top (all) ranked alternatives more (less) often
than any other method, resulting in larger WRCs,

regardless of the number of criteria. The change in L
affects each AHP version the same way. See Figs.
7-11.
EfSect of distribution of criteria weights (V): It
does not affect significantly several weight measures

0.9
,

2 0.5
f

a

t

op0.5

L

Fig. 9. TOP by number of criteria.

S.H. Zmakis et al/European

522

Journal of Operational Research 107 (1998) 507-529

..-____

1

0.9 i

0.6

$

0.7 -P
f

0.8 --

d
5
E 0.5 -8
s
e

!

1

0.4 1
0.3

0.2

0.1

0

I__

1

--_-____~-_-_+
2

3

4

5

6

7

8

Method

Fig. 10. MATCH% by number of criteria.

MAEW, MSEW - except TOPSIS), while the
effect is mixed according to rank measures. As
expected, equal criteria weights (V = 1) reduce alter-

(VW,

0.6

native weight differences between methods. Surprisingly, however, final weight dissimilarities
between
methods were higher under the uniform than beta

7

0.7 i

_

0.6

e
i
g 0.5
5
i
E


o.4
0.3

0.2

0.1

I

Fig. Il. WRC 1 by number of criteria.

S.H. Zunakis er al. / European Journal

o--.

qf Operational Research 107 (1998) 507-529

523

I

2

3

Fig. 12. MAEW by criterion weight distribution.

distribution. In the case of AHP, the uniform distribution differentiates slightly more its final rankings
and weights from SAW when using the original
scale rather than the geometric scale. TOPSIS final
rankings differ from those of SAW more (least)
under the beta (equal constant) distribution. ELECTRE and MEW methods differentiate
their final

0

rankings more (least) under the equal constant
form) distribution. See Figs. 12-15.
4.1. Rank reversal results

Similar analyses were performed on the rank reversa1 experimental results. Here each method results

/

2

3

4

(uni-

5

6

Method

Fig. 13. MAER by criterion weight distribution.

7

8

S.H. Zanakis et al. /European

524

Journal of Operational Research 107 (1998) 507-529

Fig. 14. TOP by criterion weight distribution

reveal that all factors (number of alternatives, number of criteria, distribution and method), and most of
their interactions,
are highly significant (Tables 2
and 4).

were compared to its own (not SAW), before and
after the introduction of a new (not best) alternative.
The major findings are summarized in Tables 2, 4
and 6. The parametric and non-parametric
ANOVAs
0.9

0.6

07

_

06

@
'I
I
0' 0.5

2

3

4

5

5

Method

Fig. 15. WRCl by criterion weight distribution.

7

S.H. Zanakis et al. /European
Table 6
Average performance

measures

by method and Tukey’s test on differences
SRC

rank reversal experiment
WRC2

WRCl

Methods

Mean

Tukey

Mean

Tukey

Mean

Tukey

SAW
AHP, Original, eigen
AHP, Geometric, eigen
AHP, Original, MTM
AHP, Geometric, MTM
MEW
TOPSIS
ELECTRE

I.0
0.9530
0.9499
0.9560
0.95 1 I
1.0
0.9692
0.9356

A
C
C
C
C
A
B
D

0
0.1532
0.1595
0.1520
0.1610
0
0.1116
0.2138

D
B
B
B
B
D
C
A

0

D
B
B
B
B
D
C
A

MSER

0.1361
0.1421
0.1351
0.1446
0
0.097
0.1996

MAER
Tukey

Methods

Mean

SAW
AHP, Original, eigen
AHP, Geometric, eigen
AHP, Original, MTM
AHP, Geometric, MTM
MEW
TOPSIS
ELECTRE

0

0

0.1752
0.1854
0.1740
0.1820
0
0.1379
0.3479

0.1522
0.1581
0.1515
0.1568
0
0.1104
0.2347

Note: The same letter (A, B, C, D) indicates no significant
is from largest to smallest average value.

525

Journal of Operational Research 107 (1998) 507-529

Mean

TOP
Tukey

MATCH%

Mean

Tukey

1.o
0.9258
0.9235
0.9258
0.9165
1.0
0.9531
0.4402

average difference

Mean

between methods, based on Tukey’s test. Letter order A to D

0.2

d5
L

0’
i
2

0.15

I
4
5
g

01

0.05

0
1

2

3

4

5

Tukey

1.o
0.8584
0.8544
0.8590
0.855 1
1.0
0.9005
0.7501

6

Fig. 16. Rank reversal MAER by number of alternatives

7

8

526

S.H. Zuzakis et al. /European

Journal

qf Operational

Research 107 (1998) 507-529

0.7
z
5

t

B

0.6 t

/
I--

9
5
L

i !Z:'
0.5 i

-A-l

B

-*9

5
.E

B
i

0.4

t

03

1

0.2

1

I

0.1
0

k -~--~-~~

2

1

3

~~

-c-

5

4

----_t------~r

~

6

7

~~~~

8

Method

Fig. 17. Rank reversal MATCH%

As summarized in Table 6, the MEW and SAW
methods did not produce any rank reversals, which
was expected. The next best method was TOPSIS,
followed by the four AHPs, according to all rank

by number of alternatives.

reversal
performance
measures
(larger
TOP,
MATCH%,
SRC, and smaller RMSER, RMAER,
WRCl and WRC2). The rank reversal performance
of each AHP version was statistically not different

03

0.25

L
I

p

0.2 j

+5
+10
d-15;
.-rt20/

1

2

3

4

5

6

Fie. 18. Rank reversal MAER bv number of criteria.

7

SH. Zunakis et al./

European Journal

qf Operational

ric scale in AHP seems to reduce rank reversals
when the number of criteria is small, as documented
by smaller MAER and higher MATCH%. According
to the SRC criterion, rank reversals for TOPSIS and
the AHPs with original scale are not sensitive to N.
Interestingly enough, TOPSIS exhibits its worst rank
reversals when N is small, while ELECTRE does
the same when N is large. See Fig. 18.

from the other three AHPs. ELECTRE exhibited the
worst rank reversal performance of all the methods
in this experiment,
and more so in TOP than all
ranks (MATCH%). The last finding should be interpreted with caution, since it does not reflect ELECTRE’s versatile capabilities when used directly by a
human; it is only indicative of its restrictive ability to
discriminate
among several alternatives,
based on
prespecified threshold parameters.
Effect

of number

of alternatives

(L)

Effect of distribution

This simulation
experiment
evaluated
eight
MADM methods (including four variants of AHP)
under different number of alternatives CL), criteria
(N) and distributions. The final results are affected
by these three factors in that order. In general, as the
number of alternatives increases, the methods tend to
produce similar final weights, but dissimilar rankings, and more rank reversals (fewer top rank reversals for ELECTRE). The number of criteria had little
effect on AHPs, MEW and ELECTRE.
TOPSIS
rankings differ from those of SAW more when N is

on rank reversal:

V
3

on

5. Conclusion and recommendations

of rank reversals was influenced less by
of criteria than by the number of alternaall AHP versions, rank reversals for top
remained at about 9% (14%) of L, regardnumber of criteria. However, the geomet-

2

weights (V)

In general, more rank reversals were
observed under constant weights, and fewer under
uniformly distributed weights. This was negligible
for TOPSIS, but most profound on ELECTRE. See
Fig. 19.

The number

the number
tives. For
(all) ranks
less of the

of criteria

rank reversal:

on rank

reuersal: In general, more rank reversals occur in
problems with more alternatives. This is evident by
lower MATCH% and higher MAER, WRCl and
WRC2 Among AHPs. That increase was a little
faster for the AHP with original scale and MTM
solution. The MTM AHP has a slight advantage over
the eigenvector AHP when there are not many alternatives. Reversals of the top rank occur more often
in problems with more alternatives for the AHPs, but
fewer alternatives for ELECTRE. TOPSIS top rank
reversals seem to be insensitive to L. See Figs. 16
and 17.
Effect of number of criteria (N)

527

Research 107 (1998) 507-529

4

5

6

Method

Fig. 19. Rank reversal MAER by criterion weight distribution.

7

a

528

S.H. Zunakis et al. /European

Journal qf Operational Research 107 (1998) 507-529

large, when it also exhibits its fewest rank reversals.
ELECTRE produces more rank reversals in problems
with many criteria.
The distribution of criteria weights affects fewer
performance measures than does the number of alternatives or the number of criteria. However, it affects
differently the methods examined. Equal criterion
weights reduces final weight differences
between
methods, it differentiates
further the rankings produced by ELECTRE and MEW, and produces more
rank reversals than the other distributions.
Surprisingly, however, final weight dissimilarities
between
methods were higher under the uniform than beta
distribution,
while the latter produced the fewest
rank reversals. A uniform distribution
of criteria
weights differentiates more the AHP final rankings
from SAW when using the original scale rather than
the geometric scale. Finally, a beta distribution
of
criterion weights affects more TOPSIS, whose final
rankings differ even more from those of SAW.
In general, all AHP versions behave similarly and
closer to SAW than the other methods. ELECTRE is
the least similar to SAW (except for best matching
the top-ranked alternative),
followed by the MEW
method. TOPSIS behaves closer to AHP and differently from ELECTRE and MEW, except for problems with few criteria. In terms of rank reversals, the
four AHP versions were uniformly worse than TOPSIS, but more robust than ELECTRE.
The detailed findings of this simulation study can
provide useful insights to researchers and practitioners of MADM. A user’s interest in evaluating alternatives may be in one or more of the final output,
namely their weights, ranking or rank reversals. This
experiment reveals when a user’s results are likely to
be practically the same, regardless of the subset of
methods employed; or when and by how much the
solutions may differ, thus guiding a user in selecting
an appropriate method. SAW was selected as the
basis to which to compare the other methods, because its simplicity makes it used often by practitioners. Even some researchers argue that SAW should
be the standard for comparisons,
because “it gives
the most acceptable
results for the majority of
single-dimensional
problems”
(Triantaphyllou
and
Mann, 1989).
Some caution, however, must be used when considering our findings. They should not be extrapo-

lated beyond the type of MADM problem considered
in this study; namely a decision matrix input of N
criteria weights and explicit ratings of L alternatives
on each criterion. Therefore, method variations capable of handling different problems were not considered in this simulation. This ‘standardization’ hampers ELECTRE more than any of the other methods.
It unavoidably
did not consider the variety of features of the many versions of this method developed
to handle different problem types. It did not take
advantage of the method’s capabilities in handling
problems with ordinal or imprecise information. Even
in the form used here, ELECTRE may produce
different results for different thresholds of concordance and discordance
indexes (which of course
leaves open the question on which index should the
user select). Finally, any MADM method cannot be
considered as a tool for discovering
an ‘objective
truth’. Such models should function within a DSS
context to aid the user to learn more about the
problem and solutions to reach the ultimate decision.
Such insight-gaining
methods are better termed decision aids rather than decision making. MADM methods should not be considered as single-pass techniques, without a posteriori robustness analysis. A
sensitivity (robustness) analysis is essential for any
MADM method, but this is clearly beyond the scope
of this simulation experiment.

References
Belton, V., 1986. A comparison of the analytic hierarchy process
and a simple multi-attribute value function. European Journal
of Operational Research 26, 7-2 I
Belton, V., Gear, T., 1984. The legitimacy of rank reversal - A
comment. Omega 13, 143-144.
Buchanan, J.T., Daellenbach, H.G., 1987. A comparative evaluation of interactive solution methods for multiple objective
decision models. European Journal of Operational Research
29, 353-359.
Churchman, C.W., Ackoff, R.L., Amoff, E.L., 1957. Introduction
to Operations Research. Wiley, New York.
Currim, I.S., Satin, R.K., 1984. A comparative
evaluation of
multiattribute consumer preference models. Management Science 30, 543-561.
Denpontin, M., Mascarola, H., Spronk, J., 1983. A user oriented
listing of MCDM. Revue Beige de Researche Operationelle
23, 3-11.
Dyer, J., 1990. Remarks on the analytic hierarchy process. Management Science 36, 249-258.

S.H. Zanakis

er ul. / Europeun

Journd

Dyer. J., Fishbum, P., Steuer, R., Wallenius, J., Zionts, S., 1992.
Multiple criteria decision making, multiattribute utility theory:
The next ten years. Management Science 38, 645-654.
Gemunden, H.G., Hauschildt, J., 1985. Number of alternatives
and efficiency in different types of top-management
decisions.
European Journal of Operational Research 22, 178- 190.
Gershon, M.E., Duckstein, L., 1983. Multiobjective approaches to
river basin planning. Journal of Water Resource Planning 109,
13-28.
Goicoechea, A., Stakhiv, E.Z., Li, F., 1992. Experimental evaluation of multiple criteria decision making models for application to water resources planning. Water Resources Bulletin 28,
89- 102.
Gomes, L.F.A.M., 1989. Comparing two methods for multicrite,ria
ranking of urban transportation system alternatives. Journal of
Advanced Transportation 23, 217-219.
Harker, P.T., Vargas, L.G., 1990. Reply to “Remarks
on the
analytic hierarchy process” by J.S. Dyer. Management Science 36, 269-273.
Hobbs, B.F., 1986. What can we learn from experiments
in
multiobjective decision analysis. IEEE Transactions on Systems Management and Cybernetics 16, 384-394.
Hobbs, B.J., Chankong,
V., Hamadeh, W., Stakhiv, E., 1992.
Does choice of multicriteria method matter? An experiment in
water resource planning. Water Resources Research 28, 17671779.
Hwang, C.L. Yoon, K.L., 198 1. Multiple Attribute Decision Making: Methods and Applications. Springer-Verlag,
New York.
Jelassi, M.T.J., Ozemoy, V.M., 1988. A framework for building
an expert system for MCDM models selection. In: Lockett,
A.G., Islei, G. (Eds.), Improving Decision Making in Organzations. Springer-Verlag,
New York, pp. 553-562.
Karni, R., Sanchez, P., Tummala, V., 1990. A comparative study
of multiattribute decision making methodologies.
Theory and
Decision 29, 203-222.
Kok, M., 1986. The interface with decision makers and some
experimental results in interactive multiple objective programming methods. European Journal of Operational Research 26,
96- 107.
Kok, M., Lootsma, F.A., 1985. Pairwise-comparison
methods in
multiple objective programming,
with applications in a longterm energy-planning
model. European Journal of Operational
Research 22, 44-55.
Lockett, G., Stratford, M., 1987. Ranking of research projects:
Experiments with two methods. Omega 15, 395-400.
Legrady, K., Lootsma, F.A., Meisner, J., Schellemans, F., 1984.
Multicriteria decision analysis to aid budget allocation, In:
Grower, M., Wierzbicki,
A.P., (Ed%), Interactive Decision
Analysis. Springer-Verlag,
pp. 164-174.
Lootsma, F.A., 1990. The French and American school in multicriteria decision analysis. Recherche Operationelle 24, 263285.
MacCrimmon,
K.R., 1973. An overview of multiple objective
decision making. In: Co&ran, J.L., Zeleny, M. (Eds.), Multiple Criteria Decision Making. University of South Carolina
Press, Columbia.

of Operutwnul

Research

IO? (I 998) 507-529

529

Olson, D.L., Moshkovich, H.M., Schellenberger,
R., Mechitov,
A.]., 1995. Consistency and accuracy in decision aids: Experiments with four multiattribute systems. Decision Sciences 26,
723-748.
Ozemoy, V.M., 1987. A framework for choosing the most appropriate discrete alternative MCDM in decision support and
expert systems. In: Savaragi, Y., et al. (Eds.), Toward Interactive and Intelligent Decision Support Systems. Springer-Verlag,
Heildelberg, pp. 56-64.
Ozemoy, V.M., 1992. Choosing the ‘best’ multiple criteria decision-making method. INFOR 30, I59- I7 I
Pomerol, J., 1993. Multicriteria DSS: State of the art and problems Central European Journal for Operations Research and
Economics 2, 197-212.
Roy, B., Bouyssou, D., 1986. Comparison
of two decision-aid
models applied to a nuclear power plant siting example.
European Journal of Operational Research 25, 200-215.
Saaty, T.L., 1984. The legitimacy of rank reversal. OMEGA 12,
513-516.
Saaty, T.L., 1990. An exposition of the AHP in reply to the paper
remarks on the analytic hierarchy process. Management Science 36, 259-268.
Schoemaker, P.J., Waid, CC., 1982. An experimental comparison
of different approaches to determining
weights in additive
utility models. Management Science 28, I82- 196.
Stewart, T.J., 1992. A critical survey on the status of multiple
criteria decision making theory and practice. OMEGA 20,
569-586.
Stillwell, W., Winterfeldt, D., John, R., 1987. Comparing hierarchical and nonhierarchical
weighting methods for eliciting
multiattribute value models. Management
Science 33, 442450.
Takeda, E., Cogger, K.O., Yu, P.L., 1987. Estimating criterion
weights using eigenvectors:
A comparative study. European
Journal of Operational Research 29, 360-369.
Timmermans, D., Vlek, C., Handrickx, L., 1989. An experimental
study of the effectiveness of computer-programmed
decision
support. In: Locket& A.G., Islei, G. (Eds.), Improving Decision Making in Organizations.
Springer-Verlag,
Heidelberg,
pp. 13-23.
Triantaphyllou,
E., Mann, S.H., 1989. An examination
of the
effectiveness of multi-dimensional
decision-making
methods:
A decision-making
paradox. Decision Support Systems 5,
303-312.
Voogd, H., 1983. Multicriteria Evaluation for Urban and Regional
Planning. Pion, London.
Zahedi, F., 1986. A simulation study of estimation methods in the
analytic hierarchy process. Socio-Economic
Planning Sciences
20, 347-354.
Zanakis, S., Mandakovic,
T., Gupta, S., Sahay, S., Hong, S.,
1995. A review of program evaluation and fund allocation
methods within the service and government sectors. SocioEconomic Planning Sciences 29, 59-79.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close