Facult
s Universitaires NotreDame de la Paix de Namur e Facult
des Sciences e D
partement de Math
matique e e
On the inverse shortest path problem
Didier Burton
Doctoral dissertation presented in 1993 under the guidance of Ph.L. Toint to obtain a Ph.D. Degree in Science
To the Memory of My Parents
Preface
In spite of the fact that their motivation and development originally occured at di erent times, graph theory and optimization are elds of mathematics which nowadays have many connections. Early on, the use of graphs suggested intuitive approaches to both pure and applied problems. Optimization and more precisely mathematical programming have steadily grown with the size and diversity of the problems considered. Available computer hardware and hence computer science certainly contributed to many of these developments. Optimization techniques therefore often supplied a suitable algorithmic framework for solving problems arising from graph theory. This doctoral thesis is about such a connection between graph theory and optimization. My purpose in this work is to analyse the inverse shortest path problem. I was introduced to this problem during a twoyear research period supported by the R
gion Wallonne, whose aim e was to model the behaviour of road networks users, particularly in urban centres. Tra c modelling revealed the importance of accurate estimates of perceived travel costs in a road network. This experience motivated the present research. I am very grateful to my advisor, Professor Philippe Toint. His invaluable guidance, availability and judicious advice were much appreciated. In addition, I enjoyed the opportunities he gave me to interact with other professors and researchers abroad. I especially want to thank Bill Pulleyblank IBM T.J. Watson Research Center, Yorktown Heights, USA for his collaboration, during my visit to Yorktown Heights. I am also grateful to Laurence Wolsey CORE, LouvainlaNeuve, Belgium, Michel Minoux Universit
Pierre et e Marie Curie, Paris, France, Tijmen Jan Moser Rijksuniversiteit, Utrecht, The Netherlands and Annick Sartenaer, Michel Bierlaire and Daniel Goeleven from the Department of Mathematics FUNDP, Namur for very interesting discussions and suggestions for this work. I wish to express my thanks to the members of my advisory board who kindly agreed to examine this work: F. Callier, J.J. Strodiot both from FUNDP, Namur, L. Wolsey CORE, LouvainlaNeuve and M. Minoux Universit
Pierre et Marie Curie, Paris, France. I am also e indebted to S. Vavasis Cornell University, USA and anonymous referees who contributed to improving parts of my thesis. Michel Vause GRT, FUNDP, Namur has supplied useful tools and hints for writing and illustrating this text. The Department of Mathematics of the Facult
s Universitaires NotreDame de la Paix Nae mur hosted me during my thesis work, and partly supported participation in scienti c meetings in London UK and Chicago USA. The Transportation Research Group FUNDP, Namur
i
preface
ii
provided the computer hardware used for the numerical experiments, and contributed to the expenses of several trips abroad. The Communaut
Franaise de Belgique also gave nancial e c support for my mission to London UK. Finally and most importantly, the Belgian National Fund for Scienti c Research supported me during the preparation of this thesis. Namur, December 1992 Didier Burton
Contents
Preface 1 Introduction
1.1 The graph theory context : : : : : : : : : : : : : : : : : : : : 1.2 Motivating examples : : : : : : : : : : : : : : : : : : : : : : : 1.2.1 Tra c modelling : : : : : : : : : : : : : : : : : : : : : 1.2.2 Seismic tomography : : : : : : : : : : : : : : : : : : : 1.3 The inverse shortest path problem : : : : : : : : : : : : : : : 1.4 Solving the problem : : : : : : : : : : : : : : : : : : : : : : : 1.4.1 A shortest path method : : : : : : : : : : : : : : : : : 1.4.2 An optimization framework : : : : : : : : : : : : : : : 1.4.3 Solving an instance of inverse shortest path problems
i
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
1
1 2 2 3 3 5 5 5 6
2 The shortest path problem
2.1 Terminology and notations : : : : : : : : : : : : : : : : : : : : : : : : 2.2 A speci c shortest path problem : : : : : : : : : : : : : : : : : : : : : 2.2.1 The problem type : : : : : : : : : : : : : : : : : : : : : : : : : 2.2.2 The graph type : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2.3 The strategy type : : : : : : : : : : : : : : : : : : : : : : : : : 2.3 Shortest path tree algorithms : : : : : : : : : : : : : : : : : : : : : : : 2.3.1 Shortest path trees : : : : : : : : : : : : : : : : : : : : : : : : : 2.3.2 Bellman's equations : : : : : : : : : : : : : : : : : : : : : : : : 2.3.3 Labelsetting and labelcorrecting principles : : : : : : : : : : : 2.3.4 Search strategies : : : : : : : : : : : : : : : : : : : : : : : : : : 2.3.5 Search strategies for labelsetting and labelcorrecting methods 2.4 Labelcorrecting algorithms : : : : : : : : : : : : : : : : : : : : : : : : 2.4.1 Lqueue algorithm : : : : : : : : : : : : : : : : : : : : : : : : : 2.4.2 Ldeque algorithm : : : : : : : : : : : : : : : : : : : : : : : : : 2.4.3 Ltreshold : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.5 Labelsetting algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : 2.5.1 Dijkstra's algorithm : : : : : : : : : : : : : : : : : : : : : : : : 2.5.2 Dial's algorithm : : : : : : : : : : : : : : : : : : : : : : : : : :
iii
7 8 9 9 10 11 11 12 14 15 15 16 16 17 17 18 18 19
7
contents
iv
2.5.3 Binary heap algorithm : : : : : : : : : : : : : : 2.6 An auction algorithm : : : : : : : : : : : : : : : : : : : 2.6.1 Basic concepts : : : : : : : : : : : : : : : : : : 2.6.2 Description of Bertsekas' algorithm : : : : : : : 2.6.3 Properties of the algorithm : : : : : : : : : : : 2.6.4 Algorithm's performance : : : : : : : : : : : : : 2.7 An algorithm using an updating technique : : : : : : : 2.7.1 The shortest path method as a linear program 2.7.2 Solving the problem from another root : : : : : 2.7.3 Computational performance : : : : : : : : : : : 2.8 A shortest path method for the inverse problem : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
20 22 22 23 24 25 26 26 27 28 28
3 Quadratic Programming
3.1 Terminology and notations : : : : : : : : : : : : : 3.1.1 Triviality and degeneracy : : : : : : : : : : 3.1.2 Convexity : : : : : : : : : : : : : : : : : : : 3.2 A speci c quadratic problem : : : : : : : : : : : : 3.2.1 The objective function : : : : : : : : : : : : 3.2.2 The feasible region : : : : : : : : : : : : : : 3.2.3 Searching for a strictly convex QP method 3.3 Note on the complexity of convex QP methods : : 3.3.1 Solving a problem : : : : : : : : : : : : : : 3.3.2 Complexity classes : : : : : : : : : : : : : : 3.3.3 Convex quadratic programming : : : : : : : 3.4 Resolution strategies : : : : : : : : : : : : : : : : : 3.4.1 Primal and dual methods : : : : : : : : : : 3.4.2 Simplextype and active set methods : : : : 3.4.3 Choosing a particular method : : : : : : : : 3.5 The Goldfarb and Idnani method : : : : : : : : : : 3.5.1 Basic principles and notations : : : : : : : : 3.5.2 The GI algorithm : : : : : : : : : : : : : : : 3.5.3 Linear independence of the constraints : : : 3.5.4 Linear dependence of the constraints : : : : 3.5.5 Finite termination of the GI algorithm : : :
: : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : :
30
30 30 31 32 33 33 36 36 36 37 38 38 38 41 42 44 44 44 49 52 54 55 56 56 58 59
4 Solving the inverse shortest path problem
4.1 The problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.2 Algorithm design : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.2.1 The GoldfarbIdnani method for convex quadratic programming 4.2.2 Constraints in the active set : : : : : : : : : : : : : : : : : : : : : 4.2.3 The dual step direction : : : : : : : : : : : : : : : : : : : : : : :
55
contents
v
4.2.4 Interpretation of the dual step direction : 4.2.5 Determination of the weights : : : : : : : 4.2.6 Modifying the active set : : : : : : : : : : 4.2.7 The algorithm : : : : : : : : : : : : : : : 4.2.8 Nonoriented arcs : : : : : : : : : : : : : : 4.2.9 Note : : : : : : : : : : : : : : : : : : : : : 4.3 Preliminary numerical experience : : : : : : : : : 4.3.1 The implementation : : : : : : : : : : : : 4.3.2 The tests : : : : : : : : : : : : : : : : : : 4.4 Complexity of the inverse shortest paths problem
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
61 61 64 65 66 66 66 66 67 69
5 Handling correlations between arc weights
5.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.1.1 Transportation research : : : : : : : : : : : : : : : : 5.1.2 Seismic tomography : : : : : : : : : : : : : : : : : : 5.2 The formal problem : : : : : : : : : : : : : : : : : : : : : : 5.2.1 Classes and densities : : : : : : : : : : : : : : : : : : 5.2.2 Shortest paths constraints : : : : : : : : : : : : : : : 5.2.3 Constraints on the class densities : : : : : : : : : : : 5.2.4 The inverse problem : : : : : : : : : : : : : : : : : : 5.3 The uncorrelated inverse shortest path problem : : : : : : : 5.4 An algorithm for recovering class densities : : : : : : : : : : 5.4.1 Islands, dependent sets and their shores : : : : : : : 5.4.2 The dual step direction : : : : : : : : : : : : : : : : 5.4.3 Determination of the class densities : : : : : : : : : 5.4.4 The primal step direction : : : : : : : : : : : : : : : 5.4.5 The maximum steplength to preserve dual feasibility 5.4.6 The algorithm : : : : : : : : : : : : : : : : : : : : : 5.5 Numerical experiments : : : : : : : : : : : : : : : : : : : : : 5.5.1 Implementation remarks : : : : : : : : : : : : : : : : 5.5.2 Correlated method uncorrelated method : : : : : : 5.5.3 Selecting violated constraints : : : : : : : : : : : : :
70
70 71 72 73 73 74 75 76 77 79 80 80 83 86 88 88 90 90 90 92
6 Implicit shortest path constraints
6.1 Motivating examples : : : : : : : : : : : : : : : : : : : : : : : : : : 6.2 The problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.3 The complexity of the problem : : : : : : : : : : : : : : : : : : : : 6.3.1 The convexity of the problem : : : : : : : : : : : : : : : : : 6.3.2 The 3SAT problem as an inverse shortest path calculation 6.4 An algorithm for computing a local optimum : : : : : : : : : : : : 6.4.1 Computing a starting point : : : : : : : : : : : : : : : : : :
: 95 : 96 : 97 : 97 : 98 : 101 : 102
95
contents
vi
6.4.2 Updating the explicit constraint description 6.4.3 Reoptimization : : : : : : : : : : : : : : : : 6.4.4 The algorithm : : : : : : : : : : : : : : : : 6.4.5 Some properties of the algorithm : : : : : : 6.5 The reoptimization procedure : : : : : : : : : : : : 6.5.1 Notations : : : : : : : : : : : : : : : : : : : 6.5.2 How to reoptimize : : : : : : : : : : : : : : 6.6 Some numerical experiments : : : : : : : : : : : : : 6.6.1 Implementation details : : : : : : : : : : : : 6.6.2 Tests : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: 102 : 103 : 103 : 105 : 107 : 107 : 108 : 109 : 110 : 110
7 Conclusion and perspectives A Symbol Index Bibliography
114 116 119
List of Tables
2.1 Search strategies for labelling methods. : : : : : : : : : : : : : : : : : : : : : : : : : 15 2.2 Illustration of Bertsekas' algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 4.1 The inverse shortest path test examples : : : : : : : : : : : : : : : : : : : : : : : : 67 4.2 Results obtained on the test problems : : : : : : : : : : : : : : : : : : : : : : : : : 68 5.1 5.2 5.3 5.4 Test problems involving class densities : : : : : : : : : : : : : : : : : : : Comparative test results for the correlated and uncorrelated algorithms Test problems with equality constraints : : : : : : : : : : : : : : : : : : Test results on equality constraints : : : : : : : : : : : : : : : : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
91 91 93 93
6.1 Test examples and their characteristics : : : : : : : : : : : : : : : : : : : : : : : : : 110 6.2 Results for the test problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 111
vii
List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 A cycle and a weak cycle. : : : : : : : : : : : : : : : : : : : : : : : : A weighted graph, a shortest path tree and a shortest spanning tree. A binary heap. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A small graph as illustration for the auction algorithm. : : : : : : : : A small graph with a cycle of small cost. : : : : : : : : : : : : : : : : Some shortest path tests results : : : : : : : : : : : : : : : : : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
8 12 21 24 26 29
3.1 A small example for proving the nonconvexity. : : : : : : : : : : : : : : : : : : : : 35 4.1 A rst example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 4.2 Iterations per problem size and shortest paths calculation : : : : : : : : : : : : : : 68 5.1 5.2 5.3 5.4 5.5 6.1 6.2 6.3 6.4 6.5 The rst example involving correlations between arc weights : : : : : : : : : : : : : The graph generated from a discretization : : : : : : : : : : : : : : : : : : : : : : : An island : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The correlated algorithm: iterations per problem size and shortest paths calculation Algorithm variants: iterations per problem size and shortest paths calculation : : : A small graph : : : : : : : : : : : : : : : : : : The representation of xi : : : : : : : : : : : : The subgraph associated with clause c : : : : A small example showing path combinations : Solving P A and P A, : one iteration : : : 72 73 78 92 94
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: 98 : 99 : 100 : 104 : 109
viii
1 Introduction
This chapter introduces the inverse shortest path problem. We rst present the problem's context along with motivating its applications. We then state the formal problem and specify the underlying mathematical tools that we have exploited. These tools will be analysed in the chapters that follow.
1.1 The graph theory context
Graph theory was initially developed as an abstract mathematical theory and most of its applications at rst featured the solving of combinatorial puzzles like those of Euler 1736 39 and Hamilton 1856 61 . These researches laid the foundations of graph theory, together with the research of Kirchho 1847 71 and Cayley 1857 21 , who independently developed the theory of trees. More recently, graph theory has become a favoured modelling tool for approaching questions arising in many elds of applied mathematics. In particular, graph theory is playing a signi cant role in computer science where appraising algorithm complexities 29, 50 and determining e cient data structures 107 , for instance, are of special interest. A partial list of applications that take signi cant advantage of developments in graph theory, includes routing problems 13 , network modelling 45 , computerized tomography 91 , location problems 62 and problems of society 98 . In particular, circulation problems around and within cities motivated studies aimed at understanding problems relating to tra c 101 ; these studies again pro t from the intuitive use of graphs. This incomplete list of references already shows the considerable contributions of graph theory to solving optimization problems. Loosely speaking, a graph consists of nodes" connected by arcs" of a certain length". One or more successive arcs form a path". More formal de nitions are given below in Section 1.3. A famous problem in graph theory is that of nding shortest paths in networks, given arc lengths. This problem, which consists of nding paths of minimum length between some origins and some destinations, naturally arises in the analysis of transportation, communication and distribution problems 45 . Shortest path techniques are also applied in elds as diverse as tra c modelling 38 and computerized tomography 86 . Consequently, the shortest paths problem has become fundamental in graph theory and is much studied in the literature. Very e cient
1
introduction
2
algorithms have been proposed during the last three decades to solve the shortest paths problem see 5, 8, 34, 43, 67, 85 . However, models based on shortest paths do not always re ect observations accurately. These inaccuracies are often caused by an inadequate knowledge of the arc weights or lengths used in the shortest path calculations. One way to overcome this di culty and to improve one's knowledge of the arc weights is to consider the inverse problem. Solving an inverse shortest path problem consists of nding weights associated with the arcs of a network, that are as close as possible to a priori estimated values, and that are compatible with the observations of some shortest paths in the network.
1.2 Motivating examples
The best way to introduce the inverse shortest path problem is probably by considering applications. Thus we consider examples drawn from mathematical tra c modelling and computerized tomography.
1.2.1 Tra c modelling
In this eld of applied mathematics, it is generally assumed that the users of a given network of roads optimize some criteria to choose their trip from an origin to a destination. These criteria appear to depend, on the one hand, on a priori known costs associated with the network links, and on the other hand, on individual perceived costs 32 , these costs being evaluated in time, distance, money or some other more complex measure. The road network planners are obviously extremely interested in the distribution of the tra c along actual paths taken by the users. Shortest path techniques have been often used to determine these paths, for many e cient algorithms have been developed for this problem. However these procedures fail to re ect the actual behaviour of users over a network as a result of incomplete knowledge about perceived costs. The precise assessment of the cost of a route in the user's mind is complex and often di erent from that used by the planners: network users choose perceived shortest routes for their journeys 13 . Hence recovering the perceived arc costs is an important step in the analysis of a network users' behaviour. It is therefore very useful to know some of the routes that are actually used and thus considered as shortest and then to incorporate this knowledge into the model, modifying the a priori costs so as to guarantee that the given route is indeed shortest in the modi ed network. Care must also be exercised to avoid large changes in the costs as compared to their a priori values. Although these perceived routes may be observable, their precise description might vary with time and across individuals, and the travel cost is usually subject to some estimation. This provides bounds on the travel cost of shortest paths from an origin to a destination, the actual path between the origin and the destination remaining unknown. This is an instance of the inverse shortest path problem. One is given a network represented by a graph with oriented arcs to amount for one way links and corresponding travel costs on the arcs. Naturally, travel costs are required to be positive or zero. The question is to modify
introduction
3
these costs as little as possible to ensure, on the one hand, that some given paths in the graph are shortest paths between their origin and destination, and on the other hand, that the total cost of shortest paths between given origins and destinations is bounded by given values.
1.2.2 Seismic tomography
Another interesting example is in seismic tomography see for example 90, 91, 104, 116 . The network represents a discretization of the geologic zone of interest into a large number of cells", and the costs of the arcs represents the transmission time of certain seismic waves from one cell to the next. According to Fermat's principle, these waves propagate along rays that follow the shortest path, as a function of time, across the earth's crust. Earthquakes are then observed and the arrival time of the resulting seismic perturbations is recorded at various observation stations on the surface. The question is to recover the geological structure by reconstructing the transmission times between the cells from the observation of shortest time waves and an a priori knowledge of the geological nature of the zone under study. Of course, the propagation time of the rays between a known source and a known receiver cannot be measured without some error, and the ray paths themselves usually remain unknown. This provides bounds on the duration of seismic rays. This is again an application of the inverse shortest path problem. The determination of the internal transmission properties of an inaccessible zone from outside measurements is a very common preoccupation in many scienti c elds. However, we believe that, because of their practical importance, the two examples above are enough to motivate the study of the inverse shortest path problem.
1.3 The inverse shortest path problem
We de ne a weighted oriented graph as the triple V ; A; w, where V ; A is an oriented graph with n vertices and m arcs, and where w is a set of nonnegative weights fwigm associated with i=1 n and the arcs of A by faj = vsj ; vtj gm , the arcs. We denote the vertices of V by fvk gk=1 j =1 with sj being the index of the initial or source vertex of the j th arc and tj the index of its nal vertex or target. A path in an oriented graph is any sequence of arcs where the nal vertex of one arc is the initial vertex of the next. An acyclic or simple path is a path which does not pass through the same vertex more than once. We assume that such a weighted oriented graph G = V ; A; w is given, together with a set of nE explicit"1 acyclic paths
pj = aj ; aj ; : : :; ajl j j = 1; : : :; nE ; where lj is the number of arcs in the j th path its length, and where
1 2
1:1 1:2
tji = sji+1 for i = 1; : : :; lj , 1;
1
in the sense that the paths are de ned as explicit successions of arcs, in contrast with implicitly de ned paths in the next item.
introduction
4
a set of nI origindestination pairs oj ; dj ; j = 1; : : :; nI for de ning paths implicitly" between the origins and the destinations. If we de ne w as the vector in the nonnegative orthant of Rm whose components are the given initial arc weights fwig, the problem is then to determine w, a new vector of arc weights, and hence a new weighted graph G = V ; A; w such that is achieved, for some given norm k k, under the constraints that
w2Rm
min kw , wk
1:3 1:4 1:5 1:6
wi 0 i = 1; : : :; m; pj is a shortest path in G; j = 1; : : :; nE
and that 0 lj
X
a2p1w j
wa uj ; j = 1; : : :; nI ;
where p1w is a2 shortest path in G with respect to the weights w starting at node oj and j arriving at the node dj 3. The values lj and uj are lower and upper bounds on the cost of the shortest path from oj to dj , respectively. For consistency, we impose that lj uj j = 1; : : :; nI and we allow lj to be zero and uj to be plus in nity. Constraints 1.6 are bounds constraints on the costs of shortest paths between origindestination pairs. The formulation 1.3 1.6 de nes a continuous optimization problem. The decision variables, given by w, are chosen according to the objective 1.3, i.e. di ering as little as possible from w, among feasible values determined by the set of constraints 1.4 1.6. We now make some observations about the constraints and the objective function. With the exception of the nonnegativity constraints 1.4, we introduce two unusual types of constraints: the rst constraints 1.5 will be referred to as explicit shortest path constraints, the second constraints 1.6 are called implicit shortest path constraints. The latter constraints de ne the paths p1 w j = 1; : : :; nI implicitely by their origin oj and destination dj and the j weighted graph G. The actual path taken from oj to dj clearly depends on the set of weights associated with the arcs in A. These constraints are unfamiliar in the sense that the upper bound of one implicit shortest path constraint de nes one linear constraint that is to be chosen among an exponential number of linear constraints. Moreover the chosen one is a priori unidenti ed. This will be clari ed later. On the other hand, the explicit shortest path constraints involve the paths pj j = 1; : : :; nE that are explicitly de ned as a succession of arcs speci ed by 1.1. These constraints are uncommon because one explicit shortest path constraint de nes an exponential number of xed" linear constraints. The lower bounds of the implicit constraints also involve an exponential number of xed" linear constraints. Again, this will be analyzed in the chapters that follow.
The shortest path not necessarily being unique The meaning of the superscript 1 in p1 w is to indicate that the shortest path is considered, as opposed to j the second shortest. This notation will be important in Chapter 6.
2 3
introduction
5
Our inverse problem is to reconstruct the arc weights subject to the constraints described above. It is readily observed that, as is the case in many inverse problems, the constraints do not uniquely determine the arc weights: the reconstruction problem is underdetermined. Fortunately, it often happens in applications that some additional a priori knowledge of the expected arc weights is available. This additional information then provides stability and uniqueness of the inversion see 105 , for instance. This a priori information may be obtained either from direct" models, for which there is no problem of uniqueness, or from the solution of a previous inverse problem with di erent data. In order to insure the uniqueness of the solution of our inverse shortest path problem, we force w to be as close as possible to the a priori information contained in w. As far as we are aware, the inverse shortest path problem has never been formally stated nor studied in the scienti c literature.
1.4 Solving the problem
Before giving a precise algorithmic approach to solving the problem stated in 1.3 1.6, we need to examine two fundamental tools that are to be used in our context: the rst is taken from graph theory and the second from mathematical programming.
1.4.1 A shortest path method
Solving the inverse shortest path problem requires the use of a method giving solutions to the direct problem. Indeed, constraints 1.5 and 1.6 involve shortest paths calculation in their description. The next chapter discusses the choice of a method for nding shortest paths in a weighted oriented graph, that is suitable for inverse shortest path applications.
1.4.2 An optimization framework
A number of interesting variants of our problem's formulation can be constructed by considering various norms in 1.3. In particular the `1 , `2 and `1 norms seem attractive. Since, for all x 2 Rm , 1:7 kxk1 kxk2 pmkxk1 and
where k k1 , k k2 and k k1 are norms on Rm , the `1, `2 and `1 norms are equivalent4 56 . Throughout this thesis, we will restrict ourselves to the `2 norm, or least squares norm, mostly because it is widely used, has useful statistical properties, and leads to tractable computational methods. Note that choosing the `1 norm would lead to linear programming, another interesting approach.
4
p1m kxk1 kxk2 kxk1;
1:8
This does not mean that they give identical results.
introduction
6
One could also modify the problem by introducing other functions of the fwi g to minimize. These objective functions may be linear, quadratic or generally nonlinear. Investigation of these alternatives is beyond the scope of this thesis. As a consequence, we can write the objective function 1.3 of our problem as
m 1 Xw , w 2; i min w2Rm 2 i=1 i
1:9
where the factor 1 is chosen for convenience. 2 Solving 1.91.4 1.6 needs the algorithmic framework of quadratic programming QP. Chapter 3 is devoted to the analysis of a particular QP method.
1.4.3 Solving an instance of inverse shortest path problems
Our speci c problem 1.91.4 1.6 will be analysed in three steps. We rst consider the inverse shortest path problem with explicit shortest path constraints, that is the problem de ned by 1.9, 1.4 and 1.5. The shortest path constraints will be examined and the concept of an island" will be introduced to characterize their violation. The framework of Goldfarb and Idnani's convex QP method will be specialized to our context. An algorithm will be proposed and tested on various examples. Chapter 4 deals with this inverse shortest path problem. In many applications, one can observe correlations between arc weights. It is then interesting to generalize the inverse shortest path problem to take these correlations into account. In the uncorrelated problem, the variables are the arc weights. Correlation between these weights introduce more aggregated variables called classes or cells which partition the set of arcs. Our new variables then consist of the densities" of these classes, the weight of the arcs within each class being calculated from the corresponding class density. This new formulation reduces the number of variables but involves more restrictions, that is, more constraints. This is the subject of Chapter 5. We will establish the results that allow the handling of shortest path constraints in the space of class densities. A generalized algorithm will be proposed and its numerical performance will be compared to that of the uncorrelated algorithm of Chapter 4. Di erent strategies for handling constraints will be proposed and discussed in the correlated case. Finally, in Chapter 6, we discuss the inverse shortest path problem with implicit shortest path constraints, particularly the di cult case where upper bounds on the costs of shortest paths are considered. We will see that the inclusion of such constraints in an inverse shortest path application can give rise to nonconvexity. We will show that this problem is NPcomplete. An algorithm will be proposed for nding a locally optimum solution to the problem. We will also provide a stability analysis for a found solution. Finally, numerical experiments with this method will be presented. We note that Appendix A summarizes the notation that is used in Chapters 4 6.
2 The shortest path problem
This chapter considers methods solving the shortest path problem. It does not cover the matter completely and thoroughly, but presents a few algorithms with variants that are applicable in the context of inverse shortest path applications. A particular method will be preferred for its appealing computational complexity.
2.1 Terminology and notations
Throughout this chapter we use the following de nitions and notations, which are those of Chapter 1 but here presented in more details. In this thesis, we are concerned with graphs arising in modelling systems. Many shortest path oriented models, such as those dedicated to best routing in transport networks, require directed graphs, that is, graphs in which edges have directions. We adopt for the most part the terminology and notation of Deo 29 , which are commonly found in the literature see 25, 26, 30, 63, 64 , for instance. A directed graph or digraph consists of a nite set V = fvk gn=1 of vertices and a k m of ordered pairs of vertices called arcs, where al = v ; v with sl being set A = falgl=1 sl tl the index of the source or initial vertex of the lth arc and tl the index of its terminal or nal vertex. A digraph is also called an oriented graph, though some authors still make a distinction between both terms, keeping oriented" to qualify digraphs that have at most one arc between a pair of vertices. We de ne G, a weighted oriented graph as the triple V ; A; w, where V ; A is an oriented graph with n vertices and m arcs, and where w is a set of real numbers called weights, fwlgm , associated with the arcs. We assume that G is simple: multiple arcs between ordered l=1 pairs of vertices parallel arcs and loops arcs of the form v; v are not allowed. In this chapter, it will be convenient to refer to arcs and weights by a double index addressing their initial and nal vertices. Writing aij supposes that there exists l 2 f1; : : :; mg such that i = sl and j = tl; the associated weight wl is then denoted by wij . If aij is not de ned aij 62 A, then wij is set to 1. Note that the" arc aij is unambiguously de ned since G is simple. Using these notations, we precise that the graph G is not necessarily symmetric, that is, wij does not needfully equal wji for all i; j 2 V i 6= j provided that wij and wji are nite.
7
the shortest path problem
8
Finally we adopt the following terminology about reachability: an oriented path1 p in an oriented graph G is a sequence of vertices v1; v2; : : :; vk of V such that vi ; vi+1 is an arc in A for all i = 1; : : :; k , 1; if the graph is weighted, the cost of a path sums all its intermediate arc weights. Not to confuse with the length of p, denoted by lp, which is the number of arcs in p here, lp = k , 1. The path p is an acyclic or simple path if v1; v2; : : :; vk are distinct and an oriented cycle if v1 = vk . A weak path and a weak cycle in an oriented graph are a path and a cycle in the corresponding undirected graph, respectively; see Figure 2.1 for an example of weak cycle. Note that weak paths and weak cycles include2 oriented paths and oriented cycles, respectively.
t
7
S
S
t
S
S w S
t t
7
S
S
t
S
S w S 
t
Figure 2.1: A cycle and a weak cycle. Finding a shortest path in G between two vertices of V then consists of nding a path that P minimizes i;j 2p wij , where p is any path between the two vertices. The complexity of an algorithm refers to the amount of resources required by its running computation. The worstcase complexity of an algorithm is the lowest upper bound on its complexity. We shall use the Landau's symbols to characterize algorithms' computational complexities: A function g n is Of n if there exist a scalar c and an index n0 such that
jgnj cf n for all n n0:
A function g h is of h if
2:1 2:2
g lim0 jf hj = 0: h! h
Relation 2.1 de ning Of n is weaker than that de ned in 2.2. That is why algorithms' computational times are often compared in their order of magnitude via Of n relations.
2.2 A speci c shortest path problem
The shortest path problem may be related to very di erent particular questions, according to the problem type, the network characteristics and the resolution strategy. It is then essential to determine the speci city of the shortest path problem we are concerned with. So far, the most
1 2
In contrast with Deo 29 , a path is by default oriented. contrarily to Deo 29 .
the shortest path problem
9
indepth classi cation of shortest path problems is agreed to be the Deo and Pang's taxonomy 30 . Other general surveys can be found in 37, 95 . The next three sections aim at locating our shortest path problem within the Deo and Pang's classi cation.
2.2.1 The problem type
We recall that, in the context of our inverse problem, shortest paths occur in constraint descriptions see 1.5 and 1.6 in Chapter 1. These involve shortest paths between origindestination pairs, that are not constrained to satisfy additional conditions such as visiting intermediate vertices before reaching destination. Our problem then refers to the onepair shortest path problem or, more generally, to the singlesource" problem. Algorithms solving that problem are usually based on the methods proposed by Bellman 5 , Dijkstra 34 , Ford 44 , and Moore 85 . Methods solving allpairs problems such as that of Floyd 43 are consequently inappropriate. The problem of determining kth shortest paths do not apply practically to our case. But disposing of second shortest path costs is of theoretical importance in Chapter 6.
2.2.2 The graph type
The tra c modelling application mentioned in Chapter 1 needs to represent road networks in graph form. In 112 , Van Vliet discusses road networks and their characteristics in relation to the shortest path problem. In particular, he noticed that graphs generated from road networks are large and have a number of arcs to number of vertices ratio of about 3. Our second motivating application of Chapter 1 was drawn from computerized tomography. In this case, grid graphs are often used to discretize a medium into cells. These graphs may be relatively large depending on the discretization re nement. Experiments on grid networks presented in 33 by Dial et al. again show small arcs to vertices ratios. Both examples witness that most large networks in reallife situations are sparse"; a graph G is said to be sparse when m n2 , that is, when the ratio m=n is small. On the other hand, as stated at the beginning of this chapter, tra c applications have need of directed graphs. In contrast, grid networks are usually undirected. Since an undirected graph nds its directed match when replacing each edge with two oppositely oriented arcs, the directed case is more general. Finally, we will consider weighted graphs with only nonnegative and xed arc weights. Indeed, it seems to be the case in most practical problems. The presence of negative arc weights allows negative cycles, that is, cycles with a negative cost. If such a negative cycle exists, a shortest path algorithm will minimize the shortest distance to ,1 by going round and round this cycle. The existence of a shortest path is then conditioned by the absence of negative cycles along any path between the origindestination pair under consideration. Because of the nature of our applications, we decide to restrict arc weights to nonnegative values and hence avoid discussions about maybe costly procedures detecting negative cycles in a graph. If the reader is involved in detecting oriented negative cycles, a interesting labelcorrecting approach using a dynamic
the shortest path problem
10
breadth rst search technique is recently due to Goldfarb, Hao, and Kai 54 . Summing up the above characteristics, the graphs we are interested in for determining shortest paths are large, sparse, oriented and nonnegatively weighted.
2.2.3 The strategy type
The strategy refers to the technique employed in the algorithm for calculating shortest paths. The various existing techniques use combinatorial procedures, matrix operations, updating approaches, and so on see 30 for a complete panorama. Each technique privileges precise data structures for representing graphs. The choice of a particular strategy partly depends on the problem and the graph type. Let us use these informations to select some suitable techniques, or discard illadapted ones. The singlesource problem calls for combinatorial techniques rather than algebraic ones. Indeed, the latter strategies usually use matrices as data structure and are better suited for allpairs problems. A combinatorial method traverses the arcs of the graph and records the information so obtained. This approach is successful in solving onepair problems and has led to some of the most e cient shortest path algorithms. These methods generally represent graphs in forward star form; this consists, for each vertex v 2 V , of storing either all arcs whose initial vertex is v , or all vertices reachable from v by one arc, that is, the set of successors of vertex v S v def fu 2 V j avu 2 Ag: = 2:3 The fact that our graphs are oriented, large and sparse also encourages the use of combinatorial techniques. Among combinatorial methods, two types of algorithms then seem attractive: the labelcorrecting algorithms and the labelsetting algorithms. The former methods apply to weighted graphs with general arc weights with or without negative weights while the latter will not work for graph with negative arc weights. Labelsetting methods consequently appear to be in the class of appropriate strategies for solving our speci c shortest path problem. The next section deals with these strategies.
Updating techniques take advantage of an initial nding of shortest paths to calculate other ones or the same ones when small changes occur in the graph. For instance, Florian, Nguyen and Pallottino 42 proposed an algorithm computing shortest paths from a given origin using the information of shortest paths from another origin. This strategy may be pro table when shortest paths between many origindestination pairs are to be found. Another interesting updating method is due to Goto, Ohtsuki and Yohimura 59 , which seems to be e cient when the shortest path problem must be solved repeatedly with di erent numerical values of arc weights. Such a situation of course arises in our inverse shortest path applications. However, methods solving this latter problem generally use matrices to store arc weights namely to apply a LUfactorization 47, 59 . These matrices being not available
the shortest path problem
11
in our case, we will not investigate in such methods. Section 2.7 of this chapter discusses the advantages of a particular updating technique that uses the forward star representation of a graph. We nally point out a strategy that has been recently studied by Bertsekas in 7, 8 : the auction strategy. An auction algorithm for nding shortest paths seems to be relatively e cient in many cases. Discussion about computational results obtained by this technique can be found in Section 2.6. Let us now examine some properties of algorithms solving the shortest path problem in sparse graphs using abovementioned techniques.
2.3 Shortest path tree algorithms
We rst overview algorithms using labelling" techniques. These algorithms typically calculate a shortest path tree which is a structure containing the shortest paths from one vertex called source or root to all other vertices of the graph. A proof that shortest paths build trees in the graph where they are calculated is proposed in Section 2.3.2. Let us give some properties of the tree structure and their implications on shortest paths before proving this interesting result.
2.3.1 Shortest path trees An oriented graph G is said to be strictly connected if there is at least one oriented path between every pair of vertices in G. Similarly we say that G is weakly connected if there exists at least one weak path between every pair of vertices in G. A tree is a weakly connected graph
without any weak cyclesneither an oriented cycle nor a weak cycle. If we de ne the indegree of a vertex v as the number of arcs in A having v as nal vertex, a tree with only one vertex the root of zero indegree is called an arborescence or an outtree. This is a wellknown characteristic of shortest path trees. If G is connected, then a shortest path tree rooted at vertex v reaches every vertex in N n fv g: one then speaks of a spanning tree. Consequently, a shortest path tree in a connected oriented graph is technically a spanning arborescence. It is important to note that a shortest path tree is not a shortest spanning tree see Figure 2.2, where the arc weights are shown next to the arcs which are represented by arrows. A shortest spanning tree is that of smallest weight, where the weight of a tree T is de ned as the sum of the weights of all arcs in T . In Figure 2.2, the weight of the shortest path tree from the root is 10 while that of the shortest spanning tree is 6.
Theorem 2.1 A tree with n vertices has n , 1 arcs.
Let us reason by induction on the number of vertices. One directly sees that the theorem holds for n = 1; 2 and 3. We assume now that the theorem holds for all trees with fewer than n vertices. Consider a tree T with n vertices and one of its arc aij = vi ; vj . There is no other path between vi and vj except aij because any other path would create a weak cycle in T , which is
Proof.
the shortest path problem
12
5
t t t
k Q 1 Q 6 Q Q 3 5
5
root
t t t
6 3
5
root
t t t
k Q 1 Q Q Q 3 5
Figure 2.2: A weighted graph, a shortest path tree and a shortest spanning tree. impossible since T is a tree. Then T n faij g consists of two trees since there are no weak cycles in T . Both trees have fewer than n vertices each, and therefore, by induction hypothesis, each contains one less arc than the number of vertices in it. Thus, T n faij g consists of n vertices and n , 2 arcs and hence T has exactly n , 1 arcs. 2 As a consequence, being an arboresence, a shortest path tree can be stored in a narray of vertex numbers, the ith component containing the vertex number j such that the arc j; i belongs to the shortest path tree. By convention, the component corresponding the source vertex is set to 0 remember that the source vertex has a zero indegree. One also says that the narray contains the predecessor of each vertex di erent from the root in the arborescence.
2.3.2 Bellman's equations
We are now interested in calculating a shortest path tree SPT in G = V ; A; w from the source vertex src 2 V . We assume that all cycles have nonnegative cost and that there exists a path from src to every other vertex of V . Let sc v be shortest path cost from src to v in SPT , and pred v the predecessor of v in SPT with the convention that pred src = 0. An algorithm that yields the shortest path tree SPT gives the unique solution to Bellman's equations 5 :
sc j = min sc i + wij ; j 6= src ;
iji;j 2A
2:4
which de ne the shortest path costs sc j j 6= src recursively with the initial condition that sc src = 0. Now let us formalize the condition that distinguishes a spanning tree from a shortest path tree.
Theorem 2.2 Assume that SPT is a spanning tree rooted at vertex src. Then SPTis a shortest
path tree with root src if and only if, there exist labels si associated with vertex i i = 1; : : :; n such that, for every arc aij in A, si + wij sj , with j 6= src and ssrc = 0.
Suppose that SPT is a spanning tree rooted at src . If there exists an arc aij 2 A such that si+ wij sj , then the path from src to j in SPT is not shortest. Conversly, assume that si + wij sj holds for every arc aij 2 A, with j 6= src and ssrc = 0. Let p be a path from src to j , which is of the form src = v1; v2; : : :; vlp+1 = j . Then, by hypothesis, the cost
Proof.
the shortest path problem
13
of p is bounded below by
2lp,1 3 X sv2 , ssrc + 4 svk+1 , svk 5 + sj , svlp = sj :
k=2
2:5
Hence SPT is a shortest path tree. 2 Note that the label vector s in Theorem 2.2 does not necessarily equal the vector sc de ned by Bellman's equations. The value of si will match that of sc i if and only if si + wij = sj for every arc aij in SPT see 99 . The equations 2.4 already suggests a basic algorithm for calculating shortest path costs3, which is commonly viewed as a prototype" shortest path tree algorithm 49 :
Algorithm 2.1
Step 1. Initialize a tree SPT rooted at src and, for each v 2 V , set scv to the cost of the path
from src to v in SPT ;
Step 2. Let aij 2 A be an arc for which sci + wij , scj 0, then adjust the vector sc by
setting sc j = sc i + wij , and update the tree SPT by replacing the current arc incident into vertex j by the new arc aij ;
Step 3. Repeat Step 2 until optimality conditions 2.4, which may be rewritten as sc i + wij sc j ; for all i; j 2 A
are satis ed.
2:6
In course of calculation, the value of sc v v 2 V is greater or equal to the cost of the path from src to v in the current tree. One usually calls sc v the label of vertex v . Algorithms that label the reached vertices with their shortest path costs from the source are called labelling algorithms. There are two conventional ways for classifying labelling algorithms: authors like Steenbrink 102 , Dial et al. 33 , Deo and Pang 30 distinguish between labelcorrecting and labelsetting methods; more recently, Gallo and Pallottino 48, 49 rather discern di erent search strategies the breadth rst search, the depth rst search and the best rst search employing precise data structures analysed by Aho et al. in 1 , Tarjan in 106 and Pallottino in 93 . In order to clarify both approaches, let us touch upon Step 2 of the above prototype algorithm. Considering the forward star representation of sparse graphs, one realizes that it is worth selecting vertices rather than arcs: once a vertex i has been chosen, the operations of Step 2 are performed on all arcs aij with vertex j 2 S i. We suppose that vertex i is selected from a set of candidate vertices Q. With this point of view, search strategies refer to the way i is chosen from Q in relation to the underlying data structure of Q, and labelsetting or labelcorrecting methods sooner refer to some properties of the vertices that are to be selected in Q. These precisions allow to present a variant of the prototype Algorithm 2.1 that includes the updating of the shortest path information given by the vector pred .
3
Bellman's equations do not supply information about shortest paths themselves.
the shortest path problem Algorithm 2.2
14
Step 1: Initializations. Set sc src 0 and pred src 0. For each vertex i 2 V n fsrcg, set sc i +1 and pred i i. Finally, set Q fsrcg. Step 2: Selecting and updating. Select a vertex i in Q, and set Q Q n fig. For each vertex j 2 S i such that sc i + wij sc j do: set sc j sc i + wij and pred j i; if j 62 Q, then set Q Q fj g and update Q. Step 3: Loop or exit. If Q = ;, then go to Step 2.
Else exit: sc and pred contain the costs and the description of the shortest path tree rooted at src , respectively.
The updating of the set Q in Step 2 technically depends on the data structure used to store the candidate vertices.
2.3.3 Labelsetting and labelcorrecting principles
A labelsetting method labels a vertex permanently or temporarily: the label sc v of a vertex v is made permanent only when scv equals the shortest path cost from src to v . The vertex i selected in from Q is the vertex of minimum label that is labelled as temporary; vertex i gets permanently labelled when removed from Q. It means that each vertex will enter Q at most once and that one shortest path is found at each iteration; the algorithm then terminates in at most n , 1 iterations. As far as the graph type is concerned, remember that labelsetting algorithms only work on graphs with nonnegative arc weights. On the other hand, a labelcorrecting method never labels vertices permanently until the algorithm ends. As a consequence, such a method cannot guarantee that any current path is shortest until termination occurs. A labelcorrecting algorithm therefore usually requires more than n , 1 iterations, but each of them demands less calculation. In contrast with labelsetting methods, labelcorrecting strategies apply to graphs with general arc weights. These labelsetting and labelcorrecting principles are not mutually exclusive and may coexist" as it is the case for the auction algorithm of Bertsekas 8 : the auction algorithm follows labelsetting principles in that the shortest distance to a vertex is found at the rst time the vertex is labelled; it also follows labelcorrecting principles in that the label of a vertex may continue to be updated after its shortest distance is found. This will be detailed in Section 2.6.
the shortest path problem
15
2.3.4 Search strategies
As mentioned in Section 2.3.2, search strategies involve selection rules depending on Q's data structure. Let us examine three commonly used procedures see namely 106 , for a detailed description. The breadth rst search selects the oldest element in Q, that is, the element which was inserted last; the underlying data structure is known as the FirstInFirstOut" FIFO list, or queue. The depth rst search chooses the newest element from Q, i.e. the element which was inserted last; the LastInFirstOut" LIFO list, or stack allows such a search. Both breadth rst and depth rst strategies are designed to use lists for which adding and removing an element are elementary operations. The rst element of a list is its head and the last element its tail. Some authors 28, 33 employ a doubleended queue or deque for sequencing vertices. A deque is a list composed of a queue and a stack, which allows additions and deletions at either list end. The linkedlist which consists of linking each element of Q by means of a pointer to the next one, allows e cient implementations of queues, stacks and deques. In the best rst search, we make use of numerical values labels associated with the vertices. The element to be selected is that of minimum label in Q. Appropriate data structures implementing this strategy are the priority queues. A priority queue is a collection of elements, each with an associated label, on which the following operations are e ciently performed: adding a new element, removing the minimum value element and correcting the label of an element whose location is known.
2.3.5 Search strategies for labelsetting and labelcorrecting methods
Labelling algorithms typically maintain the candidate set Q with the help of some data structure that facilitates operations on this set. In Table 2.1, we show the conventional correspondance between labelling methods and search strategies.
Labelling methods Labelcorrecting Labelsetting Queue Stack Deque Linked list Buckets Heap Auction
Table 2.1: Search strategies for labelling methods. Breadth rst searches with queues or deques often organize the selection in Q for labelcorrecting algorithms see 48 , for instance. This is due to the fact that a breadth rst search visits the vertices in concentric zones starting from the source. With the same idea, Gallo and Pallottino 48 remark that depth rst searches with stacks are odd in shortest path tree algorithms since the rst updated vertex, which is in S src , will be selected last. Using a best rst strategy with the shortest path costs sc as labels, the selection of Step 2 in Algorithm 2.2 yields the vertex i of Q that is at the shortest distance from src . Once the forward
the shortest path problem
16
star S i is updated, i does not need to be updated any more until the end of the algorithm. Shortest path tree algorithms using a search strategy derived from the best rst one are then labelsetting algorithms, and are also called shortest rst algorithms. In particular, Dijkstra 34 originally did not use any list in his shortest rst algorithm; Yen 117 exploits linked lists; Denardo and Fox in 28 and Dial et al. in 33 manipulate buckets, and Johnson 67 makes use of heaps. Buckets and heaps are structures for ordering vertices with respect to their label. They will be described in Section 2.5 with the algorithms employing them. Finally, as mentioned before, the auction technique used by Bertsekas 8 is of special nature. This technique is de ned and analysed in Section 2.6. The next sections are devoted to review some shortest path tree algorithms that use above search strategies. Computational complexities are mentioned to allow comparisons between these algorithms.
2.4 Labelcorrecting algorithms
The rst wellknown labelcorrecting algorithm was introduced by Ford 44 , and then detailed by Bellman 5 and Moore 85 . Their method recursively solves Bellman's equations 2.4 and has a computational complexity of Omn. Proofs and comments about this method can be found in 57, 89 . We present three variants of this prototype algorithm that have been suggested ever since. We do not mention the depth rst variant because of its high computational complexity On2n and poor practical performance. Other variants can be consulted in 49 .
2.4.1 Lqueue algorithm
Gallo and Pallottino 48 proposed an e cient version of the Bellman Ford Moore algorithm using a queue for representing the set Q and the forward star form. They called their algorithm Lqueue", for list search queue. We denote the queue's head and tail by Qh and Qt , respectively.
Algorithm 2.3
Step 1: Initializations. Set sc src 0 and pred src 0. For each vertex i 2 V n fsrcg, set sc i +1 and pred i i. Finally, set Q fsrcg, Qh src and Qt src . Step 2: Selecting and updating. Remove vertex i at Qh . For each vertex j 2 S i such that sc i + wij sc j do: set sc j sc i + wij and pred j i; if j 62 Q, then insert j at Qt .
the shortest path problem
17
Step 3: Loop or exit. If Q = ;, then go to Step 2.
Else exit. On sparse graphs m=n small, this breadth rst search algorithm runs in Omn = Ocn2 with small c. Experiences in 49 show that this worstcase complexity is not reached in practice. The storage requirement is 4n + 2m n + 2m for the weighted graph, n for the queue, and 2n for the vectors pred and sc.
2.4.2 Ldeque algorithm
Pape 94 exploited a suggestion of D'Esopo and set up an algorithm where Q is a doubleended queue deque which allows the insertion of vertices at both ends of the list according to a predetermined strategy: a distinction is made between vertices that are in Q and others; the list is split into a queue and a stack; unlabelled vertices are inserted at the tail of Q like a queue, while the vertices that have been already labelled are inserted at the head of Q like a stack. The resulting algorithm called Ldeque" is very similar to Lqueue". We therefore present the modi ed Step 2 of Algorithm 2.3, using the same notations about Q's head and tail:
Algorithm 2.4
Step 2: Selecting and updating. Remove vertex i at Qh . For each vertex j 2 S i such that sc i + wij sc j do: set sc j sc i + wij and pred j i; if j 62 Q, then insert j at Qt if sc j = 1, otherwise scj nite, insert j at Qh .
Of course, the presence of a stack implies a rather high worstcase complexity: On2n . According to Dial et al. 33 , the algorithm Ldeque is e cient when applied to sparse and almost planar4 graphs. The latter restriction does not necessarily meet our graph requirements. The need in storage is the same as that of Lqueue. Although Gallo and Pallottino 48 observe better runtimes for Ldeque algorithm for a broad variety of problems, we prefer the Lqueue algorithm which presents less restrictions see namely 70 where constructed examples show limitations of Ldeque.
2.4.3 Ltreshold
The partitioning of Q explains the e ciency of Ldeque. With the same idea, Glover et al. 60 organized the list Q as two separate queues Q0 and Q00 using a treshold parameter s. The queue Q0 is dedicated to vertices whose label falls below the treshold parameter s. The algorithm typically proceeds as follows: at each iteration, a vertex is selected and removed from Q0 , and any vertex j to be added in the candidate list is inserted in Q00 . When Q0 is empty, the treshold
4
that is, drawable in 2 dimensions without arc intersections.
the shortest path problem
18
s is adjusted and Q is repartioned according the new treshold value. The procedure then goes
on until exhaustion of the candidate list Q. This method becomes e cient once suitable values are chosen for s. As noticed by Bertsekas 9 , if s is taken to be equal to the current minimum label, the method behaves like Dijkstra's algorithm, which is presented in the next section; if s exceeds all vertex labels, then Q00 is empty and the algorithms reduces then to the generic labelcorrecting method. Appropriate treshold values have been proposed by Glover et al. in 60 , and Gallo with Pallottino in 49 . When applied to graphs with nonnegative arc weights, the worstcase computational complexity is Omn. Although this theoretical performance equals that of other labelcorrecting algorithms, the treshold algorithm allows better practical performance than the other labelcorrecting algorithms. The storage requirement for this method is 5n + 2m.
2.5 Labelsetting algorithms
The rst labelsetting or shortest rst search algorithm is due both to Dijkstra 34 and Moore 85 . But Dijkstra established the formal properties of this algorithm. As Bellman Ford Moore's algorithm has generated most of the labelcorrecting methods, labelsetting algorithms can be viewed as a particular implementation of Dijkstra's algorithm. Remember that arc weights wij are supposed to be nonnegative for labelsetting methods.
2.5.1 Dijkstra's algorithm
Dijkstra's algorithm has been initially implemented with an unordered list of temporarily labelled vertices. For notation convenience, Q will represent the set of vertices marked as temporary. The algorithm presented here has been adapted to make use of the forward star representation.
Algorithm 2.5
Step 1: Initializations. Set sc src 0 and pred src 0. For each vertex i 2 V n fsrcg, set sc i +1 and pred i i. Finally, set Q V . Step 2: Selecting and updating. Find vertex i in Q verifying
sc i = min sc v : v2Q
2:7
If the minimum is not unique, select any i that achieves the minimum. For each vertex j 2 S i such that sc i + wij sc j , set sc j sc i + wij and pred j i. Set Q Q n fig.
Step 3: Loop or exit. If Q = ;, then go to Step 2.
Else exit.
the shortest path problem
19
The proof that this algorithm is correct can be found in 57, 89 . The complexity of the algorithm is at most On2 since each arc is examined only once, and its space requirement is 4n + 2m. For complete5 graphs, the complexity reduces from On3 for Lqueue down to On2 for Dijkstra's method: the n factor is the price to pay for both considering general arc weights and detecting negative cycles. Note that, according to Johnson 66 , a labelcorrecting variant of Dijkstra's algorithm is able to take into account negative arc weights provided that no negative cycles occur in the graph. One can easily observe that Dijkstra's algorithm allows a relative runtime reduction when solving a onepair o; d shortest path problem, the reduction amount depending on how far d is located from o. Indeed, one shortest path is found at each iteration of the algorithm; as a consequence, the algorithm may halt once the destination vertex d has been permanently labelled. Considering the actual graph sparsity also should reduce the runtime of the algorithm. The critical operation in Dijkstra's algorithm is that of nding the vertex with smallest label in Q. Maintaining Q ordered then appears as a reasonable approach. Yen suggested a variant of Dijkstra's algorithm using an ordered linked list for sequencing vertices. The computational complexity of that variant remains On2 on sparse graphs. As noticed in 14 , ordered linked lists do not speed up the original algorithm signi cantly, since inserting a new element in Q or modifying a vertex label requires the complete scanning of Q.
2.5.2 Dial's algorithm
Dial proposed in 31 a di erent way of maintaining Q ordered. He assumes that the arc weights are nonnegative integers. Denoting the largest arc weight by W def maxi;j 2A wij , the possible = nite label values range from 0 to n , 1W . The idea is then to associate a slice of this range with the vertices whose label falls within that slice. The so built small set of vertices is usually referred to as a bucket. For the sake of simplicity, we will consider n , 1W + 1 buckets, denoted by Qk k = 0; : : :; n , 1W , the kth bucket being associated with label value k. Finding the minimum label vertex then consists of retrieving the rst nonempty bucket in ascending order, rather than scanning the candidate list Q. Usual operations on this data structure are very elementary. A new element is added deleted by inserting removing it into from the appropriate bucket. Correcting the label of an element is simply achieved by moving the element into the bucket matching its new label value. Dial's algorithm then proceeds as follows.
Algorithm 2.6
Step 1: Initializations. Set sc src 0 and pred src 0. For each vertex i 2 V n fsrcg, set sc i +1 and pred i i. For k = 1; : : :; n , 1W , set Qk ;. Finally, set Q0 fsrcg.
5
A graph is complete if there exist an arc bewteen every pair of vertices.
the shortest path problem
20
Step 2: Selecting and updating. Find smallest k such that Qk 6= ;. For each vertex i in Qk, select each vertex j 2 S i such that sc i + wij sc j , and do: if sc j = 1, then set Qsc j Qsc j n fj g; 6 set sc j sc i + wij and pred j i; set Qsc j Qsc j fj g. Set Qk ;. Step 3: Loop or exit. If Qk = ; for k = 0; : : :; n , 1W , then exit.
Else go to Step 2. Actually, Dial's algorithm is more optimized than that above. Indeed, it is su cient to maintain only W + 1 buckets, instead of n , 1W : if we are currently searching bucket k, then all buckets beyond k + W are known to be empty. This can be easily checked since the label sc j of vertex j is of the form sc i + wij , where i is a vertex that has already been removed from the candidate list; we also have that sc i k and aij W ; then sc j k + W . Dial implemented the bucket structure with twoway linkedlists. The computational complexity of Dial's algorithm is Om + nW , and the space requirement reaches 5n + 2m + W + 1. Using buckets with nonuniform widths and splitting down large ones at the right moment can speed up the algorithm. Denardo and Fox 28 reduced the complexity bound by proposing such strategies. They also generalized Dial's algorithm with noninteger arc weights. See also 112 where buckets have bounded cardinality.
2.5.3 Binary heap algorithm
Yet another data structure has been favoured to keep a list ordered : the heap. This term has been rst introduced by Williams 114 . Other authors used this technique, denoting the underlying data structure by priority queue 1, 72 . Later, Tarjan 107, Chapter 3 proposed a thorough analysis of the heap. A heap is a partially ordered collection H of items, the kth item associated with a realvalued label Hk. The properties of a heap are suitably represented by an arborescence. The root or top item is that of minimum label in H. Moreover, the label of every item in H does not exceed the label of all the items that are its descendants in the arborescence. A binary heap, denoted by Q for convenience, is a heap of K items verifying the following:
Qk Qbk=2c; for all k = 2; : : :; K;
2:8
where bxc is the greatest integer smaller than or equal to x. An binary heap with K = 6 is illustrated in Figure 2.3, where the labels Qk6=1 = 1; 2; 5; 3; 2; 6 are encircled next to their k index k. Binary heaps are very often employed for their easy implementation.
the shortest path problem
21
k=2
k=1
2
,
, ,
1
@ @ @
5 6
k=3
J J
3
J
2
k=4
k=5
k=6
Figure 2.3: A binary heap. The binary heap Q then has dlogK +1e levels in its arborescence, where log is the logarithm base 2, and dxe is the smallest integer greater than or equal to x. As a consequence, operations of removing the minimum label item, inserting a new item, and correcting a label have a computational complexity of Olog K . Indeed, each heap manipulation concerning one item consists of exchanging that item either with its ascendant one level up or with one of its descendant one level down. The procedure of recovering the heap properties after some change about a vertex or a label will be referred to as order the heap". The heap Q can be implemented by means of two nvectors: one for the arborescence, and one for keeping track of the items' position in the heap. We now describe a shortest path tree algorithm using a binary heap to manage the list of candidate vertices, which is still denoted by Q. The heap will contain at most n items or vertices. As for previous algorithms, Qh = Q1 and Qt denote the head and the tail of the heap, respectively. Note that index t n. For technical details concerning the heap updating see 49, 107 . Algorithms using related techniques have been developed by D.B. Johnson 67 and E.L. Johnson 68 .
Algorithm 2.7
Step 1: Initializations. Set sc src 0 and pred src 0. For each vertex i 2 V n fsrcg, set sc i +1 and pred i i. Finally, set Q fsrcg. Step 2: Selecting and updating. Set i Qh . Replace Qh by Qt and order the heap Q. For each vertex j 2 S i such that sc i + wij sc j , do: set sc j sc i + wij and pred j i; if j 62 Q then insert j as Qt and order the heap Q.
the shortest path problem
22
Step 3: Loop or exit. If Q 6= ;, then go to Step 2.
Else exit. Remark that at Step 2, one must not nd" the minimum label vertex, in contrast with both Algorithm 2.5 and Algorithm 2.6. The counterpart is given by the need of reordering the heap. Since each arc is examined at most once, the computational complexity of the binary heap algorithm or Johnson's algorithm is Om log n when the arc weights are nonnegative. The space requirement is O5n + 2m. For sparse graphs, m = On and the complexity becomes On log n, which is the lowest theoretical complexity bound that has been found so far for solving the shortest path tree problem. This method performs very well in practice since one can hardly tell its practical performance apart from On 14 . Interesting variants have been proposed namely by Denardo and Fox 28 : they used a heap for ordering buckets see Section 2.5.2 and got a computation complexity of Om log W where W is the maximum arc weight; this bound is of course attractive when W n.
2.6 An auction algorithm
We now present an algorithm which shares features from both labelcorrecting and labelsetting algorithms: the auction algorithm. The auction strategy has been rst studied by Bertsekas 6, 7 in the context of assignment problems. Later, Bertsekas applied this technique to the shortest path problem 8, 9 .
2.6.1 Basic concepts
Bertsekas' algorithm is designed to solve the shortest path problem from several origins to a single destination in a weighted oriented graph. It then works out our onepair problem. We therefore present the basic auction algorithm for the single origin case as proposed by Bertsekas in 8, 9 . The following assumptions are made on the graph G = V ; A; w: 1. All cycles have positive length. 2. Each vertex is the source of at least one arc except for the destination vertex of the onepair problem. 3. The graph G is simple. We are interested in nding a shortest path from vertex src to vertex dst . The auction procedure maintains a simple path P = src ; v1; v2; : : :; vk starting at src , and a vector of prices associated with the vertices, v being the price of vertex v . At each iteration, the path P is either extended by adding a new vertex vk+1 , or contracted by deleting its current destination vk . When the current destination becomes dst , the algorithm ends. Bertsekas gives an intuitive sense of his algorithm: it proceeds as a person trying to reach a destination in a graphlike maze, going forward and
the shortest path problem
23
backward along the current path; each time a backtracking occurs, the person evaluates and keeps track of the price" or the desirability" of revisiting and advancing from the left position. An iteration consists of updating the pair P; so that it satis es the complementarity slackness condition CS: i wij + j for all i; j 2 A 2:9a 2:9b These CS conditions are equivalent to Bellman's equations 2.4 with the labels sc i replaced by the negative prices ,i . When a pair P; satis es CS, the portion of P between two of its vertices i and j is a shortest path from i to j by 2.9a, and i , j is the corresponding shortest path cost by 2.9b.
i = wij + j for i; j such that aij is an arc of P .
2.6.2 Description of Bertsekas' algorithm
The algorithm proceeds by extending and contracting the current path P . A degenerate case occurs when the path is reduced to the vertex src . The path P is then either extended, or left unchanged with the price src being strictly increased. The algorithm needs to begin with a pair P; that satis es CS. This is not a restrictive assumption when all arc weights are nonnegative: one can use the default pair src; with i = 0 for all i = 1; : : :; n: Let us describe the algorithm by characterizing a typical iteration.
Algorithm 2.8
2:10
Let i be the terminal vertex of P . If
i
go to Step 1; else go to Step 2.
i;j 2A
min fwij + j g;
2:11
Step 1. Contract path: Set i mini;j 2Afwij + j g, and if i = src, contract P . Go to the next iteration. 6 Step 2. Extend path:
Extend P by vertex ji where
If ji = dst then exit: P is the desired shortest path. Else, go to the next iteration.
ji = arg i;j 2Afwij + j g: min
2:12
In order to have an intuitive view of this procedure, see Table 2.2 showing all steps of Algorithm 2.8 when applied to the small graph illustrated in Figure 2.4, where the vertex numbers are encircled and the arc weights are next to the arcs.
the shortest path problem
24
dst
2
, 2, , , , @ @ I @
[email protected] @
4
@ 2 @ I @ @ @ , , , , , 2
3
1
src
Figure 2.4: A small graph as illustration for the auction algorithm. Iteration P before iteration before iteration 1 1 0,0,0,0 2 1 1,0,0,0 3 1,2 1,0,0,0 4 1 1,2,0,0 5 1 2,2,0,0 6 1,3 2,2,0,0 7 1 2,2,2,0 8 1 3,2,2,0 9 1,2 3,2,2,0 10 1,2,4 3,2,2,0 Iteration type Contraction at vertex 1 Extension at vertex 2 Contraction at vertex 2 Contraction at vertex 1 Extension at vertex 3 Contraction at vertex 3 Contraction at vertex 1 Extension at vertex 2 Extension at vertex 4 Exit
Table 2.2: Illustration of Bertsekas' algorithm.
2.6.3 Properties of the algorithm
We note here some important properties of the auction algorithm presented above. The reader is referred to 8, 9 for their proofs. These properties can be easily checked on the example detailed in Table 2.2.
Property 1 The price i generated by Algorithm 2.8 is an underestimate of the shortest path
cost from i to dst .
Property 2 For every pair of vertices i and j , and all iterations, i , j is an underestimate of
the shortest distance from i to j .
Property 3 The portion of P between vertex src and any vertex i 2 P is a shortest path from
src to i, with shortest path cost equal to src , i .
the shortest path problem
25
Moreover, note that the shortest path cost to a vertex is found at the rst time the vertex becomes the terminal vertex of the path P and is then equal to src , and that the vertices become terminal for the rst time in the order of their proximity to the origin. Some properties allow to state relationships between the prices i and the shortest path costs sc i; remembering the convention that sc src = 0, Properties 2 and 3 imply that
sc j src , j for all j 2 V
2:13 2:14 2:15
and
sc i = src , i for all i 2 P sc i + i , dst sc j + j , dst for all i 2 P and j 2 V :
if the CS conditions hold. Then, we can write the following: The price i being an estimate of the shortest path cost from i to dst see Property 1, the quantity sc j + j , dst as an estimate of the shortest path cost from src to dst using only paths passing through j . It thus makes sense to consider vertex j as most desirable" for inclusion in the algorithm's path if sc j + j , dst is minimal.
2.6.4 Algorithm's performance
The crucial operation in Bertsekas' algorithm is the calculation of mini;j 2A fwij + j g each time vertex i becomes the terminal vertex of the path P . Bertsekas 8, 9 proposes several techniques for reducing that computational time. The computational complexity of the auction algorithm is Omn. This bound can be reduced by considering some more characteristics of the graph: if bounds the number of arcs in the subgraph of vertices that are closer to the origin src than the destination dst , a more accurate estimate is On. Graph with a small diameter still improve this computational bound: if is the minimum number of arcs in a shortest path from src to dst , and W = maxi;j 2A wij , then the computational complexity is OmW or OW . According to Bertsekas, the practical performance of auction algorithms remains to be fully investigated, particularly using parallel machines. The auction algorithm performs very well on random graphs and problems with few destinations more than one, but much less than n; it even runs faster than Johnson's algorithm. When applied to sparse graphs, the auction method seems to be not so e cient: A. Sartenaer 100 compared the performance of Bertsekas' algorithm against that of Johnson's method when practiced on urban networks. She noticed that Bertsekas' method does not take advantage of the sparsity as well as Johnson's algorithm does. Moreover, the complexity bound estimated for Bertsekas's algorithm depends on the shortest path cost and there are problems for which the number of iterations of the algorithm is not polynomially bounded. See, for instance, the small graph in Figure 2.5 which involves a cycle of relatively small cost. A. Sartenaer experimented that the runtimes of Bertsekas' algorithm closely depend on the value of W : examining the steps taken by Algorithm 2.8, one directly sees the the price of vertex 3 will be rst increased by 1 and
the shortest path problem
26
1
1 
2
1

3
W

5
src
I @ @ 1 @
4
, ,1 ,
dst
Figure 2.5: A small graph with a cycle of small cost. then by increments of 3 the cost of the cycle as many times as necessary for the price 3 to reach or exceed W . If this situation is unlikely to arise in randomly generated problems, it is not the case for urban networks for instance: just think of a roundabout followed by a long road. On the other hand, Jonhson's algorithm acts the same whatever the value of W can be, since it will terminate in n , 1 iterations. The storage requirement is 3n +2m or at most 5n +2m depending on wether a data structure is used to retrieve the minimum value mini;j 2A fwij + j g with i the terminal vertex of P . It seems di cult to make use of a heap in the auction algorithm, since at each iteration this minimum value may be calculated on a di erent set of arcs. The computational e ort of building a new heap from scratch" each time P 's terminal vertex changes would degrade the algorithm's performance.
2.7 An algorithm using an updating technique
When many shortest path trees must be computed, a way of doing results in applying a shortest path tree algorithm for each di erent root. If some of the root vertices are close" to each other, the shortest path trees at these roots will share many arcs. This observation, although wellknown, has not been exploited before Florian, Nguyen and Pallottino 42 . This is mainly due to the simplicity and e ciency of the available algorithms for solving the shortest path problem from a single origin. Some authors tackled the problem of updating the shortest path cost matrix 47, 59, 88 . But the use of such methods are impractical when faced to large networks. Florian et al. 42 set up a dual simplex strategy adapted to the forward star representation of a graph.
2.7.1 The shortest path method as a linear program
The problem of nding the shortest paths from a vertex src can be written in the minimum cost ow format as a linear program: X min wij xij 2:16 x subject to
i;j 2A
X
fj ji;j 2Ag
xij ,
X
fj jj;i2Ag
xji = bi;
for all i 2 V ;
2:17
the shortest path problem
27
xij 0;
where
for all i; j 2 A;
2:18 2:19
bi =
The variable xij represents the number of shortest paths going through arc aij . Hence, the objective function 2.16 can be translated as: minimize the sum of the costs of all shortest paths from src. The constraints 2.17 2.18 with 2.19 express the need of the shortest paths to form an arborescence rooted at src . The corresponding dual linear program is, by letting ,i be the dual variable associated with constraint i, X max j , src 2:20 subject to
j
n , 1; for i = src ,1; for i 6= src:
wij + i + j 0; for all i; j 2 A: 2:21 Note that the quantities ,i are the prices of Bertsekas' auction algorithm presented in Section 2.6. A primal feasible solution for problem P 2.16 2.19 is a set of ows fxij g that satisfy 2.17 and 2.18. A dual feasible solution for problem D 2.20 2.21 is a set of dual variables fi g that satisfy 2.21. An optimal solution is both primal and dual feasible.
2.7.2 Solving the problem from another root
An optimal solution of P corresponds to a basis of 2.17 2.19 whose arcs determine a spanning tree that minimizes 2.16. As the source vertex src is changed into a di erent vertex src 0, the shortest path tree rooted at src is a dual feasible and primal infeasible solution for the new problem. That is why Florian et al. naturally use the dual simplex method for solving this new problem. Here is the framework of a dual simplex algorithm.
Algorithm 2.9
Step 0: Step 1: Step 2: Step 3: Step 4:
start with a dual feasible, primal infeasible solution; if the primal solution is feasible, terminate; select the variable to exit the basis, usually that with the most negative solution value; select the variable to enter the basis such that dual feasibility is maintained; obtain a new basis, update primal and dual variables and go to Step 1.
The algorithm may be interpreted as follows: given a shortest path arborescence T , one builds another one, T 0 say, by transforming T . Each transformation deletes an oriented arc from T towards T 0, and adds an oriented arc a+ from T 0 towards T , which is selected to maintain the paths shortest from the new root. Note that the current subarborescence rooted at ta+ is already optimal for the new problem. The procedure ends when T 0 is an arborescence.
the shortest path problem
28
2.7.3 Computational performance
The updating algorithm presented above has a worstcase computational complexity of On2 when using an implicit updating of the dual variables. Burton 14 tested an implementation of Florian's algorithm against that of Johnson for the shortest path tree problem. Both algorithms were implemented with heaps for selecting minimum labeled elements, and were applied to sparse graphs. For these tests, a rst shortest path arborescence at src was generated by Johnson's algorithm. Then various new sources src 0 were selected and new shortest arborescences were calculated from these sources both by Florian's and Johnson's methods. The experimentations showed that Florian's algorithm runs faster when the new source src 0 belongs to the rst or second centroid of src , that is when src 0 is reachable from src by at most two arcs. This condition can be easily veri ed since the forward stars are available. In all other cases, Johnson's algorithm outperforms that of Florian. Florian's algorithm should have a better practical performance when applied to more dense graphs. Indeed the arc a+ is selected among very few arcs in sparse graphs, and the vertices sa+ and ta+ are usually close to each other. Florian's algorithm then proceeds by many small" steps because the subarborescences rooted at ta+ are rather bonsa
s than trees. In dense graphs, Florian's algorithm should consequently be more appropriate.
2.8 A shortest path method for the inverse problem
One can observe better complexity bounds for labelsetting methods. However, we must distinguish between practical performance and theoretical worstcase performance. Although labelcorrecting algorithms generally runs at worst in Omn, the best of them are competitive with the best labelsetting algorithms. The best practical methods are not necessarily those with the best computational complexity bounds. In Figure 2.6, several problems of increasing dimension have been solved by six shortest paths algorithms. These results are take from 14 where the author tested only variants of Dijkstra's algorithm. They consequently do not cover all methods exposed in this chapter. Nevertheless, they witness some trends in the use of labelsetting methods. The number of nodes increases from left to right and computation runtimes augment upward. The rst result serie in foreground is obtained by Johnson's algorithm; we can observe the almost linearity as the number of vertices increases. The last serie in background re ects the quadratic behaviour of original Dijkstra's algorithm. Intermediate algorithms are personal variants of Dijkstra's method which are not registered in the literature. For practical e ciency of other algorithms, we refer to authors that have been cited throughout this chapter. According to Gallo and Pallottino 49 , the binary heap algorithm proposed by Johnson gives similar results to those obtained by Dial's algorithm. We prefer that of Johnson since its e ciency does not depend on the maximal arc weight. The Ltreshold algorithm is also e cient, but presents the following drawback: one must adjust the treshold parameter for each kind of application in order to obtain optimal e ciency. The auction algorithm proposed by
the shortest path problem
29
120 100 80 60 40 20 0
Dijkstra Johnson
Figure 2.6: Some shortest path tests results Bertsekas, although very powerful in many cases, is relatively sensible to the presence of cycles with small length and is not best suited for sparse graphs. Finally, Florian's allows better runtimes when one can pro t by an already calculated shortest path tree whose root is close to the new one. The general algorithm that is best suited to inverse shortest path applications then seems to be Dijkstra's method implemented with a binary heap. It has an attractive practical performance, requires few storage, and takes advantage of the sparsity in a very e cient way. If available, Florian's algorithm may be helpful when combined to this choice.
3 Quadratic Programming
In Chapter 1, we introduced the inverse shortest path problem and decided on a particular algorithmic framework for solving the problem by choosing the `2 or least squares norm. The resultant problem's formulation determined a quadratic programming problem, or QP for short. This chapter deals with the theoretical background of such programs, and discusses the selection of a method well suited for treating our inverse problem.
3.1 Terminology and notations
Quadratic programming refers to optimization problems in which the objective function f x is quadratic and the constraints Ei x are linear. We are thus concerned with nding a solution x to the minimizing problem that follows: 3:1 min f x = aT x + 1 xT Gx x 2 subject to Ei x def nT x , bi 0 i = 1; : : :; h; = i 3:2 where x, a and fni gh=1 belong to Rm , G is a m m symmetric1 matrix, b is in Rh and the i superscript T denotes the transpose. For the sake of simplicity, we do not explicitely mention equality constraints of type nT x , bi = 0 which the vector of variables x may also be subject i to. In fact, these constraints can be theoretically represented by two opposite constraints of type 3.2.
3.1.1 Triviality and degeneracy A constraint Ei x 0 may either be amply satis ed when the strict inequality Ei x 0 holds, or binding when the strict equality Ei x = 0 holds, or violated when Eix 0. A constraint will be called satis ed when it is not violated. Vectors x that satisfy all constraints 3.2 are feasible vectors and the set F def fx j Eix 0 for i = 1; : : :; hg of feasible vectors =
1 1 2
xT G + GT x where the matrix 1 G + GT is symmetric. 2
30
It is always possible to arrange that the matrix G is symmetric since xT Gx = xT GT x implies that xT Gx =
quadratic programming
31
is called the feasible region. The feasible region may be empty. As de ned in 11 , a constraint Eqx is trivial or redundant if, and only if, there exist no vector x 2 Rm such that Eix 0 i 2 f1; : : :; hg n fq g, and Eq x 0. Note that this de nition pertains when there is no vector x such that Eix 0 i 2 f1; : : :; hg n fq g; the feasible region is then empty even without the qth constraint which cannot limit the feasible region any more. We will say that degeneracy occurs when the solution to the QP problem is the same, whether some binding inequality constraint is imposed or not. It means that we can disregard the inequality, solve the problem, and nd a solution which exactly satis es the inequality see 12 for further details.
3.1.2 Convexity
The concept of convexity concerns both feasible region F and objective function f x. The feasible region F of our quadratic program, determined by 3.2, is characterized by the following theoremwhich can easily be proved, since each of the h constraints 3.2 limits the feasible region to a halfspace.
Theorem 3.1 The feasible region F of a quadratic program is convex, i.e.,
x; y 2 F x + 1 , y 2 F ;
for all 0 1.
3:3
An extreme point of the convex set F is a point that does not lie strictly within the line segment connecting two other points of the set. More formally,
Definition 1 A point x in a convex set
two distinct points x1 and x2 in F such that for some 0 1.
F is said to be an extreme point of F if there are no
3:4
x = x1 + 1 , x2
Note that a singleton and the entire space Rm are both convex, the rst one is bounded and the second one is unbounded.
Definition 2 A function f : D Rm ! R is convex if the set D is convex, and if the following
inequality holds for any pair of points x1, x2 2 D and any real number 0 1:
f 1 , x1 + x2 1 , f x1 + f x2:
3:5
If the sign in 3.5 is replaced by and 6= 0; 1, the function f is said to be strictly convex. Geometrically, if f is strictly convex and continuously di erentiable, a line segment drawn between any two points on its graph falls entirely above the graph. Such a function increases more rapidly or decreases less rapidly than a straight line:
f x2 f x1 + rf T x1:x2 , x1
for all x1 ; x2 2 D;
3:6
quadratic programming
32
where r designates the Gradient or rst derivative of the next argument. Now let us come back to the objective function f of our QP. Clearly, f is twice continuously di erentiable and the second derivative of f , its Hessian, is G an m m matrix whose components are second partial derivatives. Remember that G is symmetric.
Definition 3 An m m matrix G is said to be positive semide nite if for all x 2 Rm , xT Gx
0.
Definition 4 An m m matrix G is said to be positive de nite if for all x 6= 0 2 Rm , xT Gx
0.
These de nitions are signi cant in quadratic programming for their relationship with the convexity of f and hence with the characterization of a solution to the QP problem. The two following theorems state these connections.
Theorem 3.2 Let G be an mm symmetric matrix. The quadratic function f x = aT x+ 1 xT Gx 2
is convex on Rm if and only if G is positive semide nite.
A proof of this theorem can be found in 113 . If G is positive de nite, then the function f x = aT x + 1 xT Gx is strictly convex on Rm . 2 These properties may a ect the quality of a minimizer of the QP problem 3.1 3.2:
Theorem 3.3 Assume that the feasible region F determined by 3.2 is not empty. If G is positive semide nite, a solution x of the QP problem 3.1 3.2 is a global solution, that is,
f x f x; f x f x;
for all x 2 F : for all x 6= x 2 F :
3:7 3:8
Moreover, if G is positive de nite, then x is also unique, that is,
See 12 for a proof of this theorem. When the Hessian G is inde nite then local solutions which are not global can occur2 . The problem of minimizing a convex quadratic function f x on a convex feasible domain F Rm is called convex quadratic programming.
3.2 A speci c quadratic problem
Let us characterize of our inverse problem 1.91.4 1.6 with respect to the above terminology.
2
It is also the case when the feasible region of the QP problem is not convex.
quadratic programming
33
3.2.1 The objective function
The objective 1.9 of our inverse shortest path problem can be rewritten in the QP standard form as f w = 1 wT w , wT w: 3:9 2 We directly see that the Hessian matrix of f w is the identity matrix on Rm which is positive de nite. This implies the strict convexity of our objective 1.9 and hence the certainty that a found solution is global and unique if the feasible region F of the problem is convex; otherwise, if F is nonconvex, only local optimality can be practically observed3 .
3.2.2 The feasible region
Remember that the objective of our inverse problem is subject to the nonnegativity of w 1.4, to explicit" shortest path constraints 1.5, and nally to implicit" shortest path constraints 1.6. The rst constraints are classical bound constraints which are of course linear, and hence determine a convex feasible region in Rm . We therefore do not examine these constraints in more details. Let us rather interpret and clarify the status of the shortest path constraints.
Explicit constraints
An explicit shortest path constraint has been stated in Chapter 1 as follows:
pj is a shortest path in G; j = 1; : : :; nE ;
3:10
where G is an oriented weighted graph V ; A; w, consisting of a set V of n vertices, a set A of m arcs and an mvector w of weights associated with the arcs. The paths pj j = 1; : : :; nE are de ned as an explicit succession of consecutive arcs in A. Note that we are using the terminology and de nitions presented in Chapter 2. The formulation 3.10 asks the cost of pj not to exceed that of any path with the same origin and destination as pj . This may be expressed as a possibly large set of vectorial4 linear inequality constraints of the type
X
kjak 2p0j
wk
X
kjak 2pj
wk ;
3:11
where p0j is any path with the same origin and destination as pj . As a consequence, the set of feasible weights determined by 3.10 is convex as it is the intersection of a collection of half spaces. The problem of minimizing 1.9 subject to 3.11 for j = 1; : : :; nE , and to the nonnegativity constraints 1.4 is then a classical QP problem. This QP is however quite special because its constraint set is potentially very large, very structured, and possibly involves a nonnegligible amount of redundancy. Indeed, the number of linear constraints of the form 3.11
We refer here to algorithm complexities in terms of polynomial or non polynomial runtimes. These notions will be introduced later. 4 That is, 0 belongs to the subset of points verifying the inequality as an equality.
3
quadratic programming
34
is dependent on the number of possible paths between two vertices in the graph, which typically grows exponentionally with the density of the graph m=n. As all paths are taken into account between an origin and a destination, a lot of constraints 3.11 are trivial once only few ones are suitably considered; indeed, at most m constraints are not trivial since they are vectorial and w 0. There exist procedures that eliminates such trivial or redundant constraints, allowing to start the problem's resolution with fewer constraints see Boot's procedure in 11, 12 , for instance. However, these checks for triviality require the enumeration of all constraints. In our case, enumerating an exponential number of constraints is of course out of question5 , and we will have to use a separation procedure" to determine which of these constraints are violated for a given value of the arc weights. This separation is naturally based on the computation of the shortest paths within the graph.
Implicit constraints
An implicit shortest path constraint restricts some attribute of a shortest path between an origin and a destination without suggesting the path that has to be taken from the origin to the destination. We therefore say that the shortest path is implicitly determined by its origin, its destination, the oriented graph and the current value of the arc weights w. Typically, the restricted attribute is the cost of the shortest path since our variables are the arc weights. We consider bound constraints on this shortest cost. We assume that nI origindestination pairs oj ; dj ; j = 1; : : :; nI are concerned with this constraint type, and we formulate the corresponding implicit constraints as follows: X 0 lj wa uj ; j = 1; : : :; nI ; 3:12 where p1 w is a shortest path with respect to the weights w from oj to dj . The values of lj j and uj are lower and upper bounds on the cost of the shortest path from oj to dj , respectively. For consistency, we impose that lj uj j = 1; : : :; nI and we allow lj to be chosen as zero and uj to be in nite. One constraint 3.12 actually consists of two inequality constraints: one for the lower bound part and one for the upper bound part. Due to the meaning of the shortest path principle, lower and upper bounds on shortest path costs have very di erent interpretations. Let us consider the j th origindestination pair. Since p1w is a path of minimum cost j P between oj and dj , imposing that a2pj w wa lj means that all paths from oj to dj must have their cost above or equal to lj . This can be expressed by the following.
1
a2p1w j
lj
X
a2p0
j
wa;
3:13
the path p0j being any path from oj to dj . These constraints are linear and a ne6, and their number is exponential with the density of the graph m=n, as for the depiction of an explicit
5 6
This procedure would be not polynomially bounded. That is, 0 does not belong to the manifold verifying 3.13 as an equality, when lj 6= 0.
quadratic programming
35
shortest path constraint. The feasible region delimited by lower bounds on shortest path costs is then convex. Again, much redundancy can be expected in the set of constraints 3.13. As a consequence, these constraints can be part of our inverse problem without a ecting the global nature of a solution to this problem. The underlying interpretation of an upper bound on the shortest path cost is fundamentally di erent: the j th upper bound constraint de ned by
X
a2p1w j
wa uj
3:14
does not compel all paths from oj to dj to have a cost under uj , but it just imposes that there exists one path from oj to dj whose cost does not overstep the upper bound uj . The a priori unknown path that must follow that condition has to be picked up among the exponential number of paths starting from oj and arriving at dj . A shortest path procedure will determine an appropriate path which will complete the constraint de nition in order to check whether the constraint is violated or not. However, the path that is to be selected for evaluating the constraint violation may vary with the arc weights w. The path p1 w then remains explicitely unidenti ed. Consequently the j constraint 3.14 cannot be expressed as one or more linear constraints, and hence cannot t into the classical QP framework de ned by 3.1 3.2. Moreover, the feasible region determined by constraints of the type 3.14 is nonconvex. Let us show it in this small example: consider the following graph, composed of 3 vertices and 3 arcs m = 3, shown in Figure 3.1, and consider the constraint
o
t
, , @ a2 @ , @ R @ , 
a1
t
a3
t
d
Figure 3.1: A small example for proving the nonconvexity.
X
a2p1 w
wa 5;
3:15
where p1 w is the shortest path with respect to the weights w from vertex o to vertex d. It is easy to see that w1 = 2 2 10T and w2 = 10 10 4T are feasible weight vectors, while 1 w1 + w2 = 6 6 7T is infeasible. The feasible region is therefore nonconvex. 2
The separability and sparsity
The objective function f w, as initially formulated in 1.9, displays the property of being separable. A separable function can be expressed as a linear combination of several singlevariable functions. Here, f w can be written as a sum of m functions, each of which includes a single
quadratic programming
36
variable:
f w =
m X i=1
fi wi
3:16
where fi w = 1 wi , wi 2, i = 1; : : :; m. This property is sometimes e ciently used to speed 2 up the solving procedure. One common characteristic that is shared by all the constraints 1.4 1.6 is that they are sparse. The constraints involve the weight of only very few arcs since they translate either the nonnegativity of an arc weight, or properties of a shortest path which is not eulerian7 in large graphs.
3.2.3 Searching for a strictly convex QP method
The nonconvexity yielded by the presence of upper bound constraints 3.14 does not allow to ensure that a solution to the inverse problem is global. The very special nature of the inverse shortest path problem constrained to upper bounds on shortest path costs will be analysed in Chapter 6. In that chapter local optimality will be de ned more precisely and we will set up a speci c method that produces a local solution to this problem. Deferring the resolution of our inverse problem with upper bound constraints to a further chapter, we now focus our attention on a method that is able to solve the basic" inverse shortest path problem de ned by 1.9, 1.4 and 1.5, that is, a strictly convex QP problem with a large number of sparse constraints involving much redundancy. Note that lower bounds on shortest path costs will be reintroduced in Chapter 5.
3.3 Note on the complexity of convex QP methods
In order to have insight on how e ciently solutions to quadratic problems can be computed, one naturally invokes the complexity theory. Vavasis 113 recently proposed a complexity analysis of convex and nonconvex QP in a very comprehensive way. Let us rst de ne some terms usually used in complexity theory.
3.3.1 Solving a problem A problem is a function F : C ! B , where C is the set of instances encoded as strings of character. Quadratic programming is a problem" in this sense. The set C would be of the form m; h; G; a; N; b encoded as strings of characters. In this case, m, h are integers, G is an m m matrix, a is an mvector, N is an m h matrix whose ith column is ni , and b is an hvector. Each such sixtuple is an instance" of quadratic programming. The value of F m; h; G; a; N; b is, for instance, the minimum value of aT x + xT Gx over all choices of x satisfying N T x b. In this example, the set B would be real numbers. A decision problem is a problem where the output set B consists of two elements fyes; nog. For instance, consider the problem of determining whether a vector x is a global solution of a QP problem or not.
7
using once, and only once, each arc of the graph.
quadratic programming
37
Computing the function F requires the computation standard model given by the Turing machine. We do not want to go into details, which can be found in 50, 113 . We just specify that the action of a Turing machine is deterministic, that is, cannot make choices. A Turing machine is said to compute function F : C ! B if, given an instance x of C , it eventually yields F x and halts.
3.3.2 Complexity classes
The complexity refers to the amount of resources required by a computation. A complexity class is the class of problems that satisfy a certain resource bound. Two wellknown complexity classes are P and NP.
P of problems is de ned to be those decision problems F : C ! fyes; nog such that a Turing machine M can compute F , and the number of steps required by M is bounded by p , where p is a polynomial and is the length of the input.
Definition 5 The class
Note that there is an nontrivial reasoning in order to allow Turing machines handle real numbers, or more restrictively, rational numbers, without loosing polynomial bounds when applied on decision problems. Another di culty comes from the fact that optimization problems are not generally stated as decision problems. The usual way to work around this indisposition is the following. Suppose that the optimization problem is that of minimizing f x subject to x 2 D; the associated decision problem can be: given f , D, and a rational8 number , does there exist an x in D such that f x ? Showing that an optimization problem lies in P is of great importance and is considered as a very positive result, since it turns out that when a problem has a polynomial time algorithm, it generally has an implementation that is e cient in practice. However, there are no techniques known at present for proving that a problem is not in P. The best technique available is the theory of NPcompleteness, which will be introduced in Chapter 6. For the moment, we just de ne the complexity class NP. The NP class contains the decision problems such that all instances yielding a yes can be veri ed in polynomial time. More formally, we need an alphabet, denoted by , which refers to the nite set of characters that are necessary to encode all useful instances. Then, denotes the set of all possible nite instances that can be expressed with the alphabet .
! fyes; nog is said to lie in NP if there exists a polynomial p , a nite alphabet , a Turing machine M the certi cate checker running in polynomial time and computing a function : C ! fyes; nog such that 1. For every x such that F x = yes there exists an instance 2 the certi cate for x such that x; = yes.
Definition 6 A decision problem F : C
8
See the remark just above.
quadratic programming
38
2. For every x such that F x = no and for every instance 2 , x; = no.
A precision is usually added indicating that, in Item 1, the length of is bounded by the value of plength of x. This de nition means that every yesinstance has a certi cate of polynomial length, and that there exists a Turing machine that can check the certi cates in polynomial time. In the optimization framework, a problem is in NP if one can verify in polynomial time whether a given point is a solution of that problem or not. Many combinatorial problems are in NP since verifying is usually simple for that kind of problems. As a direct consequence, remark that P NP.
3.3.3 Convex quadratic programming
Vavasis showed in 113, Chapter 3 that convex QP problems are in P. The nonconvex case will be of interest in Chapter 6.
Theorem 3.4 The QP problem de ned by 3.1 3.2 with a positive de nite Hessian G is solv
able in polynomial time by a deterministic algorithm.
This theorem and the above comments allow the opinion that some e cient algorithm for solving convex QP problems does exist.
3.4 Resolution strategies
During the past three decades, several algorithms have been suggested to solve the QP problem 3.1 3.2. Classifying these methods is not simple because methods sharing some characteristics often di er from another point of view. In quadratic and nonlinear constrained optimization, one usually distinguishes between primal and dual methods. The techniques employed for handling the constraints rather refer to simplextype methods and active set methods. One can nd both techniques in primal and dual approaches.
3.4.1 Primal and dual methods
The presentation of this section may give the impression of keeping primal and dual methods apart from each other. Both methods are not so separate in practice, and methods exist mixing both approaches; they are called primaldual methods.
Primal methods
A primal method works on the original problem directly by searching through the feasible region for the optimal solution. Each point in the process is primal feasible and the value of the objective function constantly decreases. Such methods bene t from the following advantages: Most primal methods do not rely on special problem structure, such as convexity, and hence apply to a wideranging class of problems.
quadratic programming
39
Since each generated point is feasible, if the process halts before reaching the solution, the nal point is feasible and may represent an acceptable approximation to the solution of the original problem. These methods however present major drawbacks. They require an initial procedure to start from an initial feasible point. Di culties also come from the need to maintain this feasibility throughout the process. As noticed by Luenberger in 81 , some methods can fail to converge for problems with inequality constraints unless elaborate precautions are taken, but they generally have good convergence rates, particularly with linear constraints. A primal method solving our QP problem 3.1 3.2 gives a point x that satis es the following conditions called KuhnTucker conditions 73 : there exist real numbers ui 0, i = 1; : : :; h such that h X rf x , uirEix = 0 3:17a for all i = 1; : : :; h: 3:17b The vector u is called the vector of Kuhn and Tucker multipliers or Lagrange multipliers. The conditions 3.17a 3.17b geometrically mean that the gradient of f at x lies in the normal cone de ned at x, that is, can be expressed as a linear combination of the inward normals to the binding or active constraints at x. Note that in case of nondegeneracy9 , the Kuhn and Tucker multipliers are unique. Just for reference, famous primal methods are the gradient projection methods and the reduced gradient method. Both basic methods can be viewed as the method of steepest descent applied on the manifold de ned by the binding or active constraints. In linear programming, the simplex method is wellknown for travelling through extremal points of the convex feasible region.
Dual methods
and
i=1
uiEi x = 0
A dual method does not tackle the original constrained problem directly but instead considers an alternate problem, the dual problem, whose unknowns are the Lagrange multipliers of the rst problem. Lagrange multipliers, for convenience denoted by u 2 Rh , in a sense measure the sensivity of the constraints as they appear in the following function, the dual objective function which is usually called the Lagrange function or Lagrangian:
Lx; u def f x , =
h X i=1
ui Eix
3:18
where Lx; u is the Lagrange function of the QP problem de ned by 3.1 3.2. For a problem with m variables and h constraints, dual methods thus work in the hdimensional space of the When rEi x , with i such that Ei x = 0, are linearly independent.
9
quadratic programming
40
Lagrange multipliers u, and solve the dual problem which is formulated as follows. subject to where
u2Rh
max du
3:19 3:20
u 0;
du = xminm Lx; u: 3:21 2R For our convex QP problem, it can be shown that du is concave any local maximum is then global, since du = , 1 uT N T G,1 N u + uT b + N T G,1a , 1 aT G,1 a, where N is the m h 2 2 matrix whose ith column is ni . The nonnegativity of the Lagrange multipliers 3.20 is due
to the inequality constraints 3.2 of the original or primal problem; equality constraints would leave their Lagrange multiplier unconstrained in sign. Once these multipliers are known, one must determine the solution point in the space of primal variables x, such that Gx = Nu , a, in order to supply the desired solution of the QP problem. A method solving the above dual problem actually searches for a saddlepoint ; u of the Lagrange function Lx; u, that is, a x point such that L; u Lx; u for all x 2 Rm x 3:22a and L; u L; u for all u 0: x x 3:22b If such a point is found, then x is a global optimum of the primal problem. Dual methods o er the following attractive features: Lagrange multipliers have meaningful intuitive interpretations as prices associated with the constraints, in the context of practical applications. Dual methods do not require to start from an initial primal feasible point. Global convergence of dual methods is often guaranteed. The e ciency of dual methods however heavily relies on the convexity of the problem. Dual procedures also have the disadvantage of supplying a primal feasible solution only when they terminate.
Primaldual relation
For primal and dual feasible solutions, x and u, we have that du f x. Under di erentiability conditions, optimal points of the primal and dual problem yield the same primal and dual objective function values f x = du: 3:23 In our case, the objective function f x and the constraints Eix are convex and di erentiable. Then, x is a global minimum if and only if the Kuhn and Tucker conditions are satis ed. This result is stated in the theorem that follows.
quadratic programming Theorem 3.5 Provided that there exists x 2 Rm such that Ei x
41
0 for i = 1; : : :; h the Slater condition10, the Kuhn and Tucker conditions are necessary and su cient conditions to obtain a global optimum x to the QP problem de ned by 3.1 and 3.2 where f x is convex: there exists u 0 verifying rxLx; u = 0 3:24a
and
u Eix = 0; i
for all i = 1; : : :; h:
3:24b
The notation rx indicates the partial derivative with respect to the variable x. A proof of this theorem can be found in 82 .
3.4.2 Simplextype and active set methods
The methods used to solve QP problem can for the most part be categorized as either modi ed simplex type methods or active set methods.
Simplextype methods
Quadratic programming methods that wear the simplex label" have one or more of the following properties. They use simplex tableaux; they perform Gauss elimination type pivots on basis matrices that are derived from the KuhnTucker optimality conditions 3.24a 3.24b; they nally may reduce to the simplex method for linear programming for the degenerate case of G = 0. They roughly consist of a generalization of the simplex method for quadratic programming. These methods are inappropriate to our inverse problem since the pivot operations are performed on matrices of row size m + h. The number of our constraints h = nE being typically exponential, the simplextype methods would require too large matrices for operating correctly. See 27 for further details concerning such methods.
Active set methods
These methods are based upon projections onto active sets of constraints and employ operators of size no larger than m m. They are consequently generally more e cient and require less storage than methods of the simplex type. In fact, as mentioned by Luenberger in 81, Chapter 14 , a quadratic program with inequality constraints is almost always solved by an active set method. In particular, active set techniques handle much better the possible redundancy among the constraints. In an active set method, some inequality constraints, indexed by the active set denoted by A, are regarded as equalities while the remaining constraints are temporarily disregarded. The method updates this active set in order to identify the correct active constraints at the solution to the original problem 3.1 3.2. On iteration k a point11 xk is calculated which satis es the
this condition along with the convexity of both the objective function and the constraints implies that the Lagrange function has a saddlepoint. 11 This point will be primal feasible if the active set strategy is used by a primal method.
10
quadratic programming
42
active constraints as equalities, that is nT xk = bi for i 2 A. Moreover, apart from degenerate i T xk bi for i 62 A. Each iteration thus attempts to locate the solution xk to an cases, ni equality constrained problem in which only the active constraints occur. To the solution xk corresponds a vector of Lagrange multipliers uk for the active constraints in A. The vectors xk and uk then verify the Kuhn and Tucker conditions possibly without primal or dual feasibility depending upon whether the active set technique is combined with a dual or primal method, respectively. Some constraints thus need to be added and or removed from the active set, and a new subproblem then comes under consideration. This is repeated until validation of the KuhnTucker conditions for the original problem. Technical details about active sets can be found namely in 41, Chapter 10 .
3.4.3 Choosing a particular method
Let us proceed to a short survey of quadratic programming methods in order to choose the one that is the most appropriate for solving our inverse problem. The rst known QP methods appeared in the late Fifties and early Sixties: Frank and Wolfe 46 , Beale 4 , Wolfe 115 , Dantzig 27 , Van de Panne and Whinston 109 set up the rst primal QP algorihms. Being in uenced by the e ciency of the simplex method in linear programming, these methods operate on simplex tableaux. They modify the quadratic problem using Kuhn and Tucker's developments so that the simplex method can be used. Except for Wolfe's method, these procedures were dedicated to strict convex QP problems. Another characteristic of these methods is that they start from a primal feasible point which does not necessarily minimize the objective function. In Beale's method, inequality constraints are converted to equations, and an initial feasible solution is found for the constraint equations without considering the nature of the objective function. Further steps are attempts to improve the value of the objective function. At the same time, Theil and Van de Panne 108 proposed a rst dual method for solving QP problems. They are just followed by Lemke 75 who also proposed a dual algorithm. Note that Lemke 74 priorly developed the dualsimplex method. Both QP methods assume the strict convexity of the objective function and follow identical steps in their progress. They were so far preferable to other methods if the solution laid on relatively few of the constraining hyperplanes. Theil, Van de Panne and Lemke used an active set strategy without naming it. They had the idea of starting from the unconstrained minimum, which is G,1 a, where the gradient of f vanishes. In constrast with the above primal methods, this initial point is usually primal infeasible but it minimizes the objective function. The dual procedure however needs the inverse of the Hessian G, but, as noticed in 76 , the computations involved in nding a primal feasible solution are approximately equivalent to nding G,1 . Each step of the dual method then consists of solving subproblems where some inequality constraints are satis ed in equational form as explained in the previous section. Theil and Van de Panne devised ingenious rules in order to limit the search to a small number of all possible solution combinations. At the beginning of the main section, we mentioned the di culty to classify QP methods.
quadratic programming
43
Goldfarb showed in 53 that Beale's method, which was developed as an extension of linear programming, can be viewed as an active set method. This is corroborated in 75 where Lemke nds his method very close to that of Beale. Fletcher 40 writes similar comments about Dantzig's method, and Van de Panne and Whinston 110 showed that Beale's method and Dantzig's method generate the same sequence of points if they both start from the same initial point. This brings forward the fact that QP methods apparently originate at the same basic idea, and that they di er by the relative point of view with which they have been developed. The relative merits of QP methods then show up through extensive computational experience. Indeed, remember that the storage computational aspect invited us to prefer active set methods against simplextype ones. For experience results about the above algorithms, see those obtained by Fletcher 40 and Goldfarb 53 . Later, Gonalves 58 proposed a primaldual algorithm based on simplex techniques, and c Stoer 103 developed a method for constrained leastsquares. Again, both approaches carry out Gauss elimination pivots on potentially large matrices. Gill and Murray 52 set up a primal method using an active set and QRfactorizations on the matrix formed by the normals to the actives constraints. Their method applies to inde nite QP problems, and has the advantage to be numerically stable, but it needs an initial primal feasible point. The search for an initial feasible point can be avoided by an alternative approach proposed by Conn 22 minimizing a penalty function. This modi cation allows the iterates to be not feasible. According to Gill and Murray 52 , results produced by Conn and Sinclair 23 do not allow rm conclusions. A more recent method for convex QP is suggested by Goldfarb and Idnani 55 . It can be viewed as a dual active set method. They use the idea of Theil and Van de Panne of starting from the unconstrained minimizer of the quadratic function, and factorize the matrix of the normals to the active constraints by similar techniques as those employed by Gill and Murray. The factorization techniques bring numerical stability to the procedure. The Goldfarb and Idnani GI method already seems to gather the advantages of prior methods, that are appropriate to the solving of large convex QP problems involving redundancy. Indeed, following Fletcher's appreciation 41 , the method is most e ective when there are only few active constraints at the solution, and is also able to take advantage of a good estimate of the active set A at the solution. This latter advantage makes the GI method suitable for sequential quadratic programming methods for nonlinearly constrained optimization calculations. Powell 96 analyzed the Goldfarb and Idnani method in the special case where the Hessian G is illconditioned due to a tiny eigenvalue. Powell's conclusions cast some doubt on the numerical stability of Goldfarb and Idnani's implementation. Powell proposed in 97 a stable implementation of the GI method that circumvents these di culties. Note that our Hessian G = I , the identity matrix, does not enter the scope of Powell's improvements. The GI method suitably meets the requirements for solving our convex QP problem. The next section is devoted to its analysis.
quadratic programming
44
3.5 The Goldfarb and Idnani method
The Goldfarb and Idnani method GI solves the problem of minimizing f x = aT x + 1 xT Gx 2 over x 2 Rm , subject to Ei x nT x , bi 0 for i = 1; : : :; h, where G is positive de nite. i The matrix G is m m, a and ni are in Rm , and bi is in Rh . This problem will be referred to as the convex quadratic program CQP. Since the inverse shortest path algorithm presented in Chapter 4 will heavily rely on the GI method, we present here a detailed analysis whose aim is both to establish the primal and dual step formulations, and to prove the nite termination of the GI algorithm. This detailed presentation closely follows that used by Goldfarb and Idnani in their common paper 55 .
3.5.1 Basic principles and notations
The GI method uses an active set, which is a subset of the h linear inequality constraints that are satis ed as equalities by the current estimate x of the solution to CQP. The set A will index the constraints of the active set. Since Golfarb and Idnani use a dual approach, they must rst provide an initial dual feasible point, that is, a primal point for some subproblem of the original problem. By relaxing all constraints A = ; and the Lagrange multipliers u = 0, the unconstrained minimum of f x, ,G,1 a, is such a point. The dual method then iterates until primal feasibility i.e. dual optimality is achieved, while maintaining the primal optimality of intermediate subproblems i.e. dual feasibility. Let us de ne a subproblem P K as the QP problem of minimizing f x subject only to the subset of constraints Ei x 0, i 2 K , where K is a subset of f1; : : :; hg. The unconstrained minimum is then the solution to P ;. The solution to P K lies on some linearly independent active set of constraints" indexed by A K ; one says that this solution x is part of an Spair x; A. By linear independence of a set of constraints, we mean that the inward normals ni corresponding to these constraints are linearly independent. If the dual problem to CQP is set up, then the GI method is equivalent to a primal active set method being applied to this dual problem. In 55 , Goldfarb and Idnani discuss their dual method in terms of primal subproblems, nding that approach more instructive. We now propose to examine their approach. Some undetailed proofs in 55 will be re ned here.
3.5.2 The GI algorithm
Here is the basic approach followed by Goldfarb and Idnani.
Algorithm 3.1
Step a. Assume that some Spair x; A is given, typically ,G,1a; ;. Step b. Pick some q such that constraint q is infeasible in CQP. Step c. If P A fqg is infeasible, then exit: CQP is infeasible. Step d. Else, determine a new Spair ; A fqg such that A A and f x x fq g. x; A ; A x
f x. Set
quadratic programming
45
Step e. If all constraints are satis ed, then exit: x is the optimal solution to CQP.
Else, go to Step b. Note that, in Step b, the index q belongs to f1; : : :; hg n A. We need some more notations to describe the algorithm more formally. Let jAj be the number of constraints in A, and N be the m jAj matrix whose columns are the normals ni of the constraints in the active set A. The algorithm will use two additional matrices when the columns of N are linearly independent:
N def N T G,1N ,1 N T G,1 ; =
3:25
which is the MoorePenrose generalized inverse of N in the space of variables under the transformation y = G1=2x, and, if I is the m m identity matrix,
H def G,1 I , NN ; =
3:26
which is the reduced inverse Hessian of the quadratic objective function in the subspace of points satisfying the active constraints. Indeed, since NN is the operator of the projection along the subspace of points satisfying the active constraints, I , NN is a generally nonorthogonal projection onto the manifold verifying the active constraints. Note that H is symmetric. The operators N and H satisfy the following properties.
Property 4 Hw = 0 , w = N , with Proof.
2 RjAj.
Since G,1 is inversible, Hw = 0 , I , NN w = 0 , w = N , for some 2 RjAj , since I , NN is a projection along the subspace spanned by N . 2
Property 5 HGH = H .
This is easy to see: HGH = G,1I , NN 2 = G,1 I , NN = H . 2 As a consequence, H is positive semide nite since for all x 2 Rm , one can write
Proof.
xT Hx = xT HGHx = HxT GHx 0;
by symmetry of H and positive de niteness of G.
Property 6 N GH = 0. Proof.
3:27
The lefthand side equals N , N NN where N N equals the Identity matrix. 2 Let us denote the feasible region of P A by F A, where
F A def fx 2 Rm j nT x = bi; i 2 Ag; = i
3:28
and the Gradient of the objective f at x by g x rf x = Gx + a. Then we can state the following theorem.
quadratic programming Theorem 3.6 Suppose that x belongs to F A. Then, the minimizer ^
x2F A
46
min f x
3:29
is attained at x = x , Hg ^. ^ x
By Taylor's formula, f x can be developed as f ^ + g T ^x , x + 1 x , xT Gx , x: x x ^ 2 ^ ^ 3:30 The solution x of minimizing f x over F A is given by the condition that I , NN rf = 0, x i.e. I , NN g ^ + G , x = 0: x x ^ 3:31 The nal expression of x follows from 3.31, 3.26, the fact that I , NN G = GHG = GGH , and x, x 2 F A. 2 = GI , NN ^ The expression of x in Theorem 3.6 is similar to the solution to Newton's equation adapted to the case of a projection onto a subspace of Rm . Since x the optimal solution of P A, it satis es the Kuhn and Tucker condition 3.17a:
Proof.
g = Nu; x x
3:32
where u 0 is the vector of Lagrange multipliers associated to the active constraints at x. x From 3.25 we then have u N g 0; x x 3:33 and Hg = 0 x 3:34 since 3.33 implies that Nu = NN g , which is equivalent to I , NN g = 0 by 3.32. x x x Conditions 3.33 and 3.34 re ect the dual feasibility and the primal optimality of x with respect to P A, respectively. These conditions are su cient as well as necessary for x to be the optimal solution of P A. Let us now detail Algorithm 3.1 using the above precisions. The following algorithm makes use of another set of multipliers
r = N nq
which Goldfarb and Idnani call infeasibility multipliers.
Algorithm 3.2
3:35
step 0: Find the unconstrained minimum.
Set x ,G,1 a, f 1 aT x, H G,1 , A ; and u 0 . 2 step 1: Choose a violated constraint, if any. Compute the constraint values fEi xgh=1. If all constraints are satis ed, the current x i
quadratic programming
47
is the desired solution. Otherwise, a violated constraint is chosen, that is, an index q is selected in f1; : : :; hg such that Eq x 0. Also set
8 !
u
0 0
u+
:
if jAj 0; if jAj = 0:
3:36
step 2: Compute the primal and dual step directions.
These directions are computed by the relations and, if jAj 0,
s = Hnq r = N nq : S = fj 2 f1; : : :; jAjg j rj 0g:
3:37 3:38
step 3: Determine the maximum steplength to preserve dual feasibility.
De ne 3:39 The maximal steplength that will preserve dual feasibility is then given by
8
tf =
: +1
u+ ` r`
= minj 2S ujj r
+
if S 6= ;; otherwise.
3:40
step 4: Determine the steplength to satisfy the qth constraint. This steplength is only de ned when s = 0, and is then given by 6
q x tc = , ETn : s q
3:41
step 5: Take step and update the active set. If tf = 1 and s = 0, then the original CQP is infeasible and the algorithm stops with a
suitable message. Otherwise, if s = 0, update the Lagrange multipliers by
u+ u+ + tf
,r
1
!
3:42
and drop the `th constraint, that is A A n f`g, where ` has been determined in 3.40. Then go back to step 2 after updating H and N . If s 6= 0, tc is wellde ned, and one sets
t = min tf ; tc ; x x + ts;
3:43 3:44
quadratic programming
48
and
f f + t 1 t + u+ j+1sT nq jA 2
3:45 3:46
8
u+
If t = tc , then set u u+ , add constraint q , that is A A fq g, and go back to step 1 after updating H and N . If, on the other hand, t = tf , drop the `th constraint, that is A A n f`g and go back to step 2 after updating H and N . Note that u, the vector of Lagrange multipliers, has a dimension equal to the number of active constraints. We observe that the GI algorithm involves three types of possible iterations. 1. The rst is when the new violated constraint is linearly independent from those already in the active set, and all the active constraints remain active at the new solution of the QP subject to the augmented set of constraints. This occurs when t = tc . 2. The second is when the new violated constraint is linearly dependent on those already in the active set. This occurs when s = 0, or, equivalently, when Nr = nq . In order to preserve independence of the active set that is, linear independence of the columns of N , an old constraint the `th is dropped from the set before incorporating the new one. As a result, N is always of full column rank. 3. The third is when the solution of the QP subject to the augmented set of constraints is such that one of these constraints is not binding. This occurs when t = tf , in which case the `th constraint ceases to be binding. As one wishes to keep only binding constraints in the active set, this constraint is dropped. An e cient implementation of this algorithm does not need to explicitely compute and store the operators N and H that are used by the algorithm. One can store and update the matrices J = QT L,1 and R obtained from the Cholesky and QR factorizations G = LLT and L,1 N = Q R . We will bring precisions about these factorizations when, in Chapter 4, we specialize the 0 GI method for our inverse shortest path problem. Before examining the algorithm's properties, we need to introduce some more notations. The set A+ denotes the set A fq g where q 2 f1; : : :; hg n A is the index of the constraint that is to be added in the active set. Similarly, A, refers to a subset of A containing one fewer element than A. Accordingly, N + and N , are the matrices of inward normals corresponding to A+ and A, , respectively. The normal n+ indicates the normal vector nq added to N to give N + and n, is the column removed from N to give N , . In agreement, N + and H + denote the operators de ned in 3.25 and 3.26, respectively, with N + instead of N . Finally, the mvector ei is the ith column of the identity matrix I . The next two sections discuss the cases where the columns of N + are linearly independent or not.
: u+ + t
u+ + t
,r
1
!
if jAj 0; if jAj = 0:
quadratic programming
49
3.5.3 Linear independence of the constraints
According to Algorithm 3.2's mechanism, when a new Spair, ; A fq g say, has to be determined x during Steps 2 5, given the Spair x; A, we already know that
Eq x 0;
and that
3:47
Ei x = 0 for all i 2 A: 3:48 Since x is the optimal solution of P A, we have that Hg x = 0 and ux N g x 0 according to 3.34 and 3.33. By Property 4, Hg x = 0 is equivalent to the fact that g x can be written as a linear combination of columns of N . Consequently, g x is a linear combination of columns of N + too, that is, g x = N + + . Now, if nq is linearly independent from ni i 2 A, one can use the operators H + and N + , and the last result is equivalent to H +gx = 0 3:49 again by Property 4. Moreover, + is the vector of Lagrange multipliers associated with A+ at x, with + j+1 = 0. We then can assert that jA u+ x N + g x 0: 3:50
Let us gather the found properties into a de nition.
Definition 7 A triple x; A; q where q
both nq is linearly independent from the columns of N , and the equations 3.47 3.50 hold with A+ = A fqg.
2 f1; : : :; hg n A is said to be a V violatedtriple if
The following lemma shows how to nd a point x minimizing f on F A+ from a point x of a Vtriple x; A; q .
Lemma 3.7 Consider x; A; q , a Vtriple, and points of the form
x = x + ts
where the primal step direction s is de ned by
3:51 3:52 3:53 3:54 3:55 3:56 3:57
s = Hn+:
Then we have the following:
Ei = 0 x
where Moreover,
H +g = 0; x
for all i 2 A;
u+ N +g = u+ x + t x x r = N nq :
! ,r ; 1
Eq = Eq x + tsT n+ : x
quadratic programming Proof.
50
We rst need to establish some results before proving the above equations:
g = Gx + ts + a = g x + tGs; x
from 3.52 and 3.56, we can write Gs as
3:58
Gs = I , NN n+ = n+ , Nr = N + H +N + = 0;
! ,r ; 1
3:59
3:60 because I , N + N + is a projection along the subspace spanned by the columns of N + ; N + N + = I; since nq is linearly independent from the ni indexed by A; nally, 3:61 3:62
nT Hn+ = Hni T n+ = 0; for all i 2 A; i
because I , NN ni = 0 for i 2 A. In order to show 3.53, we use successively 3.51, 3.58 3.60 and 3.49:
H + g = H + g x +tH + Gs = t H + N + x
 z
0
 z
0
! ,r : 1
3:63
Now, let us prove that x belongs to F A: +tnT s = t nT Hn+ = 0 i i z 0 by 3.48 0 by 3.62 The modi cation of the Lagrange multipliers is as follows:
Ei = nT x + ts , bi = x i
Ei zx 
for all i 2 A:
3:64
u+ N + gx + tN +Gs ! x = N + g x + t N + zN + ,r 
= u+ x + t ,r 1
by 3.58 by 3.59 by 3.50 and 3.61. 3:65
!I
1
Finally, the evaluation 3.57 of Eq is established by a similar development as that in 3.64. x 2. By Lemma 3.7 we can determine the point xc = x + tc s that minimizes f over F A+ : it is the point such that Eq c = 0, which implies that tc = ,Eq x=sT nq if sT nq 6= 0. Moreover, x c ; A+ will be an Spair if u+ c 0. If not, then 3.55 allows a smaller value tf of the x x steplength t such that tf tc and that some u+ f becomes negative, where xf = x + tf s. The i x constraint, say ` 2 A, corresponding to that ith component is dropped from the active set and f ; A, ; q satis es the conditions to be a Vtriple, where A, = A n f`g. This is formally stated x by the following theorem.
quadratic programming Theorem 3.8 Let x; A; q be a Vtriple and x be de ned as in Lemma 3.7 with
51
t = minftc ; tf g;
where and
3:66 3:67
if S 6= ;; 3:68 : +1 otherwise, where S = fj 2 f1; : : :; jAjg j rj 0g. The multipliers u+ x and r are given by 3.55 and 3.56 respectively. Then, we have that
8
tc = , ETq x s n+
u+ x ` r`
tf =
= minj 2S
u
j x rj
+
Eq Eq x x
and we observe the following increase of the objective function f f , f x = tsT n+ 1 t + ujAj+1x 0: x 2 If t = tc , then ; A fq g is an Spair, and if t = tf , then ; A n f`g; q is a Vtriple. x x
3:69 3:70
In the de nition 3.68 of tf , we abused the notation of the index `, which is actually the index of the constraint as de ned in CQP 1 ` h, and not its index j ` in the vector u+ x where 1 j ` jAj. For the sake of simplicity, let ` refer to the dropped constraint in either cases. Let us now prove the above theorem. Proof. Let us rst note that s = G,1I , NN n+ is 6= 0 since the linear independence of n+ from the columns of N makes n+ not belong to the null space of I , NN , and G,1 is positive de nite. Then, since x; A; q is a Vtriple, we can write
sT n+ = n+T Hn+ Prop. 5 n+ T HGHn+ 3.52 sT Gs 0; = =
3:71
because G is positive de nite and s 6= 0. As a consequence, t 0 and by 3.57, Eq = x T n+ E x. Remark that when t = t , E E x since t Eqx + t  z s q c qx q c 0 from 3.47 and
0
3.67. On the other hand, using Taylor's formula on f with x , x = ts, one has that f , f x = tsT g x + 1 t2 sT Gs: x 3:72 2 By Property 4, H + g x = 0 implies that g x = N + u+ x. It then follows that Hg x = HN +u+ x = Hn+u+ j+1 x, since H projects along the manifold spanned by the columns of jA N the nonzero contribution remains that of the jAj + 1th column of N + , which is n+ . Consequently, sT gx = n+ T Hg x = n+T Hn+u+ j+1x 0 3:73 jA
quadratic programming
52
by 3.71 and 3.55. Substituting 3.71 and 3.73 into 3.72, gives 3.70. Moreover as long as t 0, f f x. x Lemma 3.7 and the de nition of t 3.66 3.68 ensure that x is primal optimal for P A+ +g = 0, primal feasible for P A Ei = 0 for i 2 A, and that u+ is dual feasible H x x x + and ; A fq g is an Spair. 0. If t = tc , then Eq = 0, x is primal feasible for P A x x We then have performed a full step in the primal space. If t = tf tc , then Eq 0 and x + = 0. Since H + g = 0, the latter equation implies that u` x x
g = N +u+ = x x
X
where i is the j ith index in A+ . Consequently, ; A n f`g; q is a Vtriple since the set of x normals fni j i 2 A fq g n f`gg is of course linearly independent. We then have performed a partial step in the primal space. 2 The above theorem allows to obtain an Spair ; A fq g from a Vtriple x; A; q with A A, x such that f f x. This is achieved after jAj , jAj partial steps this number is jAj or x one full step.
i2A fqgnf`g
u+i ni ; j x
3:74
3.5.4 Linear dependence of the constraints We now handle the case where the normal n+ is linearly dependent from the columns of the matrix N . This does not allow x; A; q to be a Vtriple. Then, two situations may occur: either the subproblem P A fq g is infeasible, or a constraint can be removed from the active set A so that x; A,; q is a Vtriple.
The rst situation implies that the original problem CQP is also infeasible. The second situation involves a constraint drop which is similar to the partial step described in Theorem 3.8. If n+ is a linear combination of the columns of N , then the primal step direction s de ned by 3.52 becomes s = Hn+ = 0; 3:75 since H projects along the subspace spanned by the columns of N . As a consequence, the steplength computed to verify q th constraint in the primal space, tc , is in nite. The following theorem suggests the procedure in such a case.
Theorem 3.9 Assume that x; A is an Spair and that q is the index of a constraint in the set
f1; : : :; hg n A such that
and
n+ nq = Nr Eq x 0:
3:76
3:77 If r 0, then P A fq g is infeasible; otherwise, constraint ` can be dropped from the active set, where ` veri es u`x = min uix ; 3:78
r`
ri 0j1ijAj
ri
quadratic programming
53
to give A, = A n f`g and the Vtriple x; A, ; q .
Again, in 3.78, we abuse of the notation ` as we did in Theorem 3.8. Proof. Suppose that there exists a feasible solution x = x + s to the problem P A fq g. On one hand, x must verify Eq = 0 for being a solution to this problem. Consequently, since x + T s = n+ T x , n+ T x n  z  z 0, one can write
=bq
bq by 3.77
n+ T s = rT N T s 0;
3:79
using 3.76. On the other hand, x must be feasible, that is Ei 0 for all i 2 P A fq g. x if and only if Since x; A is an Spair, this result is veri ed for i 2 A, that is, Ei = Ei zx +nT s 0, x  i
0
N T s 0:
3:80
For i = q , n+ T s must exceed ,Eq x which is stricly positive. One directly sees that, if r 0, both requirements 3.79 and 3.80 cannot be simultaneously satis ed; hence, in this case, problem P A fq g is infeasible. If some component of r is positive, it follows from 3.78 that r` 0 and from 3.76 that
2 3 X n` = r 1 4n+ , rj ini 5 :
j ` i2A,
3:81
Since x; A is an Spair, we have that
gx = Nux = Pi2A ujixni P = i2A, uj ixni + uj ` x ` n P ,
u x , uj ` r n + uj ` n+ ; by 3.81. = i2A j i rj ` j i i rj `
3:82
Note that in 3.81 3.82, we distinguish between the `th constraint and its index j ` in the active set. ^ ^ Now, if we de ne A = A, fq g, then N has full rank, that is,
X
^ i2A
i ni = 0
3:83
^ implies i = 0 for i 2 A. Indeed, suppose that 3.83 holds. Then, by 3.76, we can write
X
i2A,
i ni + q
X
i2A
rini = 0;
3:84 3:85
that is,
X
i2A,
i + q rini + q r` n` = 0:
quadratic programming
54
Since fni j i 2 Ag is linearly independent, we deduce from 3.85 that i + q ri = 0 for all i 2 A, , as well as q r` = 0. We know that r` 0. Thus, q = 0 and hence i = 0 for i 2 A, . The ^ matrix N then has full rank. ^ It then follows from 3.82 and Property 4 that Hg x = 0, and ^ ux N gx = ^
uji , u`` rji 0; for i 2 A, ; r u` 0; r`
3:86
^ since N ni = ei for i 2 A, . This establishes that x; A,; q is a Vtriple. We then have performed a dual step. 2 Note that the change that occurs to the active set A and to the dual variables in the partial step described in Theorem 3.8 is the same as that performed in the dual step described just above. The only di erence is that x is not changed in the dual step there is no step in the primal space while this primal modi cation generally occurs in a partial step. A dual step emphasizes that when degeneracy occurs, it is possible to take nontrivial steps in the space of dual variables without changing x and f x.
3.5.5 Finite termination of the GI algorithm
The termination of the GI algorithm is stated in the following theorem.
Theorem 3.10 The dual Algorithm 3.2 will solve CQP or indicate that it has no feasible solution
in a nite number of steps.
Proof.
Each time Step 1 of Algorithm 3.2 is executed, the current point x solves the subproblem P A and x; A is an Spair. If x satis es all the constraints in CQP, then it is an optimal solution to CQP. Otherwise, a new Spair ; A is obtained after one full step via at x most jAj minfh; mg partial and or dual steps, or infeasibility is detected according to Theorem 3.8 and Theorem 3.9. If the problem is not infeasible, the algorithm then returns to Step 1 and we have that f f x. This shows that an Spair can never reoccur. Since the number x of possible Spairs is nite, the Algorithm 3.2 terminates in a nite number of iterations. 2 The numerical results presented by Goldfarb and Idnani in 55 and by Powell in 96, 97 encourage the use of the GI method against those presented in the previous sections. The GI implementation always chooses the most violated constraint to add to the active set. In 55 , one can nd an example showing di erent solution paths" according to diverse constraint selection strategies. The example witnesses that the most violated heuristic performs well in practice. Note that primal methods cannot choose which constraint to add in the active set unless small infeasibility is tolerated. The GI dual method proved to be superior to primal algorithms because it appears not to add many constraints to the active set that are not in the nal active set. Clearly, the number of drops" is relatively small. Compared to other dual methods, the GI implementation is far more e cient and also more numerically stable.
4 Solving the inverse shortest path problem
In this chapter, the basic inverse shortest path problem is considered where the constraints are given as a set of shortest paths and nonnegativity constraints on the weights. We introduce the concept of island" in order to characterize the violation of shortest path constraints. The violation of an explicit shortest path constraint creates one or more islands: an island is made of two shores"; the rst shore indicates a portion of the computed shortest path and the second shore is the succession of arcs that the path should but does not follow; both shores have common termination vertices. In order to follow Goldfarb and Idnani's method framework, we establish specialized formulations of the primal and dual step directions, the update of the arc weights, and the maximum steplength to preserve dual feasibility. These new formulations are islandoriented". We also provide a way to check whether the primal step direction s is zero without computing s explicitly. A computational algorithm is then proposed. Our method is then tested on practical large scale problems with large numbers of constraints. These tests con rm the e ciency of our method since few constraints, or islands, are added in the active set that are not active at the solution. The content of this chapter has been published in 15 .
4.1 The problem
The inverse shortest path problem has been motivated in Chapter 1 with examples drawn from both tra c modelling and seismic tomography. We therefore directly remember the formal basic problem together with its special nature. We nd convenient to recall the notations that will be used throughout this chapter. A weighted oriented graph is a triple V ; A; w, where V ; A is an oriented graph with n vertices and m arcs, and where w is a set of nonnegative weights fwigm associated with the arcs. We i=1 denote the vertices of V by fvk gn=1 and the arcs of A by faj = vsj ; vtj gm , with sj being j =1 k the index of the vertex at origin of the j th arc and tj the index of the vertex at its end. We assume that such a weighted oriented graph V ; A; w is given, together with a set of acyclic paths pj = aj ; aj ; : : :; ajl j j = 1; : : :; nE ; 4:1
1 2
55
solving the inverse shortest path problem
56
where lj is the number of arcs in the j th path its length, and where
tji = sji+1 for i = 1; : : :; lj , 1:
4:2
If we de ne w as the vector in the nonnegative orthant of Rm whose components are the given initial arc weights fwig, the problem is then to determine w, a new vector of arc weights, and hence a new weighted graph G = V ; A; w such that
w2Rm
min kw , wk
4:3 4:4
is achieved under the constraints that
wi 0 i = 1; : : :; m
E and that the paths fpj gn=1 are shortest paths in G. j Remember that we decided to restrict ourselves to the `2 norm to pro t by the quadratic programming framework. As a consequence, our inverse shortest path problem became
min 1 wi 2
m X i=1
wi , wi 2
4:5
subject to 4.4 and the nE shortest path constraints. In Chapter 3, we established that these last constraints may be expressed as a possibly large set of linear constraints of the type
X
kjak 2p0j
wk
X
kjak 2pj
wk ; j = 1; : : :; nE
4:6
where p0j is any path with the same origin and destination as pj . As a consequence, the set of feasible weights, F say, is convex as it is the intersection of a collection of half spaces. The problem of minimizing 4.5 subject to 4.4 and 4.6 is then a classical quadratic programming QP problem. This QP is however quite special because its constraint set is potentially very large1 , very structured, and possibly involves a nonnegligible amount of redundancy. Also the problem of minimizing 4.5 on the set F of feasible weights may be considered as the computation of a projection of the unconstrained minimum onto the convex set F . Again, the special structure of F distinguishes this problem from a more general projection.
4.2 Algorithm design
4.2.1 The GoldfarbIdnani method for convex quadratic programming
The algorithm we present below is a specialization of the dual QP method by Goldfarb and Idnani 55 . Let us recall the idea of this method presented in Chapter 3 which is to compute a sequence of optimal solutions to the quadratic programming problems involving only some of the constraints that are present in the original problem, that is a sequence of dual feasible points.
1
In general, the number of constraints can be exponential.
solving the inverse shortest path problem
57
An active set of constraints is maintained by the procedure, that is, a set of constraints which are binding at the current stage of the calculation. A new violated constraint is incorporated into this set at every iteration of the procedure some other constraint may be dropped from it, and the objective function value monotonically increases to reach the desired optimum. This approach was chosen for two main reasons. Since the GoldfarbIdnani GI algorithm is a dual method, it is extremely easy to incorporate new constraints once a rst solution has been computed. In our context, this means that, if a new set of prescribed shortest paths is given, modest computational e ort will be required to update the solution of the problem. The GI method has an excellent reputation for e ciency, especially in the case where the number of constraints is large and neardegeneracy very likely. In particular, the method avoids slow progress along very close extremal points of the constraint set F . Also, the GI method and its e cient implementation are discussed in the literature, by Goldfarb and Idnani in their original paper, but also by Powell in 96 and 97 , for example. Because our method heavily relies on the GI algorithm, we now state this method in its full generality. In this form, it is designed for solving the QP problem given by minx f x = aT x + 1 xT Gx; 2 def nT x , b 0 i = 1; : : :; h; subject to Eix = i i 4:7
where x, a and fni gh=1 belong to Rm , G is a m m symmetric positive de nite matrix, b is in i Rh and the superscript T denotes the transpose. As indicated above, the GI algorithm maintains a set of currently active constraints, A say, and relies on the matrix N whose columns are the normals ni of the constraints in the active set A. The matrix N is thus of dimension m jAj, where jAj is the number of constraints in A. The algorithm also uses two additional matrices, namely N def N T G,1N ,1 N T G,1; = 4:8 which is the MoorePenrose generalized inverse of N in the space of variables under the transformation y = G1=2x, and H def G,1 I , NN ; = 4:9 which is the reduced inverse Hessian of the quadratic objective function in the subspace of points satisfying the active constraints. We do not restate here the GI algorithm which is given in detail in Chapter 3 Algorithm 3.2. We also refer the reader to Chapter 3 and 55 for further details on the general GI algorithm, and in particular for the proof that it indeed solves the QP 4.7, provided a solution exists. Our purpose, in the next paragraphs, is to specialize the GI algorithm to the inverse shortest path problem given by 4.5, 4.4 and 4.6. We will therefore examine the successive stages of the algorithm presented above, where the structure of the problem allows some re nement.
solving the inverse shortest path problem
58
4.2.2 Constraints in the active set
We rst wish to analyze how to detect the violation of constraints 4.6, as required in Step 1.
Shortest paths constraints
For each of the given paths pj , we rst de ne Pj as the set of vertices in V that are attained by this path, that is Pj def fsaj1 ; taj 1; taj2; : : :; taj;ljg: = 4:10 The vertex saj 1 is called the origin or source of the j th path, and denoted sj . For every such path pj with source sj and for a given vector w of arc weights, it is then possible to compute all the shortest paths in V ; A; w from the source sj to all the other vertices of Pj . We will then detect a violated constraint if, for some vertex v 2 Pj n fsj g, one has that the predecessor of v on the shortest path from sj to v is di erent from the predecessor of v in the path pj . In this situation, it is easy to verify that there must be a vertex x 2 Pj closest to v possibly sj , such that x is also on the shortest path from sj to v . Furthermore, there exist two distinguished paths from x to v , the rst one, noted I + , being the shortest path and the second one, noted I ,, being given as a subpath of pj . The set of both these paths is called a violating island and is denoted by I . The path I + is called its positive shore while I , is called its negative shore. Furthermore, the excess of the island, denoted by E , is de ned as the cost of the positive shore minus the cost of the negative shore. The constraint associated with the island I is therefore violated when its excess is negative.
v1
v5
u u
2 3 4 a2 , a3 , , , , , , a4 ,, a5 ,, a6 ,a a7 a8 a9 10 , , , ? ? , , ? , ?
a1
v
a11
v6
u u
v
a12
v7
u u
v
a13
v8
u u
Figure 4.1: A rst example On the small example given in Figure 4.1, we assume that the weight vector w is given by the relation wj = j that is the arc aj has a weight of j , while the constraint paths are given by
p1 = a1 ; a5; a12; a13 and p2 = a11 ; a12; a10: 4:11 At this point, it is not di cult to verify that the shortest path from v1 to v8 is the path a1; a2; a3; a7: 4:12 Hence a constraint related to the path p1 is violated at the vertex v8, because the predecessor of v8 on its shortest path from v1, that is v4 , is di erent from its predecessor on the constraint
solving the inverse shortest path problem
59
path, which is v7. The vertex v above is then v8 , while inspection shows that the relevant vertex w is v2 . The corresponding violating island is then
I = a2; a3; a7; a5; a12; a13 ;
4:13
where I + = a2; a3; a7 is its positive shore, I , = a5; a12; a13 its negative shore, and whose associated excess E is 2 + 3 + 7 , 5 + 12 + 13 = ,18. This violating island is not the only one for this example. A second one, related to the path p2 , is given for instance by
I 0 = a8; a2; a3; a11; a12; a10 ;
4:14
whose excess E 0 is equal to 20. A violated constraint of the type 4.6 therefore corresponds to a violating island in V ; A; w. When it is incorporated in the active set, the constraint is enforced as an equality and the costs of its negative and positive shore are exactly balanced see section 4.2.5. The corresponding island is then called active.
Nonnegativity constraints and bounds on the arc weights
The nonnegativity constraints 4.4 must also be taken into account. When one of them is violated, which is easy to detect, it may also be incorporated in the active set, along with the active islands. These bounds are then also called active. They will be regarded in the sequel as active islands with only one arc in the positive shore and no negative shore. The active set at a given stage of the calculation will therefore contain a number of active islands with or without negative shore. This will be denoted by A = V; Y , where V is the set of currently active islands with a negative shore and Y the set of active islands without negative shore, that is the set of active bounds.
4.2.3 The dual step direction
The next stage of the specialization of the GoldfarbIdnani algorithm to our inverse shortest paths problem is the computation of the dual step direction r in 3.38. As in 55 and 97 , this calculation, which is equivalent to
r = N T G,1 N ,1N T G,1 nq ;
4:15
can be performed by maintaining a triangular factorization of the matrix N T G,1 N . However, our problem has the very important feature that the Hessian matrix G of the quadratic objective is the identity I . This obviously induces a number of useful algorithmic simpli cations, the rst one being that 4.15 can be rewritten as
r = N T N ,1N T nq :
4:16
The matrix N is then nothing but the unweighted MoorePenrose generalized inverse of N . Therefore, we will only maintain a triangular factorization of the form
N T N = RT R;
4:17
solving the inverse shortest path problem
60
where R is an upper triangular matrix of dimension jAj. Since N is of full rank, this is equivalent to maintaining a QR factorization of N of the form = QU; 4:18 0 as is the case in the numerical solution of unconstrained linear least squares problems. Indeed, it is straightforward to verify that 4.16 is the solution of min kNr , nq k2 : 4:19 r The second useful simpli cation due to the special structure of the problem arises in the computation of the product N T nq in 4.16. The resulting vector indeed contains in position i the inner product of the ith active constraint normal with the normal to the q th constraint. As both these constraints may be interpreted as islands, the question is then to compute the inner product of the new island, corresponding to the q th constraint, with all already active islands. We then obtain the following simple result. Lemma 4.1 The vector N T nq appearing in 4.16 is given componentwise by h T i N nq i = jIj+ Iq+j + jIj, Iq, j , jIj+ Iq,j , jIj, Iq+ j 4:20 for i = 1; : : :; jAj and j equal to the index of the ith active island. Proof. Since h T i T N nq i = ni nq 4:21 It is useful to note that, because of 4.4 and 4.6, 8 +1 if ak 2 I`+ ; n` k = ,1 if ak 2 I`, ; 4:22 : 0 otherwise, for k = 1; : : :; m and ` 2 A fq g. This equation holds for both types of islands with or without negative shore. Taking the inner product of two such vectors for ` = j and ` = q then yields 4.20. 2 As a consequence, the practical computation of r may be organized as follows: 1. compute the vector y 2 RjAj whose ith component is given by 4.20, 2. perform a forward triangular substitution to solve the equation RT z = y for the vector z 2 RjAj , 4:23
N = Q1 Q2
R ! def
3. perform a backward triangular substitution to solve the equation Rr = z 4:24 for the desired vector r. This calculation will be a very important part of the total computational e ort per iteration in the algorithm.
solving the inverse shortest path problem
61
4.2.4 Interpretation of the dual step direction The Lagrange parameter ui in a sense represents a relative price" to pay for relaxing the ith
active constraint at the current stage of calculation. A null price ui = 0 ensures, by the complementarity condition 3.24b, that constraint i does not play any signi cant role at the current point. Higher values of ui indicate that contour lines of f correspond to lower values of the objective function in the direction of the outward2 normal to the ith constraint. Theorem 3.7 evinces the vector r as the opposite of the step in the dual space. Constraints that should leave the active set, when needed, consequently must have a strictly positive corresponding component in r. Indeed, such constraints are meant to have lower relaxation costs in the direction along which the current solution progresses. A good relaxation choice should also be taken according to the relative price decrease. Since u 0, the best relative relaxation choice must follow equation 3.68.
4.2.5 Determination of the weights
We now examine the way in which changes in the weights may be computed. In the original GI method, both primal and dual step directions are computed once a new constraint has been selected for inclusion in the active set as described in step 2. In our framework, the computation of the new values of the primal variables may be completely deferred after that of the dual step in a rather simple way, as will be shown now. This adaptation may be viewed as another consequence of the fact that G = I for our problem. Before stating this result more precisely, we introduce some more notation. In order to complete the description of the set f1; : : :; mg given an active set A = V; Y , we recall the de nition of Y as Y def fi 2 f1; : : :; mg j wi = 0g = 4:25 and we de ne the sets X def fi 2 f1; : : :; mg n Y j 9j 2 V ai 2 Ij g = 4:26 and Z def f1; : : :; mg n X Y : = 4:27 The set X thus contains the indices of the arcs that are involved in one of the active islands of V but are not xed at their lower bounds. The set Z contains the indices of the arcs that are not involved at all in the active constraints of A. For i 2 X , we also de ne
I +i def fj 2 V j ai 2 Ij+g and I ,i def fj 2 V j ai 2 Ij,g: = =
2
4:28
towards the infeasible region.
solving the inverse shortest path problem
62
Hence, I + i resp. I , i is the set of active islands of V such that the arc ai belongs to its positive resp. negative shore. We nally de ne the logical indicator function by
condition =
1 0
if condition is true, if condition is false.
4:29
We can now state our lemma.
Lemma 4.2 Consider a dual feasible solution for the problem of minimizing 4.5 subject to the
constraints given by an active set A = V; Y . Assume furthermore that, among the Lagrange multipliers fuk gjkAj , those associated with the active islands of V are known. Then the weight =1 vector w corresponding to this dual solution is given by
2 3 X X 5 wi = i 2 X Z ci + i 2 X 4 uk , uk
k2I + i k2I , i
4:30
for i = 1; : : :; m.
We rst note that we can restrict our attention to the weights that are not at their bounds i 2 X Z , because we know, by de nition, that wi = 0 for i 2 Y . Every active island in V thus corresponds to a constraint of the form
Proof.
X
kjak 2I + ^k62Y
wk ,
X
kjak 2I ,^k62Y
wk = 0:
4:31
The desired expression for wi i 2 X Z immediately follows from the Lagrangian equation @Lw; u = 0; 4:32
@wi
where the Lagrangian function for the problem is given by
Lw; u =
=
hP P P , 1P wi , ci 2 , jkAj uk h ijai 2Ik ^i62Y wi , ijai2Iki^i62Y + =1 2 i2X Z P P P 1P 2 2 i2X Z wi , ci , i2X wi k2I + i uk , k2I , i uk ;
wi
i
4:33
where we restrict the last major sum to the set X because all other terms are zero. 2 The lemma simply means that the ith weight can be obtained from wi by adding to it all Lagrange multipliers corresponding to active islands such that ai belongs to the positive shore of the island and by substracting all the multipliers of active islands such that ai belongs to the negative shore. Consider now the computation of the primal step direction s and of the inner product sT nq . Note rst that, when 3.41 is reached in the algorithm, the primal step direction s is nonzero and nq is linearly independent from the columns of N . The value of sT nq is then given by the following result.
solving the inverse shortest path problem
63
Lemma 4.3 Assume the GI algorithm is applied to the inverse shortest paths problem under
consideration, and that it has reached the point where equation 3.41 should be evaluated. Assume furthermore that A = V; Y is the active set at this stage of the calculation. Then the primal step direction s is given componentwise by
2 3 X X 5 si = ai 2 Iq+ , ai 2 Iq, + i 2 X 4 rk , rk
k2I ,i k2I i
+
4:34
for i = 1; : : :; m. As a consequence,
sT nq = 1 +
X
k2I ,q
rk ,
X
k2I + q
rk
4:35
in the case where the q th constraint is the lower bound on the q th weight, and
+ ijai 2Iq
2 3 2 3 X 4 X X 5 X 4 X X 5 sT nq = 1+ rk , rk + 1+ rk , rk
k2I , i k2I + i
, ijai 2Iq
k2I + i
k2I , i
4:36
in the case where the q th constraint is a violating island.
We rst note that s, the change in the weight w corresponding to a unit step in the dual step direction, can be viewed as the sum of two di erent terms s = nq , Nr. The rst term corresponds to the incorporation of the q th constraint in the active set and its contribution to si is +1 if ai belongs to the positive shore of the q th island, and is 1 if ai belongs to its negative shore. This is because the jAj + 1th component of the dual step direction, corresponding to the q th constraint, is equal to +1. Hence we have that this rst contribution is equal to
Proof.
ai 2 Iq+ , ai 2 Iq,
4:37
for the ith arc. Note that only one of the indicator functions can be nonzero in 4.37. The second contribution corresponds to the modi cations to wi caused by the fact that ai may also belong to islands that are already active. In other words, the nonzero components of ,r have to be taken into account. The equation 4.30 then implies that this second contribution from the Lagrange multipliers associated with all constraints already in the active set must be equal to
2 3 X X 5 i2X 4 rk , rk :
k2I , i k2I + i
4:38
Summing the contributions 4.37 and 4.38 gives 4.34. Assume now that the q th constraint is a lower bound. In this case, one has that nq = eq , the q th vector of the canonical basis in Rm. Hence the product sT nq is equal to sq . Equation 4.30, the nonnegativity of the fwigm and the fact that wq 0 imply that q 2 X , and 4.35 then i=1 follows from 4.34. On the other hand, if the q th constraint is a violating island, the normal nq is then given componentwise by 4.22 with ` = q . Hence we obtain 4.36 from 4.34. 2
solving the inverse shortest path problem
64
4.2.6 Modifying the active set
The active set modi cations in Step 5 of the GI algorithm nally require the updating or downdating of the triangular matrix R, as introduced above in 4.17. Assume rst that the `th constraint is dropped from the active set A. This amounts to dropping a column of N in 4.18, which, in turn, is equivalent to dropping a column of the upper triangular matrix R. The resulting matrix is therefore upperHessenberg, and a sequence of Givens plane rotation is applied to restore the upper triangular form. This technique is quite classical, and has already been used in the more general implementations of the GI method, both in 55 and 97 . The reader is referred to those papers for further details in the context of the GI algorithm and to 56 for general information on Givens plane rotations and their practical computation. If one now wishes to add the `th constraint to the active set, then N has one more column, namely nq , and the resulting matrix U in 4.18 then has the form
where Q1 and Q2 are de ned in 4.18. Again, this matrix should be restored to triangular form, and again this can be done by premultiplying it by suitable orthogonal transformations. In fact, the only necessary modi cation to 4.39 is the premultiplication of the vector QT nq by an 2 orthogonal transformation T, say, such that
R QT nq ; 1 0 QT nq 2
!
4:39
TQT nq = kQT nq ke1; 2 2
where e1 is the rst vector of the canonical basis of Rm,jAj . Note also that
4:40 4:41 4:42 4:43
QT nq = R,T N T nq = z; 1
where z has already been computed in 4.24. Moreover, one has that Hence the updated matrix R is given by
knq k2 = k Q1 Q2 T nq k2 = kzk2 + kQT nq k2 = kzk2 + kTQT nq k2: 2 2
Rupdated = R z ;
!
where = knq k2 , kz k2. The updating of the triangular factor R is therefore extremely cheap to compute, mainly because of the fact that z is available from previous calculations. It is also interesting to note that, because of the equivalence between 4.17 and 4.18, the technique presented here is in fact identical to the computation of the
Cholesky factor of N + T N + using the bordering method see 51 , for example, where N + = N nq . A similar procedure is also used in 55 and 97 . We nally note that s, the primal step direction, is zero if and only if the residual of the problem 4.19 is zero, which, in turn, is equivalent to knq k = kz k. This last relation provides a possible way for testing the equality s = 0 without explicitly computing s.
q
0
solving the inverse shortest path problem
65
4.2.7 The algorithm
We are now in position to describe our algorithm for solving our inverse shortest paths problem, as described by 4.5, 4.4and 4.6. For this description, we use a small machine dependent tolerance 0 to detect to what extent a real value is nonzero, and we de ne the integer = jAj.
Algorithm 4.1
step 0: Initialization. Set w w, f 0, A ;, 0 and u 0. step 1: Compute the current shortest paths. step 2: Choose a violated island or exit.
For j = 1; : : :; nE , compute the shortest paths from sj to every vertex in Pj n fsj g.
Select Iq , an island whose excess Eq is negative, if any. If no such island exists, then w is optimal and the algorithm stops. q Otherwise, if = 0, then set jIq+ j + jIq, j and go to Step 5. Otherwise that is if 0 set ! u u : 4:44 0
step 3: Revise the triangular factor R. 3a: Add the previous constraint normal nq to N .
3b: Drop n` from N .
If = 1 then set R = and go to Step 4. Otherwise that is if 1 update the upper triangular matrix R using 4.43 and go to Step 4. Remove from R the column corresponding to the `th island, and use Givens rotations to restore it to upper triangular form, as described in Section 4.2.6.
step 4: Compute the dual step direction.
Compute the vectors z and r, using Lemma 4.1, 4.23 and 4.24. Compute also according to q = knq k2 , kz k2: 4:45 Determine the set S according to 3.39, tf and possibly ` using 3.40.
step 5: Determine the maximum steplength to preserve dual feasibility. step 6: Determine the steplength to satisfy the qth constraint. If then go to Step 7b. step 7: Take the step and revise the active set.
Otherwise, compute tc according to 3.41, and s and sT nq as described in Lemma 4.3.
solving the inverse shortest path problem
66
7a: Compute the steplength t as in 3.43, set c c + ts, revise f according to 3.45 and u using 8 ! u + t ,r if 0; u 4:46 1 : u+t if = 0:
If t = tc , set A A fq g, + 1 and go to Step 1. Otherwise that is if t = tf set A A n f`g, , 1 and go to Step 3b. 7b: If tf = +1, then the problem is infeasible, and the algorithm stops with a suitable message. Otherwise, update the Lagrange multipliers according to 3.42. Set A A n f`g, , 1 and go to Step 3b. Note that, in our current implementation of the algorithm's second step, we choose the current violated island as that whose excess is most negative. This technique appears to be quite e cient in practice.
4.2.8 Nonoriented arcs
An important variant of the basic problem occurs when some arcs in the graph are undirected. In this case, it is quite ine cient to replace each of these arcs by two distinct arcs of opposite orientation, because it increases both the dimension of the problem and the number of constraints. Indeed, one has to impose that the two new oriented arcs have the same weight. Fortunately, the algorithm described above can be applied to the case where arcs are nonoriented without any modi cation, provided the shortest paths method used in Step 1 can handle such arcs.
4.2.9 Note
Similar implementation techniques have been used by Calamai and Conn for solving location problems with a related structure see 18, 19, 20 . Their technique is however di erent from ours and a comparison of both approaches will be examined in future work.
4.3 Preliminary numerical experience
4.3.1 The implementation
In order to verify the feasibility of the above described algorithm, a Fortran program was written and tested on an Apollo DN3000 workstation, using the FTN compiler. The shortest paths calculations were performed by Johnson's algorithm using a binary heap see Chapter 2. The crucial part of our implementation has been both the determination of the violated islands and the update of the matrix R. On one hand, detecting violated islands has been achieved by comparing the explicit de nition of each shortest path constraint with the shortest
solving the inverse shortest path problem
67
path tree rooted at the origin of the path de ning the constraint, proceeding backward from the destination to the origin of that path, since shortest path trees computations give the predecessor of each vertex in the trees. On the other hand, the sparsity of R has been taken into account by means of linked lists. The following operations on R then needed to be specialized: adding and deleting a column, and performing Givens plane rotations to restore the upper triangular form. Finally, storing a graph in a computer's memory naturally involves the representation of the arcs our variables by their terminal vertices. Thus care must be exercised to handle vertex vs. arc representations for the graph, the constraints, and in particular nonoriented arcs.
4.3.2 The tests
We present here a set of seven typical examples extracted from a large collection of tests. The rst ve arise from the tra c modelling problem presented in Section 1, with graphs for two di erent cities. The next one is obtained on a randomly generated graph while the last one is built from the graph of a two dimensional rectangular grid. The problems' characteristics are reported in Table 4.1. We recall that n, m and nE are the number of vertices in the graph, the number of arcs and the number of shortest path constraints respectively.
P1 246 P2 246 P3 246 P4 822 P5 822 P6 500 P7 3600
n
351 245 351 600 351 6724 1447 821 1447 6806 1469 100 7063 650
m
nE
Graph type city 1 city 1 city 1 city 2 city 2 random 2D grid
Constraint paths generation a tree in the graph all paths between a subset of the nodes all paths from a node subset to another node subset a tree in the graph all paths from a node subset to another node subset randomly generated paths all paths from one side of the grid to the other sides
Table 4.1: The inverse shortest path test examples We summarize the results of the tests in Table 4.2, where the following symbols are used:
iter. : the number of major iterations of the algorithm, that is, the number of full steps in the
primal space adding a constraint in the active set and requiring the calculation of the shortest paths and the choice of a new violated constraint
drops : the number of islands dropped at Step 7 of the algorithm, that is, the number of minor
iterations partial and dual steps, involving only the computation of the step directions in the primal and dual space
jAj : the number of active islands at the solution.
We note that the rst of these numbers is always one larger than the sum of the two others, because one iteration is required for considering the empty active set. The following gure illustrates results obtained by applying the inverse shortest path algorithm to a set of problems presented in Chapter 5 Table 5.1, page 91. The lefthand histogram
solving the inverse shortest path problem
iter. drops jA j 35 2 32 77 17 59 167 34 132 246 55 190 468 238 229 436 54 381 171 8 162
68
P1 P2 P3 P4 P5 P6 P7
Table 4.2: Results obtained on the test problems shows the total number of iterations partitioned into drops and major iterations. The righthand graphic shows the time spent in calculating shortest paths with respect to the overall algorithm runtime.
140 120
Iterations
sh. paths time / overall time
100% 80% 60% 40% 20% 0%
100 80 60 40 20 0
24 84 220 612 1300
24 3280
84
220
612
1300
m
Drops Major Iterations
3280
m
Figure 4.2: Iterations per problem size and shortest paths calculation Despite the limited character of these experiments, one can nevertheless observe the points that follow. The algorithm is relatively e cient in the sense that it does not, at least in our examples, add many constraints that are not active at the solution, with the necessity to drop them at a later stage. See this feature in Figure 4.2. One also observes in practice that a fairly substantial part of the total computational e ort is spent in calculating the necessary shortest paths in order to detect constraint violation Figure 4.2. Choosing a set of constraint paths from a single tree induces signi cant savings in the
solving the inverse shortest path problem
69
determination of the most violated constraint, because only one shortest path tree is needed.
4.4 Complexity of the inverse shortest paths problem
During the refereeing period of 15 presenting the matter of this chapter, an alternative formulation of the inverse shortest paths problem was communicated to the authors by S. Vavasis. Representing the cost of the shortest paths from node vi to node vj by the new variables wi;j for i; j = 1; : : :n, we may then add the constraints
sa` = vk and ta` = vj = wi;j wi;k + w`
together with the equalities
4:47 4:48 4:49
wi;i = 0 wi;q wj + + wjl j
1 1
for all i = 1; : : :; n. The constraints on the shortest paths 4.6 may then be rewritten as
for any path of the form 1 with saj = vi and tajl j = vq . There are at most mn inequalities of type 4.47, n equalities of type 4.48 and nP 2 inequalities of type 4.49. Hence the total number of constraints in this formulation is n polynomial. As a consequence, the problem is solvable in polynomial time by an interior point algorithm. This interesting observation is clearly of theoretical importance, but the inclusion of n2 additional variables could generate ine ciencies in practical implementations.
5 Handling correlations between arc weights
In many applications, modelling networks accurately requires dependences between arc weights. See, for instance, seismic waves propagating through the earth crust: these waves have similar velocities as they propagate through media made of similar densities. The motivation for this research also comes from applications in tra c modelling. This chapter considers the inverse shortest path problem where arc weights are subject to correlation constraints. A new method is proposed for solving this class of problems. It is constructed as a generalization of the algorithm presented in Chapter 4 for uncorrelated inverse shortest paths. In the uncorrelated case, the variables were the arc weights and there was no correlation between them. Now, we partition the arcs into cells or classes. The weights of the arcs located in the same cell are derived from the same value called cell density". The variables of our new problem become these cell densities. The advantage of such a partition is that the number of variables decreases substantially. Moreover, the re nement of each cell may increase without a ecting the number of our new variables. On the other hand, the correlations involve more restrictions, and hence more constraints. Note that shortest path constraints are not expressed with our new variables, but still involve arc weights. As a consequence, the concept of island, introduced to formalize the violation of shortest path constraints, has to be revised in this new context. This chapter will establish the results allowing our new algorithm to handle such constraints in the space of the cell densities, including implicit lower bounds constraints on shortest path costs. Preliminary numerical experience with the new method is presented and discussed. In particular, we propose a computational comparison between the uncorrelated method that of Chapter 4 and the correlated one. We also provide results obtained by using two possible strategies for handling constraints: the rst considers the rst violated constraint as candidate to enter an active set of constraints, and the second strategy privileges the most violated constraint. The matter of this chapter is to be published in 16 .
5.1 Motivation
The technique proposed in Chapter 4 for solving an inverse shortest path problem is based on the solution of a particular instance of the problem's description, which is the problem of recovering
70
handling correlations between arc weights
71
arc weights in a weighted oriented graph, given a usually incomplete set of shortest paths in this graph. In this approach, the arc weights are assumed to be independent from each other. This last assumption, although reasonable in some applications, is not ful lled in all cases of interest. Even in the areas mentioned above transportation and tomography, interesting questions can be asked where the independence assumption is clearly violated. It is the purpose of the present chapter to propose an algorithmic approach to overcome this limitation. We rst illustrate the need for such an extension by an example drawn from transportation research. This example is presented in detail and subsequently used to motivate the speci c concepts to be introduced. An additional case of interest in computerized tomography is also mentioned.
5.1.1 Transportation research
Our rst example deals with the question of reconstructing the costs associated with routes in an unsaturated transportation network. As mentioned above, a rst approach has been proposed and tested in Chapter 6 that uses an instance of the inverse shortest path problem. The idea was to reconstruct the delays associated with links of the network as perceived by the users from the observation of the paths actually taken assuming that users choose the perceived shortest route between their origin and their destination. This method is akin to the idea of using mental maps" 36 in the process of route planning. Technically speaking, the network under study is represented by an oriented graph in which a set of shortest paths is known; the question is then to infer the value of the time delays associated with each arc of the graph and di ering as little as possible from a set of a priori known weights derived, for instance, from the knowledge of the geometrical characteristics of the road. Applying this methodology to urban situations, it is important to explicitly consider the delays at signalized junctions, and not to restrict the analysis to the estimation of the delays on the links only. This can be achieved by using a graph that contains detailed arcs to represent the various turns" in a junction. A small example of such a graph is given at Figure 5.1. Unfortunately, a naive application of this method results in a set of estimated weights for the inner arcs" of a given junction that are mutually uncorrelated. This can be considered unrealistic because the delays at a simple signalized junction all depend in some xed way on the relevant tra c light cycle. We may then be interested in reconstructing the light cycles themselves, as they are perceived by the users, proceeding again from the observation of the routes actually chosen in the network. The estimated values for the cycles can then be fed into models that explicitly use tra c light phasing. We now follow this approach and build the following simple model. The network is represented by a detailed graph as that of Figure 5.1 in which a delay, or weight, is associated with each arc according to the rule if the ith arc does not belong to any junction, 5:1 id`i if the ith arc belongs to the `ith junction, where the di are the delays associated with links or junctions, and where the i specify how the
wi =
di
handling correlations between arc weights
72
3
4
13
14
7 2 6
8 9 16
17
18 19 25
1
5 12 11
10
15 21 24 23 22
20
26
27
28
29 30 43 42 44 45 47
38
36
31
37
35 34 33
32
41
46
48
40
39
Figure 5.1: The rst example involving correlations between arc weights delay for a turn depends on the global delay the red light period, for instance of the relevant junction. We may trivially extend this de nition to
wi = i d`i;
5:2
where we have de ned i def 1 and `i def i for all arcs not belonging to any junction. = = We then face the problem of estimating the delays d`i subject to the constraint that a set of a priori known paths in the graph must be the shortest ones between their origin and their destination. As in Chapter 4, this problem is usually underdetermined and a particular solution can be chosen that minimizes the di erence between the computed delays and some a priori known values. We then have an inverse shortest path problem as de ned in Chapter 4 whose variables are the delays as opposed to the weights.
5.1.2 Seismic tomography
Our second motivating example is drawn from seismic tomography. In this research area, a possibly large geologic zone is discretized into neighbouring cells of constant material density. The arrival times of compression shock waves generated by earthquakes or arti cial explosions are then observed by seismographs placed at known locations. The problem is then to reconstruct the cell densities from an analysis of the paths rays used by these waves. One possible way to proceed is to construct, in each cell, a small graph whose arcs represent the propagation of an incoming compression wave in di erent directions. For example, if we assume that a zone is
handling correlations between arc weights
73
divided into 2 3 cells, we may then choose a simple cell model consisting of 6 arcs a square with both diagonals, and then construct the resulting undirected network illustrated in Figure 5.2.
1
4
5
9
j j j j j j j j j j j j
1
@
, @ , 6 , @ 5 @ , @ , @ , 2 , , , @ @ @ , @ , @ , @ , 3 @ , @ , @ @ , @ , @ , @ , @ , @ , , , , @ @ @ , @ , @ , @ , @ , @ , @
2
3
4
6
7
8
10
11
12
Figure 5.2: The graph generated from a discretization We consider the following simple model to describe the travel time wi of a compression wave along the ith arc within the `ith cell:
wi = i d`i;
5:3
where i is now proportional to the length of the ith arc1 . In our example, the travel times associated with the arcs of the rst cell in Figure 5.2 whose sides are assumed to be of unit length are given by wi = d1 for i = 1; : : :; 4; p 5:4 wi = 2 d1 for i = 5; 6: As above, we now consider the question of estimating the cell densities d`i from the knowledge of the wave paths and arrival times. Because of the Fresnel law stating that waves follow shortest paths in their propagation medium, this is again a variant of the inverse shortest path problem, where the variables are no longer the weights associated with the arcs, but some more aggregated quantities the cell densities which determine these weights via linear relations.
5.2 The formal problem
Both examples described in the previous section are particular cases of the following formal problem speci cation.
5.2.1 Classes and densities
For the terminology related to graphs, we let the reader refer to Chapter 2. Yet, we recall here the basic notations and set those related to cell densities. We consider a directed weighted graph
Confusion about the cell to which inner horizontal and vertical arcs belong can be avoided by suitably de ning `i.
1
handling correlations between arc weights
74
V ; A; w, where V ; A is an oriented graph with n vertices and m arcs, and where w a set of nonnegative weights fwi gm associated with the arcs. Let V be the set of vertices of the graph i=1 and A = fak = sk; tkgm=1 be the set of arcs, where sk denotes the vertex at origin of the k arc ak its sourcevertex" and tk the vertex at its end its targetvertex". Also assume that the set of arcs A is partitioned in L disjoint classes and that a nonnegative density is associated with each of these classes. Assume nally that the weight of every arc can be computed as an arcdependent proportion of the density of the class to which the arc belongs, that is
wi = i d`i for i = 1; : : :; m;
5:5
where `i denotes the index of the unique class containing the ith arc. We say that the ith arc is associated with the `ith class. In our rst example, the arcs are the detailed links of the network, including the detailed links within a junction. They are partitioned into classes corresponding to roads and junctions: the densities of these classes then correspond to the delays along roads and the tra c light cycles at the junctions. In our second example, the classes correspond to cells of the discretized geological medium, the densities to their actual physical densities and the arcs to the possible ways in which a wave can travel across a cell. Our problem is then to determine values of the class densities that are compatible with a set of known properties of the weighted graph.
5.2.2 Shortest paths constraints
The main feature of our problem is that we wish to specify that some paths are shortest between an origin and a destination. We note that this concept is only meaningful in the weighted graph, and has no direct translation in terms of classes and densities, which are our variables. We rst allow to impose that a known path between two vertices is shortest between these vertices. More formally, we de ne a simple path pj to be an ordered set of arcs of the form
pj = aj ; aj ; : : :; aj j j = 1; : : :; nE ;
1 2
5:6 5:7
where j is the number of arcs in this path its length, and where
tji = sji+1 for i = 1; : : :; j , 1:
As detailed in Chapter 3, the constraint that a given path pj is shortest can then be expressed as a possibly very large set of linear inequalities of the form
X
kjak 2p0j
wk
X
kjak 2pj
wk ;
5:8
where p0j is any path with the same origin and destination as pj . For future reference we also note that a path pj can be de ned equivalently by the ordered set Pj of successive vertices that are on pj , i.e.
Pj def sj1 ; tj1; tj2; : : :; tjj : =
5:9
handling correlations between arc weights
75
In our rst example, we may assume that the network users follow the path that they perceive to be shortest between their origin and destination. An observation of the paths actually chosen by these users then gives constraints of the type just described. For instance, we may know that users travelling from vertex 1 to vertex 38 use the path de ned by
P1 = 1; 5; 10; 15; 24; 29; 36; 38
while those travelling from vertex 1 to 48 use that given by
5:10 5:11
P2 = 1; 5; 10; 15; 22; 43; 46; 48:
We also wish to consider constraints that impose a lower bound on the cost of the shortest path between two vertices. These constraints were introduced in Chapter 1 and have not been considered in the basic inverse shortest path method proposed in Chapter 4. We saw in Chapter 3, Section 3.2, that such a constraint can be expressed by a set of constraints imposing that the weight of all paths between the two vertices, no and nd say, is bounded below by a constant, that is X wk no;nd ; 5:12 where g is any path with origin no and destination nd . In the context of our example, we may know that the time required to reach vertex 42 from vertex 13 is clearly not smaller than 50 measure units. In this case, we wish to impose that the weight of the shortest path between these vertices is bounded below by 50. The number of linear constraints of the form 5.8 and 5.12 is dependent on the number of possible paths between two vertices in the graph, which grows exponentially with the density of the graph m=n. Enumerating these constraints is of course out of question, and we will have to use a separation procedure" to determine which of these constraints are violated for a given value of the class densities. This separation procedure is based on the computation of the shortest paths within the graph, given the weights on its arcs, which are themselves determined by the cell densities and 5.5. We could also consider imposing upper bounds and therefore equalities on the weight of some shortest paths. We showed in Chapter 3, Section 3.2 that this type of constraint can no longer be expressed as a set of linear inequalities, as in 5.8 and 5.12. The problem is therefore of a di erent nature. This special case will be considered in Chapter 6.
kjak 2g
5.2.3 Constraints on the class densities
Besides the constraints on shortest paths, we also include general linear constraints on the class densities, which are the true variables of our problem. The rst constraint of this type is clearly that the class densities must be nonnegative in order to ensure the nonnegativity of the arc weights. But we may want to specify further linear constraints of the form
L X l=1
il dl
i
i 2 I ;
5:13
handling correlations between arc weights
76
and or
L X l=1
il dl =
i
i 2 E ;
5:14
where the il are general coe cients, the i are speci ed constants and the sets I and E index the inequality and equality constraints respectively. For instance, the network users of the rst example may be aware that no tra c light cycle exceeds 5 minutes, therefore imposing an explicit upper bound on all class densities representing such cycles. Other a priori knowledge of the network also might indicate that a given cycle is longer than another one: this again produces a linear constraint of the type 5.13 on the relevant class densities. Observe that linear constraints on the arc weights can be expressed in the form of 5.13 or 5.14 provided they involve xed sets of arcs. The translation from arc weights to class densities is then given by 5.5.
5.2.4 The inverse problem
If we remember that our problem is to reconstruct the class densities subject to the constraints described above, we recall that, as is the case in the basic inverse shortest path problem, the constraints do not determine the class densities uniquely: the reconstruction problem is underdetermined. Fortunately, it often happens in applications that some additional a priori knowledge of expected class densities is available. Using this information then provides stability and uniqueness of the inversion see 105 . This a priori information may be obtained either from direct" models, for which there are no problem of uniqueness, or from a posteriori information of a previous inverse problem run with di erent data. The application of this idea to our framework results in the question of determining the class densities that are as close as possible to their expected values. Denoting these a priori expected l=1 values by fdl gL , we therefore consider the minimization problem subject to the constraints
d2RL
min kd , dk
5:15
5:16 and a selection of constraints as described above in 5.8, 5.12, 5.13 and 5.14. Of course, the constraints of type 5.8 and 5.12 should be interpreted as constraints on shortest paths in the weighted graph whose arc weights are determined by the value of d and 5.5. As decided in Chapter 1, we choose use the `2norm to measure the proximity to the a priori information, so that the objective function 5.15 can now be rewritten as
L X minL f = 1 dl , dl2 : 2 l=1 d2R
dl 0 l = 1; : : :; L
5:17
This particular choice implies that, although arc weights are correlated, class densities are assumed to be independent.
handling correlations between arc weights
77
We note that statistical correlation between weights could be handled by considering the objective 1 5:18 min 2 w , wT C w , w w where C ,1 is a covariance matrix on w and where the superscript T denotes the transpose see 105 , for instance. There are two main reasons why we will not follow this approach. 1. The formulation 5.18 clearly allows for statistical correlation between the densities, but does not garantee that the equalities 5.5 do hold. 2. Introducing a non diagonal C as the Hessian of the objective function substantially complicates the algorithm, as will become clear in our later developments. Methods based on 5.18 therefore constitute an alternative to those presented in this chapter and deserve a separate study.
5.3 The uncorrelated inverse shortest path problem
As our approach will make extensive use of the technique developed in Chapter 4, we rst recall that technique proposed therein to solve the uncorrelated inverse shortest path problem. In particular, let us illustrate again the concept of island used to characterize the violation of shortest path constraints. This is important to introduce the generalization of the concept of island. Remember that we may rewrite all the constraints of type 5.8 as
Eiw def nT w , bi 0 i = 1; : : :; h; = i
5:19
where w and fni gh=1 belong to Rm and b is in Rh . Note that constraints of type 5.8 have i no constant term, so that all bi in equation 5.19 would be zero. We then de ne the m jAj matrix N whose columns are the normals ni of the constraints of the active set A, where jAj is the number of constraints in A. Since the Hessian G of our objective function equals the identity, the MoorePenrose generalized inverse of N in the space of variables under the transformation y = G w simply becomes N def N T N ,1 N T ; = 5:20 and H def I , NN ; = 5:21 is then the reduced inverse Hessian of the quadratic objective function in the subspace of weights satisfying the active constraints. When a constraint 5.8 is violated or active, that is when a path pj has its weight greater than or equal to that of another path, g say, with the same origin and destination, pj and g determine together at least one island whose positive shore consists of the part of g that is not common with pj and whose negative shore consists of the part of pj that is not common to g see Figure 5.3. Of course a violated constraint may generate more than one such island the paths pj and g may indeed rst depart from each other, then join and depart again later, but each island necessarily
1 2
handling correlations between arc weights
78
I , , g pj , ,  I+ , , , , pj g
ss ss s s ss s s
g I,
pj
Figure 5.3: An island must correspond to a violated constraint that is implicit in the statement 5.8 a subpath of a shortest path is also shortest. We make the choice to consider each such constraint explicitly and therefore to associate one and only one island with each violated constraint. The algorithm only considers a subset of all possible islands and assigns an index, q say, to each one of them. For each such island, the sets Iq+ and Iq, are de ned to be the sets containing the arcs of its positive and negative shores respectively, while Iq def Iq+ Iq, . The excess of the island, denoted = Eqw or, more brie y, Eq is then given by
Eq w def =
X
+ ai 2Iq
wi ,
X
, ai 2Iq
wi :
5:22
We again illustrate these concepts within our rst example. If we assume that the weight of the path p1 is greater than that of g = 1; 5; 12; 27; 36; 38, the constraint that p1 is shortest is + violated, and these paths determine an island, I1 say, whose positive shore I1 contains the arcs , joining vertices 5, 12, 27 and 36, while its negative shore I1 contains the arcs joining vertices 5, 10, 15, 24, 29 and 36. We note that both shores of an island start at the same vertex and end at the same vertex. Remember that the inverse shortest path algorithm produces a set of dual feasible points and keeps a set of active constraints A, where each constraint is veri ed as an equality. The algorithm then proceeds by successively selecting the island whose excess is most negative and by adding it to its active set". This is achieved by increasing the weights of the arcs on the positive shore and reducing the weights on the negative shore until both shores are of equal weight. The process is continued until no violated constraint is left. In Chapter 4, the algorithm also explicitly handles the fact that the weights must remain nonnegative. This creates additional constraints that can also become active in the course of the calculation. When such a bound constraint is violated, we
handling correlations between arc weights
79
consider that it has a positive shore containing the arc whose weight is negative and an empty negative shore. That is why we partitioned the active set A into two subsets
A = V; Y ;
5:23
where V is the set of currently active islands with a nonempty negative shore and Y the set of active islands with empty negative shore the set of active bounds:
Y def fi 2 f1; : : :; mg j wi = 0g: =
5:24
Let us recall the de nition of several sets that will be generalized in the next section, in order to work in the space of class densities:
X def fi 2 f1; : : :; mg n Y j 9j 2 V : ai 2 Ij g =
and
5:25
5:26 X contains the indices of the arcs that appear in one of the active islands of V but are not xed at their lower bounds, while Z contains the indices of the arcs that are not involved at all in the active constraints of A. For i 2 X , we had de ned
Z def f1; : : :; mg n X Y : =
I + i def fj 2 V j ai 2 Ij+g and I ,i def fj 2 V j ai 2 Ij, g; = =
5:27
which is the set of active islands of V whose positive resp. negative shore contains the arc ai .
5.4 An algorithm for recovering class densities
Our purpose is now to develop a specialized variant of the GoldfarbIdnanI quadratic programming algorithm for recovering the class densities, as opposed to the arc weights. This variant will be similar in spirit to that of the previous section. However, it will clearly operate in a lower dimensional space, because the number of classes is typically much smaller than the number of arcs. From now on, we will therefore place ourselves in RL . Then, Ei depends on the class densities and can be written as follows:
Eid = nT d , bi i = 1; : : :; h; i
5:28
where d and ni both belong now to RL . N is a then matrix of dimension L jAj and the Hessian G of the objective function 5.17 is the identity matrix of order L. N and H are still de ned as above. If we wish to use the same approach as that recalled in Section 4 for the inverse shortest path problem, we will need to reexamine successively 1. the concept of island it will now feature both classes and arcs; 2. the computation of the dual step direction r as a function of N T nq , where nq is the normal to the constraint corresponding to the newly incorporated island;
handling correlations between arc weights
80
3. the update of the density values; 4. the computation of the primal step direction s when it is nonzero and that of the inner product sT nq when nq is linearly independent from the columns of N ; 5. the determination of the maximum steplength to preserve dual feasibility. This is the purpose of the next subsections.
5.4.1 Islands, dependent sets and their shores
If the concept of islands" is natural when considering arcs and paths as in the inverse shortest path problem, the naming of the same concept extended to our more general setting is much less obvious. The problem is that we have again constraints 5.8 and 5.12 that balance the weight of two paths sharing their origins and destinations thus de ning an island as above but we must also consider general linear relations between the class densities 5.13 and 5.14 where such a geographical" interpretation seems irrelevant. However, we can preserve the interpretation of a violated constraint containing two sets of variables: one that should increase and the other that should decrease for the constraint to be satis ed. These sets will then respectively correspond, in the context of general linear constraints, to the positive and negative shores of the islands in the context of shortest paths. The fact that this balancing" interpretation of the constraints then holds both for shortest paths and linear constraints results in substantial notational simpli cations. More precisely, we de ne the concept of dependent sets as follows. The ith dependent set Di is given by Di def fcl j il 6= 0g; = 5:29 where the cl are classes and where il are the coe cients of these densities in 5.13 and 5.14. Strengthening the analogy between islands and dependent sets, we also de ne the positive and negative shores of these sets by Di+ def fcl j il 0g = 5:30 and Di, def fcl j il 0g: = 5:31 As for the islands, we let Di = Di+ Di, .
5.4.2 The dual step direction
As in Chapter 4, the formulation of the dual step direction r = N nq can be rewritten, using 5.20, as r = N T N ,1N T nq : 5:32 Let us recall that this calculation can be performed by maintaining a triangular factorization of the matrix N T N ; since the matrix N is the unweighted generalized inverse of N , it will only be
handling correlations between arc weights
81
necessary to maintain a triangular factorization of the form
N T N = RT R;
5:33
where R is a upper triangular matrix of dimension jAj. Since N is of full rank, this is equivalent to maintaining a QR factorization of N of the form
N = Q1 Q2
R ! def
0
= QU;
5:34
as is the case in the numerical solution of unconstrained linear least squares problems. Indeed, it is straightforward to verify that 5.32 may be reformulated as min kNr , nq k2 : r 5:35 The second useful simpli cation due to the special structure of the problem arises in the computation of the product N T nq in 5.32. The resulting vector indeed contains in position i the inner product of the ith active constraint normal with the normal to the q th constraint. As both these constraints may now be interpreted as islands or dependent sets, we may exploit this similarity in expressing the value of N T nq . In order to state this expression in a reasonably compact form, we de ne some additional notations: l is the set of the arcs located in the class cl, i.e. l def fak j `k = lg : = 5:36 ,l Ii+ and ,l Ii, are the weighted" cardinalities of the positive and negative shores of Ii restricted to the arcs of l, that is ,l Ii def =
X
ak 2 l
k
ak 2 Ii
5:37
where the k are the proportions de ned by the equation 5.5. Similarly, we also de ne ,l Di+ and ,l Di, by ,l Di def il cl 2 Di : = 5:38 Finally, we use the symbol Ji to represent either Ii , if the ith constraint is a proper island, or Di if the ith constraint is of the type 5.13 or 5.14. By convention, we set Di = ; when Ji = Ii and Ii = ; when Ji = Di . ,l Ji+ and ,l Ji, are then given, according to 5.37 and 5.38, by ,l J def =
i
,l Ii ,l Di
if J is an island, if J is a dependent set.
5:39
We can now express the inner product of the normal to the q th active constraint with the normals of all other active ones.
handling correlations between arc weights Lemma 5.1 The vector N T nq appearing in 5.32 is given componentwise by
82
h
N T nq i =
i
X
l2Bi
+ , ,l Jg , ,l Jg ,l Jq+ , ,l Jq,
5:40
for i = 1; : : :; jAj, where g = g i is equal to the index of the ith active constraint, and where Bi is the set of indices of the classes that appear in constraints g and q, namely
9ak 2 l : ak 2 Ig Iq : Bi = l j or c 2 D D l g q
5:41
We rst prove the result in the case where the current and active constraints are islands, that is J = I in 5.39. We then consider the shortest paths constraints 5.8 and 5.12. Since the arc weights are bound to the class densities by 5.5, a constraint of type 5.8 can be written as
Proof.
X
ijai 2G
i d`i
X
ijai 2P
i d`i
5:42
where P is one of the paths pj and G is a path with same origin and destination as P , whose weight is less or equal than that of P ; a lower bound constraint on a shortest path 5.12 is then given by X 5:43 i d`i no ;nd ; where G is a path starting at node no , ending at nd and of weight less or equal than no ;nd . These formulations now express an island constraint in terms of class densities. By 5.36, our constraint 5.42 becomes
ijai 2G
0 L
[email protected] X
l=1 ai2 l
i
1 0 L
[email protected] X ai 2 G A dl
l=1 ai 2 l
i
1 ai 2 P A dl:
5:44
The normal to the constraint 5.44 is thus a vector of RL whose lth component is:
X
ai 2 l
ai 2 G , ai 2 P i :
5:45
When a shortest paths constraint is active or violated, we may introduce I + and I , in the formulation of 5.45, in place of G and P since the arcs in G n I + and in P n I , do not contribute to the violation excess value. Then, by 5.37, we obtain the following normal components for the kth constraint:
+ , nk l = ,l Ik , ,l Ik ;
for l = 1; : : :; L:
5:46 5:47
In the case where nk is related to the constraint 5.43, its lth component is reduced to
+ nk l = ,l Ik :
handling correlations between arc weights
83
, The formulation 5.46 is therefore valid for both constraint types because the negative shore Ik is empty for a lower bound constraint. Since h T i T N nq i = ni nq 5:48
and g is equal to the index of the ith active island, 5.40 holds for the island constraints when observing that the class indices involved l are those for which there exists at least one arc belonging both to the class and to a shore of the island Iq or Ig , that is, we can restrict l to the set Bi. The proof is totally similar when considering the dependent set constraints 5.13 and 5.14, using 5.29, 5.30, 5.31, and 5.38. For future reference, we note that, in the general case, nk l can be written as
+ , nk l = ,l Jk , ,l Jk ;
for l = 1; : : :; L;
5:49
using 5.39. 2 As a consequence of Lemma 5.1, the practical computation of r in 5.32 may be organized as follows: 1. compute the vector y 2 RjAj whose ith component is given by 5.40, 2. perform a forward triangular substitution to solve the equation RT z = y for the vector z 2 RjAj , 3. perform a backward triangular substitution to solve the equation Rr = z for the desired vector r.
5.4.3 Determination of the class densities
Before stating this result more precisely, the de nitions of the set V and of Y , X , Z , in 5.24, 5.25 and 5.26 respectively, have to be generalized for the correlated problem. The set f1; : : :; Lg is now partitioned as follows: the active set A still has the same formulation V; Y but V def VD VI = 5:50 where VD def fcurrently active dependent setsg, = 5:51 VI def fcurrently active islandsg. = 5:52 Observe that VD contains the set of equality constraints. These last constraints are obviously active provided they have been incorporated rst in the active set before handling other constraint violations. We now de ne special sets involving active bound constraints on individual class densities:
Y0 def fl 2 f1; : : :; Lg j dl = 0g; =
5:53
handling correlations between arc weights
84
5:54 with l being either the lower bound or the upper bound value at which the lth class density is currently xed. The value of l equals that of i =il, where i 2 VD is the index of the related bound constraint. Note that Y0 Y . To characterize the classes involved in active islands or dependent sets, we de ne the set of class indices appearing in active dependent sets
Y def fl 2 f1; : : :; Lg j dl is at a bound lg; =
XD def fl 2 f1; : : :; Lg n Y j 9j 2 VD : cl 2 Dj g; =
the set of class indices with which arcs in active islands are associated
5:55 5:56 5:57
XI def fl 2 f1; : : :; Lg n Y j 9j 2 VI : 9ak 2 l Ij g; =
and
X def XD XI =
Note that XD and XI are not necessarily disjoint. The set X thus contains the indices of the classes that are involved in one of the active islands or dependent sets of V but are not xed at a bound. The remaining class indices are the elements of the set Z ,
Z def f1; : : :; Lg n X Y : =
5:58
The set Z contains the indices of the class densities that are not involved at all in the active constraints of A. When we consider a class index l in X , de nitions analogous to 5.27 can be made: for l 2 XD , we de ne the sets D+ l and D, l as
D l def fj 2 VD j cl 2 Djg; =
5:59
that is the set of active dependent sets whose positive negative shore involves the lth class. By convention, we then set I l def ; if l 62 XI : = 5:60 Similarly, if l 2 XI , we de ne the sets I + l and I , l as
I l def fj 2 VI j 9ak 2 l Ij g; =
5:61
i.e. the set of active islands such that their positive negative shore involves an arc associated with the lth class. We then set D l def ; if l 62 XD : = 5:62
handling correlations between arc weights
85
We nally de ne, for l 2 X , F + l and F , l as follows:
F l def D l I l: =
5:63
The set F + l resp. F , l then contains active constraints that are not bound constraints and that involve the class cl in their positive resp. negative shore.
Lemma 5.2 Consider a dual feasible solution for the problem of minimizing 5.17 subject to
the constraints in the active set A = V; Y . Assume furthermore that, among the Lagrange multipliers fuk gjkAj , those associated with the active islands and dependent sets of V are known. =1 Then the class density vector d corresponding to this dual solution is given by
dl = l 2 Y n Y0
for l = 1; : : :; L.
Proof.
2 3 X X + , ,l Jg ug , ,l Jg ug 5 5:64 l + l 2 X Z dl + l 2 X 4
g2F + l g2F ,l
De ne the following sets:
the set of active islands related to lower bound constraints on a shortest path, i.e. constraints of type 5.12 BI def fq 2 VI j Iq, = ;g; = 5:65 the set of classes that are involved in the active constraints of type 5.12
XBI def fl 2 f1; : : :; Lg n Y j 9j 2 BI : 9ak 2 l Ij g; =
and the set of classes involved in the active constraints of type 5.82:
5:66 5:67
XBIc def fl 2 f1; : : :; Lg n Y j 9j 2 VI n BI : 9ak 2 l Ij g: =
The Lagrangian function of our problem is 1 Ld; u = 2
L X l=1
dl , dl 2 , SD , SIc , SI
5:68
where SD is the term involving the active constraints on the class densities, that is
SD =
X
g2VD
20 ug
[email protected]
X
l2f1;:::LgnY0
1 3 gl dl A , g 5 ; ,
5:69
SIc is the term involving the active shortest paths constraints, SIc =
2
X
g2VI nBI
2 ug 4
X
l2f1;:::LgnY0
3 + ,l Ig , ,l Ig dl 5 ;
5:70
c The sets XBI and XBI are not necessarily disjoint.
handling correlations between arc weights
86
and SI is the term involving the active lower bound constraints on a shortest path,
SI =
X
g2BI
20 ug
[email protected]
X
l2f1;:::LgnY0
1 3 + ,l Ig dl A , g 5 ;
5:71
where g is the lower bound no ;nd de ned in 5.12 and related to the g th constraint. Since the class densities indexed by Y are xed at their bound, 5.68 becomes X X c 2 dl , dl2 + 1 5:72 Ld; u = 1 2 l2X Z 2 l2Y nY l , dl , SD , SI , SI :
0
+ , When observing that, because of 5.38, gl = ,l Dg , ,l Dg , we can rewrite SD using 5.59 and permuting the two sums in 5.69:
SD =
X
l2XD Y nY0
2 3 X X X + , dl 4 ,l Dg ug , ,l Dg ug 5 ,
g2D+ l g2D, l
g2VD
g ug :
5:73
Similarly, using 5.37 and 5.61, we can modify SIc and SI to obtain:
SIc =
and
X
c l2XBI
2 X dl 4
+
g2I lnBI
+ ,l Ig ug ,
X
g2I ,lnBI
3 , ,l Ig ug 5
5:74
Expressing now the and 5.75 yields, for l 2 X Z , that
l2XBI g2I + l BI d;u condition @
[email protected] = 0 for l 2 X
SI =
X
2 X dl 4
3 X + ,l Ig ug 5 , g ug :
g2BI
5:75
Z and combining the terms from 5.74
5:76
dl =
l 2 X Z dl
P P + , + l 2 XD
Y n Y0 ,l Dg ug , g2D, l ,l Dg ug g2D l P + ug , P , ,l I , ug : + l 2 XI g g2I l ,l Ig g2I l
+ +
Finally, since dl = l for l 2 Y n Y0 , we obtain the desired expression of dl using 5.57, 5.60, 5.62 and 5.63 in 5.76. 2 Of course, the multipliers uq q 2 E in 5.64 are not constrained to be nonnegative.
5.4.4 The primal step direction
So far, we have been able to calculate N T nq , the dual step direction r by triangular substitutions, and the update of the class densities values. We now specialize the computation of the primal step direction s when it is nonzero and of its inner product with nq when nq is linearly independent from the columns of N .
handling correlations between arc weights
87
Lemma 5.3 Assume that the inverse shortest path algorithm has reached the point where the
primal step direction is to be computed, and assume that A = V; Y is the active set at this stage of the calculation. Then, when nonzero, the step direction in the primal space s is given by
2 3 X X , + sl = l 2 X 4 nq l + ,l Jk rk , ,l Jk rk 5
k2F , l k2F + l
5:77
for l = 1; : : :; L. Moreover, if q is the index of a violated island or a violated dependent set that is not a bound on a class density, we have that
sT nq =
where
X
i2D
2 3 X X , + nq i2 + nq i 4 ,i Jk rk , ,i Jk rk 5
k2F , i k2F + i
5:78 5:79
If q is the index of a bound constraint on the lq th class density, we then have that for a lower bound,
k D = l 2 X j or 9a2 2 l : ak 2 Iq cl Dq
sT nq = ,lq Jq+ +
for an upper bound,
X
k2F , lq
, ,lq Jk rk ,
X
k2F + lq
+ ,lq Jk rk ;
5:80
sT nq = ,lq Jq, ,
Proof.
X
k2F , lq
, ,lq Jk rk +
X
k2F + lq
+ ,lq Jk rk :
5:81
From the de nition of H in 5.21 and that of r = N nq , the primal step direction s= Hnq may be rewritten as nq , Nr. Then, using 5.49, the lth component of s can be expressed as:
sl = ,l Jq+ , ,l Jq, , sl = ,l Jq+ , ,l Jq, +
jAj X
Eliminating the null terms and using 5.63, we obtain that
k=1
+ , ,l Jk , ,l Jk rk :
5:82 5:83
X
k2F , l
, ,l Jk rk ,
X
k2F + l
+ ,l Jk rk :
Moreover, if l 62 X , sl must be zero since a class density at a bound cannot change as long as the bound constraint is active. We thus obtain 5.77 by using 5.46. Assume now that the q th constraint is a lower resp. upper bound on a class density the lq th one, say. Then, nq = eq resp. ,eq , the qth vector of the canonical basis. Hence the product sT nq is equal to sq resp. ,sq and 5.80 resp. 5.81 follows from 5.77 and the fact that q 2 X , since dq violates a bound.
handling correlations between arc weights
88
On the other hand, if the q th constraint is a violated island or a violated dependent set that is not a bound constraint, L X
sT nq = si ,i Jq+ , ,i Jq, 5:84 where si has been established in 5.77. Note that both ,i Jq+ and ,i Jq, may be nonzero, at variance with what happens in the uncorrelated inverse shortest paths problem in Chapter 4. Finally, we obtain 5.78 from 5.84 by eliminating terms whose contribution is zero. 2
i=1
5.4.5 The maximum steplength to preserve dual feasibility
Since equality constraints have been added for our correlated problem, the set S containing the constraints that are candidates to leave the active set A is
S = fj 2 f1; : : :; jAjg n VE j rj 0g
where
5:85 5:86
VE def fcurrently active equality constraintsg. =
5.4.6 The algorithm
We are now able to describe the algorithm for solving our correlated inverse shortest path problem. In this description, we will use a small machine dependent tolerance 0 to detect to what extent a real value is nonzero, and we de ne the integer = jAj. Note that the revision of the active set method is similar to that presented in Chapter 4. The following algorithm does not take linear constraints on densities 5.13 5.14 into account, simply because they can be handled by the Goldfarb and Idnani method.
Algorithm 5.1
step 0: Initialization. Set d d, f 0, A ;, 0 and u 0. Set also w w where w is de ned componentwize by
wi = i d`i for i = 1; : : :; m: 5:87
step 1: Compute the current shortest paths. step 2: Choose a violated constraint or exit.
For j = 1; : : :; nE , compute the shortest paths from sj1 to every vertex in Pj n fsj1g in the graph V ; A; w. Select Jq , an island whose excess Eq is negative, if any. If no such island exists, then d is optimal and the algorithm stops. Otherwise, compute the normal nq to the violated constraint reduced to the space of the densities RL according to 5.46.
handling correlations between arc weights
89
If = 0, then set knq k and go to Step 4. Otherwise that is if 0 set !
u
u :
0
5:88
step 3: Compute the dual step direction.
Compute the vectors z and r, so that 3 and 4.24 using Lemma 5.1. Compute also according to = knq k2 , kz k2:
q
5:89
step 4: Determine the maximum steplength to preserve dual feasibility. step 5: Determine the steplength to satisfy the qth constraint. If then go to Step 6b.
Determine the set S by 5.85, tf and possibly ` using 3.40 where u = u+ .
Otherwise, compute s and sT nq as described in Lemma 5.3, and tc according to 3.41.
step 6: Take the step and revise the active set. 6a: Compute the steplength t as min tf ; tc , set d d + ts, update the arc weight values
w by 5.5 and revise f according to 3.45 where u = u+ and u using u + t ,r u 1 : u+t
8
!
if
0;
if = 0:
5:90
If t = tc , set A A fq g, + 1 and go to Step 7a. Otherwise that is if t = tf set A A n f`g, , 1 and go to Step 7b. 6b: If tf = +1, then the problem is infeasible, and the algorithm stops with a suitable message. Otherwise, update the Lagrange multipliers according to 3.42 where u = u+ . Set A A n f`g, , 1 and go to Step 7b.
step 7: Revise the triangular factor R. 7a: Add the constraint normal nq to N . If = 1 then set R = and go to Step 1.
Otherwise that is if 1 update the upper triangular matrix R using
R R z
0 and go to step 1.
!
5:91
7b: Drop n` from N . Remove from R the column corresponding to the `th island, and
use Givens rotations to restore it to upper triangular form. Go to Step 3.
handling correlations between arc weights
90
In our algorithmic framework, the computation of the new values of the primal variables may be completely deferred after that of the dual step, in contrast with the original method of Golfarb and Idnani. Note that in the second step of our current implementation of the algorithm, we do not specify how to select a violated constraint. We examine two possibilities in the next section. Also remark that the calculation of tf ensures that equality constraints can never leave the active set.
5.5 Numerical experiments
In this section, we compare, on one hand, the performance of our method solving the correlated inverse shortest path problem against that of the method solving the uncorrelated problem, and, on the other hand, the performance of two implementations of the algorithm solving the correlated problem.
5.5.1 Implementation remarks
In our implementations, we used linked lists to represent the partition of the arcs into classes. The di culty has been to set up e cient data structures to be able to calculate ,l Jk easily, since the need of evaluating these quantities occurs at each stage of the algorihm. Handling at the same time both constraints on class densities and constraints expressed with arc weights has also been a challenging part of these implementations. We also developed a tool for generating various test problems: this generator is able to create random, 2D and 3D grids graphs with or without diagonals, generate layered or random densities with associated correlations for the arc weights; constraints can be generated at random, or between two or more faces when applied to grid graphs, including explicit, implicit, equality constraints, as well as other linear constraints that are useful to model real practical problems. Our algorithms were implemented in Fortran 77 on a DECstation 3100 using double precision arithmetic with the Mips f77 compiler. All shortest path calculations are performed using Johnson's algorithm with a binary heap 67 , which is presented in Chapter 2.
5.5.2 Correlated method uncorrelated method
We tested the Algorithm 5.1 against that presented in Chapter 4 or Algorithm 4.1 on correlated problems typically arising in tra c modelling and seismic tomography. These problems mostly contain constraints of type 5.8 with a few constraints of type general linear inequality and 5.14. A large amount of problems of increasing dimension were generated. Each problem was determined by specifying a sparse graph of n vertices joined by m arcs. The path constraints were generated randomly. Among those generated, we selected a few representative problems whose characteristics appear in Table 5.1. We recall that L is the number of classes in the correlated problem, and nE
handling correlations between arc weights
91
the number of known shortest paths de ning the constraints of the problem. The graph of test problem 2 is illustrated in Figure 1; it is in fact extracted from a larger graph covering a whole city in a realistic application.
L n m nE 1 9 16 24 12 2 36 49 84 24 3 100 121 220 56 4 289 324 612 144 5 625 676 1300 312 6 1600 1681 3280 650
Table 5.1: Test problems involving class densities The results of the test runs are reported in Table 5.2. The columns labeled CORRELATED" and UNCORRELATED" refer to the correlated method and the uncorrelated method respectively. As already mentioned above, the number of variables is much smaller when solving a correlated problem with the former method than with the latter. The label var" indicates this number of variables in each case. jAj is the number of active constraints at the solution. In the third column for each method, one can nd the number of dropped constraints, that is the number of minor iterations. The reader can then deduce the number of major iterations that were required to solve the problem as being the sum of the number of minor iterations and jAj + 1. Finally, the heading time" refers to the total cputime in seconds needed to obtain the solution and sp time" is the time in seconds spent in calculating shortest paths trees.
1 2 3 4 5 6 CORRELATED var jAj drops time sp time 9 3 0 0.230 0.011 36 4 0 0.433 0.144 100 4 0 0.894 0.425 289 13 0 9.750 4.703 625 41 1 130.582 49.304 1600 89 3 1958.714 473.363 UNCORRELATED var jAj drops time sp time 24 3 0 0.851 0.011 84 4 0 0.925 0.066 220 5 0 1.269 0.367 612 16 1 7.285 5.175 1300 31 2 41.078 34.046 3280 125 1 569.003 461.949
Table 5.2: Comparative test results for the correlated and uncorrelated algorithms The following gure shows the results obtained in Table 5.2 with the correlated algorithm. The lefthand histogram illustrates the total number of iterations partitioned into drops and major iterations. The righthand graphic brings forward the time spent in calculating shortest paths with respect to the overall algorithm runtime. The corresponding results obtained via the uncorrelated algorithm are represented in Figure 4.2 of Chapter 4. We rst notice that the correlated method runs faster on smaller problems 1 3, while it is not the case on larger ones 4 6. But, of course, the arc weights produced by the
handling correlations between arc weights
92
CORRELATED Algorithm
100 80
Iterations
sh. paths time / overall time
CORRELATED Algorithm
100% 80% 60% 40% 20% 0%
9 36 100 m 289 625 1600
60 40 20 0
9 36 100 m 289 625
1600
Drops
Major Iterations
Figure 5.4: The correlated algorithm: iterations per problem size and shortest paths calculation uncorrelated method lack the necessary correlation between their values, although these values do generally correspond in their order of magnitude. The usefulness of these weights can therefore be questioned for practical application. The new method, on the other hand, produces the desired correlations, as expected. The shortest paths tree calculation requires roughly the same amount of time for both methods, even when the problem's dimension increases; this time logically tends to vary in parallel with the number of major iterations. In order to explain the cputime di erences obtained for the tests 4 6 the correlated method taking much more time, despite the shortest paths computations being comparable, we evaluated the time spent in each procedure of both methods using the UNIX pro ler. It turns out that 40 of the time is used for the shortest paths calculation in the correlated method, while the proportion increases to 80 for the uncorrelated method. The additional computation in the new method corresponds to calculating the values of ,l Jk , and can take up to 50 of the total execution time. This calculation relates the smaller problem in term of class densities to the larger one in term of arc weights.
5.5.3 Selecting violated constraints
We now provide a comparison between the performance of two implementations of our correlated method. Both implementations start by incorporating the equality constraints, because they must be active at the solution. The rst variant then chooses the most violated constraint for inclusion in the active set a greedy approach. The second just incorporates the rst violated constraint found in the list of the original problem constraints instead of the most violated one. This latter
handling correlations between arc weights
93
strategy might prove cheaper in computation time because the procedure of selecting the next constraint to add to the active set is considerably simpler. We tested these variants on problems where the number of equality constraints is half the total number constraints. Characteristics of 6 test problems are given in Table 5.3.
L n m nE 7 9 16 42 12 8 36 49 156 24 9 64 81 144 24 10 225 256 930 60 11 256 289 544 70 12 441 484 924 140
Table 5.3: Test problems with equality constraints The results of these test runs are summarized in Table 5.4.
jAj
1 8 2 18 3 24 4 76 5 68 6 154
MOST VIOLATED MV drops time sp time 0 0.188 0.044 0 0.457 0.093 5 1.386 0.292 15 60.105 7.046 6 31.214 3.984 17 373.980 27.191
FIRST VIOLATED FV drops time sp time 2 0.179 0.004 1 0.425 0.051 17 1.960 0.183 76 135.450 4.281 88 94.839 5.796 327 1786.385 56.53
Table 5.4: Test results on equality constraints Again, we propose in Figure 5.5 illustrations of the results obtained by these variants of the correlated algorithm. The contents of lefthand and righthand graphics are the same as those explained above for Figure 5.4. Upper illustrations apply for the algorithm choosing the rst violated constraint, and lower graphics illustrate the algorithm choosing the most violated constraint as candidate to enter the active set. We note that variant MV gives the least number of drops and also the smallest computing time for the larger problems. Variant FV only seems interesting for smaller cases. We also tried to handle the equality constraints just as the other ones without giving them priority to enter the active set. The obtained results are sometimes better than the worst of MV and FV, but never better than the best. Of course, more experience is required before drawing extensive and de nitive conclusions. But we feel that the reported tests already illustrate some major trends in the use of inverse shortest path algorithms.
handling correlations between arc weights
94
CORRELATED Algorithm
with FIRST violated constraint
CORRELATED Algorithm
with FIRST violated constraint
500 400
Iterations
sh. paths time / overall time
100% 80% 60% 40% 20% 0%
9 36 64 m 225 256 441
300 200 100 0
9 36 64 m 225 256
441
Drops
Major Iterations
CORRELATED Algorithm
with MOST violated constraint
CORRELATED Algorithm
with MOST violated constraint
500 400
Iterations
sh. paths time / overall time
100% 80% 60% 40% 20% 0%
9 36 64 m 225 256 441
300 200 100 0
9 36 64 m 225 256
441
Drops
Major Iterations
Figure 5.5: Algorithm variants: iterations per problem size and shortest paths calculation
6 Implicit shortest path constraints
In this chapter, we examine the computational complexity of the inverse shortest paths problem with upper bounds on shortest path costs. The presence of upper bounds on shortest path costs makes the inverse shortest path problem harder to solve. Indeed, such an upper bound constraint restricts the cost of one path that is not known explicitly, and therefore cannot be expressed as one or more linear constraints. Actually, our problem then can become nonconvex. Solving this problem has interesting implications, namely in seismic tomography where ray paths between known locations are usually not observable and hence unknown. We will prove that obtaining a globally optimum solution to this problem is NPcomplete: we show that a polynomial transformation of the wellknown 3SAT problem can be viewed as an inverse shortest path problem with upper bounds on shortest path costs. An algorithm for nding a locally optimum solution is then proposed and discussed. The local optimality conditions allow to de ne a stability region" around a local solution when the shortest paths de ned at that solution are unique. A combinatorial strategy is set up when the shortest paths are not unique. This is necessary to obtain a solution for which a stability region can be de ned again. We will see that the stability region of a local solution depends on the second shortest path costs: the idea is to de ne a region in which the explicit de nition of some shortest paths does not change. Our algorithm using an enumeration strategy has been implemented and tested on problems arising in practical applications. These tests bring forward the fact that very few problems among those not generated at random need the recourse to our combinatorial strategy. They also illustrate that the combinatorial aspect of our problem may appear in practice, although the shortest path uniqueness is usually expected in double precision arithmetic. The content of this chapter is reported in 17 .
6.1 Motivating examples
The number of possible and interesting variants of the inverse shortest path problem is large. Yet many applications including tomography and some tra c modelling questions feature a speci c class of constraints in their formulation: bounds on the total weight of shortest paths between given origins and destinations. Unfortunately, only lower bounds on paths costs have
95
implicit shortest path constraints
96
been considered so far. It is the purpose of this chapter to examine the more di cult case where upper bounds are present as well. We now motivate this development with two examples. The rst arises from seismic tomography. In this eld, one is concerned with recovering ground layers densities from the observations of seismic waves 79 . According to Fermat's principle, these waves propagate along rays that follow the shortest path in time across the earth crust. One can then measure, usually with some error, the propagation time of these rays between a known source and a known receiver. The problem is then to reconstruct the ground densities from these observations. One approach 87 uses a discretisation of the propagation medium into a network whose arcs have weights inversely proportional to the local density. In this framework, one is then faced with the problem of recovering these arc weights from the knowledge of intervals on seismic rays travel times and from a priori geological knowledge, the ray paths themselves remaining unknown. This is an inverse shortest paths problem with bound on the paths' weights. The second example is drawn from tra c modelling. In this research area, graph theory is used to create a simpli ed view of a road network. An elementary and often justi ed behavioural assumption is that network users choose perceived shortest routes for their journeys 92, 13, 84 . Although these routes might be observable, their precise description might vary across time and individuals, and their travel cost is usually subject to some estimation. This naturally provides bounds on the total time spent on shortest paths whose de nition is unavailable. Recovering the perceived arc costs is an important step in the analysis of network users' behaviour. This is again a problem of the type considered in this chapter. In the next section, we formalize the problem and explain why this bound constraints on shortest paths costs cannot t into the framework of classical convex quadratic programs, as used in Chapter 4 and 5.
6.2 The problem
We now de ne the problem more formally and discuss its special nature. Consider a directed weighted graph V ; A; w, where V ; A is an oriented graph with n vertices and m arcs, and where w a set of nonnegative weights f w igm associated with the arcs, where i=1 we use i to denote the ith component of a vector. Let V be the set of vertices of the graph and A = fak gm=1 be the set of arcs. A path is then de ned as a set of consecutive arcs of A. k As presented in Chapter 4, the idea of the inverse shortest path method is to determine the arc weights that are as close as possible to their expected values, subject to satisfying the shortest path constraints. Denoting these a priori expected values by fwigm and choosing the `2 norm to i=1 measure the proximity between the weight vectors w and w, we therefore consider the following least squares problem
m X minm f w = 1 w i , w i 2 2 i=1 w 2R
6:1
implicit shortest path constraints
97
subject to the constraints
w i 0; i = 1; : : :; m
6:2 6:3
and the bound constraints on the cost of shortest paths
X
a2p1w q
w a uq ; q = 1; : : :; nI ;
where p1 w is the shortest path with respect to the weights w starting at vertex oq and arriving q at the vertex dq 1 . The values of uq are upper bounds on the cost of the shortest path from oq to dq. We allow uq to be in nite. Notethat the shortest path p1w is not necessarily unique for a q given w. This will have important implications later in this chapter. The method proposed in Chapter 4 is based on the quadratic programming algorithm due to Goldfarb and Idnani 55 . The idea is to compute a sequence of optimal solutions to the problem involving only a subset of the constraints present in the original problem. The method therefore maintains an active set of constraints. Starting from the unconstrained solution, each iteration incorporates a new constraint in the active set, completing what we call a major iteration. To achieve this goal, it may be necessary to drop a constraint from the the active set. These drops occur in minor iterations. Incorporating upper bound constraints of the type 6.3 in the active set is complex. The di culty is that expression 6.3 only de nes the path p1 w implicitly, while adding a constraint q to the active set as a linear inequality on arc weights requires an explicit de nition of that constraint of the form X w a uq 6:4 where one needs the explicit de nition of the path p as a succession of arcs to specify which arcs appear in the summation. When such a constraint is activated, one naturally chooses a path which is currently shortest given the value of the arc weights w a. However, as these weights are modi ed in the course of the optimization, a path that is shortest between a given origin and destination may vary, and therefore the explicit de nition of the constraint in the form 6.4 should also vary accordingly. An immediate consequence of this observation is that, besides adding and dropping constraints of the type 6.4 from the active set, one should also keep track of the modi cations in the explicit de nitions of the constraints 6.3 which might in turn modify the active set.
a2p
6.3 The complexity of the problem
6.3.1 The convexity of the problem
The di culty of handling constraints of type 6.3 can be partially explained by the fact that they generate a nonconvex feasible region. Our goal of nding a global minimizer of the objective
The meaning of the superscript 1 in p1 w is to indicate that the shortest path is considered, as opposed to q the second shortest.
1
implicit shortest path constraints
98
function 6.1 with a method of low complexity then appears much more di cult, despite the fact that the objective is strictly convex. Let us recall the small example of Chapter 3 dedicated to illustrate the nonconvex nature of our constraints. Consider the following graph, composed of 3 vertices and 3 arcs m = 3, shown in Figure 6.1.
o
t
,@ a2 , @ , @ , @
a1
t
a3
t
d
Figure 6.1: A small graph Consider now the problem of minimizing 6.1 subject to the constraint
X
a2p1w
w a 5;
6:5
where p1w is the shortest path with respect to the weights w from vertex o and to vertex d. It is easy to see that w1 = 2 2 10T and w2 = 10 10 4T are feasible solutions, while 1 w1 + w2 = 6 6 7T is infeasible. The problem is therefore nonconvex. 2
6.3.2 The 3SAT problem as an inverse shortest path calculation
In this section, we use the terminology of the complexity theory introduced in Chapter 3. As mentioned in Chapter 4, the original inverse shortest path problem is solvable in polynomial time since an equivalent formulation of the problem contains a polynomial number of convex constraints. The original problem then belongs to the class P of decision problems solvable in polynomial time by a deterministic algorithm 50, 69, 111, 113 . A problem is NPhard if every problem in the class NP of problems, solvable in polynomial time on a nondeterministic Turing machine, can be transformed to it. See 50 . Cook 24 proved that there exist NPhard problems by showing that the satis ability" problem has the property that every other problem in NP can be polynomially reduced to it. Therefore, if the satis ability problem can be solved with a polynomial time algorithm, then so too can every problem in NP. In e ect, the satis ability problem is a hardest" problem in NP. Many other combinatorial problems, such as the travelling salesman problem, have since been proved to have this same universal" property. Vavasis proved in 113 that the general nonconvex quadratic problem is one of those hard problems in NP. The class of NPcomplete problems consists of all such problems which belong themselves to NP. We now show that the addition of the constraints 6.3 makes the inverse shortest path problem NPhard. This we do by reducing a known NPhard problem to it.
implicit shortest path constraints
99
A particular instance of the satis ability problem, the 3SAT problem, is one of the best known NPcomplete problem. We follow 50 for its brief description. Let X be a set of Boolean variables fx1; x2; : : :; xlg. A truth assignment for X is a function t : X ! ftrue; falseg. Let x be a variable in X , then we say that x is realized under t if tx = true. The variable :x will be realized under t if and only if tx = false. We say that x and :x are literals de ned upon the variable x. A clause is a set of literals over X , such as fx1 ; x2; :x3g, representing the disjunction of those literals and is satis ed by a truth assignment if and only if at least one of its member is realized under that assignment. A set C of clauses over X is satis able if there exists some truth assignment for X that simultaneoulsy satis es all the clauses in C . The 3SAT problem consists in answering the question: is there a truth assignment for C , when the clauses in C contain exactly 3 literals over X ? Cook proved that this problem is NPcomplete 24 . We can show that another problem is NPhard by showing that 3SAT can be polynomially transformed to it. Let ISP denote the decision problem: given an inverse shortest path problem and a bound k, does there exist a solution with objective value at most k? We show that ISP is NPcomplete, which implies that the inverse shortest path problem is NPhard.
Theorem 6.1 ISP is NPcomplete. Proof.
We proceed as follows:
1. show that problem ISP is in NP, 2. construct a transformation from 3SAT to ISP, and 3. prove that this is a polynomial transformation. The rst requirement is easy to verify in our case, because the shortest path problem itself can be solved in polynomial time and all a nondeterministic algorithm solving ISP need do is guess a set of arc weights and verify in polynomial time that they satisfy the constraints. Let us now examine the second requirement and consider a 3SAT problem with l variables and p clauses. We represent each variable xi by a small subgraph with two distinct paths between a node si to a node di see Figure 6.2. The variable xi will be true or false depending
si
, 1, , , @ R @
[email protected] @
t tt tt t
1 u1 i
@ R @ , ,
u2 i li2
@ @ , ,
1 1
di
li1
1
Figure 6.2: The representation of xi on whether the shortest path from si to di follows the upper path via vertices u1 , u2 or lower i i
implicit shortest path constraints
100
one via vertices li1, li2 of its associated graph. Imposing that
si = di,1 for i = 2; : : :; l,
6:6
we obtain a chainlike" resulting graph representing our Boolean variables. A path from vertex s1 to vertex dl in this graph is therefore equivalent to a truth assignment of all Boolean variables. We assign an initial cost of 1 to each of the six arcs of the Boolean graph" xi . We now describe a representation of our p clauses. A clause c of the 3SAT problem is a disjunction of type xi _ xj _ :xk , for instance. The clause c will be associated with the choice among three possible paths going from a vertex named ac to a vertex named bc , where ac and bc are di erent from the vertices of the Boolean graphs xi i = 1; : : :; l. Each of the three paths is formed by three consecutive oriented arcs. The rst arc originates at vertex ac and has a zero cost, and the last one terminates at vertex bc and has a zero cost too. The middle arc is determined as one of the arcs li1; li2 or u1; u2 depending on whether the considered variable xi in the clause is i i negated or not. The subgraph associated with a clause c of the type xi _ xi+1 _:xk is illustrated in Figure 6.3.
si
A
A
A
A 0
A +
A 0
0 A U A
A
A
A @ , @ R , 1 @ R R , 1 B , A , B @, @ d @ , sk ,A i+1 B, @ ,R @A @ , R , R @  , @ @  B , A 1 , B BB N B B 0 0 B B 0 B B B
ac
s
s s s s s s ppp s s s s ss s s s s s
bc
dk
s
Figure 6.3: The subgraph associated with clause c Our representation of the variables xi and the clauses cj generates an weighted oriented graph, which we call G . The original cost of any path between ac and bc c = 1; : : :; p is 1, and the cost of the shortest path from s1 to dl is 3l. The 3SAT problem then is equivalent to the question: is there a choice of nonnegative arc weights in G such that the cost of the shortest path between
implicit shortest path constraints
101
each pair of nodes ac ; bc is zero as well as that of the shortest path from s1 to dl , and such that the `2 distance of these weights to the original weights is at most 3l? The equality constraints on the shortest paths in this formulation may be replaced by upper bound constraints provided that we require the arcs weights w to be as close as possible to the original weights w. The resulting problem is therefore min kw , wk2 w subject to 6:7 6:8 6:9 6:10
w0
costac ; bc 0 c = 1; : : :; p costs1 ; dl 0;
where costn1; n2 is the cost of a shortest path from n1 to n2 in G . We recognize, in the formulation 6.7 6.10, an instance of our inverse shortest path problem with upper bound constraints on the cost of shortest paths. We thus have found a transformation from the 3SAT problem to ISP. Finally, it is easy to see that this transformation is polynomial, since the instance of ISP we constructed has 6l + 6p arcs and 5l + 1 + 2p nodes. This complete our proof that ISP is NPcomplete. 2
6.4 An algorithm for computing a local optimum
We saw in Chapter 3 that for convex problems, each critical point is a global optimum. In our nonconvex context, we shall be happy with a local minimum, that is a set of weights w such ^ that, for all w in the intersection of the feasible domain and a neighbourhood of w, one has that ^ f w f w. ^ In Chapter 4 and 5, we considered a dual approach to nd a global solution to the inverse shortest paths problem, because of its robustness and the insurance to reach a global optimum. Yet, this approach presents a drawback in our new context: it heavily relies on convexity. On the other hand, primal methods typically generate a sequence of primal feasible iterates ensuring a monotonic decrease of the objective function. They do not rely so much on convexity and have the further advantage of giving a approximated solution satisfying the constraints when the iteration process is interrupted. This approach from the inside" is the one we have chosen to follow. The general outline of our proposal is as follows. 1. We rst compute a feasible starting point. 2. At each iterate, we revise the explicit de nition of the shortest path constraints, and solve the resulting convex problem, using the algorithm proposed in Chapter 4. 3. The calculation is stopped when no further progress can be obtained in this fashion.
implicit shortest path constraints
102
6.4.1 Computing a starting point
Selecting a good starting point is important in this framework. We propose to use Algorithm 4.1 to nd our starting weights. This algorithm will indeed compute an optimal solution of a variant of the problem where the explicit description of the path constraints is kept xed at that chosen in w. This variant is m 1X 6:11 minm 2 w i , w i2 w2R i=1 subject to w i 0; i = 1; : : :; m 6:12 and the bound constraints on the cost of the a priori shortest paths
X
a2p1w q
w a uq ; q = 1; : : :; nI :
6:13
It is important to note that w1, the solution of 6.11 6.13, is feasible for the original ISP, although it may not be optimal because, at w1, the explicit de nition of the shortest path constraints may di er from that at w. This calculated vector is therefore a suitable starting point for a primal algorithm.
6.4.2 Updating the explicit constraint description Given a set of weights w, we must choose an explicit description of the constraints associated with w. Together with the quadratic objective function, this new explicit description then de nes
a convex quadratic program. We emphasized the word choose" above because there might be more than a single shortest path p1 w between the origin oq and destination dq of the q th original constraint in the ISP. q Denoting the number of shortest paths from oq to dq for a given w by nq w, we de ne p1w; ij q ij = 1; : : :; nq w as the ij th shortest path from oq to dq . This de nition assumes that we ordered the nq w shortest paths, for instance using lexicographic order. For convenience, we rede ne p1 w as the rst" shortest path from oq to dq , that is q
p1 w def p1w; 1: = q q
6:14
For future reference, the possible convex feasible regions determined, for a given w, by the constraints 6.2 6.3 will be denoted by F w; i1; : : :; inI , where iq q = 1; : : :; nI varies from 1 to nq w. Again, for convenience, we de ne
F w def F w; 1; : :z:; 1: = 
nI
6:15
Using these notations, P w and P w; i1; : : :; inI respectively denote the problem of minimizing f w subject to w 2 F w, or in F w; i1; : : :; inI . Finally, F denotes the generally nonconvex feasible domain determined by 6.2 and 6.3. Updating the constraint description at w therefore amounts to specifying F w; i1; : : :; inI for some choice of the indices i1; : : :; inI .
implicit shortest path constraints
103
6.4.3 Reoptimization Once F w; i1; : : :; inI has been determined at the feasible point w, it is possible to solve the associated convex quadratic program P w; i1; : : :; inI . This process is called reoptimization". Because we assume that reoptimization will always take place at a point w which is the solution of another subproblem P w0 ; i01; : : :; i0nI , it is not di cult to see that the new subproblem di ers
from the old one in two di erent ways. 1. Some constraints of P w0; i01; : : :; i0nI are now obsolete because the associated path, although shortest for w0 , is no longer shortest for w. These constraints must be replaced by constraints whose explicit description corresponds to paths that are shortest for w. 2. Although p1w0; i01; : : :; i0nI can still be shortest for w, another shortest path between oq and q dq may be chosen to de ne the new subproblem. The constraint whose explicit description corresponds to p1w; i01; : : :; i0nI must then be replaced by another constraint with explicit q description corresponding to p1w; i1; : : :; inI . q As a consequence, some linear inequalities of the form 6.4 are dropped from the subproblem and some new ones are added. Adding new linear inequalities can be handled computationally by using the GoldfarbIdnani dual quadratic programming method, as is already the case in Chapter 4. Removing linear inequalities can be handled much in the same way by computing the GoldfarbIdnani step that would add them and then taking the opposite. These calculations are straightforward applications of the method presented in Chapter 4; they are detailed and illustrated in Section 6.5.
6.4.4 The algorithm
We are now in position to specify our proposal for an algorithm that computes a local solution to ISP.
Algorithm 6.1
Step 0: Initialization
Compute w1 using Algorithm 4.1, the inverse shortest paths algorithm of Chapter 4 for solving P w. Set i 1 and C1 1; : : :; 1
Step 1: Update the feasible region Compute F wi; Ci. Step 2: Reoptimization
Compute wi+1 the solution of P wi ; Ci, using the inverse shortest paths Algorithm 4.1. If wi+1 6= wi , set i i + 1 and go to Step 1 with Ci = 1; : : :; 1.
I Is there, amongst the nc = n=1 nq wi possible shortest path combinations at wi , one that i q
Step 3: Choose another shortest path combination Q
implicit shortest path constraints
104
has not been considered yet? If no, stop: wi is a local minimum of P w. Otherwise, rede ne Ci to be i1; : : :; inI , the nI uple of indices corresponding to an untried combination and go to Step 1. The reader might wonder if the possibly costly loop between Steps 3 and 1 is necessary. We now show that this is the case by providing a simple example, in which it is not su cient to examine the 1; : : :; 1 combination of shortest paths only, or even to consider every possible shortest path separately. Consider the small graph, composed of 9 vertices and 11 arcs, shown in Figure 6.4.
b f e
Figure 6.4: A small example showing path combinations Let us assume that w i = 10 for i = 1; : : :; 11, and consider the problem of minimizing 6.1 with m = 11 subject to 12 constraints of type 6.3, de ned by
tt tt ttt t t
g
, I @ @, I ,
[email protected] , @
I , @ , @ c 6 d , 6 I @ @, 6
i
h
8
o1 = a; o2 = a; o3 = d; o4 = d; o5 = e; o6 = f; o7 = h; o8 = i; o9 = e; o10 = h; o = b; : o11 = c; 12
d1 = b; d2 = c; d3 = b; d4 = c; d5 = f; d6 = g; d7 = i; d8 = g; d9 = a; d10 = a; d11 = g; d12 = g;
u1 = 10; u2 = 10; u3 = 5; u4 = 5; u5 = 10; u6 = 10; u7 = 10; u8 = 10; u9 = 5; u10 = 5; u11 = 5; u12 = 5:
6:16
We directly see that, at any solution, all arcs but ad will have a weight equal to 5, since nq = 1 for all q 6= 1; 2. Suppose now that, for these latter constraints, the shortest paths have been ordered as in 6.16 and have been considered by the algorithm in that order. As a consequence, solving
implicit shortest path constraints
105
the problem in the feasible region F w; 1; : : :; 1 will give the solution w i = 5 i = 1; : : :; 11, ^ since the shortest path from a to b and that from a to c both use vertex d. The objective function value at w is 137.5. ^ Note now that, at w, the shortest paths are not unique between the od pairs a; b and ^ a; c, since the paths a , f , b and a , i , c are also shortest. Furthermore, this set of weights can be improved by considering P w; 2; 2; : : :; 1, whose solution has every arc weight equal to ^ 5 except that of arc ad, which is equal to 10 and where the objective function has the value 125. Moreover, examining every possible shortest path separately would not allow any progress because successively solving P w; 2; 1; : : :; 1 and P w; 1; 2; : : :; 1 still gives the same solution w. ^ ^ ^ It is therefore crucial to consider every combination of shortest paths that are not unique at a potential solution.
6.4.5 Some properties of the algorithm
In this section, we examine some properties of the algorithm proposed above. In particular, we show its termination and analyze the stability" of the local solution it produces.
Theorem 6.2 Algorithm 6.1 above terminates in a nite number of iterations.
The number of paths between two vertices is nite, since the number of arcs m is nite. As a consequence, the number of di erent convex polygons F wi; Ci computed at Steps 1, and nc calculated in Step 3 are also nite. The algorithm consists of a sequence of convex inverse i shortest path problems di ering by the actual shortest paths used in the explicit description of the constraints. Furthermore, each of these subproblems is considered at most once and is solvable in a nite number of operations. The complete algorithm therefore also terminates in a nite number of steps. 2 Let us consider the point w obtained at termination of the algorithm. We now show that w ^ ^ is a local minimum of our problem 6.1 6.3 and analyze the neighbourhood V w around w in ^ ^ which every other feasible point has a higher objective function value. In other words, we show that w is locally stable" as a local minimum in a neighbourhood of V w of w in which all the ^ ^ ^ explicit shortest paths de ning the constraints 6.3 remain unchanged when they are unique. The solution's stability" therefore depends on how far" the second shortest paths are from w. ^ Considering the q th shortest path constraint, we denote the cost of the optimal" shortest path from oq to dq by Pq1 , that is X Pq1 def = w a: ^ 6:17
Proof.
We already mentioned that p1w may be not unique, although Pq1 is. We then de ne a second q ^ shortest path from oq to dq as a path whose cost is closest but strictly larger than that of p1 w, q ^ 1 . The rst such second shortest path in our prede ned path order is denoted, if it exists, i.e. Pq by p2w and its cost by Pq2 . If p2 w does not exist, then we set Pq2 = 1 by convention. With q ^ q ^ these additional notations, we are now in position to state the next property of our algorithm.
a2p1 w q ^
implicit shortest path constraints
106
Theorem 6.3 The point w computed by Algorithm 6.1 is a local optimum of P w, the original ^
problem. Moreover, f w f w for every w in ^ where k k1 is the usual `1 norm.
V w def w 2 F j kw , wk1 min Pq2 , Pq1 ; ^ = ^ q
Proof.
6:18
Let us consider the conditions under which p1 w may vary around w, and de ne ^ q ^ a stability neighbourhood Vq w associated with each shortest path constraint. Four cases need ^ to be examined.
1. Pq2 = 1 and nq = 1. 2. Pq2 = 1 and nq 1.
In this situation, the path from oq to dq is unique and p1 w is obviously constant for all q m . We then de ne Vq w = Rm F = F . w2R ^ There are now more than one path from oq to dq , but they all have the same cost Pq1 . In this case, an in nitesimal change in the cost w may cause the feasible polygon de ned ^ at w to change. However, since w is a point produced by our algorithm, choosing any of ^ ^ the nq , 1 other possible polygons does not produce an objective function decrease. This indicates that f w may not be improved upon in the neighbourhood Vq w = F . ^ ^ In this situation, the explicit description shortest path p1w will not change until its q ^ cost reaches that of the second shortest path. More precisely, p1w is constant in the q neighbourhood Vq w = fw 2 F : kw , wk1 Pq2 , Pq1g: ^ ^ 6:19 This is a combination of the two previous cases. As above, f w cannot be improved upon in the neighbourhood Vq w = fw 2 F : kw , wk1 Pq2 , Pq1 g. ^ ^
3. Pq2 6= 1 and nq = 1.
4. Pq2 6= 1 and nq 1.
Moreover, the algorithm's mechanism implies that we cannot nd a point better than w by ^ considering all combinations of constraint de nitions as examined above for a single constraint. As a consequence, w will be a stable" solution in the neighbourhood ^
V w = ^
2
nI q=1
Vqw; = w 2 F : kw , wk1 min Pq2 , Pq1 : ^ ^ q
6:20
We now examine the case where the original ISP problem also features lower bounds on the costs of the shortest paths between given origins and destinations, that is constraints of the type 0 lq
X
a2p1w q
w a;
6:21
implicit shortest path constraints
107
for q = 1; : : :; nI where lq can be chosen as zero and lq uq . These constraints are much easier to handle, because the inequality 6.21 must be satis ed for every possible path from oq to dq . Of course, the number of these linear constraints is typically very high, but the situation is entirely similar to that handled in Chapter 4 and 5. As a consequence, the technique developed in these last chapters is directly applicable to each convex subproblem arising in the course of the solution of problem 6.1 6.3, 6.21.
6.5 The reoptimization procedure
In this section, we analyse the reoptimization procedure of Step 2 of our algorithm and give some results about updating the objective function f and the variables of the problem. Let us rst introduce some notations.
6.5.1 Notations
Let us consider the problem P w; i1; : : :; inI . If we freeze" the arc weight values w, we may rewrite the ith constraint of type 6.3 as
Eiw def nT w , bi 0 i = 1; : : :; nI ; = i
6:22
where ni 2 Rm and bi = ,ui for i = 1; : : :; nI . The vector ni represents the normal to the ith constraint. The matrix of the normal vector of the constraints in the active set indexed by A will be denoted by N . A, will denote a subset of A containing one fewer element than A, and N , will represent the matrix of normals corresponding to A, . The normal nr will designate the column deleted from N to give N , . The index set A, then designates A n frg. Since the Hessian G of the objective function 6.1 equals the identity, the MoorePenrose generalized inverse of N in the space of variables under the transformation y = G w simply is
1 2
N def N T N ,1 N T ; =
and
6:23
H def I , NN ; = 6:24 the orthogonal projection on the nullspace of N , is then the inverse reduced Hessian of the
quadratic objective function in the subspace of weights satisfying the active constraints. Denoting the gradient of f , by g w = w , w, we designate the Lagrange multipliers at the point w by uw. Let us de ne P A as the problem of minimizing 6.1 subject to the subset of constraints 6.22 indexed by A and considered as equalities. As proved in Chapter 3, at the optimal solution w of problem P A, we can write 3.33 and 3.34 as ^
uw N g w 0; ^ ^
and
6:25 6:26
Hg w = 0; ^
implicit shortest path constraints
108
respectively. This formulation comes from the fact that g w is a linear combination of the ^ columns of N , g w = Nuw, as is implied by the rst order condition 6.25. Remember that ^ ^ conditions 6.25 and 6.26 are also su cient to characterize w. ^ , will denote the operator 4.9 with N replaced by N , and we will use similar Finally, H notations for u.
6.5.2 How to reoptimize Suppose that w is the solution to the problem P A, and that there exists 1 i nI such that ^
X
a2p1w i ^
wa ^
X
a2pi
w a; ^
6:27
where pi is the path from oi to di that was considered as shortest between these two vertices just before reaching w as current solution. Let r refer to the index to the explicit active constraint ^ associated to the path pi which is no longer shortest this supposes that r 2 A. The needed reoptimization then consists of nding w , the optimal solution to P A, since constraint r ceases being active, that is w = w2M f w; min 6:28 where M = fw 2 Rm j nT w = bi; i 2 A, g. Since w 2 M, w is attained at the wellknown point ^ i , g w see Theorem 3.6. This solves the reoptimization in the primal space. w,H ^ ^ Now, in order to update the the Lagrange multipliers uw, we need the steplength t such ^ that w = w + ts; ^ 6:29 where s is the primal step direction solving the reoptimization starting from w. Then, ^
uw = uw + td; ^
6:30
where d is the dual step direction solving the reoptimization. We are now concerned to nd t and d. The method to nd w from w is precisely the reverse of that computing w from w , that is, ^ ^ , frg, knowing that of P A,. Figure 6.5 shows this feature computing the solution of P A when jAj = 2. The reverse problem is solved via the method of Goldfarb and Idnani see Chapter 3. Since nr is linearly independent from fni gi2A, , we apply that method in the case where the constraint to be added of normal nr is linearly independent from the active set A, . Assuming that w is the optimal solution to P A, , we obtain the optimum of P A, frg at w = w + ts0 ^ by Lemma 3.7, where s0 = H , nr : 6:31 , ! ,! u w + t ,d , with The corresponding Lagrange multipliers are uw = ^ 0 1
d, = N ,nr
6:32
implicit shortest path constraints
109
feasible region nr w ˆ ni
feasible region ni w ˆ nr s
w*
w ¯
w ¯
Figure 6.5: Solving P A and P A, : one iteration and Since s = ,s0 and d =
,1
d, , we have proved the following theorem.
!
E w t = ur w = , sr0 T n : ^
r
6:33
Theorem 6.4 If w is solution of P A, and if s = ,H , nr , then, the weight vector w = w + ts, ^ ^ , , that is, the primal optimality of such that t = ur w, veri es the optimality conditions of P A ^ w H ,g w = 0; 6:34
the primal feasibility of w
Eiw nT w , bi 0 i 2 A, ; i
and the dual feasibility of w
6:35 6:36 6:37
u, w N , g w 0: f w , f w = 1 ur w 2nT s 0: ^ ^ r 2
As a consequence, note that because nT s 0, since H , is positive semide nite. r
6.6 Some numerical experiments
We now present some results obtained with a preliminary implementation of the algorithm described above.
implicit shortest path constraints
110
6.6.1 Implementation details
Our method has been implemented in double precision Fortran 77 and C and has been run on a DECstation 3100 under Ultrix. This implementation has been very challenging, especially the part related the combinatorial strategy coming into play when the shortest paths are not unique at a potential solution. Indeed, remember that in this case, we must enumerate and examine all shortest paths combinations. First, we had to modify our shortest path algorithm in order to be able to compute how many shortest paths there were between two vertices, and compute the kth shortest path when needed, because it was obviously out of question to store them all. This has been achieved recursively in C language. Then we set up a combinatorial procedure to determine a no yet visited" shortest path combination again recursively in C without storing any further information. These C procedures were called from Fortran subroutines. A last di cult point has been to detect a change in the feasible region: this has been achieved by comparing temporary les containing active shortest path descriptions at di erent stages of the algorithm. Our program selects among possible active shortest paths in Step 3 by examining rst paths that di er as close to the destination as possible, ties being broken by considering vertices in their numbering order. The shortest path calculations are performed by Johnson's variant of Dijkstra's algorithm using a binary heap see Chapter 2 for its description.
6.6.2 Tests
We have selected a few problems whose graphs, shortest path constraints and a priori weights have been generated in di erent ways. The problems and their characteristics are summarized in Table 6.1. In this table, the heading vertices" refers to the number of vertices in the graph, graph type" indicates how the network is generated, weights" indicates how the a priori weights are chosen a layered choice means that subset of arcs were chosen with constant costs, corresponding to grid levels in the case of gridlike graphs, constraints" indicates how the shortest path constraints are chosen: either by choosing origins and destinations at random or by choosing them along the faces of the grids, when applicable.
Problem vertices m nI graph type weights constraints Example 9 11 12 See Section 6.4.4 P1 100 180 125 2D grid constant random + faces P2 181 504 100 2D grid constant faces P3 100 180 100 2D grid layered random P4 100 210 100 random random random P5 100 180 70 2D grid layered faces P6 181 504 100 2D grid layered faces P7 500 769 26 random layered random P8 300 860 400 random random random
Table 6.1: Test examples and their characteristics All these problems but P1 were solved in that a local minimum was found for all of them.
implicit shortest path constraints
111
The results of applying our pilot code to these problems is reported in Table 6.2. In this P table, i and nc = ij =1 nc refer to the number of iterations of the algorithm and the total number j of possible active shortest paths combinations respectively, as described in Section 6.4.4. The column comb" indicates how many of the nc paths combinations were e ectively examined by the algorithm before termination. The symbol , means that it has not been possible to solve problem P1 in less than a week on our workstation.
Problem Example P1 P2 P3 P4 P5 P6 P7 P8 2 4  109 2 48 1 0 1 0 1 0 2 0 1 0 1 0
i
nc comb
4 1 0 0 0 0 0 0
Table 6.2: Results for the test problems The following comments can be made on these results. 1. Many of our problems were solved with a single iteration i = 1 of our algorithm, but not all of them. However, the number of iterations remains small on these examples. 2. As expected, problems with randomly generated weights were solved in a single iteration. 3. As shown be the behaviour of the algorithm on problem P1, the combinatorial aspect of the method may practically appear. The development of better heuristics to improve the choice of the active paths combination seems therefore useful. Comments about P1's results are made below. 4. The detection of paths with equal costs is nontrivial in nite precision. We have chosen to consider all paths whose relative costs di er by at most one hundred times machine precision. Further consideration should probably be given to this potentially important stability issue. Many tests were performed on 2D grid graphs see Table 6.1. The results of Table 6.2 indicate more than one iteration of Algorithm 6.1 only when applied to these graphs, and particularly when the constraints are generated between faces of the grid. When generated at random, the shortest paths are very likely independent", in the sense that their explicit de nitions should share very few arcs, and should be far" from each other. As a consequence, activating such a constraint would not in uence the explicit de nition of other shortest path constraints. In contrast, as lots of shortest paths are de ned between faces of a grid, their explicit de nitions
implicit shortest path constraints
112
should share many arcs and be usually close" to each other, so that modifying arc weights on one of them may in turn modify the explicit de nition of proximate shortest paths very likely. Table 6.2 also indicate that the combinatorial nature of the problem showed up again only with grid graphs. This is due to the fact that the nonuniqueness of shortest paths relies on the density of the graph m=n: an increase in that value tends to reduce the nonuniqueness of the shortest paths. This trend in 2D grids has been analysed by Moser in his Ph. D. Thesis 87 . We report here one interesting result he mentioned in his report. Let us consider a 2D grid and denote its nodes by a double index i; j specifying their Cartesian position in the grid. The number of shortest paths from node 0; 0 to node i; j is designated by si; j . We choose the simplest 2D grid which is squared or crossruled like the streets of New York, where each link or arc have the same weight. For convenience, we do not mention the grid size and will refer to the entire grid by specifying i; j 0. In such grids, si; j can be recursively computed by the following relation:
si + 1; j + 1 = si + 1; j + si; j + 1;
with the initial condition that
for i; j 0
6:38 6:39
si; 0 = s0; j = 1;
for i; j 0:
Indeed, there are two shortest paths from i; j to i + 1; j + 1, one going through node i + 1; j and the other one through node i; j + 1. We can make use of an auxiliary function f x; y in order to solve 6.38 knowing 6.39:
f x; y def =
where
X
i;j 0
si; j xiy j
6:40 6:41 by 6:39 by 6:38
0 x; y 1: The function f x; y can be developed as follows.
f x; y = 1 + Pi1 si; 0xi + Pj 1 s0; j y j + Pi;j 1 si; j xiy j P P y x = 1 + 1,x + 1,y + i;j 1 si; j , 1xiy j + i;j 1 si , 1; j xiy j P P y x = 1 + 1,x + 1,y + y i1;j 0 si; j xiy j + x i0;j 1 si; j xiy j
y = 1 h 1,x + 1,y + + x i h i y f x; y , Pj0 s0; j y j + x f x; y , Pi0 si; 0xi = 1 + x + y f x; y
1 f x; y = 1,x,y P
We then have that
6:42 6:43
= k0 x + y k P P = k0 k=0 k xiy k,i i i P = i;j 0 i+j xi y j ; i
implicit shortest path constraints
113
b ! where a = a!bb,a! are the binomial coe cients, which represent the number of possible choices of choosing a items amongst a set of b items without distinguishing their ordre and without taking the same item twice. As a consequence 6.40 and 6.43 give si; j = i+j = i i+jj ! : 6:44 i !! It is easy to check that si; j will grow very rapidly even when i; j lies not far from 0; 0 for instance, s10; 10 = 184756. Moser mentioned similar results for 2D grids with higher densities. He observed that si; j decreases very rapidly when each node originates at an increasing number arcs. However, this does not mean that shortest path uniqueness is attained with high densities. Moser extrapolated his observations and conjectured that shortest paths are uniquely determined only for nodes on a few straight2 lines through 0; 0. This explains the potential nonuniqueness that is present in our grids, especially those built with constant arc weights. The resolution of problem P1 su ered from this potential nonuniqueness. Let us nally mention that amongst all generated problems not only those presented in this section problem P1 is the only one which presented a so strong combinatorial aspect nc 109 .
Grids generated by Moser include namely square diagonals which allows to reach several nodes by straight lines from 0; 0.
2
7 Conclusion and perspectives
This thesis deals with instances of the inverse shortest path problem. Signi cant questions arising in applied mathematics can be formulated as instances of this inverse problem. Chapter 1 showed the pertinence of such a formulation in problems arising in tra c modelling and seismic tomography. In a typical inverse shortest path problem we want to recover some attributes of a network, while knowing informations about the shortest paths between certain origindestination pairs from observing actual ow on the network. To make the solution unique, we assume that a set of attributes is a priori known, and we would like the solution to be as close as possible to these a priori known values. Thus, the objective function is to minimize the norm of the di erence between the solution and the known vector of attributes. The constraints assure that the shortest paths are the same or verify the same properties under the solution, and may represent other relationship between the paths or variables of the problem. Solving the problem require an algorithm solving the direct problem the shortest path problem and an algorithmic framework that depends on the choice of the norm for evaluating the proximity of the solution to the a priori values. Chapter 2 discussed the choice of a shortest algorithm with respect to the properties of graphs representing the networks under study. Johnson's algorithm was selected to solve the direct problem in our context. This method could eventually be combined with updating techniques for e ciency. Chapter 3 examined the quadratic programming context and nally preferred the Goldfarb and Idnani approach to solve our problem. The need to handle an exponential number of linear constraints involving much redundancy has guided this choice. In the uncorrelated inverse shortest path problem, the variables are the weights on the arcs and there is no correlation between them. In the correlated problem, however, the arcs are divided into classes, and the weights of the arcs in the same class are derived from the same value which is referred to as the class' density. Thus, the weights within each group are correlated, and the variables are actually the densities. Correlation of course implies more restriction and hence more constraints. In Chapter 4, the uncorrelated inverse shortest paths problem has been posed and a computational algorithm has been proposed for one of the many problem speci cations: the constraints are given as a set of shortest paths and nonnegativity constraints on the weights. The proposed algorithm has been programmed and run on a few examples, in order to prove the feasibility of
114
conclusion and perspectives
115
the approach. Chapter 5 provides a modi ed method for solving the inverse shortest path problem with correlated arc weights. To achieve this goal, we generalized the inverse shortest path method of Chapter 4 to take the desired correlation into account. We derived new expressions for the primal step and other quantities in the algorithm. We tested our new algorithm on a wide class of correlated problems and compared it with the original uncorrelated method. Finally, two possible strategies for handling constraints were considered and compared in this context. Finally, in Chapter 6, we have presented and motivated the inverse shortest path problem with upper bounds on shortest paths costs. These constraints may no longer be expressed as sets of linear constraints. The resulting feasible region may therefore be nonconvex. The NPcompleteness of nding a global solution of this problem has then been shown. An algorithm for local minimization has been presented, analyzed and tested on a few examples. In this thesis, we have supplied algorihms to solve the main common instances of the inverse shortest path problem. The possible extensions are many. New perspectives should concern the use of other norms in the objective function. The resulting stronger nonlinearity then would call for di erent approaches. Other types of constraint speci cations are also of obvious interest. The problem of recovering attributes of the second third, : : : shortest paths is also a challenging area for instance, this could help in retrieving the successive shortest time waves in seismic tomography. Further research could cover heuristics for active path selection and stability analysis. We are also interested in applying the algorithms discussed in this thesis to practical cases in tra c engineering and computerized tomography. Extensions or applications of the inverse shortest path problem have been informally proposed after several conferences given on the subject. A. Lucena London 1991 perceived an application of our method to the update of Lagrange multipliers in an algorithm solving the timedependent travelling salesman problem 80 . Another application suggested by S. Boyd August 1992 resides in nding or estimating transition probabilities in Markov chains given the maximum likehood path. The formulation of this last problem includes an objective function that does not depend on a usual norm, but involves a max" function. The objective function has the property of remaining convex in this particular de nition. The connection between this problem and ours can be viewed when identifying the vertices with states or events of a Markov chain, arcs with possible transitions, paths as sequences of states and arc weights with transition probabilities. We hope that still numerous such extensions will show up in future research.
A Symbol Index
This appendix is intended to help the reader in readily retrieving the meaning of a symbol used in Chapters 4 6.
Symbol Purpose De nition Section 4.1 4.7 Section 5.2.1 Section 5.2.1 Section 5.2.4 4.5, 5.17 4.7 Section 4.1 Section 4.1 Algo. 6.1 4.7 4.1 6.3 4.1 4.6 4.16 4.16 4.34, 5.77 Section 4.1 3.41 3.40 Section 4.1 3.33 Section 4.1, 5.5 Section 4.1, 5.87 4.1 4.23 Page 55 57 73 73 76 56, 76 57 55 55 103 57 55 97 55 56 59 59 63, 87 55 47 47 55 46 55, 74 55, 88 60 60
ak bi cl dl dl eq f h m n nic ni nE nI pj p0j q r s sk tc tf tk u wi wi yi z
kth arc of the oriented graph V ,A,w constant term in the linear constraint Ei x class number l of the set C density value of the class cl a priori expected value of dl qth vector of the canonical basis in RL
objective function of the minimizing problem number of constraints in the inverse shortest path problem number of arcs in A number of vertices in V number of shortest path combinations at ith iteration normal to the ith constraint Ei number of explicit shortest path constraints number of implicit shortest path constraints j th explicit path constraint any path with same origin and destination as pj index of the current violated constraint dual step direction primal step direction index of the vertex at origin of the arc ak its source vertex" steplength to satisfy the qth constraint the maximum steplength to preserve dual feasibility index of the vertex at end of the arc ak its target vertex" the vector of the Lagrange multipliers, i.e. the dual variables weight of the arc ai a priori expected value of wi ith component of the product N T nq intermediate vector to compute r by triangular substitution
116
symbol index
Symbol Purpose De nition
117
Page
A Ci D D+ D, D + l D , l Ei x F + l F ,l G H I I+ I, + I l I ,l J J+ J, L N N P w Pj Q R S U V VI VD VE X XI XD Y Y0 Z
set of active constraints Sect. 4.2.1 & 4.2.5 57, 61 the shortest path combination at iteration i Algo. 6.1 103 dependent on the class densities 5.29 80 the positive shore of D 5.30 80 the negative shore of D 5.31 80 set of dependent constraints involving the class cl in their positive 5.59, 5.62 84, 84 shore set of dependent constraints involving the class cl in their negative 5.59, 5.62 84, 84 shore general form of a constraint 4.7 57 union of the sets D+ l and I + l 5.63 85 union of the sets D, l and I ,l 5.63 85 Hessian matrix of the objective function 4.16 59 the reduced inverse Hessian of f in the subspace of points satisfying 4.9 57 the active constraints island on the arc costs, union of I + and I , Section 4.2.2 58 the positive shore of I Section 4.2.2 58 the negative shore of I Section 4.2.2 58 set of island constraints involving an arc belonging the class cl in 4.28, 5.61 61, 84 their positive shore set of island constraints involving an arc belonging the class cl in 4.28, 5.61 61, 84 their negative shore general constraint interpreted as two shores, whether it is an island Section 5.4.1 80 I or a dependent set D the positive shore of J Section 5.4.1 80 the negative shore of J Section 5.4.1 80 number of classes in C Section 5.2.1 73 matrix whose columns are the ni , the normals of the active Section 4.2.1 57 constraints MoorePenrose generalized inverse of N 4.8 57 Problem of minimizing f over F w 102 set of vertices attained by the path pj 4.10 58 orthogonal matrix such that N = QU 4.18 60 triangular factor matrix 4.17 59 set of non equality constraints such that the matching r compo3.39 47 nent is strictly positive triangular factor matrix such that N = QU 4.18 60 set of active constraints 4.2.2 59 set of currently active islands 5.52 83 set of currently active dependent sets 5.51 83 set of currently active equality constraints 5.86 88 set of the indices of the classes that are involved in active 4.26, 5.57 61, 84 constraints set of the indices of the classes that are involved in active islands 5.56 84 set of the indices of the classes that are involved in active 5.55 84 dependent set of the indices of the classes whose density is currently at a 4.25, 5.54 61, 84 bound set of the indices of the classes whose density is currently zero 5.53 83 set of the indices of the classes that are not involved in the active 4.27, 5.58 61, 84 constraints
symbol index
Symbol Purpose De nition Page 47 74 55 82 73 87 102 76 75 55 84 74 75 55 65 75 75 81 81 81
118
index of the constraint to drop out of the active set 3.40 index of the class with which the arc ai is associated 5.5 A set of the arcs of the graph, numbered from 1 to m Section 4.1 Bi set of class indices for computing N T nq 5.41 C set of classes numbered from 1 to L Section 5.2.1 D set of class indices for computing sT nq 5.79 F w convex feasible region determined at w 6.15 E set indexing the equality constraints 5.14 I set indexing the inequality constraints 5.13 V set of vertices in the graph, numbered from 1 to n Section 4.1 the bound value at which the class density dl is currently xed, 5.54 l when the bound is active proportion factor on the arc ai to determine the cost wi, when 5.5 i multiplied by the class density d`i il general coe cients de ning the constraints on the class densities 5.13, 5.14 j number of arcs in the path pj 4.1 number of current active constraints, i.e. jAj Section 4.2.7 constant term of the constraints on the class densities 5.13, 5.14 i no ;nd lower bound value on the weight of a shortest path 5.12 ,l D 1 if the class cl belongs to the dependent shore D, 0 otherwise 5.38 sum of the proportional factors i related to the arcs ai of l ,l I 5.37 belonging to the island shore I l the set of the arcs associated with the class cl 5.36
` `i
Bibliography
1 A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, AddisonWesley, Reading, MA, 1974. 2 S.S. Anderson, Graph theory and nite combinatorics, Markham Publishing Company, Chicago, 180 pp., 1970. 3 M. Avriel, Nonlinear Programming: Analysis and Methods, PrenticeHall, Inc., Englewood Cli s, NJ, 512 pp., 1976. 4 E.M.L. Beale, On Quadratic Programming", Naval Research Logistics Quarterly, vol. 6, pp. 227 244, 1959. 5 R. Bellman, On a routing problem", Quart. Appl. Math., vol. 16, pp. 87 90, 1958. 6 D.P. Bertsekas, A new algorithm for the assignment problem", Mathematical Programming, vol. 21, pp. 152 171, 1981. 7 D.P. Bertsekas, The Auction Algorithm for Assignment and Other Network Flows Problems: A Tutorial", INTERFACES, vol. 20:4, pp. 133 149, July August 1990. 8 D.P. Bertsekas, An auction algorithm for shortest paths", Lab. for Information and Decision Systems Report P2000, MIT, Cambridge, MA, 1990, revised February 1991. SIAM Journal on Optimization, to appear. 9 D.P. Bertsekas, Linear Network Optimization: Algorithms and Codes, The MIT Press, Cambridge, Massachusetts, 359 pp., 1991. 10 J.C.G. Boot, Notes on quadratic programming: the KuhnTucker and Theilvan de Panne conditions, degeneracy, and equality constraints", Management Science, vol. 8, No. 1, pp. 85 98, October 1961. 11 J.C.G. Boot, On trivial and binding constraints in programming problems", Management Science, vol. 8, pp. 419 441, 1962. 12 J.C.G. Boot, Quadratic Programming, NorthHolland Publishing Co., Amsterdam, 213 pp.,1964.
119
bibliography
120
13 P.H.L. Bovy and E. Stern, Route Choice: Way nding in Transport Networks, Kluwer Academic Publishers, Dordrecht, 1990. 14 D. Burton, Analyse et impl
mentation de m
thodes des plus courts chemins dans un e e r
seau urbain, Master's Thesis, Facult
s Universitaires NotreDame de la Paix, Namur, e e 1986. 15 D. Burton and Ph.L. Toint, On an instance of the inverse shortest paths problem", Mathematical Programming, vol. 53, pp. 45 61, 1992. 16 D. Burton and Ph.L. Toint, On the use of an inverse shortest paths algorithm for recovering linearly correlated costs", Mathematical Programming to appear, 1993. 17 D. Burton, B. Pulleyblank and Ph.L. Toint, The inverse shortest path problem with upper bounds on shortest path costs", Internal Report, FUNDP, submitted to ORSA Journal on Computing, 1993. 18 P.H. Calamai and A.R. Conn, A stable algorithm for solving the multifacility location problem involving Euclidian distances", SIAM Journal on Scienti c and Statistical Computing, vol. 4, pp. 512 525, 1980. 19 P.H. Calamai and A.R. Conn, A secondorder method for solving the continuous multifacility location problem", in: G.A. Watson, ed., Numerical Analysis: Proceedings of the Ninth Biennial Conference, Dundee, Scotland, Lectures in notes in Mathematics 912, SpringerVerlag Berlin, Heidelberg and New York, pp. 1 25, 1982. 20 P.H. Calamai and A.R. Conn, A projected Newton method for lp norm location problem", Mathematical Programming, vol. 38, pp. 75 109, 1987. 21 A. Cayley, On the theory of the analytical forms call trees", Philos. Mag., vol. 13, pp. 172 176, 1857. Mathematical Papers, Cambridge, vol. 3, pp. 242 246, 1891. 22 A.R. Conn, Constrained optimization using nondi erentiable penalty function", SIAM Journal of the Numerical Analysis, vol. 10, pp. 760 784, 1973. 23 A.R. Conn and J.W. Sinclair, Quadratic programming via a nondi erentiable penalty function", Department of Combinations and Optimization Research, University of Waterloo, Rep. CORR 7515, 1975. 24 S. Cook, The complexity of Theorem Proving Procedures", Proc. 3rd Ann. ACM Symp. on Theory of Computing, Association for Computing Machinery, New York, 151 158, 1971. 25 W.K. Chen, Applied graph theory, NorthHolland Publishing Company, 484 pp., 1971. 26 N. Christofides, Graph Theory. An algorithmic approach, Academic Press, London, 400 pp., 1975.
bibliography
121
27 G.B. Dantzig, Quadratic Programming, A Variant of the WolfeMarkowitz Algorithm, Research Report 2, Operations Research Center, University of California, Berkeley, 1961. 28 E.V. Denardo and B.L. Fox, Shortestroute methods: 1. reaching, pruning, and buckets", Operations Res., vol. 27, pp. 161 186, 1979. 29 N. Deo, Graph Theory with Applications to Engineering and Computer Science, PrenticeHall, Englewood Cli s, NJ, 478 pp., 1974. 30 N. Deo and C. Pang, ShortestPath Algorithms: Taxonomy and Annotation", Networks, vol. 14, pp. 275 323, 1984. 31 R.B. Dial, Algorithm 360: Shortest path forest with topological ordering", Commun. ACM, vol.12, pp. 632 633, 1969. 32 R.B. Dial, A probabilistic multipath tra c assignment model which obviates path enumeration", Transportation Research, vol. 5, pp. 83 111, 1971. 33 R.B. Dial, F. Glover, D. Karney and D. Klingman, A computational analysis of alternative algorithms and labeling techniques for nding shortest path trees", Networks, vol. 9, pp. 215 248, 1979. 34 E.W. Dijkstra, A note on two problems in connexion with graphs", Numerische Mathematik, vol. 1, pp. 269 271, 1959. 35 K.A. Dines and R.J. Lytle, Computerized geophysical tomography", Proc. IEEE, vol. 67, pp. 1065 1073, 1979. 36 R.M. Downs and D. Stea, Maps in minds, Harper and Row, New York, 1977. 37 S.E. Dreyfus, An appraisal of some shortest path algorithms", Operations Res., vol. 17, pp. 393 412, 1969. 38 S.M. Easa, Shortest route with movement prohibition", Transportation Research B, vol. 19, nr. 3, pp. 197 208, 1985. 39 L. Euler, Solutio problematis ad geometriam situs pertinentis", Comment. Academiae Sci. Imp. Petropolitanae, vol. 8, pp. 128 140, 1736. Opera Omnia Series, vol. I7, pp. 1 10, 1766. English translation in The Konigsgerg bridges", Sci. Amer., vol. 189, pp. 66 70, July 1953. 40 R. Fletcher, A general quadratic programming algorithm", Journal of the Institute of Mathematics and its Applications, vol. 7, pp. 76 91, 1971. 41 R. Fletcher, Practical Methods of Optimization, 2nd Edition, WileyInterscience, 436 pp., 1987.
bibliography
122
42 M. Florian, S. Nguyen and S. Pallottino, A Dual Simplex Algorithm for Finding all Shortest Paths", Networks, vol. 11, pp. 367 378, 1981. 43 R.W. Floyd, Algorithm 97: shortest path", Comm. ACM, vol. 5, p. 345, 1962. 44 L.R. Ford, Network ow theory", Report P923, The Rand Corporation, Santa Monica, CA, 1956. 45 L.R. Ford and D.R. Fulkerson, Flows in networks, Princeton University, Princeton, NJ, 1962. 46 M. Frank and P. Wolfe, An algorithm for quadratic programming", Naval Research Logistics Quarterly, vol. 3, pp. 95 110, 1956. 47 S. Fujishige, A note on the problem of updating shortest paths", Networks, vol. 11, pp. 317 319, 1981. 48 G. Gallo and S. Pallottino, Shortest path methods: A unifying approach", Mathematical Programming Study, vol. 26, pp. 38 64, 1986. 49 G. Gallo and S. Pallottino, Shortest path algorithms", Annals of Operations Research, vol. 13, pp. 3 79, 1988. 50 M.R. Garey and D.S. Johnson, Computers and intractability. A guide to the theory of NPCompleteness, W.H. Freeman and Company, San Fransisco, 1979. 51 J.A. George and J.W. Liu, Computer solution of large positive de nite systems, PrenticeHall, Englewood Cli s, 1981. 52 P.E. Gill and W. Murray, Numerically stable methods for quadratic programming", Mathematical Programming, vol. 14, pp. 349 372, 1978. 53 D. Goldfarb, Extension of Newton's method and simplex methods for solving quadratic programs", in: F.A. Lootsma, ed., Numerical methods for nonlinear optimization Academic Press, London, 1972, pp. 239 254, 1972. 54 D. Goldfarb, J. Hao and S.R. Kai, Shortest path algorithms using dynamic breadthrst search", Networks, vol. 21, pp. 29 50, 1991. 55 D. Goldfarb and A. Idnani, A Numerically Stable Dual Method for Solving Strictly Convex Quadratic Programs", Mathematical Programming, vol. 27, pp. 1 33, 1983. 56 G.H. Golub and C.F. van Loan, Matrix computations, North Oxford Academic, Oxford, 476 pp., 1983. 57 M. Gondran and M. Minoux, Graphes et algorithmes, 2nd
dition, Editions Eyrolles, e Paris, 546 pp., 1990. English translation by S. Vajda, Graphs and Algorithms, WileyInterscience, NY, 1984.
bibliography
123
58 A.S. Gon alves, A primaldual method for quadratic programming with bounded varic ables", in: F.A. Lootsma, ed., Numerical methods for nonlinear optimization Academic Press, London, 1972, pp. 255 263, 1972. 59 S. Goto, T. Ohtsuki and T. Yoshimura, Sparse matrix techniques for the shortest path problem", IEEE Trans. Circuits and Systems, CAS23, pp. 752 758, 1976. 60 F. Glover, R. Glover and D. Klingman, Computational study of an improved shortest path algorithm", Networks, vol. 14, pp. 25 , 1984. 61 W.R. Hamilton, Account of the icosian calculus", Proc. Roy. Irish. Acad., vol. 6, pp. 415 416, 1853 7. 62 D.Y. Handler and P.B. Mirchandani, Location on Networks: Theory and Algorithms, The MIT Press, Cambridge, Massachusetts, 233 pp., 1979. 63 F. Harary, Graph theory, AddisonWesley Publishing Company, 274 pp., 1972. 64 F. Harary, R.Z. Norman and D. Cartwright, Structural Models: An Introduction to the Theory of Directed Graphs, John Wiley & Sons, Inc., New York, 1965. 65 G.T. Herman, Image reconstruction from projections: the fundamentals of computerized tomography, Academic Press, New York, 1980. 66 D.B. Johnson, A note on Dijkstra's shortest path algorithm", J. Assoc. Comput. Mach., vol. 20, pp. 385 388, 1973. 67 D.B. Johnson, E cient algorithms for shortest paths in sparse networks", J. Assoc. Comput. Mach., vol. 24, pp. 1 13, 1977. 68 E.L. Johnson, On shortest paths and sorting", Proceedings of the 25th ACM Annual Conference, pp. 510 517, 1972. 69 R.M. Karp, On the computational complexity of combinatorial problems", Networks, vol. 5, pp. 45 68, 1975. 70 A. Kershenbaum, A note on nding shortest path trees", Networks, vol. 11, pp. 399400, 1981. 71 G. Kirchhoff, Uber die Au osung der Gleichungen, auf welche man bei der Untersuchung der linearen Verteilung galvanischer Strome gefuhrt wird", Ann. Phys. Chem., vol. 72, pp. 497 508, 1847. 72 D.E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching, AddisonWesley, Reading, MA, 1973. 73 H.W. Kuhn and A.W. Tucker Eds., Linear inequalities and related systems, Princeton University Press, Princeton, NJ, 1956.
bibliography
124
74 C.E. Lemke, The dual method for solving the linear programming problem", Naval Reasearch Logistics Quarterly, vol. 1, No. 1, 1954. 75 C.E. Lemke, A method of solution for quadratic programs", Management Science, vol. 8, pp. 442 453, 1962. 76 N.P. Loomba and E. Turban, Applied programming for management, Holt, Rinehart & Winston, Inc., 475 pp., 1974. 77 F.A. Lootsma, ed., Numerical methods for nonlinear optimization, Academic Press, London, 440 pp., 1972. 78 A.K. Louis, Computerized tomography. I: Physical background and mathematical modelling", Extended version of a conference, given in February 1984 at the Facult
s Universie taires NotreDame de la Paix, Namur Belgium, 1984. 79 A.K. Louis and F. Natterer, Mathematical problems of computerized tomography", Proc. IEEE, vol. 71, no 3, pp. 379 389, 1983. 80 A. Lucena, Timedependent traveling salesman problem  the deliveryman case", Networks, vol. 20, pp. 753 763, 1990. 81 D.G. Luenberger, Linear and nonlinear programming, 2nd. edition, AddisonWesley, Reading, MA, 491 pp., 1984. 82 M. Minoux, Programmation math
matique : th
orie et algorithmes tome 1, Dunod, Paris, e e 294 pp., 1983. English translation by S. Vajda, Mathematical Programming: Theory and Algorithms, WileyInterscience, NY. 83 M. Minoux and G. Bartnik, graphes, algorithmes, logiciels, Dunod, Paris, 428 pp., 1986. 84 P. Mirchandani and H. Soroush, Generalized Tra c Equilibrium with Probabilistic Travel Times and Perceptions", Transportation Science, vol. 21, no 3, pp. 133 152, 1987. 85 E.F. Moore, The shortest path through a maze", in Proceedings of the International Symposium on the Theory of Switching, Part II, 1957, Harvard University, Cambridge, MA, pp. 285 292, 1959. 86 T.J. Moser, Shortest path calculation of seismic rays", Geophysics, vol. 56, pp. 59 67, 1991. 87 T.J. Moser, The shortest path method for seismic ray tracing in complicated media", Ph.D. Thesis, Rijksuniversiteit Utrecht, 1992. 88 J.D. Murchland, A xed matrix method for all shortest distances in a directed graph and for inverse problems", Ph.D. Thesis, Karlsruhe University, 1970.
bibliography
125
89 G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization, A WileyInterscience Publication, John Wiley & Sons, 763 pp., 1988. 90 G. NeumannDenzau and J. Behrens, Inversion of seismic data using tomographical reconstruction techniques for investigations of laterally inhomogeneous media", Geophys. J. R. astr. Soc., vol. 79, pp. 305 315, 1984. 91 G. Nolet, ed., Seismic Tomography, D. Reidel Publishing Company, Dordrecht, 387 pp., 1987. 92 V.E. Outram and E. Thompson, Driver's perceived cost in route choice", Proceedings  PTRC Annual Meeting, London, pp. 226 257, 1978. 93 S. Pallottino, Shortest path methods: complexity, interrelations and new propositions", Networks, vol. 14, pp. 257 267, 1984. 94 U. Pape, Implementation and e ciency of Moore algorithms for the shortest route problem", Mathematical Programming, vol. 7, pp. 212 222, 1974. 95 A.R. Pierce, Bibliography on algorithms for shortest path, shortest spanning tree and related circuit routing problems", Networks, vol. 5, pp. 129 149, 1975. 96 M.J.D. Powell, On the quadratic programming algorithm of Goldfarb and Idnani", Mathematical Programming Study, vol. 25, pp. 45 61, 1985. 97 M.J.D. Powell, ZQPCVX, A Fortran subroutine for convex quadratic programming", Report DAMTP NA17, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK, 1983. 98 F.S. Roberts, Graph Theory and Its Applications to Problems of Society, SIAM, CBMSNSF, Regional Conference Series in Applied Mathematics, Philadelphia, Pennsylvania, 122 pp., 1978. 99 B. Roy, Algbre moderne et th
orie des graphes, Tome II, Dunod, Paris, 1970. e e 100 A. Sartenaer, On the application of the auction algorithm of Bertsekas for the search of shortest routes in a urban network", Technical Report 92 27, Facult
s Universitaires e NotreDame de la Paix, Namur, D
partement de Math
matique, 1991. e e 101 Y. Sheffi, Urban Transportation Networks, PrenticeHall, Englewood Cli s, 1985. 102 P.A. Steenbrink, Optimization of Transport Networks, Wiley, Bristol, 1974. 103 J. Stoer, On the numerical solution of constrained leastsquares problems", SIAM Journal on Numerical Analysis, vol. 8, No. 2, pp. 382 411, 1971. 104 A. Tarantola, Inverse problem theory. Methods for data tting and model parameter estimation, Elsevier, 1987.
bibliography
126
105 A. Tarantola and B.Valette, Generalized nonlinear inverse problems solved using the least square criterion", Reviews of Geophys. and Space Phys., vol. 20, pp. 219 232, 1982. 106 R.E. Tarjan, Complexity of combinatorial algorithms", SIAM Review, vol. 20, nr. 3, pp. 457 491, 1978. 107 R.E. Tarjan, Data Structures and Network Algorithms, SIAM, CBMSBSF, Regional Conference Series in Applied Mathematics, Philadelphia, 131 pp., 1983. 108 H. Theil and C. Van de Panne, Quadratic programming as an extension of classical quadratic maximization", Management Science, vol. 7, No. 1, pp. 1 20, October 1960. 109 C. Van de Panne and A. Whinston, The simplex and the dual method for quadratic programming", Operations Research Quarterly, vol. 15, pp. 355 389, 1964. 110 C. Van de Panne and A. Whinston, A comparison of two methods for quadratic programming", Operations Research, vol. 14, pp. 422 441, 1966. 111 J. Van Leeuwen, Ed., Algorithms and Complexity, Volume A of Handbook of Theoretical Computer Science, Elsevier, Asmterdam, and The MIT Press, Cambridge, Massachusetts, 996 pp., 1990. 112 D. Van Vliet, Improved shortest path algorithms for transport networks", Transportation Research, vol. 12, pp. 7 20, 1978. 113 S.A. Vavasis, Nonlinear Optimization: Complexity Issues, Oxford University Press, Inc., NY, 165 pp., 1991. 114 J.W.J. Williams, Algorithm 232: Heapsort", Comm. ACM, vol. 7, pp. 347 348, 1964. 115 P. Wolfe, The Simplex Method for Quadratic Programming", Econometrica, vol. 27, pp. 382 398, 1959. 116 J.H. Woodhouse and A.M. Dziewonski, Mapping the upper mantle: threedimensional modeling of Earth structure by inversion of seismic waveforms", Journal of Geophysical Research, vol. 89 B7, pp. 5953 5986, 1984. 117 J.Y. Yen, A shortest path algorithm, Ph.D. Thesis, University of California, Berkeley, 1970.