Rotecting national infrastructure such as airports, historical


particular, DOBSS obtains a decomposition scheme



tải về 0.64 Mb.
Chế độ xem pdf
trang5/13
Chuyển đổi dữ liệu27.06.2022
Kích0.64 Mb.
#52518
1   2   3   4   5   6   7   8   9   ...   13
2173-Article-Text-2832-1-10-20090226


particular, DOBSS obtains a decomposition scheme
by exploiting the property that follower types are
independent of each other. The key to the DOBSS
decomposition is the observation that evaluat ing
the leader strategy against a Harsanyi-transformed
game matrix is equivalent to evaluating against
each of the game matrices for the individual fol-
lower types. 
We first present DOBSS in its most intuitive form
as a mixed-integer quadratic program (MIQP); we
then illus trate how it may be transformed into a
linearized equivalent mixed-integer linear program
(MILP). While a more de tailed discussion of the
MILP is available in Paruchuri et al. (2008), the cur-
rent section may at least serve to explain at a high
level the key idea of the decomposition used in this
MILP. 
The model we propose explicitly represents the
actions by the leader and the optimal actions for
the follower types in the problem solved by the
agent. We denote by the leader’s policy (mixed
strategy), which consists of a vector of prob ability
distributions over the leader’s pure strategies.
Hence, the value x
i
is the proportion of times in
which pure strategy is used in the policy. We
denote by q
l
the vector of strate gies of follower type

∈ L. We also denote by and the index sets of
leader and follower l’s pure strategies, respec tively.
We also index the payoff matrices of the leader and
each of the follower types by the matrices R
l
and
C
l
. Let be a large positive number. Given a priori
probabilities p
l
with 
∈ L, of facing each follower
type, the leader solves problem 1. 
Here for a set of leader’s actions and actions for
each follower q
l
, the objective represents the
expected reward for the agent considering the a
priori distribution over different follower types p
l
.
Constraints with free indices mean they are repeat-
ed for all values of the index. For example, the
fourth constraint means x
i
∈ [0 ... 1] for all ∈ X.
The first and the fourth constraints define the set
of feasible solu tions as a probability distribution
over the set of actions X. The second and fifth con-
straints limit the vector of actions of follower type
l, q
l
to be a pure distribution over the set (that is
each q
l
has exactly one coordinate equal to one
and the rest equal to zero). Note that we need to
consider only the reward-maximizing pure strate-
gies of the follower types, since for a given fixed
mixed strategy of the leader, each follower type
faces a problem with fixed linear rewards. If a
mixed strategy is optimal for the follower, then so
are all the pure strategies in support of that mixed
strategy. 
The two inequalities in the third constraint
ensure that q
1
j
= 1 only for a strategy that is opti-
Articles
SPRING 2009 47
cc' 
cd' 
dc' 
dd' 

2
α + (1 – α), 1 
2, 
α 
4
α + (1 – α), (1 – α) 4α + 2(1 – α), 0 

α, (1 – α) 
α + 3(1 – α), 2(1 – α) 
3
α, 2α + (1 – α) 
3, 2 
max
, ,
x q a
l
ij
l
i
j
l
j Q

tải về 0.64 Mb.

Chia sẻ với bạn bè của bạn:
1   2   3   4   5   6   7   8   9   ...   13




Cơ sở dữ liệu được bảo vệ bởi bản quyền ©hocday.com 2024
được sử dụng cho việc quản lý

    Quê hương