Skip to content

Predicting Advertiser Bidding Behaviors In Sponsored Search

We study how an advertiser changes his/her bid prices in sponsored search, by modeling his/her rationality. Predicting the bid changes of advertisers with respect to their campaign performances is a key capability of search engines, since it can be used to improve the offline evaluation of new advertising technologies and the forecast of future revenue of the search engine. Previous work on advertiser behavior modeling heavily relies on the assumption of perfect advertiser rationality; however, in most cases, this assumption does not hold in practice. Advertisers may be unwilling, incapable, and/or constrained to achieve their best response. In this paper, we explicitly model these limitations in the rationality of advertisers, and build a probabilistic advertiser behavior model from the perspective of a search engine. We then use the expected payoff to define the objective function for an advertiser to optimize given his/her limited rationality. By solving the optimization problem with Monte Carlo, we get a prediction of mixed bid strategy for each advertiser in the next period of time. We examine the effectiveness of our model both directly using real historical bids and indirectly using revenue prediction and click number prediction. Our experimental results based on the sponsored search logs from a commercial search engine show that the proposed model can provide a more accurate prediction of advertiser bid behaviors than several baseline methods.

Advertiser modeling, rationality, sponsored search, bid prediction. [Information Systems]: Information Storage and Retrieval - On-line Information Services

Sponsored search has become a major means of Internet monetization, and has been the driving power of many commercial search engines. In a sponsored search system, an advertiser creates a number of ads and bids on a set of keywords (with certain bid prices) for each ad. When a user submits a query to the search engine, and if the bid keyword can be matched to the query, the corresponding ad will be selected into an auction process. Currently, the Generalized Second Price (GSP) auction [10] is the most commonly used auction mechanism which ranks the ads according to the product of bid price and ad click probability1 and charges an advertisers if his/her ad wins the auction (i.e., his/her ad is shown in the search result page) and is clicked by users [13].

Generally, an advertiser has his/her goal when creating the ad campaign. For instance, the goal might be to receive 500 clicks on the ad during one week. However, the way of achieving this goal might not be smooth. For example, it is possible that after one day, the ad has only received 10 clicks. In this case, in order to improve the campaign performance, the advertiser may have to increase the bid price in order to increase the opportunity for his/her ad to win future auctions, and thus to increase the chance for the ad to be presented to users and to be clicked.2

Predicting how the advertisers change their bid prices is a key capability of a search engine, since it can be used to deal with the so-called second order effect in online advertising [13] when evaluating novel advertising technologies and forecasting future revenue of search engines. For instance, suppose the search engine wants to test a novel algorithm for bid keyword suggestion3 [7]. Given that the online experiments are costly (e.g., unsuccessful online experiments will lead to revenue loss of the search engine), the algorithm will usually be tested based on the historical logs first to see its ef-

This work was performed when the first and the third authors were interns at Microsoft Research Asia.

1Usually a reserve score is set and the ads whose scores are greater than the reserve score are shown.

2Note that the advertiser may also choose to revise the ad description, bid extra keywords, and so on. However, among these actions, changing the bid price is the simplest and the most commonly used method by advertisers. Please also note that since GSP is not incentive compatible, advertisers might not bid their true values and changing bid prices is their common behaviors.

3The same thing will happen when we evaluate other algorithms like traffic estimation, ad click prediction, and auction mechanism.

fectiveness (a.k.a., offline experiment). However, in many cases, even if the algorithm works quite well in offline experiment, it may perform badly after being deployed online. One of the reasons is that the advertisers might change their bid prices in response to the changes of their campaign performances caused by the deployed new algorithm. Therefore, the experiments based on the historical bid prices will be different from those on online traffic. To tackle this problem, one needs a powerful advertiser behavior model to predict the bid price changes.

In the literature, there have been a number of researches [4] [5] [22] [19] [2] [17] [3] that model how advertisers determine their bid prices, and how their bid strategies influence the equilibrium of the sponsored search system. For example, Varian [19] assumes that the advertisers bid the amount at which their value per click equals the incremental cost per click to maximize their utilities. The authors of [2] and [17] study how to estimate value per click, by assuming advertisers are on the locally envy-free equilibrium, and assuming the distributions of all the advertisers’ bids are independent and identically distributed.

Most of the above researches rely highly on the assumptions of perfect advertiser rationality and full information access4 , i.e., advertisers have good knowledge about their utilities and are capable of effectively optimizing the utilities (i.e., take the best response). However, as we argue in this paper, this is usually not true in practice. In our opinion, real-world advertisers have limitations in accessing the information about their competitors, and have different levels of rationality. In particular, an advertiser may be unwilling, incapable, or constrained to achieve his/her “best response.” As a result, some advertisers frequently adjust the bid prices according to their recent campaign performances, while some other advertisers always keep the bid unchanged regardless of the campaign performances; some advertisers have good sense of choosing the appropriate bid prices (possibly with the help of campaign analysis tools [14] or third-party ad agencies), while some other advertisers choose bid prices at random.

To better describe the above intuition, we explicitly model the rationality of advertisers from the following three aspects:

  • Willingness represents the propensity an advertiser has to optimize his/her utility. Advertisers who care little about their ad campaigns and advertisers who are very serious about the campaign performance will have different levels of willingness.
  • Capability describes the ability of an advertiser to estimate the bid strategies of his/her competitors and take the bestresponse action on that basis. An experienced advertiser is usually more capable than an inexperienced advertiser; an advertiser who hires professional ad agency is usually more capable than an advertiser who adjusts bid prices by hisself/herself.
  • Constraint refers to the constraints that prevent an advertiser from adopting a bid price even if he/she knows that this bid price is the best response for him/her. The constraint usually (although not only) comes from the lack of remaining budget.

With the above notions, we propose the following model to describe how advertisers change their bid prices, from the perspective of the search engine.5 First, an advertiser has a certain probability to optimize his/her utility or not, which is modeled by the willingness function. Second, if the advertiser is willing to make changes, he/she will estimate the bid strategies of his/her competitors. Based on the estimation, he/she can compute the expected payoff (or utility) and use it as an objective function to determine his/her next bid price. This process is modeled by the capability function. By simultaneously considering the optimization processes of all the advertisers, we can effectively compute the best bid prices for every advertiser. Third, given the optimal bid price, an advertiser will check whether he/she is able to adopt it according to some constraints. This is modeled by the constraint function.

Please note that the willingness, capability, and constraint functions are all parametric. By fitting the output of our proposed model to the real bid change logs (obtained from commercial search engines), we will be able to learn these parameters, and then use the learned model to predict the bid behavior change in the future. We have tested the effectiveness of the proposed model using real data. The experimental results show that the proposed model can predict the bid changes of advertisers in a more accurate manner than several baseline methods.

To sum up, the contributions of our work are listed as below. First, to the best of our knowledge, this is the first advertiser behavior model in the literature that considers different levels of rationality of advertisers. Second, we model advertiser behaviors using a parametric model, and apply machine learning techniques to learn the parameters in the model. This is a good example of leveraging machine learning in game theory to avoid its unreasonable assumptions. Third, our proposed model leads to very accurate bid prediction. In contrast, as far as we know, most of previous research focuses on estimating value per click, but not predicting bid prices. Therefore, our work has more direct value to search engine, given that bid prediction is a desired ability of search engine as aforementioned.

The rest of the paper is organized as the following. In Section 2, we introduce the notations and describe the willingness, capability, and constraint functions. We present the framework of the bid strategy prediction model in Section 3. In Section 4, we introduce the efficient numerical algorithm of the model. In Section 5, we present the experimental results on real data. We summarize the related work in Section 6, and in the end we conclude the paper and present some insights about future work in Section 7.

As mentioned in the introduction, how an advertiser adjusts his/her bid is related to his/her rationality. In our opinion, there are three aspects to be considered when modeling the rationality of an advertiser: willingness, capability, and constraint. In this section, we introduce some notations for sponsored search auctions, and then describe the models for these rationality aspects.

We consider the keyword auction in sponsored search. For simplicity, we will not consider connections between different ad campaigns and we assume each advertiser only has one ad and bids on just one keyword for it. That is, the auction participants are the keyword-ad pairs. Advertisers are assumed to be risk-neutral.6

4 Please note that some of these works take a Bayesian approach; however, they still assume that the priors of the value distributions are publicly known.

5That is, the model is to be used by the search engine to predict advertisers’ behavior, but not by the advertisers to guide their bidding strategies.

6This assumption will result in a uniform definition of utility functions for all the advertisers. However, our result can be naturally

We use i ( i=1,,Ii = 1, \dots, I ) to index the advertisers, and consider advertiser l as the default advertiser of our interest. Suppose in one auction the advertisers compete for J ad slots. In practice, the search engine usually introduces a reserve score to optimize its revenue. Only those ads whose rank scores are above this reserve score will be shown to users. To ease our discussion, we regard the reserve score r as a virtual advertiser in the auction. We use ai,ja_{i,j} to denote the click-through rate (CTR) of advertiser i’s ad when it is placed at position j. Similar to the setting in [2][17], we assume ai,ja_{i,j} to be separable. That is, ai,j=γiαja_{i,j} = \gamma_i \alpha_j , where γi\gamma_i is the ad effect and αj\alpha_j is the position effect. We let αj=0\alpha_j = 0 when j > J. The sponsored search system will predict the click probability [11] of an ad and use it as a factor to rank the ads in the auction. We use sis_i to denote the predicted click probability of advertiser i’s ad if it is placed in the first ad slot. Note that both ai,ja_{i,j} and sis_i are random variables [2], since they may be influenced by many dynamic factors such as the attributes of the query and the user who issues the query.

We assume all the advertisers share the same bid strategy space Ω\Omega which consists of B different discrete bid prices denoted by bi,i=1,,Bb_i, i=1,\cdots,B . Furthermore, we denote the strategy of advertiser l as πl=(πl,1,,πl,B)\pi_l=(\pi_{l,1},\cdots,\pi_{l,B}) , which is a mixed strategy. It means that l will use bid strategy bib_i with a probability of πl,i,i=1,,B\pi_{l,i}, i=1,\cdots,B . We assume advertiser l will estimate both the configuration of his/her competitors and their strategies in order to find his/her own best response. We use S\mathbb S (including l) to indicate the set of advertisers who are regarded by advertiser l as the participates of the auction and use Sl\mathbb S_{-l} (excluding l) to indicate the set of competitors of l. We denote πi(l)\pi_i^{(l)} as l’s estimated bid strategy for a competitor i ( ili \neq l ), and denote l’s own best-response strategy as πl(l)\pi_l^{(l)} .

Note that both S\mathbb S and πi(l)(il)\pi_i^{(l)}(i\neq l) are random: (i) S\mathbb S is a random set due to the uncertainty in the auction process: a) the participants of the auction is dynamic [17]; b) in practice l never knows exactly who are competing with him/her since such information is not publicly available. (ii) πi(l)(il)\pi_i^{(l)}(i\neq l) is a random vector due to l’s incomplete information and our uncertainty on l’s estimation. More intuitions about πi(l)\pi_i^{(l)} will be explained in the modeling of the capability function (see Section 2.3).

To ease our discussion, we now transform the uncertainty of S\mathbb S to the uncertainty in bid prices, as shown below. That is, we regard all the other advertisers as the competitors of l and add the zero bid price (denoted by b0b_0 ) to extend the bid strategy space. The extended bid strategy space is represented by Ω=Ω{b0}\Omega^* = \Omega \bigcup \{b_0\} . If an advertiser is not a real competitor of l, we regard his/her bid price to be zero. According to the above discussion, S\mathbb S will be the whole advertiser set with the set size I. Thus, we will only consider the uncertainty of bid prices in the rest of the paper.

Willingness represents the propensity an advertiser is willing to optimize his/her utility, which is modeled as a possibility. We model willingness as a logistic regression function Wl(xt(l))W_l(\boldsymbol{x}_t^{(l)}) . Here the input xt(l)=(xt,1(l),,xt,H(l))\boldsymbol{x}_t^{(l)} = (x_{t,1}^{(l)}, \cdots, x_{t,H}^{(l)}) is a feature vector (H is the number of features) extracted for advertiser l at period t, and the output is a real number in [0,1] representing the probability that l will optimize his/her utility. That is, advertiser l with feature vector xt(l)\boldsymbol{x}_t^{(l)}

extended to the case where advertisers’ different risk preferences are considered.

will have a probability of Wl(xt(l))W_l(\boldsymbol{x}_t^{(l)}) to optimize his/her utility, and a probability of 1Wl(xt(l))1 - W_l(\boldsymbol{x}_t^{(l)}) to take no action.

In order to extract the feature vector xt(l)\boldsymbol{x}_{t}^{(l)} , we split the historical auction logs into T periods (e.g., T days). For each period tt \in T,yt(l)T,\,y_t^{(l)} indicates whether the bid was changed in period t+1. If the bid was changed, yt(l)=1y_t^{(l)}=1 ; otherwise, yt(l)=0y_t^{(l)}=0 . With this data, the following features are extracted: (i) The number of bid changes before t. The intuition is that an advertiser who changes bid more frequently in the past will also have a higher possibility to make changes in the next period. (ii) The number of periods that an advertiser has kept the bid unchanged until t. Intuitively, an advertiser who has kept the bid unchanged for a long time may have a higher possibility to continue keeping the bid unchanged. (iii) The number of different bid values used before t. The intuition is that an advertiser who has tried more bid values in the past may be regarded as a more active bidder, and we may expect him/her to try more new bid values in the future. (iv) A Boolean value indicating whether there are clicks in t. The intuition is that if there is no click, the advertiser will feel unsatisfied and thus have a higher probability to make changes.

With the above features, we write the willingness function as,

Wl(xt(l))=11+e{β0(l)+n=1Hβn(l)xt,n(l)}, (t=1,,T).W_l(\boldsymbol{x}_t^{(l)}) = \frac{1}{1 + e^{\{\beta_0^{(l)} + \sum_{n=1}^H \beta_n^{(l)} x_{t,n}^{(l)}\}}}, \ (t = 1, \cdots, T).

Here mβ(l)=(eta0(l),,etaH(l))m{\beta}^{(l)}=(eta_0^{(l)},\cdots,eta_H^{(l)}) is the parameter vector for l.

To learn the parameter vector β(l)\boldsymbol{\beta}^{(l)} , we minimize the sum of the first-order error t=1Tyt(l)Wl(xt(l))\sum_{t=1}^{T}|y_t^{(l)}-W_l(\boldsymbol{x}_t^{(l)})| on the historical data using the classical Broyden-Fletcher-Goldfarb-Shanno algorithm (BFGS) [15]. Then we apply the learned parameter β(l)\boldsymbol{\beta}^{(l)} to predict l’s willingness of change in the future.

Capability describes the ability of an advertiser to estimate the bid strategies of his/her competitors and take the best-response action on that basis. A more experienced advertiser may have better capability in at least three aspects: information collection, utility function definition, and utility optimization. Usually, in GSP auctions, a standard utility function is used and the optimal solution is not hard to obtain. Hence, we mainly consider the capability in information collection, i.e., the ability in estimating competitors’ bid strategies

Recalling that l does not have any exact information on his/her competitors’ bids, it is a little difficult to model how advertiser l estimates his/her competitors’ strategies, because different l has different estimation techniques. Before introducing the detailed model for the capability function, we would like to briefly describe our intuition. It’s reasonable to assume that l’s estimation on i is based on i’s market performance, denoted by PerfiPerf_i . Then we can write l’s estimation as Estl(Perfi)Est_l(Perf_i) , which means l applies some specific estimation technique EstlEst_l on PerfiPerf_i . The market performance PerfiPerf_i is decided by all the advertisers’ bid profiles due to the auction property. That is, Perfi=Perfπi(πi)Perf_i = Perf_{\pi_{-i}^*}(\pi_i^*) , here πi\pi_i^* is i’s historical bid histogram. Note that we use πi\pi_i^* and πi\pi_{-i}^* because we believe the observed market performance PerfiPer f_i is based on the auctions during a previous period, while not just one previous auction. However, we are mostly interested in profitable keywords, the auctions of which usually have so many advertisers involved that πi\pi_{-i}^* can be regarded as a constant environment factor for any i. Therefore, PerfiPerf_i only depends on πi\pi_i^* , i.e., Perfi=Perf(πi)Perf_i = Perf(\pi_i^*) .

<sup>7Note that “willing to optimize” does not always mean a change of bid. Probably, an advertiser attempts to optimize his/her utility, but finally finds that his/her previous bid is already the best choice. In

this case, he will keep the bid unchanged but we still regard it as “willing to optimize.”

Thus, we have Estl(Perfi)=Estl(Perf(πi))Est_l(Perf_i) = Est_l(Perf(\pi_i^*)) . Till now, the problem becomes much easier: l is blind to πi\pi_i^* , but the search engine has all the information of πi\pi_i^* . To know Estl(Perfi)Est_l(Perf_i) , the search engine only needs to model the function Estl(Perf())Est_l(Perf(\cdot)) given that πi\pi_i^* is known.

Specifically, we denote the above Estl(Perf(πi))Est_l(Perf(\pi_i^*)) as our capability function Al(πi)A_l(\pi_i^*) . As described in Section 2.1, Al(πi)A_l(\pi_i^*) is denoted by πi(l)\pi_i^{(l)} . The reason that AlA_l is named as capability function is clear: EstlEst_l , the techniques l uses for estimation, reflects his/her capability. The reason that πi(l)=Al(πi)\pi_i^{(l)} = A_l(\pi_i^*) is modeled to be random is also clear: the search engine does not know what EstlEst_l , and thus aspired by the concept of “type” in Bayesian Game [12] which is a description of incomplete game setting, we regard EstlEst_l as a “type” of l and model its distribution. For the same πi\pi_i^* , different advertisers may have different estimations according to their various capabilities.

To simplify our model, we give the following assumption on πi(l)\pi_i^{(l)} . We assume that l’s estimations on other advertisers’ bid strategies are all pure strategies. That is, πi(l)\pi_i^{(l)} is a random Boolean vector with just one element equal to 1.8

Given a bid bnb_n with possibility πi,n\pi_{i,n}^* from the historical bid histogram πi\pi_i^* , we assume l’s estimation has a fluctuation around bnb_n . The fluctuation can be modeled by a certain probability distribution such as Binomial distribution or Poisson distribution. The parameters of the distribution can be used to indicate l’s capability. Here we use Binomial distribution to model the fluctuation due to the following reasons: (i) Theoretically, Binomial distribution can conveniently describe the discrete bids due to its own discrete nature. Furthermore, the two parameters in Binomial distribution can well reflect the capability levels: the trail times N can control the fluctuation range (N=0 means a perfect estimation) and the success possibility δ(0,1)\delta \in (0,1) can control the bias of the estimations. Specifically, if δ>0.5\delta > 0.5 , it means the estimation is on average larger than the true distribution and vice versa. (ii) Experimentally, we have compared Binomial distribution with some other well-known distributions such as Gaussian, Poisson, Beta, and Gamma distributions, and the experiment results show that Binomial distribution performs the best in our model.

For sake of simplicity, we let the fluctuation range be an integer 2NlΩ2N_l \in \Omega , and the success possibility be δl(0,1)\delta_l \in (0,1) . Then (Nl,δl)(N_l, \delta_l) are l’s capability parameters. The fluctuation on bnb_n in πi\pi_i^* is modeled by

Pr(Al(bn)=bn+m)=πi,n(2NlNl+m)δl(Nl+m)(1δl)(Nlm),Pr(A_l(b_n) = b_{n+m}) = \pi_{i,n}^* \binom{2N_l}{N_l + m} \delta_l^{(N_l + m)} (1 - \delta_l)^{(N_l - m)},

(m=Nl,...,Nl).(m = -N_l, ..., N_l).

In the above formula, ili \neq l ; the symbol ”=” means the equivalence of strategy; (2NlNl+m)\binom{2N_l}{N_l+m} is the number of (Nl+m)(N_l+m) -combinations in a set with 2Nl2N_l integers. Therefore, by considering all the bid values in πi\pi_i^* , we have,

Pr(πi(l)=bn)=Pr(Al(πi)=bn)Pr(\boldsymbol{\pi}_{i}^{(l)} = b_{n}) = Pr(A_{l}(\boldsymbol{\pi}_{i}^{*}) = b_{n})

=m=NlNlπi,nm(2NlNl+m)δl(Nl+m)(1δl)(Nlm).= \sum_{m=-N_{l}}^{N_{l}} \boldsymbol{\pi}_{i,n-m}^{*} \binom{2N_{l}}{N_{l}+m} \delta_{l}^{(N_{l}+m)} (1-\delta_{l})^{(N_{l}-m)}.

Constraint refers to the factor that prevents an advertiser from adopting a bid price even if he/she knows that this bid price is the

best response for him/her. In practice, many factors (such as lack of remaining budget and the aggressive/conservative character of the advertiser) may impact advertiser’s eventual choices. For example, an advertiser who lacks budget or has conservative character may prefer to bid a lower price than the best response.

We model constraint using a function ClC_l , which translates the best response (which may be a mixed strategy) to the final strategy with step (a.k.a., difference) ct(l)c_t^{(l)} . That is, if the best bid strategy is πl(l)\pi_l^{(l)} at period t, then Cl(πl(l))C_l(\pi_l^{(l)}) will be bn+ct(l)b_n + c_t^{(l)} with probability πl,n(l)\pi_{l,n}^{(l)} . Similar to the proposal in the willingness function, we model the step ct(l)c_t^{(l)} using a regression model. The difference is that this time we use linear regression since ct(l)c_t^{(l)} is in nature a translation distance but not a probability. Here we use the remaining budget as the feature xt(l)x_t^{(l)} and build the following function form:

ct(l)=β1,l+β2,lxt(l)x(l)x(l),wherex(l)=tTxt(l)T.c_t^{(l)} = \beta_{1,l} + \beta_{2,l} \frac{x_t^{(l)} - \overline{x}^{(l)}}{\overline{x}^{(l)}}, where \, \overline{x}^{(l)} = \frac{\sum_{t \in T} x_t^{(l)}}{|T|}.

In the above formula, T is the set of periods for training and xt(l)x_t^{(l)} is l’s remaining budget in period t. In the training data, we use (n=1Bbnπl,n(l))bt(l)(\sum_{n=1}^B b_n \pi_{l,n}^{(l)}) - b_t^{(l)} as the label for ct(l)c_t^{(l)} . Here bt(l)b_t^{(l)} is l’s real bid at period t; β1,l\beta_{1,l} and β2,l\beta_{2,l} are the parameters for the linear regression. Note that β1,l\beta_{1,l} is only related to l himself/herself. This parameter reveals l’s internal character on whether he/she is aggressive or not. One can intuitively imagine that for aggressive advertisers, β1,l\beta_{1,l} will be positive because such advertisers are radical and they would like to overbid. Moreover, we normalize the budget in the formula because the amounts of budget vary largely across different advertisers. The normalization will help to build a uniform model for all advertisers.

After explaining the advertiser rationality in terms of willingness, capability, and constraint, we introduce a new advertiser behavior model.

Suppose advertiser l has a utility function UlU_l . The inputs of UlU_l are l’s estimations on his/her competitors’ bid strategies, which are given by the capability function AlA_l . The goal of advertiser l is to find a mixed strategy πl(l)\pi_l^{(l)} to maximize this utility, i.e.,

argmaxπl(l)Ul(Al(πi),i=1,,I)\arg \max_{\boldsymbol{\pi}_{l}^{(l)}} U_{l}(A_{l}(\boldsymbol{\pi}_{i}^{*}), i = 1, \cdots, I)

=argmaxπl(l)Ul(πi(l),i=1,,I,il).= \arg \max_{\boldsymbol{\pi}_{l}^{(l)}} U_{l}(\boldsymbol{\pi}_{i}^{(l)}, i = 1, \cdots, I, i \neq l).

If we further consider the changing possibility WlW_l , the constraint function ClC_l , and the randomness of AlA_l , we can get the general advertiser behavior model that explains how advertiser l may determine his/her bid strategy for the next period of time:

πl=WlEAl(Cl(argmaxπl(l)Ul(πi(l),i=1,,I,il)))+(1Wl)(0,...0,1,0...0)T,l.\boldsymbol{\pi}_{l} = W_{l} E_{A_{l}} (C_{l}(\arg \max_{\boldsymbol{\pi}_{l}^{(l)}} U_{l}(\boldsymbol{\pi}_{i}^{(l)}, i = 1, \cdots, I, i \neq l))) + (1 - W_{l})(0, ...0, 1, 0...0)^{T}, \forall l. (1)

Here (0,..0,1,0…0) is the unchanged B-dimension bid strategy where the index of the one (and the only one) equals n if the bid in the previous period is bnb_n . “arg max” outputs a B-dimension mixed strategy of l; EAlE_{A_l} means the expectation on the randomness of Al(πj)A_l(\pi_j^*) ; WlW_l is the possibility that l decides to optimize his/her utility.

We want to emphasis that equation (1) is a general expression under our rationality assumptions. Though we have provided the details of the model in Section 2 about WlW_l , AlA_l , ClC_l and we will <sup>8Our model can be naturally extended to the mixed strategy cases, with a bit more complicated notations and computing algorithms.

introduce the details about UlU_l in the next subsection, one can certainly propose any other forms of the model for all these functions.

To make the above model concrete, we need to define and calculate the utility function UlU_l for every advertiser.

Recall our assumption that πi(l)=Al(πi)\pi_i^{(l)} = A_l(\pi_i^*) is a pure strategy; that is, only one element in πi(l)\pi_i^{(l)} is one and all the other elements are zeros. Suppose the bid value that corresponds to the “one” in πi(l)\pi_i^{(l)} is oio_i ( oiΩo_i \in \Omega^* and ili \neq l ). In this case, the bid configuration is o=(o1,,oI)\mathbf{o} = (o_1, \cdots, o_I) , in which all the advertisers’ bids are fixed. Please note that the representations in terms of oio_i and the original representations in term of πi(l)\pi_i^{(l)} are actually equivalent to each other, since they encode exactly the same information and its randomness in the bid strategies of advertiers.

Then we introduce the form of UlU_l . Based on the bid prices in o and ad quality scores sis_i (i=1,,I)(i=1,\cdots,I) , we can determine the ranked list in the auction according to the commonly used ranking rules (i.e., the product of bid price and ad quality score [13]) in sponsored search. Suppose l is ranked in position j and l^\hat{l} is ranked in position j+1. According to the pricing rule in the generalized second price auction (GSP) [10], l should pay oisi/slo_i s_i / s_l for each click. As defined in Section 2.1, the possibility for a user to click l’s ad in position j is alj=γlαja_{lj} = \gamma_l \alpha_j . Suppose the true value of advertiser l for a click is vlv_l (which can be estimated using many techniques, e.g., [9]), then we have,

Ul=Eγl,αj,sl^,sl{(γlαj(vlol^sl^sl))}=γlαj(vlol^sl^sl).U_l = E_{\gamma_l,\alpha_j,s_{\hat{l}},s_l} \{ (\gamma_l \alpha_j (v_l - \frac{o_{\hat{l}} s_{\hat{l}}}{s_l})) \} = \overline{\gamma}_l \overline{\alpha}_j (v_l - \frac{o_{\hat{l}} s_{\hat{l}}}{\overline{s}_l}).

As explained in Section 2, γl\gamma_l , αj\alpha_j , sl^s_{\hat{l}} , sls_l are all random variables. Here γl\overline{\gamma}_l , αj\overline{\alpha}_j , sl^\overline{s}_{\hat{l}} , and sl\overline{s}_l are their means. Since UlU_l is linear and the above four random variables are independent of each other, the outside expectation can be moved inside and substituted by the corresponding means.

With all the above discussions, we are now ready to give the final form of the advertiser model. By denoting ol=(o1,,ol1,ol+1,,oI)\mathbf{o}_{-l}=(o_1,\cdots,o_{l-1},o_{l+1},\cdots,o_I) as the bid configuration without l’s bid, we get the following expression for l=1,,Il=1,\cdots,I :

πl=WlEol{Cl[argmaxπl(l)(γlαj(vlolsl/sl))]}\boldsymbol{\pi}_{l} = W_{l} E_{\boldsymbol{o}_{-l}} \{ C_{l} [\arg \max_{\boldsymbol{\pi}_{l}^{(l)}} (\overline{\gamma}_{l} \overline{\alpha}_{j} (v_{l} - o_{l} \overline{s}_{l} / \overline{s}_{l}))] \}

+(1Wl)(0,...0,1,0...0)T.+ (1 - W_{l}) (0, ...0, 1, 0...0)^{T}.

Here the randomness of AlA_l is specifically expressed by the randomness of oldsymbolololdsymbol{o}_{-l} .

Note that γl\overline{\gamma}_l is a constant for l and it will not affect the result of “arg max”. Therefore we can remove it from the above expression to further simplify the final model:

πl=WlEol{Cl[argmaxπl(l)(vlol^sl^/sl))]}\pi_{l} = W_{l} E_{o_{-l}} \{ C_{l} [\arg \max_{\pi_{l}^{(l)}} (v_{l} - o_{\hat{l}} \overline{s}_{\hat{l}} / \overline{s}_{l}))] \}

+(1Wl)(0,...,1,0...0)T,l.+ (1 - W_{l}) (0, ..., 1, 0...0)^{T}, \forall l. (2)

In this section we introduce an efficient algorithm to solve the advertiser model proposed in the previous sections. To ease our discussion, we assume that the statistics αj\overline{\alpha}_j , si\overline{s}_i , and sl\overline{s}_l are all known (with sufficient data and knowledge about the market). Furthermore, we assume that the search engine can effectively estimate the true value vlv_l in (2). Considering the setting of our problem, we choose to use the model in [9] for this purpose.

initialize o = (o_1, \cdots, o_I) = (0, 0, \cdots, 0)
for i = 1, ..., I,
f = \text{random}();
// \text{random}() uniformly outputs a random float number in [0,1].
sum = 0;
n = 0;
while (sum < f)
sum = sum + P(O_i = b_n);
n = n + 1;
end;
o_i = b_n;
end;
output o;

Our discussions in this section will be focused on the computational challenge to obtain the best response for all the cases of bid configurations o (corresponding to olo_{-l} in (2)). This is a typical combinatorial explosion problem with a complexity of BIB^I , which will increase exponentially with the number of advertisers. Therefore, it is hard to solve the problem directly. Our proposal is to adopt a numerical approximation instead of giving an accurate solution to the problem. We can prove that the approximation algorithm can converge to the accurate solution with a small accuracy loss and much less running time.

Our approximation algorithm requires the use of a O\mathcal{O} -simulator, which is defined as follows.

DEFINITION 1. ( O\mathcal{O} -simulator) Suppose there is a random vector O=(O1,,OI)P(o)\mathcal{O} = (O_1, \cdots, O_I) \sim P(\mathbf{o}) , i.e., P(o)P(\mathbf{o}) is the distribution of O\mathcal{O} . Given oΩ\forall \mathbf{o} \in \Omega^* and P(o)P(\mathbf{o}) , an algorithm is called an O\mathcal{O} -simulator if the algorithm randomly outputs a vector o\mathbf{o} with the probability P(o)P(\mathbf{o}) .

As described above, O\mathcal{O} -simulator actually simulates the random vector O\mathcal{O} and randomly output its samples. In general, it is difficult to simulate a random vector; however, in our case, all the OiO_i are independent of each other and they have discrete distributions. Therefore, the simulation becomes feasible. In Table 1 we give a description of O\mathcal{O} -simulator. Here we assume O=(O1,,OI)\mathcal{O} = (O_1, \cdots, O_I) and OiPi(oi),oiΩO_i \sim P_i(o_i), o_i \in \Omega^* . Furthermore, Ω={b0,b1,,bB}\Omega^* = \{b_0, b_1, \cdots, b_B\} is a discrete space shared by all i (like the bid space in our model) and all OiO_i are independent of each other.

Note that f is a uniformly random number from [0,1], therefore the possibility that oio_i equals bnb_n is exactly P(Oi=bn)P(O_i = b_n) . Thus, the possibility to output o=(o1,,oI)o = (o_1, \dots, o_I) is i=1IP(Oi=oi)\prod_{i=1}^{I} P(O_i = o_i) , which is exactly what we want.

We then give the Monte Carlo Algorithm as shown in Table 2 to calculate Eol{argmaxπl(l)(αj(vlolsl/sl))}E_{o_{-l}}\{\arg\max_{\pi_l^{(l)}}(\overline{\alpha_j}(v_l-o_l\overline{s_l}/\overline{s_l}))\} for a certain l. For simplicity, we denote Pr(πi(l)=bn)Pr(\pi_i^{(l)}=b_n) as qi,n(l)q_{i,n}^{(l)} , and thus qi,0(l)q_{i,0}^{(l)} is the possibility that i is not in the auction. In this algorithm, the historical bid histogram πi\pi_i^* and qi,0(l)q_{i,0}^{(l)} are calculated from the auction logs by Maximum Likelihood Estimation. Given rationality parameter δl\delta_l , NlN_l , and qi,0(l)q_{i,0}^{(l)} , we initialize qi,n(l)q_{i,n}^{(l)} by the capability function. Then with olo_{-l} generated by O\mathcal{O} -simulator, we can calculate which ranked list is optimal for l by solving argmaxπl(l)(αj(vlolsl/sl))\arg\max_{\pi_l^{(l)}}(\overline{\alpha_j}(v_l-o_l\overline{s_l}/\overline{s_l})) . Note that it is possible that different bids may lead to the same optimal ranked list (with the same utility). In this case, the inverse function “arg maxπl(l)\max_{\pi_l^{(l)}} ” will output a bid set BolB_{o_{-l}} including all the equally optimal bids. By assuming that advertiser l will take any bid in BolB_{o_{-l}} with uniform probability, we allocate each bid in BolB_{o_{-l}} with

for i = 0, ..., I,
initialize \pi_i^*, q_{i,0}^{(l)}, \overline{s}_i;
for i = 0, ..., J,
initialize \overline{\alpha}_j;
initialize, \delta_l, N_l, 1/\overline{s}_l;
\pi_{l,n}^{(l)} = 0;
\mathbf{for} \ i = 1, \cdots, I(l \neq i) \ \text{and} \ n = 1, \cdots, B
q_{i,n}^{(l)} = (1 - q_{i,0}^{(l)}) \times \sum_{m = -N_l}^{N_l} \pi_{i,n-m}^* \binom{2N_l}{N_l+m} \delta_l^{(N_l+m)} (1 - \delta_l)^{(N_l-m)};
Build an \mathcal{O}-simulator with P(\mathcal{O} = \mathbf{o}_{-l}) = \prod_{i=1, i \neq l}^{I} q_{i, o_i}^{(l)}, \forall \mathbf{o}_{-l};
for t = 1, \dots, N,
\mathcal{O}-simulator outputs a sample o_{-l};
Solve \arg \max_{\boldsymbol{\pi}_{:}^{(l)}} (\overline{\alpha}_{j}(v_{l} - o_{\hat{l}}\overline{s}_{\hat{l}}/\overline{s}_{l})) to get B_{o_{-l}};
for all b_i \in B_{o_{-l}},
\pi_{l,i}^{(l)} = \pi_{l,i}^{(l)} + 1/|B_{o_{-l}}|;
end;
end.
for n=1,\cdots,B,
\pi_{l,n}^{(l)} = \pi_{l,n}^{(l)}/N.
output \pi_{l,n}^{(l)};

a weight 1BOl\frac{1}{|B_{O_-l}|} averagely. Finally, we use the simulation times N to normalize the distribution and output it.

For the Monte Carlo Algorithm, we can prove its convergence to the accurate solution, which is shown in the following theorem.

THEOREM 1. Given πi\pi_i^* and qi,0(l)q_{i,0}^{(l)} , the output of the Monte Carlo Algorithm converges to Eol{argmaxπl(l)(αj(vlolsl/sl))}E_{o_{-l}}\{\arg\max_{\pi_l^{(l)}}(\overline{\alpha}_j(v_l-o_l\overline{s}_l/\overline{s}_l))\} as the times of simulation N grows.

PROOF. We assume that the accurate solution is πl0\pi_l^0 and thus we need to prove n (n=1,,B),πl,n(l)πl,n0\forall n\ (n=1,\cdots,B), \pi_{l,n}^{(l)} \to \pi_{l,n}^0 as NN\to\infty . For a certain player l, we construct the following map:

M:olBol={all of ls best bids in case ol},ol.M: \mathbf{o}_{-l} \to B_{\mathbf{o}_{-l}} = \{all \ of \ l's \ best \ bids \ in \ case \ \mathbf{o}_{-l}\}, \forall \mathbf{o}_{-l}.

According to the definition, we know that πl,n0\pi^0_{l,n} equals to the nthn^{th} element of Eol{argmaxπ(l)(αj(vlolˉslˉ/sl))}E_{o_{-l}}\{\arg\max_{\boldsymbol{\pi}^{(l)}}(\overline{\alpha}_j(v_l-o_{\bar{l}}\overline{s}_{\bar{l}}/\overline{s}_l))\} , and then

πl,n0=all Bol containing bnP(ol)Bol.\pi_{l,n}^{0} = \sum_{\substack{all \ B_{\boldsymbol{o}-l} \ containing \ b_n}} \frac{P(\boldsymbol{o}_{-l})}{|B_{\boldsymbol{o}_{-l}}|}.

Here P(ol)P(o_{-l}) is the probability of olo_{-l} . In the Monte Carlo algorithm, we initialize πl,n(l)=0\pi_{l,n}^{(l)}=0 , and suppose that πl,n(l)\pi_{l,n}^{(l)} increases by Δt\Delta_t in each step of the loop “for t=1,,Nt=1,\cdots,N ”. Therefore, the value of πl,n(l)\pi_{l,n}^{(l)} will finally be (t=1NΔt)/N(\sum_{t=1}^N \Delta_t)/N . However, in each step t, for a sample olo_{-l} , the expectation of Δt\Delta_t is,

E(Δt)=all Bol containing bnP(ol)Bol.E(\Delta_t) = \sum_{\substack{all \ B_{\mathbf{o}_{-l}} \ containing \ b_n}} \frac{P(\mathbf{o}_{-l})}{|B_{\mathbf{o}_{-l}}|}.

Hence, referring to the Law of Large Number, (t=1NΔt)/N(\sum_{t=1}^{N} \Delta_t)/N will converge to the expectation of Δt\Delta_t , which exactly equals πl,n0\pi_{l,n}^0 as N grows. This finishes our proof of Theorem 1. \square

Besides the above theorem, we can also prove some properties of the proposed model. We describe the properties in the appendix for the readers who are interested in them.

In this section, we report the experimental results about the prediction accuracy of our proposed model. In particular, we first describe the data sets and the experimental setting. Then we investigate the training accuracy for the willingness, capability, and constraint functions, to show the step-wise results of the proposed method. After that, we test the performance of our model in bid prediction, which is the direct output of the advertiser behavior model. At last, we test the performance of our model in click number prediction and revenue prediction, which are important applications of the advertiser behavior model.

In our experiments, we used the advertiser bid history data sampled from the sponsored search log of a commercial search engine. We randomly chose 160 queries from the most profitable 10,000 queries and extracted the related advertisers from the data. We sampled one auction per 30 minutes from the auction log within 90 days (from March 2012 to May 2012) 9^9 , so there are in total 4,320 (90 × 24 × 2) auctions. For each auction, there are up to 14 (4 on mainline and 10 on sidebar) ads displayed. We filtered out the advertisers whose ads have never been displayed during these 4,320 auctions, and eventually kept 5,543 effective advertisers in the experiments.

For the experimental setting, we used the first 3,360 auctions (70 days) for model training, and the last 960 auctions (20 days) as test data for evaluation. In the training period, we used the first 2,400 auctions (50 days) to obtain the historical bid histogram πi(i=1,,I)\pi_i^*(i=1,\cdots,I) and the true value vlv_l ; we then used the rest 960 auctions (20 days) to learn the parameters for the advertiser rationality. For clarity, we list the usage of the data in Table 3. Note that the three periods in the table are abbreviated as P1, P2, and P3.

5.2 Different Aspects of Advertiser Rationality

Section titled “5.2 Different Aspects of Advertiser Rationality”

First, we study the logistic regression model for willingness. We train the willingness function using the auctions in P2 according to the description in Section 2.2, and test its performance on actions in P3. In particular, for any auction t in P3, we get the value of yt(l)y_t^{(l)} according to whether the bid was changed in the time interval [t-1,t], and use it as the ground truth. For the same time period, we apply the regression model to calculate the predicted value y^t(l)[0,1]\hat{y}_t^{(l)} \in [0,1] of yt(l).y_t^{(l)}. We find a threshold in [0,1] such that y^t(l)\hat{y}_t^{(l)} is correspondingly converted to 0 or 1. Then we can calculate the prediction accuracy compared with the ground truth. Figure 1 shows the distribution of different prediction accuracies among advertisers when the threshold is set to 0.15. According to the figure, we can see that the willingness function gets a prediction accuracy of 100% for 39% (2,170 of 5,543) advertisers, and a prediction accuracy over 80% for 68% (3,773 of 5,543) advertisers. In this regard we say the proposed willingness model performs well on predicting whether the advertisers are willing to change their bids. In the search engine, only the latest-90-day data is stored. To deal with the seasonal or holiday effects, we can choose seasonal or holiday data from different years instead of the data in continuous time. We only consider the general cases in our experiments.

Table 3: Data usage in the experiments

PurposeTrainTest
PeriodP1: Day 1 to Day 50P2: Day 51 to Day 70P3: Day 71 to Day 90
#auctions2,400960960
Usage(i) Get historical bid histogram
(ii) Learn true value
Learn rationality parametersTest model
Information requiredbid price
ad quality score
ad position
bid price
ad quality score
click number
budget
bid price
ad quality score
click number
budget
pay per click

Figure 1: Distribution of the prediction accuracy.

Second, we investigate the capability function. For this purpose, we set ClC_l as an identify function, and only consider WlW_l and AlA_l . In the capability function AlA_l , we discretely pick the parameter pair (δl,Nl)(\delta_l, N_l) from the set {0,0.1,,0.9,1.0}×{0,1,,9,10}\{0, 0.1, \cdots, 0.9, 1.0\} \times \{0, 1, \cdots, 9, 10\} and judge which parameter pair is the best using the data in P2 as described in Section 2.3. We call the advertiser model with the learned willingness and capability functions (without considering the constraint function) Rationality-based Advertiser Behavior model with Willingness and Capability (or RAB-WC for short). Its performance will be reported and discussed in Section 5.3.

Third, the constraint function is implemented with a linear regression model trained on P2, using the remaining budget as the feature, according to the discussions in Section 2.4. By applying the constraint function, we get the complete version of the proposed model. We call it Rationality-based Advertiser Behavior model with Willingness, Capability, and Constraint (or RAB-WCC for short). Its performance will be given in Section 5.3.

In this subsection, we compare our proposed advertiser model with six baselines in the task of bid prediction. The predicted bid prices are the direct outputs of the advertiser behavior models. The baselines are listed as follows:

  • Random Bid Model (RBM) refers to the random method of bid prediction. That is, we will randomly select a bid in the bid strategy space as the prediction.
  • Most Frequent Model (MFM) refers to an intuitive method for bid prediction, which works as follows. First, we get the

historical bid histogram from the bid values in the training period, and then always output the historically most frequentlyused bid value for the test period. If there are several bid prices that are equally frequently used, we will randomly select one from them.

  • Best Response Model (BRM) [5] refers to the model that predicts the bid strategy to be the best response by assuming the advertisers know all the competitors’ bids in the previous auction.
  • Regression Model (RM) [8] refers to the model that predicts the bid strategy using a linear regression function. In our experiments, we used the following 5 features as the input of this function: the average bid change in history, the bid change in the previous time period, click number, remaining budget, and revenue in the previous period.
  • RAB-WC refers to the model as described in the previous subsection.
  • RAB-WCC-D refers to the degenerated version of RAB-WCC. That is, we select the bid with the maximum probability in the mixed bid strategy output by RAB-WCC.

We adopt two metrics to evaluate the performances of these advertiser models.

First, we use the likelihood of the test data as the evaluation metric [9]. Specifically, we denote a probabilistic prediction model as M,10\mathcal{M},^{10} which outputs a mixed strategy of advertiser l in period t as πl[t]=(πl,0[t],,πl,B[t])\pi_l^{[t]} = (\pi_{l,0}^{[t]}, \cdots, \pi_{l,B}^{[t]}) in the bid strategy space Ω0\Omega^0 . Suppose the index of the real bid strategy of l in period t is ωl[t]\omega_l^{[t]} . Considering a period set T\mathcal{T} and an advertiser set I\mathcal{I} , we define the following likelihood:

PT,I(M)=ΠtT,lI(πl,ωi[t][t]).P_{\mathcal{T},\mathcal{I}}(\mathcal{M}) = \Pi_{t \in \mathcal{T}, l \in \mathcal{I}}(\pi_{l,\omega_i^{[t]}}^{[t]}).

PT,I(M)P_{\mathcal{T},\mathcal{I}}(\mathcal{M}) reflects the probability that model M\mathcal{M} produces the real data ωl[t]\omega_l^{[t]} for all tTt \in \mathcal{T} and all lIl \in \mathcal{I} . To make the metric normalized and positive, we adopt the geometric average and a negative logarithmic function. As a result, we get

DT,I(M)=ln(TPT,I(M))=lnPT,I(M)TI.D_{\mathcal{T},\mathcal{I}}(\mathcal{M}) = -\ln(\sqrt{|\mathcal{T}|}\sqrt{P_{\mathcal{T},\mathcal{I}(\mathcal{M})}}) = \frac{-\ln P_{\mathcal{T},\mathcal{I}}(\mathcal{M})}{|\mathcal{T}||\mathcal{I}|}.

We call it negative logarithmic likelihood (NLL). It can be seen that with the same T\mathcal T and I\mathcal I , the smaller NLL is, the better prediction M\mathcal M gives.

&lt;sup>10Please note some of the models under investigation are deterministic models. We can still compute the likelihood for them because deterministic models are special cases of probabilistic models.

Second, we use the expected error between the predicted bid strategy and the real bid as the evaluation metric. Specifically, we define the metric as the aggregated expected error (AEE) on a period set T and an advertiser set I, i.e.,

tTlIi=0Bπl,i[t](bibωl[t]).\sum_{t \in \mathcal{T}} \sum_{l \in \mathcal{I}} \sum_{i=0}^{B} |\pi_{l,i}^{[t]}(b_i - b_{\omega_l^{[t]}})|. (3)

The average NLL and AEE on all the 160 queries of the above algorithms are shown in Table 4. We have the following observations from the table.

  • Our proposed RAB-WCC achieves the best performance compared with all the baseline methods.
  • RAB-WCC-D performs the second best among these methods, indicating that the bid with the maximum probability in RAB-WCC has been a very good prediction compared with most of the baselines.
  • RAB-WC performs the third best among these methods, showing that: a) the proposed rationality-based advertiser model can outperform the commonly used algorithms in bid prediction; b) the introduction of the constraint function to the rationality-based advertiser model can further improve its prediction accuracy.
  • RBM performs almost the worst, which is not surprising due to its uniform randomness.
  • BRM also performs very bad. Our explanation is as the following. In BRM, we assume the advertisers know all the competitors’ bids before selecting the bids for the next auction. However, the real situation is far from this assumption. So the “best response” will not be the real response for most cases.
  • MFM model performs better than BRM. This is not difficult to interpret. MFM is a data driven model, without too much unrealistic assumptions. Therefore, it will fit the data better than BRM.
  • RM performs better than MFM but worse than RAB-WC, RAB-WCC-D, and RAB-WCC. RM is a machine learning model which leverages several features related to the advertiser behaviors, therefore it can outperform MFM which is simply based on counting. However, RM does not consider the rationality levels in its formulation, and therefore it cannot fit the data as well as our proposed model. This indicates the importance of modeling advertiser rationality when predicting their bid strategy changes.

In addition to the average results, we give some example queries and their corresponding NLL and AEE on the 960th auction in P3 in Table 5 and Table 6. The best scores are blackened in the table. At first glance, we see that RAB-WCC achieves the first positions in most of the example queries, while RAB-WCC-D and RAB-WC achieve the first positions for the rest example queries. In most cases, RBM performs the worst, and RM performs moderately.

To sum up, we can conclude that the proposed RAB-WCC method can predict the advertisers’ bid strategies with the best accuracy among all the models under investigation.

To further test the performance of our model, we apply it to the tasks of click number prediction and revenue prediction.11 We compare our model with two state-of-the-art models on these tasks. The first baseline model is the Structural Model in Sponsored Search [2], abbreviated as SMSS-1. The second baseline model is the Stochastic Model in Sponsored Search [17], abbreviated as SMSS-2. SMSS-1 calculates the expected number of clicks and the expected expenditure for each advertiser by considering some uncertainty assumptions on sponsored search marketplace. SMSS-2 assumes that all the advertisers’ bids are independent and identically distributed and they learn the distribution by mixing all the advertisers’ historical bids.

We use the relative error and absolute error as compared to the real click numbers and revenue in the test period as the evaluation metrics. Specifically, suppose the value output by the model and the ground truth value are φ and ϕ respectively, then the absolute error and the relative error are calculated as |φ − ϕ| and |φ − ϕ|/ϕ respectively. The performance of all the models under investigation are listed in Table 7.

According to the table, we can clearly see that RAB-WCC performs better than both SMSS-1 and SMSS-2. The absolute errors on click number and revenue made by SMSS-1 are very large as compared to the other methods. The relative errors made by SMSS-1 are larger than 50% for both click number and revenue prediction, which are not good enough for practical use. The relative error made by SMSS-2 for revenue prediction is even larger than 80%. In contrast, our proposed RAB-WCC method generates relative errors of no more than 20% for both click and revenue prediction (and the absolute errors are also small). Although the results might need further improvements, a 20% prediction error has already provided quite good references for the search engine to make decision.

Besides the randomized bid strategy and the strategy of selecting the most frequently used bid, there are a number of works on advertiser modeling in the literature. Early work studies some simple cases in sponsored search such as auctions with only two advertisers and auctions in which the advertisers adjust their bids in an alternating manner [1] [21] [18]. Later on, greedy methods were used to model advertiser behaviors. For example, in the random greedy bid strategy [4], an advertiser chooses a bid for the next round of auction that maximizes his/her utility, by assuming that the bids of all the other advertisers in the next round will remain the same as in the previous round. In the locally-envy free bid strategy [10] [16], each advertiser selects the optimal bid price that leads to a certain equilibrium called locally-envy free equilibrium. In [6], the advertiser bid strategies are modeled using the knapsack problem. Competitor-busting greedy bid strategy [22] assumes that an advertiser will bid as high as possible while retaining his/her desired ad slot in order to make the competitors pay as much as possible and thus exhaust their advertising resources. Other similar work includes low-dimensional bid strategy [20], restricted balanced greedy bid strategy [4], and altruistic greedy bid strategy [4]. In [5], a model that predicts the bid strategy to be the best response is proposed by assuming the advertisers know all the competitors’ bids in the previous auction. In [8], a linear regression model is used base on a group of advertiser behavior features. In addition, a bid strategy based on incremental cost per click is discussed in [19]

11After outputting the bid prediction, we simulated the auction process based on those bids and made estimation on the revenue and clicks according to the simulation results.

Table 4: Prediction performance

ModelRBMMFMBRMRMRAB-WCRAB-WCC-DRAB-WCC
NLL3.9391.4202.1541.2891.1351.0561.018
AEE35.39234.74877.52640.39714.61610.5538.876

Table 5: Prediction performance on some example queries (NLL)

ModelRBMMFMBRMRMRAB-WCRAB-WCC-DRAB-WCC
car insurance3.0671.1981.7772.4680.9950.9750.975
disney2.1690.5412.5920.3000.1300.1400.130
ipad4.4571.2882.0750.7470.3150.3250.310
jcpenney2.0890.5113.2130.4870.2630.3510.262
medicare3.6491.4661.7502.8661.1251.1271.121
stock market5.0681.7112.1001.8391.3731.3491.362

[2], which proves that an advertiser’s utility is maximized when he/she bids the amount at which his/her value per click equals the incremental cost per click.12

However, please note that most of the above works assume that the advertisers have the same rationality and intelligence in choosing the best response to optimize their utilities. Therefore they have significant difference from our work. Actually, to the best of our knowledge, there is no work on advertiser behavior modeling that considers different aspects of advertiser rationality.

In this work, we have proposed a novel advertiser model which explicitly considers different levels of rationality of an advertiser. We have applied the model to the real data from a commercial search engine and obtained better accuracy than the baseline methods, in bid prediction, click number prediction, and revenue prediction.

As for future work, we plan to work on the following aspects.

  • First, in Section 2.1, we have assumed that the auctions for different keywords are independent of each other. However, in practice, an advertiser will bid multiple keywords simultaneously and his/her strategies for these keywords may be dependent. We will study this complex setting in the future.
  • Second, we will study the equilibrium in the auction given the new advertiser model. Most previous work on equilibrium analysis is based on the assumption of advertiser rationality. When we change this foundation, the equilibrium needs to be re-investigated.
  • Third, we will apply the advertiser model in the function modules in sponsored search, such as bid keyword suggestion, ad selection, and click prediction, to make these modules more robust against the second-order effect caused by the advertiser behavior changes.
  • Fourth, we will consider the application of the advertiser model in the auction mechanism design. That is, given the advertiser model, we may learn an optimal auction mechanism using a machine learning approach.

In the appendix, we discuss some properties of the proposed model. Firstly, we give a theorem on the relationship of true value and bid. Secondly, we give a theorem related to the estimation accuracy of the true value.

We discuss about the relationship between true value vlv_l and our predicted bid strategy. Note that we will mainly focus on the results from the capability function because both willingness and compromise functions are not effected by the true value vlv_l according to their definitions. For this purpose, by setting Wl=1W_l=1 and ClC_l as the identity function in πl\pi_l , we define:

F(vl)=Eσol{argmaxπl(l)(αj(vlol^sl^/sl))}\boldsymbol{F}(v_l) = E_{\sigma_{\boldsymbol{o}_{-l}}} \left\{ \arg \max_{\boldsymbol{\pi}_l^{(l)}} (\overline{\alpha}_j (v_l - o_{\hat{l}} \overline{s}_{\hat{l}} / \overline{s}_l)) \right\}

E(vl)=(b1,b2,...,bB)(F(vl))TE(v_l) = (b_1, b_2, ..., b_B)(\mathbf{F}(v_l))^T

Here F(vl)F(v_l) is a B-dimension strategy vector and E(vl)E(v_l) is the average bid of the strategy F(vl)F(v_l) . Under a very common assumption that ad position effect αj\alpha_j decreases with the slot index j, Theorem 2 shows that an advertiser with a higher true value will generally set a higher bid to optimize the utility, which is consistent to the intuition. This conclusion shows the consistency of our model in the capability part.

THEOREM 2. Assume αj\alpha_j decreases in j, then E(vl)E(v_l) is monotone nondecreasing in vlv_l .

PROOF. To prove E(vl)E(v_l) is monotone nondecreasing, we only need to prove that ol\forall o_{-l} and Δ>0\forall \Delta > 0 ,

(b1,b2,...,bB)(argmaxπl(l)(αj(vl(1+Δ)ol~sl~/sl)))(b_{1}, b_{2}, ..., b_{B}) \left( \underset{\boldsymbol{\pi}_{l}^{(l)}}{\arg \max} (\overline{\alpha}_{j} (v_{l} (1 + \Delta) - o_{\tilde{l}} \overline{s}_{\tilde{l}} / \overline{s}_{l})) \right)

\geq (b_{1}, b_{2}, ..., b_{B}) \left( \underset{\boldsymbol{\pi}_{l}^{(l)}}{\arg \max} (\overline{\alpha}_{j} (v_{l} - o_{\tilde{l}} \overline{s}_{\tilde{l}} / \overline{s}_{l})) \right), \tag{4}

and then the ” \geq ” will keep unchanged in the expectation of olo_{-l} . We denote jΔj^{\Delta} and j0j^0 as the best rank of l for the cases that true values are vl(1+Δ)v_l(1+\Delta) and vlv_l respectively. Here olo_{-l} is fixed and “best rank” means the rank that leads to the optimal utility.

&lt;sup>12Incremental cost per click is defined as the advertiser’s average cost of additional clicks received at a better ad slot.

Table 6: Prediction performance on some example queries (AEE)

ModelRBMMFMBRMRMRAB-WCRAB-WCC-DRAB-WCC
car insurance89.45989.883305.703107.33533.20722.18812.760
disney5.0194.8959.7030.2170.2970.1710.140
ipad16.35515.42830.8560.6620.9750.4580.385
jcpenney5.0365.14516.3371.4761.4111.1650.209
medicare98.20699.014225.774111.24820.22116.6953.744
stock market37.57638.36072.64097.0355.8244.1371.486

We denote l^Δ\hat{l}_{\Delta} and l^0\hat{l}_{0} as the advertisers who rank at (jΔ+1)(j^{\Delta}+1) and (j0+1)(j^{0}+1) respectively. Note that for a fixed olo_{-l} , jΔj^{\Delta} and j0j^{0} can be different due to different true value of l. If we are able to prove jΔj0j^{\Delta} \geq j^{0} , then the inequality (4) will be valid since a nondecreasing best ranking yields a nondecreasing best bid strategy.

As j0j^0 is the best rank for the true value vlv_l , we have

\overline{\alpha}_{j^0}(v_l - o_{\hat{l}_0} \overline{s}_{\hat{l}_0} / \overline{s}_l) \ge \overline{\alpha}_{j^{\Delta}}(v_l - o_{\hat{l}_{\Lambda}} \overline{s}_{\hat{l}_{\Lambda}} / \overline{s}_l). \tag{5}

Assuming jΔ<j0j^{\Delta} < j^0 , we have,

\overline{\alpha}_{j^0} v_l \Delta > \overline{\alpha}_{j^{\Delta}} v_l \Delta. \tag{6}

By adding (3) and (4), we got,

αj0(vl(1+Δ)ol^0sl^0/sl)>αjΔ(vl(1+Δ)ol^Λsl^Λ/sl).\overline{\alpha}_{j^0}(v_l(1+\Delta) - o_{\hat{l}_0}\overline{s}_{\hat{l}_0}/\overline{s}_l) > \overline{\alpha}_{j^\Delta}(v_l(1+\Delta) - o_{\hat{l}_\Lambda}\overline{s}_{\hat{l}_\Lambda}/\overline{s}_l).

This equation reveals that j0j^0 is a better rank than jΔj^{\Delta} and jΔj^{\Delta} should not be the best rank for the true value vl(1+Δ)v_l(1 + \Delta) , which is contradictive to the definition of jΔj^{\Delta} . Therefore, the assumption jΔ<j0j^{\Delta} < \underline{j^0} is not valid, which also finishes our proof of this theo-

As discussed in Section 4, we choose the model in [9] for the true value prediction. Usually, the estimation is not perfect and there might be some errors. Fortunately, we can prove a theorem which guarantees that the solution of this model will keep accurate if the estimation errors are not very large. This holds true because the payment rule of GSP is discrete and it allows the small-scale vibration of true value.

Before introducing the theorem, we give some notations first. For a fixed olo_{-l} and true value vlv_l , l’s best rank is denoted as BRolBR_{o_{-l}} (Best Rank), the optimal utility is denoted as BUo1BU_{o_{-1}} , and the ranking score of l^\hat{l} (the one ranked next to l) in the optimal case is denoted BSo1BS_{o_{-1}} . To describe the theorem, we also denote the second optimal utility as SUolSU_{o_{-l}} (Second Utility), which is the largest utility less than BUo1BU_{o_{-1}} in the fixed o1o_{-1} .

Theorem 3. We assume that αj\alpha_j decreases in j and set θ=maxol(SUolBUol)\theta = \max_{o_{-l}}(\frac{SU_{o_{-l}}}{BU_{o_{-l}}}) , ρ=maxol(BRol)\rho = \max_{o_{-l}}(BR_{o_{-l}}) , and ω=maxol(BSol)\omega = \max_{o_{-l}}(BS_{o_{-l}}) , (vlω/sl>0)(v_l - \omega/\overline{s}_l > 0) . Let vlv_l increase by Δvl\Delta v_l (ΔR)(\Delta \in R) , then F(vl)F(v_l) will keep unchanged if Δαρα1(1θ)(1ωslvl)|\Delta| \leq \frac{\alpha_{\rho}}{\alpha_{1}}(1-\theta)(1-\frac{\omega}{\overline{s}_{l}v_{l}}) , where α1\alpha_{1} is the CTR at the first position.

In order to prove the bound of Δ\Delta keeps F(vl)F(v_l) unchanging, we prove the following lemma instead.

Lemma 1. If Δ\Delta satisfies Δαρα1(1θ)(1ωslvl)|\Delta| \leq \frac{\alpha_{\rho}}{\alpha_{1}}(1-\theta)(1-\frac{\omega}{\overline{s}_{l}v_{l}}) , then ol\forall o_{-l} we have, argmaxπl(l)αj(vl(1+Δ)olsl/sl)=argmaxπl(l)αj(vlcl)argmax_{\pi_{l}^{(l)}}\overline{\alpha}_{j}(v_{l}(1+\Delta)-o_{l}\overline{s}_{l}/\overline{s}_{l}) = argmax_{\pi_{l}^{(l)}}\overline{\alpha}_{j}(v_{l}-c_{l})

The proof of Theorem 3 will be finished at once after we sum up all the cases of olo_{-l} in Lemma 1.

Table 7: Prediction performance in applications

SMSS-1SMSS-2RAB-WCC
0.520.110.19
2.020.710.23
0.540.830.18
659.06124.8025.75
0.52
2.02
0.54
0.52 0.11
2.02 0.71
0.54 0.83

PROOF. Since a change of ” argmaxπ(l)argmax_{\pi^{(l)}} ” is equivalent to a change of ” BRo1BR_{o_{-1}} ”, we consider the critical point that the increase of Δ\Delta makes the best rank transfer exactly from j0j^0 to jΔj^\Delta ( j0jΔj^0 \neq j^\Delta ). Thus we have: jΔ(jΔj0),s.t. jΔ,j0\exists j^\Delta(j^\Delta \neq j^0), s.t.\ j^\Delta, j^0 maximizes αj(vl(1+\overline{\alpha}_j(v_l(1+

Δ\Delta ) – oisˉi/sˉlo_i \bar{s}_i / \bar{s}_l ) simultaneously, and then we can get,

\overline{\alpha}_{j^{\Delta}}(v_l(1+\Delta) - o_{\hat{l}_{\Delta}}\overline{s}_{\hat{l}_{\Delta}}/\overline{s}_l) = \overline{\alpha}_{j^0}(v_l(1+\Delta) - o_{\hat{l}_0}\overline{s}_{\hat{l}_0}/\overline{s}_l). \tag{7}

From equation (7) we have

Δ=αj0(vlol^0sl^0/sl)αjΔ(vlol^Δsl^Δ/sl)vl(αjΔαj0).\Delta = \frac{\overline{\alpha}_{j^0} (v_l - o_{\hat{l}_0} \overline{s}_{\hat{l}_0} / \overline{s}_l) - \overline{\alpha}_{j^{\Delta}} (v_l - o_{\hat{l}_{\Delta}} \overline{s}_{\hat{l}_{\Delta}} / \overline{s}_l)}{v_l (\alpha_{j^{\Delta}} - \alpha_{j^0})}. (8)

Assume there is a θ0\theta_0 such that

\overline{\alpha}_{j\Delta}(v_l - o_{\hat{l}_{\Delta}}\overline{s}_{\hat{l}_{\Delta}}/\overline{s}_l) = \theta_0 \overline{\alpha}_{j0}(v_l - o_{\hat{l}_0}\overline{s}_{\hat{l}_0}/\overline{s}_l). \tag{9}

Then equation (8) is transformed as,

Δ=(1θ0)αj0(vlol^0sl^0/sl)vl(αjΔαj0)\Delta = \frac{(1 - \theta_0)\overline{\alpha}_{j^0}(v_l - o_{\hat{l}_0}\overline{s}_{\hat{l}_0}/\overline{s}_l)}{v_l(\alpha_{j^\Delta} - \alpha_{j^0})}

=(1θ0)αj0αjΔαj0(1ol^0sl^0/slvl).(10)= (1 - \theta_0)\frac{\alpha_{j^0}}{\alpha_{j\Delta} - \alpha_{j^0}}(1 - \frac{o_{\hat{l}_0}\overline{s}_{\hat{l}_0}/\overline{s}_l}{v_l}). \quad (10)

Considering j0j^0 is the best rank, from equation (9) we have,

θ0=αjΔ(vlol^Δsl^Δ/sl)αj0(vlol^0sl^0/sl)SUolBUolθ<1.\theta_0 = \frac{\overline{\alpha}_{j\Delta} (v_l - o_{\hat{l}_{\Delta}} \overline{s}_{\hat{l}_{\Delta}} / \overline{s}_l)}{\overline{\alpha}_{j0} (v_l - o_{\hat{l}_0} \overline{s}_{\hat{l}_0} / \overline{s}_l)} \le \frac{SU_{o_{-l}}}{BU_{o_{-l}}} \le \theta < 1. (11)

In addition, there holds

j0maxσl(BRσl)=ρ,j^{0} \leq \max_{\boldsymbol{\sigma}_{-l}}(BR_{\boldsymbol{\sigma}_{-l}}) = \rho,

αj0αρ,αj0αjΔαj0>αρα1,\alpha_{j^{0}} \geq \alpha_{\rho}, \left|\frac{\alpha_{j^{0}}}{\alpha_{j^{\Delta}} - \alpha_{j^{0}}}\right| > \frac{\alpha_{\rho}}{\alpha_{1}}, (12)

ol^0sl^0=BSolmaxol(BSol)=ω.o_{\hat{l}_0} \overline{s}_{\hat{l}_0} = BS_{o_{-l}} \le max_{o_{-l}} (BS_{o_{-l}}) = \omega. (13)

According to (10) and (11),(12),(13), we finally have,

Δ=1θ0αj0αjΔαj0(1ol^0sl^0slvl)>αρα1(1θ)(1ωslvl).|\Delta| = |1 - \theta_0| \frac{\alpha_{j^0}}{|\alpha_{j^{\Delta}} - \alpha_{j^0}|} (1 - \frac{o_{\hat{l}_0} \overline{s}_{\hat{l}_0}}{\overline{s}_l v_l}) > \frac{\alpha_{\rho}}{\alpha_1} (1 - \theta) (1 - \frac{\omega}{\overline{s}_l v_l}).

As Δ\Delta is the critical point, for any fixed olo_{-l} , if Δαρα1(1θ)(1racωslvl), BRmol|\Delta| \leq \frac{\alpha_{\rho}}{\alpha_{1}}(1 \theta)(1- rac{\omega}{\overline{s}_lv_l}),~BR_{m{o}_{-l}} and ” argmaxπl(l)argmax_{\pi_l^{(l)}} ""will keep unchanged. This ends our proof of Lemma 1.

  • [1] K. Asdemir. Bidding patterns in search engine auctions. In Second Workshop on Sponsored Search Auctions (2006), ACM Electronic Commerce. Press., 2006.

  • [2] S. Athey and D. Nekipelov. A structural model of sponsored search advertising auctions., 2010. Available at http://groups.haas.berkeley.edu/ marketing/sics/pdf_2010/paper_athey.pdf.

  • [3] A. Broder, E. Gabrilovich, V. Josifovski, G. Mavromatis, and A. Smola. Bid generation for advanced match in sponsored search. In Proceedings of the fourth ACM international conference on Web search and data mining, WSDM ‘11, pages 515–524, New York, NY, USA, 2011. ACM.

  • [4] M. Cary, A. Das, B. Edelman, I. Giotis, K. Heimerl, A. R. Karlin, C. Mathieu, and M. Schwarz. Greedy bidding strategies for keyword auctions. In EC ‘07 Proceedings of the 8th ACM conference on Electronic commerce. ACM Press., 2007.

  • [5] M. Cary, A. Das, B. G. Edelman, I. Giotis, K. Heimerl, A. R. Karlin, C. Mathieu, and M. Schwarz. On best-response bidding in gsp auctions., 2008. Available at http: //www.hbs.edu/research/pdf/08-056.pdf.

  • [6] D. Chakrabarty, Y. Zhou, and R. Lukose. Budget constrained bidding in keyword auctions and online knapsack problems. In WWW ‘07 Proceedings of the 16th international conference on World Wide Web. ACM Press., 2007.

  • [7] Y. Chen, G.-R. Xue, and Y. Yu. Advertising keyword suggestion based on concept hierarchy. In WSDM ‘08 Proceedings of the international conference on Web search and web data mining. ACM Press., 2008.

  • [8] Y. Cui, R. Zhang, W. Li, and J. Mao. Bid landscape forecasting in online ad exchange marketplace. In KDD ‘11 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press., 2011.

  • [9] Q. Duong and S. Lahaie. Discrete choice models of bidder behavior in sponsored search. In Workshop on Internet and Network Economics (WINE). Press., 2011.

  • [10] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. In The American Economic Review., 2007.

  • [11] T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In Proceedings of the 27th International Conference on Machine Learning. ACM, 2010.

  • [12] J. C. Harsanyi. Games with incomplete information played by bayesian players, i-iii. In Management Science., 1967/1968.

  • [13] J. Jansen. Understanding Sponsored Search: Core Elements of Keyword Advertising. Cambridge University Press., 2011.

  • [14] B. Kitts and B. Leblanc. Optimal bidding on keyword auctions. In Electronic Markets. Routledge Press., 2004.

  • [15] D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. In Journal of Mathematical Programming. Springer-Verlag New York., 1989.

  • [16] C. Nittala and Y. Narahari. Optimal equilibrium bidding strategies for budget constrained bidders in sponsored search auctions. In Operational Research: An International Journal. Springer Press, 2011.

  • [17] F. Pin and P. Key. Stochasitic variability in sponsored search auctions: Observations and models. In EC ‘11 Proceedings of the 12th ACM conference on Electronic commerce. ACM Press., 2011.

  • [18] S. S. S. Reddy and Y. Narahari. Bidding dynamics of rational advertisers in sponsored search auctions on the web. In Proceedings of the International Conference on Advances in Control and Optimization of Dynamical Systems. Press., 2007.

  • [19] H. R. Varian. Position auctions. In International Journal of Industrial Organization, 2006.

  • [20] Y. Vorobeychik. Simulation-based game theoretic analysis of keyword auctions with low-dimensional bidding strategies. In UAI ‘09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press., 2009.

  • [21] X. Zhang and J. Feng. Finding edgeworth cycles in online advertising auctions. In Proceedings of the 26th International Conference on Information Systems ICIS. Press., 2006.

  • [22] Y. Zhou and R. Lukose. Vindictive bidding in keyword auctions. In ICEC ‘07 Proceedings of the ninth international conference on Electronic commerce. ACM Press., 2007.