Modeling Competition And Its Impact On Paid Search Advertising
Paid search has become the mainstream platform for online advertising, further intensifying competition between advertisers. The main objective of this research is twofold. On the one hand, we want to understand, in the context of paid-search advertising, the effects of competition (measured by the number of ads on the paid-search listings) on click volume and the cost per click (CPC) of paid-search ads. On the other hand, we are interested in understanding the determinants of competition, that is, how various demand and supply factors affect the entry probability of firms and, consequently, the total number of entrants for a keyword. We regard each keyword as a market and build an integrative model consisting of three key components: (i) the realized click volume of each entrant as a function of the baseline click volume and the decay factor; (ii) the vector of realized CPCs of those entrants as a function of the decay factor and the order statistics of the value per click at an equilibrium condition; and (iii) the number of entrants, the product of the number of potential entrants multiplied by the entry probability; the entry probability is determined by the expected revenue (a function of expected click volume, CPC, and value per click) and the entry cost at the equilibrium condition of an incomplete information game. The proposed modeling framework entails several econometric challenges. To cope with these challenges, we develop a Bayesian estimation approach to make model inferences. Our proposed model is applied to a data set of 1,597 keywords associated with digital camera/video and their accessories with full information on competition. Our empirical analysis indicates that the number of competing ads has a significant impact on the baseline click volume, decay factor, and value per click. These findings help paidsearch advertisers assess the impact of competition on their entry decisions and advertising profitability. In the counterfactual analysis, we investigate the profit implication of two polices for the paid-search host: raising the decay factor by encouraging consumers to engage in more in-depth search/click-through and providing coupons to advertisers.
Key words: paid-search advertising; competition; Internet marketing; Bayesian estimation History: Received: January 6, 2012; accepted: July 2, 2013; Preyas Desai served as the editor-in-chief and K. Sudhir served as associate editor for this article. Published online in Articles in Advance October 24, 2013.
1. Introduction
Section titled “1. Introduction”Paid (or sponsored) search has grown rapidly during the last decade, driving the Internet to become the second-largest media for advertising spending in the United States. This type of advertising format not only provides the largest source of revenue to traditional search engines such as Google but also has been extended to other business platforms such as online retailers (e.g., Amazon) and online market makers (e.g., eBay, Priceline) serving as a host for paid-search advertising. The paid-search host plays the role of directing users to relevant sponsored ads based on user-generated queries. When an Internet user enters a query, she receives search results containing both the organic links and paid links. If a user clicks on a paid link, she is directed to the advertiser’s site, and the advertiser pays the search host a fee (i.e., cost per click, or CPC) for sending a potential customer.
As paid search becomes the mainstream platform for online advertising today, competition further intensifies. In paid-search advertising, a keyword is often regarded as a market reflecting a unique pattern of demand and supply. The demand captures the click volume of each ad on the sponsored search listings for that keyword, whereas the supply captures advertisers’ decision of whether to enter the market (by purchasing the keyword) and how much to bid for their ads. Intuitively, the number of entrants or competing ads appearing on the paid-search listings for a given keyword affects consumer search and buying behavior, which will in turn influence advertisers’ expected click volume, value per click, and CPC for
entering such a market. At the same time, the number of entrants is related to advertisers’ entry probability, which is often determined by the expected profit from paid-search advertising.
The main objective of this research is twofold. First, we want to understand, in the context of paid-search advertising, the effects of competition (the number of competing ads) on three key latent constructs that determine click volume and cost per click of paidsearch ads: baseline click volume, decay factor, and value per click, where the decay factor can be interpreted as the conditional probability for consumers to click on the next ad on the paid-search listings. Second, we are interested in understanding the determinants of competition—that is, how various demand and supply factors affect the entry probability of firms and, as a result, the total number of entrants for a keyword.
The understanding of competition and its impact is important to both advertisers and the paid-search host. On the one hand, it helps advertisers more precisely evaluate the impact of competition on their expected profit and thus make better decisions about keyword choices. Furthermore, studying the separate effects of competition on the three key constructs can help improve paid-search advertisers’ bidding effectiveness in the generalized second-price (GSP) auction. For example, advertisers should adjust their bids based on the number of competitors if competition affects click volume, decay factor, and/or value per click. This is because these parameters play a major role in determining the equilibrium CPCs in the GSP auction (Edelman et al. 2007, referred to as EOS hereafter). On the other hand, this analysis helps the paidsearch host better understand how competition affects its own profit. If the paid-search host does benefit from competition, what policy changes can be made to improve its profit by influencing the competition? Our research provides a general framework to help the search host address this important question.
Utilizing a unique data set with full information on competition, we propose a structural framework to characterize competition and analyze its impact on click volume and CPC. Specifically, our integrative modeling framework has three major components: (i) the realized click volume of each entrant as a function of the baseline click volume and the decay factor; (ii) the vector of realized CPCs of those entrants as a function of the decay factor and the order statistics of the value per click at an equilibrium condition of the GSP auction; and (iii) the number of entrants as the multiplication of the number of potential entrants and the entry probability, and the entry probability is determined by the expected revenue (a function of expected click volume, CPC, and value per click) and the entry cost at the equilibrium condition of an incomplete information game.
We make several modeling contributions. Unlike most of the previous studies where click volume and CPC are modeled in a reduced-form fashion, we structurally model these two variables to allow inferences of the key underlying structural parameters. We also structurally model the number of competing ads as a reflection of both demand and supply conditions. The proposed modeling framework entails several econometric challenges. Specifically, order statistics stemming from the unobserved value per click induces cross-position correlation on CPCs of those entrants. Furthermore, the occurrence of the decay factor in both models of click volume and CPC requires a joint estimation. Finally, the number of entrants takes an implicit functional form, requiring numerical calculation of the Jacobian to simulate the likelihood function. To cope with these challenges, we develop a Bayesian estimation approach to make model inferences.
Several key findings emerge from our analysis. These include (i) the number of entrants (ads) positively affects the baseline click volume, (ii) the number of entrants has an inverse-U relationship with the mean decay factor, (iii) the number of entrants has a negative and convex relationship with the mean value per click of a keyword, and (iv) competition generally hurts advertisers but benefits the paid-search host.
Our structural analysis provides the paid-search host with some guidelines to improve its profitability. We conduct two counterfactual analyses as a demonstration. First, we show that the paid-search host could raise the decay factor by encouraging users to engage in more in-depth search/click-through on paid-search listings; such a policy change could help the paid-search host increase profit. Second, our analysis can also help the search host determine the optimal face value of entry coupons distributed to advertisers to increase the search host’s profit.
The rest of this paper proceeds as follows. Section 2 reviews the relevant literature and positions our study in relation to previous studies. Section 3 describes the data and background information and develops the model. Section 4 provides an empirical application where we apply the model to real-world data collected from a large paid-search advertising host, and we discuss our findings. Section 5 provides managerial implications and presents two counterfactual analyses. Section 6 concludes this paper.
2. Relevant Literature
Section titled “2. Relevant Literature”Our work is based on the growing literature on paidsearch advertising. A series of papers have analytically examined advertisers’ bidding behavior in the
GSP auction. EOS and Varian (2007) are the first two papers that characterize the bidding equilibrium in the GSP auction. EOS proved that the GSP auction is incentive incompatible; that is, bidding one’s true value is not optimal. By studying a corresponding generalized English auction, they found that there exists a unique envy-free Bayes Nash equilibrium. Moreover, the ex post bids corresponding to this generalized English auction also satisfy the Nash equilibrium conditions of the GSP auction. Varian (2007) independently derived a similar equilibrium condition, which shows that the vector of equilibrium bids of the GSP auction can be expressed in a recursive form. The theoretical study on the GSP auction is further developed by Katona and Sarvary (2010), who extended the model to account for the heterogeneity of click-through rate (i.e., the ratio of actual clicks to the number of impressions) across competing ads and build the link between sponsored ads and organic ads. Athey and Nekipelov (2012) recently introduced advertisers’ uncertainty in quality scores to the GSP auction and presented theoretical conditions for the existence of a unique Nash equilibrium. They also proposed a computation algorithm to infer the bounds of bidders’ valuations and applied their method to the historical data of several keywords as a demonstration.
These studies have provided important theoretical foundations to modeling advertisers’ bidding strategies in the GSP auction. However, the theoretical results regarding paid-search advertiser’s bidding behavior have not been empirically investigated. In this paper, we characterize the CPC formation based on the equilibrium condition provided by EOS and Varian (2007). We jointly estimate the CPC, click volume, and the number of entrant advertisers, and we infer the distributions of baseline click volume, mean value per click, and mean decay factor. We also identify keyword characteristics that affect these three parameters. Our proposed structural model allows us to derive insights that cannot be obtained from a reduced-form model and fits the data better.
On the empirical side, several papers have examined marketing-related issues in the context of paidsearch advertising. For example, Ghose and Yang (2009) simultaneously modeled click-through, conversion, CPC, and ad position using keyword-specific data from one retailer. Yang and Ghose (2010) extended their previous work by analyzing consumer click-through and conversion on both sponsored search listings and organic search listings for the same keyword. Rutz and Bucklin (2011) explored the potential spillover effects between activities associated with generic and branded keywords in paidsearch advertising, using data from a hotel chain. Goldfarb and Tucker (2010) empirically studied the price variation on paid-search ads related to a legal service on Google and found evidence of substitution of online advertising for off-line advertising. These aforementioned studies generally employed a reduced-form approach and focused on predicting paid-search ad performances.
Few studies have empirically examined the underlying competition of paid-search advertising. Two important pieces of work need to be mentioned here. Chan and Park (2013) studied the influence of sequential search behaviors of consumers on the value of click-throughs in sponsored search advertising. They modeled the position competition in the context of a first-price auction with a buy-it-now option, which allows advertisers to acquire a position without submitting a bid. Since a unique equilibrium cannot be obtained in such an auction mechanism, they used the moment-inequality estimation approach to avoid imposing restrictive assumptions on equilibrium selection and infer advertisers’ value per click from the observed ad positions. Yao and Mela (2011) also modeled the position competition in the firstprice auction with a sorting/filtering function available to users. They emphasized the dynamics in forward-looking advertisers’ bidding strategies and used the Markov perfect equilibrium to characterize advertisers’ bids. They estimated the model by applying the two-step estimators developed by Bajari et al. (2007), assuming the existence and uniqueness of the equilibrium.
Our paper differs from this line of work in several ways. First, our paper focuses on the effect of competition (the number of competing ads) on click volume and CPC through three latent variables: baseline click volume, decay factor, and value per click, whereas previous studies have not examined the separate effects of competition on these variables. Second, we develop an integrative model of click volume, CPC, and number of entrants. However, the previous two studies have not considered the entry decisions of advertisers and therefore have not modeled the number of entrants for a given keyword. Third, unlike the previous two studies, which looked at paid-search advertising in the first-price auction with a small number of ad positions, we study the GSP auction without capacity constraint, which is the most popular type of paid-search mechanism. Fourth, the characterization of CPC in our paper is built on the Nash equilibrium condition, which is theoretically proved and derived by EOS. This equilibrium condition enables us to more closely examine the realized CPCs of advertisers and estimate the distribution of value per click. We develop a Bayesian estimation algorithm to cope with the econometric
challenges of the CPC model based on this equilibrium condition.
Knowing that the number of entrants not only affects click volume and CPC but also reflects the demand and supply conditions associated with a keyword, we model number of entrants as an aggregate outcome of entry decisions made by potential entrants. Since we model advertisers’ simultaneous entry decisions, our paper is related to the literature of simultaneous-move game. The pioneering work in this line of research includes Bresnahan and Reiss (1990, 1991) and Berry (1992). The first two find that the multiple-equilibrium issue is prevalent in the simultaneous-move game, and they suggest that one way to bypass the multiple equilibrium is to focus on the total number of entrants in a market rather than the vector of individual entry decisions. One important finding in Berry (1992) is that the number of entrants is uniquely determined if the profit function strictly decreases with the number of entrants. Recently, more complicated simultaneousmove entry models have been developed by endogenizing either firms’ product differentiation (Mazzeo 2002, Seim 2006) or spatial differentiation (Zhu and Singh 2009), or both (Datta and Sudhir 2011), accounting for the spillover effect within a market (Vitorino 2012) or across markets (Jia 2008), and incorporating the effect of zoning regulations into market structure (Datta and Sudhir 2013).
Because of the large number of potential entrants in our empirical context, we follow the previous literature to model the entry decisions of advertisers as a simultaneous-move game with incomplete information (Seim 2006, Zhu and Singh 2009, Datta and Sudhir 2011, Vitorino 2012). In other words, the profitability of each advertiser in a keyword is private information, and only its distribution is common knowledge among competitors. We further assume advertisers to be symmetrical for the following reasons. First, because one of our main objectives is to study the impact of number of entrants on three underlying constructs of click volume and CPC, it is reasonable to make this assumption to build an internally consistent model. Second, following Berry’s (1992) idea to assume that advertiser’s expected profit is a decreasing function of number of entrants, we can prove the existence of a unique equilibrium for number of entrants. Finally, because of the heavy computational burden, current methods can only handle a small number of heterogeneous players (e.g., Datta and Sudhir 2011, Vitorino 2012). However, because there are a large number of potential entrants in our empirical context, it is infeasible to adopt the same modeling approach.
3. Data and Proposed Model
Section titled “3. Data and Proposed Model”3.1. Description of the Data
Section titled “3.1. Description of the Data”We obtain data from a leading online market maker outside the United States who hosts paid-search advertising. We regard each keyword as a market, and consequently, an advertiser is named an entrant to a keyword if she decides to advertise her product using this keyword. For this paid-search host, ad positions are auctioned in the second-price fashion and are entirely determined by the rank of bids submitted by entered advertisers. Each advertiser then pays the highest bid among all bids below hers for each click. In other words, the auction mechanism used in our data is exactly the same as the GSP auction defined in EOS.
Our data include aggregate information on 1,573 keywords of digital camera/video products and related accessories in June 2010. There are 359 advertisers that advertised through a subset of these 1,573 keywords. According to the paid-search host, advertisers in that market often review their keyword lists and make purchase decisions monthly. These 1,573 keywords are further classified into three main categories: digital camera, digital video, and accessory. In addition to this primary categorization, each keyword also belongs to one or several of 44 subcategories.1
We create several keyword attributes. First, we define three variables: DV, Accessory, and Coverage, based on the categorical information of a keyword. The variable Coverage measures the number of subcategories a keyword belongs to, which indicates the market breadth of the keyword. Second, we define several keyword attributes based on the productrelated information: Brand (whether the keyword contains a brand name), General (whether the keyword includes a general feature that could apply to different products), and Specific (whether the keyword includes a specific feature such as model/series number that exclusively refers to a product). In addition, we also have the length information for each keyword. The variable Length indicates the total number of characters included in the keyword. Finally, we create a dummy variable, Promotional, if a keyword includes promotional terms. Taking the keyword “Nikon D700 HD Cheap” as an example, a brand name (Nikon), a specific word (D700), a general feature (HD, for “high definition”), and a promotional term (Cheap) are included. Table 1 reports the summary information of keyword characteristics.
For each keyword, the paid-search host informed us of the composition of its competition set (i.e., the
1 Each subcategory can be regarded as a refined classification of the main category. For example, the keyword “Canon camera” belongs to two subcategories: the “ordinary digital camera” and “professional SLR (single-lens reflex) camera.”
| Table 1 Summary Statistics of Keyword Characteristics  | ||||||
|---|---|---|---|---|---|---|
| Variable | Mean | SD | Min | Max | ||
| DV | 0011 | 0031 | 0 | 1 | ||
| Accessory | 0042 | 0049 | 0 | 1 | ||
| Coverage | 2004 | 1091 | 1 | 17 | ||
| Length | 5090 | 2040 | 1 | 18 | ||
| Brand | 0047 | 0050 | 0 | 1 | ||
| General | 0022 | 0042 | 0 | 1 | ||
| Specific | 0053 | 0049 | 0 | 1 | ||
| Promotional | 0008 | 0027 | 0 | 1 | ||
| n | 9010 | 4040 | 5 | 25 | ||
| N_Potential | 119000 | 61000 | 5 | 306 | 
Notes. All keyword characteristics including n are mean centered in our empirical implementation. The variables n and n are scaled by 10 and 100, respectively, in estimation.
set of potential advertisers). The selection is mainly based on two criteria. First, for keywords that link to a specific product or brand, the set of potential entrants includes advertisers that carry this product or brand in their stores associated with this paid-search host. Second, for a keyword that does not link to a specific product or brand, the set of potential entrants includes those who bought other keywords that share similar subcategories (e.g., professional digital camera, single-lens reflex) with it. As shown in Table 1, the number of entrants ranges from 5 to 25, and the number of potential entrants ranges from 5 to 306. On average, each keyword belongs to two subcategories. For each keyword, we have information on the aggregate click volume, average CPC, and average positions for each entered advertiser.2 Table 2 reports the summary statistics of the average click volume and CPC across keywords.
3.2. Model Setup
Section titled “3.2. Model Setup”To fix the context, we have I advertisers and K keywords. Potential entrants/advertisers of each keyword are indexed by i, and keywords are indexed by k. Let Ck stand for the set of potential entrants of keyword k. To be consistent with the industry practice that advertisers choose the set of keywords on a monthly basis and then optimize their bids after entry, we model advertisers’ keyword selections and bid decisions as the following two-stage sequential process.
In the first stage, potential entrants i ∈ Ck decide whether to purchase a specific keyword k. We model the entry of advertisers as a simultaneous-move game with incomplete information, in which advertisers possess private information about their own profitability. Since potential entrants do not observe realized click volume, CPCs, and value per click before entry, they are assumed to form expectations on these
Table 2 Summary Statistics of Click Volume and CPC Across Keywords
| Variable | Mean | SD | Min | Max | 
|---|---|---|---|---|
| Total click volume Average click volume across positions  | 195079 17088  | 464046 33028  | 10 1025  | 8,246 515038  | 
| Average CPC across positions (in cents)  | 14053 | 5036 | 5027 | 53040 | 
variables as well as the entry decisions of others. Based on these expected values, each potential entrant will form expectations on the advertising revenue and entry cost. We further assume that potential entrants are symmetrical, and the entry probability is determined by the expected revenue and the entry cost in a keyword. This microlevel process determines the total number of entrants for a keyword at the aggregate level. The equilibrium number of entrants is modeled as the number of potential entrants multiplying by the entry probability.
In the second stage, after entry decisions have already been made (i.e., the number of entrants n has been realized and becomes common knowledge), entrant advertisers now determine how much to bid. The equilibrium CPCs and ad positions are then determined by realizations of value per click. The value per click of each entrant advertiser is assumed to be private information. In the same setup, EOS proved that there exists a unique Bayes Nash equilibrium. In this equilibrium, ad positions are determined by the descending order of value per click, and the associated CPCs are shown to be a recursive function of value per click. Click volume at each position is then realized as users’ responses to paid-search ads.
We next model the three main equilibrium outcomes of this game backward. We first present the model of click volume conditional on the rank of advertisers. Then we discuss how to model advertisers’ equilibrium CPCs conditional on the number of entrants and their value per click. Finally, we present the model of number of entrants.
3.3. Modeling Click Volume
Section titled “3.3. Modeling Click Volume”Let nk stand for the number of entrants in keyword k, and let Qki stand for the realized click volume for advertiser i displayed in the paid-search results of keyword k at its realized position jki. A smaller jki corresponds to a higher position (i.e., more toward the top). Following Feng et al. (2007), we assume that the expected click volume of an ad decreases exponentially with its position. We model the click volume as
(1)
2 We were informed by the data provider that there is very small fluctuation on these measures within the data period.
where stands for the baseline click volume at the top position for keyword k; is the decay factor, which stands for the ratio of click volume between position j + 1 and position j for keyword k ; and is the noise component distributed as normal with mean zero and variance . Here, is a measurement error between the expected and realized log click volume and is unknown to advertisers.3 As for , we assume that these positionspecific decay factors are common knowledge to entrants because each entrant of a keyword can easily learn by experimenting bids to change positions. Furthermore, because the realized decay factors depend on consumers’ search behavior given the search listings, we assume that is observed by advertisers only after their entry. After taking log on both sides of Equation (1), we obtain
undefined