Applying Large Language Models To Sponsored Search

Abstract

The growing prominence of generative artificial intelligence (AI) is disrupting several wellestablished industries. With the increasing availability of powerful large language models (LLMs), the generation of textual content for marketing, including blog posts and emails, is becoming more accessible. In this research, we examine the potential to apply LLMs to search engine advertising (SEA). We develop and evaluate an “application layer” that sits on top of Open AI’s GPT series to generate ad text tailored to the SEA context, as well as predict the implications for advertising costs. We experimentally test our framework in two empirical settings, demonstrating the superior performance of a human-in-the-loop generative AI approach to producing the advertising text. The improved performance still holds under different budget scenarios, which offers opportunities for advertisers to benefit from AIsupported ad quality gains, particularly when using a limited budget. We also identify boundary conditions that appear to limit the benefits using our generative AI framework, including the presence of highly optimized landing pages. Overall, our research demonstrates the performance gains afforded by developing bespoke business applications of LLMs.

Introduction

Sponsored search engine advertising (SEA) is the digital marketing workhorse. Expenditures on search advertising are expected to exceed $100 billion in 2023 and account for nearly 30% of total media ad spending in the United States (Mitchell 2022). Despite valid concerns about its effectiveness when used in isolation (see, e.g., Blake, Nosko and Tadelis 2015; Simonov, Nosko and Rao 2018; Agarwal, Hosanagar and Smith 2011, 2015), ample research demonstrates that SEA can improve a brand’s online visibility, achieve higher conversion rates and increase sales when used in conjunction with search engine optimization to support organic search listings (e.g., Ghose and Yang 2009; Yang and Ghose 2010; Berman and Katona 2013; Narayanan and Kalyanam 2015; Park and Agarwal 2018).

For a brand to gain visibility in a highly competitive SEA market (see, e.g., Choi et al. 2020 for an overview), it must achieve prominent ad rankings in the major search engines, which are typically subject to the outcomes of a real-time auction (Sayedi et al. 2018). For search engines such as Google, Baidu (Fan et al. 2019), Bing (Deng et al. 2018), and Yahoo, the amount that a brand bids for a set of keywords (which is then mapped to a user’s search query) is among the most important factors that contribute to higher search ad rankings (e.g., Skiera and Abou Nabout 2013). In addition, the advertiser’s historic clickthrough rates (CTRs), user interactions with the targeted landing page (LP), and the content of search ads are important aspects that consistently contribute to higher search ad rankings (e.g., Im et al. 2016; Deng et al. 2018).

While there is substantial research on optimizing SEA bidding strategies (e.g., Skiera and Abou Nabout 2013; Balseiro and Gur 2019; Tunuguntla and Hoban 2021), the literature on how to improve the text of the ad (i.e., the ad copy) itself is scarce. Extant studies have focused on technical features, such as the fit of sponsored advertisements with corresponding LPs, keyword integration, or semantic ad relevancy (e.g., Fan et al. 2019). Others have examined customer perceptions of paid search ads and explore how this is linked to ad performance (e.g., Rutz et al. 2017; Yang et al. 2018).

Given the importance of ad copy in sponsored search rankings, firms and their digital marketing agencies heavily invest in producing ad content. They have also started to leverage automatic keyword generation and content matching (e.g., Fujita et al. 2010), using natural language processing and text summarization techniques (e.g., Kamigaito et al. 2021; Cogalmis and Bulut 2022) to take steps toward automating the production of ad content. With recent advances in powerful large language models (LLMs) in the style of Open AI’s GPT (Generative Pre-trained Transformer) series (Radford et al. 2019), we are on the precipice of a new era of machine-assisted content generation (Davenport and Mittal 2022; Schweidel et al. 2023). Reisenbichler et al. (2022) demonstrated the potential benefits for drafting content for website LPs. Eloundou et al. (2023) indicate the far-reaching impact that LLMs may have on the workforce, while Noy and Zhang (2023) provide experimental evidence of increased productivity and job satisfaction from using ChatGPT for professional writing tasks.

While we expect similar implications for the task of ad copy writing, despite the growing number of offerings supported by LLMs (e.g., ChatGPT, Copy AI, Copysmith, Jasper), there is scant research that systematically assesses how to write effective ad copy, and what the performance implications may be under varying situations such as different budget constraints. We address this gap by introducing a human-in-the-loop, semi-automated framework for generating ad content using an attribute-enriched LLM, which we describe in the next section. The proposed approach is agnostic to the specific LLM employed, making this application modular and capable of being implemented with increasingly powerful LLMs.

A Semi-Automated Workflow of SEA Content Generation and Bidding Cost Prediction

To provide an intuitive understanding of the SEA context, Figure 1 illustrates the search results returned in response to a user’s query for the phrase “IT support” (referred to as the main keyword, ). In this scenario, popular search engines display both sponsored ads (upper part) and organic results (lower part) for the search query. When a user clicks on a specific advertisement, they are directed to the LP on the firm’s website that provides relevant information associated with the ad. These components serve as the essential ingredients that digital marketers commonly utilize when creating SEA content.

Figure 1: Key Components of the SEA Environment

Our proposed framework is designed to mimic the typical workflow used in creating SEA content. We summarize our approach in Figure 2. ¹ The process starts by specifying out of a set of target keywords for which the company intends to bid, and the corresponding LP for the planned SEA campaign (I). This initiates a content crawling process (II), designed to find associated (sub-)keywords and semantic language structures that are used by the top-ranked organic search results for , as well as by the target LP. The aim of this content crawling is to ensure that the generated ad content is aligned with the online searcher’s information needs and consistent with the focal LP. We extract a list of the most frequently occurring (sub-)keywords embedded in the webpages of the top 10 organic search engine results² (10) and in the firm’s LP (). The derived lists of relevant keywords

¹ We provide here a high-level overview of the entire process, with a specific details appearing in Web Appendix A1 and A2.

² By default, in our framework we rely on the 10 best fitting pages because many search engines typically display 10 search results on the first search result page.

are then represented as a bag-of-words (BoW; Zhang, Jin and Zhou 2010) that are converted into attributes to guide SEA content generation (III). In addition, we inform the LLM with context-specific language (IV) and ascertain that the generated ad content reflects advertiserspecific sponsored ad standards, such as style, text length limitations and typical abbreviations.

Figure 2: A Framework for SEA Content Generation and Bidding Costs Prediction

SEA Content Generation

To produce content that is guided toward a specific application, we adopt the “plug and

play language models” (PPLM; Dathathri et al. 2020) approach (V). PPLM was originally built for the GPT model series and alters the LLM’s output predictions of the next word³ by increasing the probability of generating words that appear on a list that is provided by the user. This makes it particularly suited for use by advertisers that want to emphasize specific attributes of their products or brands. In our application, we fine-tune the LLM with ad-specific language structures and semantics, as well as infuse the generated content with keywords from the webpages of the top-ranked search results and the target LP.

We prompt the LLM with the focal keyword and use the PPLM approach to guide the conditional content output generated by the LLM to increase the likelihood for sampling keywords that appear on the target landing page or in the top-ranking organic results.

³ To make our explanations more tangible, we explain the LLM in terms of generating words, while in reality, it consists of token (e.g., whole words, pieces of words, etc.) based generation.

Specifically, we increase the likelihood of the LLM drawing words from 10 ⋃ . To provide a concrete example, we illustrate the SEA content generation process as it would apply to a university’s graduate business programs in Figure 3. Applying our proposed approach to generate ad content for the keyword “study economics,” PPLM would increase the prevalence with which words such as “economics,” “university” and “business” — words that are prevalent in the top organic results and on the LP — appear in the generated content to increase the generated ad content’s relevancy and topical fit. 4

Figure 3: Incorporating Attributes for Directed Content Generation

Ad Copy Quality Scoring

As the output of LLMs arises from a stochastic process, the generated content will vary in its appropriateness for SEA. For this reason, off-the-shelf implementations of models such as ChatGPT and LLaMA (Large Language Model Meta AI) may not be suitable for generating SEA content without additional guidance. To this end, we derive a quality score for each piece of generated content to assess the anticipated SEA performance of the ad copy (VI). The composition and specification of is based on research on technical features of high performing ad content (e.g., Schlangenotto and Kundisch 2016; Rutz et al. 2017; Yang et al. 2018; Yang et al. 2020) and research in the evaluation of SEO content (Reisenbichler et al. 2022).

Our quality score consists of the following components:

⁴ We provide additional modeling details in Web Appendix A2.

$QS_{ad} = \frac{1}{5} \begin{pmatrix} l_{t10} sim_{-}t10_{ad} + l_{LP} sim_{-}LP_{ad} \\ +l_{t10} KW_{-}t10_{ad} + l_{LP} KW_{-}LP_{ad} + KW_{-}main_{ad} \end{pmatrix}$ (1)

$sim\_t10_{ad}$ measures the semantic fit between the generated ad copy’s content and the top 10 webpages, computed using the average cosine similarity between the vectors of word frequency distributions (after stop words removal) of the generated ad and each of the top 10 webpages. Similarly, the semantic fit between the generated ad content and the target LP is measured by the cosine similarity, $sim\_LP_{ad}$ , between their word frequency vectors. To measure the degree of keyword integration in the generated ad copy, we calculate the proportions $KW\_t10_{ad}$ , $KW\_LP_{ad}$ and $KW\_main_{ad}$ of (sub-)keywords contained in the generated ad copy from the total number of respective words in the corresponding keyword lists $KW_{top10}$ , $KW_{LP}$ , and $KW_{main}$ .

Finally, equation (1) contains two “attention focus” parameters, $l_{t10}$ and $l_{LP}$ , that impose weights on the semantic fit and keyword integration components. Both parameters are logistic transformations of cosine similarity measures for the internal consistency of content reflected in the top 10 organically ranked webpages ( $l_{t10}$ ) or the consistency between the target LP and the top 10 webpages ( $l_{LP}$ ), respectively. We let $\overline{\cos}$ ( $t_i$ ) denote the average cosine similarity between the $i^{th}$ top 10 page and all other top 10 webpages and define $l_{t10}$ as:

l_{t10} = \frac{\exp\!\left(\sum_{i=1}^{10} \overline{\cos(t_i)}\right)} {1 + \exp\!\left(\sum_{i=1}^{10} \overline{\cos(t_i)}\right)} \tag{2}

Thus, higher content similarity among the top 10 ranked pages results in a higher $l_{t10}$ score. We calculate $l_{LP}$ in a similar fashion based on the sum of the cosine similarities between the company’s LP and each of the organically ranked top 10 websites for $KW_{main}$ .

The purpose of $l_{t10}$ and $l_{LP}$ is to make our assessment of generated ad copies sensitive to inconsistent content in both the top ranked websites and the landing page. For example, for some search queries (e.g., for the keyword “apple”), search engines might display inconsistent organic results (e.g., some pages on the fruit apple, some about Apple computers). In such cases,

our model accounts for such inconsistencies by shifting the quality score’s focus more towards the brand’s LP components by obtaining a higher importance score for relative to the 10 score.

We extensively test and confirm the validity of the quality score and its components in three separate studies. First, we illustrate that higher are associated with improved SEA performance on a set of 145,939 scraped ads for approximately 2,700 keywords. Second, we find that higher are also associated with increased bidding costs (CPC) by working with companies’ internal SEA data from 987 ads, 937,914 impressions, and 57,489 clicks. Third, we show that we can predict the ads’ future positions based on our components and keyword competition with an average error of less than one ranking position; precisely, with a Root Mean Squared Error (RMSE) of ~.963 and an explained variance (R-squared) of ~.699 (see Web Appendix A1 for details).

Using the ad quality score, we offer a brief comparison of our PPLM-based model and recent LLMs such as GPT-3 and ChatGPT. Table 1 reports the quality scores resulting from the use of different language models in the IT and SaaS and education sectors in which we conduct our subsequent practical experiments. In addition to reporting the quality score associated with our proposed model (“PPLM revised”), we also report the performance of the base GPT-2 model and a GPT-2 model that has been fine-tuned with ad-specific language. In both empirical contexts, the PPLM model outperforms the GPT-2 variants. We also observe that our proposed PPLM model outperforms the base GPT-3 model and ChatGPT in terms of the quality score. Thus, despite the capabilities of recent LLMs, our analysis reveals that the use of larger language models per se (e.g., GPT-3 and ChatGPT versus our proposed PPLM approach that makes use GPT-2) does not necessarily guarantee better performance for a specific task like the generation of text that is expected to perform best for SEA, as reflected by the quality score . 5

Table 1: Quality Score Performance of Proposed PPLM Model and other LLMs

	median (IQR)1 QSad
Model	IT & SaaS	Education
PPLM revised	.44 (.06)	.43 (.06)
GPT-2 fine-tuned	.41 (.07)	.39 (.05)
GPT-2 base	.38 (.05)	.34 (.08)
GPT-3 base	.43 (.08)	.37 (.06)
ChatGPT base	.33 (.07)	.33 (.09)
~χ2 2	300	733
2 df	4	4
2 p	.000**	.000**

¹ Medians and interquartile ranges (IQR) for all keywords used in our experiments; best values are printed in bold and marked in grey; ² Kruskal-Wallis QSad difference group comparison test; statistical significance levels: *p ≤ .10, **p ≤ .05;

Predicting Bid Amounts

Higher content quality (as measured by ) is typically associated with higher ad rankings and better performance (in terms of visibility and number of clicks). However, previous evidence is mixed as to the implications on cost and ad profitability. While search engine providers contend that higher quality will reduce the CPC (e.g., Google 2020), some have found that improved content results in higher CPC (e.g., Abou Nabout and Skiera 2012) or that ads ranked in middle positions are more profitable than top ranked ads (e.g., Ghose and Yang, 2009). Using previous campaign data from our research partners (i.e., used ad content, bidding and ad performance), we fit a predictive CPC model (VII) to provide decision support

⁵ We provide additional details on this performance comparison test in Web Appendix A2.3.

for final ad content selection and human revision (VIII).⁶ Once historic data is available, the final step of our semi-automated procedure offers SEA campaign managers the opportunity to select content for a target range of bidding costs by predicting the expected CPC for each generated piece of ad content.

Table 2 illustrates two of the highest scoring pieces of output from our workflow for the keyword “study economics”. Along with the respective quality score (), expected CPC according to our predictive model, scraped top 10 keywords (10), and LP keywords (,). The raw content is then provided to the SEA manager for final selection and revision. Note that despite a higher overall quality score () the predicted CPC for item #1 is slightly lower than for the second highest item.

⁶ Web Appendix A1.4 demonstrates how historic campaign performance can be predicted with an average error of ~0.35 € in CPC (RMSE ~.346, R-squared ~.849).

Table 2: Two Examples of Generated SEA Content Output

#	QSad	CPC prediction	KWtop10	KWLP	PPLM generated SEA content
1	.395	0.729	economics; study; add; favourites; item; university; business; course; schools; partners	offered; half; term; 2; program; semester; international; sem; business; economics	Study economics in your university in economics - Business economics program Study economics in your university in business economics. The program will offer you the opportunity to gain broad business knowledge in economics and business management. International environment. Bachelor, Master, MBA: 6 specializations. Apply now! International environment.
2	.387	1.069	economics; study; add; favourites; item; university; business; course; schools; partners	offered; half; term; 2; program; semester; international; sem; business; economics	Study economics at university of economics and business - Economics degree programs Study economics at university of economics and business, in Vienna Business School. Get in touch us. Choose among our programs: Economics, Management, Marketing, Political Science. International program. International internship. International Study. Specialization.

^---- = Focal keyword (), top10 keywords (10), and LP keywords () integrated in the generated ad content using PPLM; ||| = Our models’ trained separator between the ads’ headline and regular ad textual descriptions;

Empirical Performance from Field Tests

We demonstrate the performance of our PPLM-assisted SEA content generation in two diverse empirical settings. We conduct a series of sponsored ad campaigns in collaboration with a mid-sized local IT and SaaS provider in the business-to-business sector, and an internationally renowned business school promoting its study programs through sponsored search ads.

Improving SEA Performance with Generative AI

We first restrict our assessment to the effects of SEA optimized content only, omitting step VII (the bidding cost prediction) in Figure 2. In collaboration with our partners, we generated SEA content for 208 keywords across all settings and experimental groups, selected the top scoring piece per keyword and let human SEA experts make minor edits (“Best QS PPLM” condition). In a departure from standard practice, we exploit the capability of LLMs to efficiently produce content by generating unique ad content for each individual keyword. This stands in contrast to the business convention of using a limited number of ad variants to address a broader set of keywords. As a second condition (“Human – Keyword Specific”), a group of extensively trained study participants with access to top performing ads received instructions to produce ad content for the same set of keywords. The third condition (“Human – Conventional”) consisted of a set of 28 pieces of sponsored search ads provided by the SEA professionals from our corporate partners, designed to address the full set of ad keywords. 7

The ads included in our experiments were placed online for approximately two months to compete in SEA bidding at a total cost of €3,452. During this period, they generated 151,089 impressions and 9,201 clicks. Consistent with prior research, the application of generative AI to SEA content creation improves the efficiency with which content is created by more than 60%, resulting in a reduction of the time needed to create content (time savings of 18.56 hours to produce 208 ads). This increase in productivity could also translate to increased output, with an average SEA writer potentially producing 19,521 more pieces of ad content per year in the same amount of time. 8

We present our results for two weeks of the education sector ad campaigns and the first month of the IT service and SaaS ad campaigns where a high budget constraint was in place in Table 3. Our approach significantly improves ad content (with an average of around .43 compared to scores of ≤ .30 achieved by humans) and yields superior performance in terms of the number of impressions and clicks in both industries (see Table 3). Comparing the content from the two human groups, we find that crafting content specific to the target keyword

⁷ We provide additional details on the experimental setup and the keywords curated by our industry partners in Web Appendix A3.1 & A3.2.

⁸ For details on efficiency calculations see Appendix A3.3.

performs better than the standard practice of creating a limited number of ads that broadly target the entire keyword sets for the sake of efficiency. Moreover, the CPC of both the ads from the human and semi-automated procedure that are targeted to a specific keyword are lower than the CPC of ads generated under standard practice.

Table 3: SEA Performance Across Empirical Settings and Experimental Groups 9

Empirical	Experimental Group	Ad Campaign Performance
Setting		Impr.	Clicks	CPC	Total cost, €	QSad1
Education (B2C)	Best QS PPLM	21,252	2,002	.50	952	.43
	Human – Keyword Specific	17,374	1,802	.49	845	.30
	Human – Conventional	10,147	1,418	.56	694	.20
IT & SaaS	Best QS PPLM	34,590	1,507	.34	463	.43
(B2B)	Human – Keyword Specific	12,031	530	.31	137	.29
	Human – Conventional	1,255	7	.38	3	.30

¹ Mean quality score () values (of human revised PPLM output or human generated ads) across all ads and keywords used.

To elaborate more on the trade-off between ad quality improvement and cost implications, we use the full framework proposed in Figure 2 in our next experiment. Working with historical ad content and performance data for a subset of keywords from our IT and SaaS partner company, we derived CPC predictions for the machine generated ad content. Rather than restricting our focus to , we test the performance of generated ads with the lowest predicted CPC (condition “Lowest CPC PPLM”). As shown in Table 4, the semi-automated

⁹ To better assess the performance gains of using our approach, we conducted an additional study as part of our field experiment in the IT and SaaS setting. Comparing the performance of topically irrelevant keywords for content generation against the highly optimized ad content for a subset of keywords used in the main experiment, impressions increase by a factor ranging from 4.5 to 6.2 and clicks increase by a factor ranging from 2.9 to 3.5,

approach for content generation not only outperforms humans on the basis of impressions and clicks, but also reduces the CPC of running ads by 38% (

Table 4: SEA Performance in Case of Low Cost-Per-Click (CPC) Targeting

Empirical Setting	Experimental Group	Ad Campaign Performance
		Impr.	Clicks	CPC	Total cost, €	$QS_{ad}^{1}$
IT & SaaS	Lowest CPC PPLM	24,205	982	.24	241	.37
	Human – Keyword Specific	6,476	282	.39	72	.29

^<sup>1 Mean quality score $(QS_{ad})$ values of human revised PPLM output or human generated ads, across all ads and keywords used.

Overall, our findings suggest that by combining a generative LLM with quality scoring and a predictive CPC model, digital marketers can achieve both superior performance and lower marketing costs.

The Role of Landing Page Content

Having demonstrated the potential for our semi-automated approach to generate content that outperforms human-generated content, we next consider factors that may moderate the impact of using generative AI in our application. One important factor is the quality of the LP, as the alignment between the LP and the target keyword is instrumental to the performance of sponsored ads (e.g., Ghose and Yang 2009; Amaldoss et al. 2015). To examine the differential effect of a well-optimized LP against improved ad content generation in boosting SEA performance, we conducted another experiment with our IT and SaaS partner company. In this study, we manipulated the target LPs for a subset of 18 keywords from the “IT Support” list by contrasting the original LPs (condition “Human LP”) with a corresponding set of highly content-optimized, but otherwise identical LPs (condition “Machine LP”) that were created using a generative AI-supported SEO tool (e.g., Reisenbichler et al. 2022) to obtain high organic search engine rankings. In both the machine- and human-generated ad content conditions, each piece of ad copy is produced for an individual keyword, resulting in a total of 72 pieces of

sponsored ad content (4 conditions × 18 keywords). ¹⁰ We present the results of these treatments in Figure 4.

Figure 4: Landing Page versus Ad Content Optimality and SEA Performance

Consistent with our previous findings, the machine-generated ad content (“Best QS PPLM”) yields more impressions and clicks compared to human-generated ads. At the same time, the machine-generated ads result in a higher CPC compared to human-generated ads, particularly when the ads are paired with a machine-generated landing page. Based on our analysis, machine-made sponsored ads work best (13,114 impressions and 615 clicks) with human-made landing pages, which are less likely to appear in top positions of organic search rankings.

While we replicate prior studies on the interplay between LP quality and SEA performance (e.g., Ghose and Yang 2009; Yang and Ghose 2010), our findings also point to a possible saturation effect. If both organic and sponsored ad links for the same brand become visible to users (i.e., through content optimization), they may tend to substitute clicks on ads for clicks on the organic listings (e.g., Blake et al. 2015; Agarwal et al. 2015). We replicate this pattern of results in a more comprehensive study that involved 145,939 scraped ads for a set of

¹⁰ For further details on the experimental setup, see Web Appendix A3.2 and A3.5.

approximately 2,750 industry-relevant keywords. We find that the experimental effects presented above are consistent and stable across the entire SEA industry.¹¹

The Impact of Ad Budget Restrictions

We next extend our campaign evaluation to consider the effect of the advertising budget. Specifically, we study how ad content performance and CPC in our experimental groups vary under different levels of an advertiser’s maximum daily budget for paid search. In our experimental setups, the IT and SaaS sector SEA campaigns were run with a high budget (with up to €90 per day) for one month, followed by another month with a substantially lower daily SEA budget (€5 per day).

Figure 5 compares the performance of the high ad budget campaign weeks (as already presented in Table 3) with those from the subsequent low budget campaign weeks. We see that a lower budget constraint reduces both CPC and ad performance across our experimental groups. We also observe that the conventional practice of using a limited number of ads for all keywords (the “Human – Conventional” condition) yields the lowest performance in terms of both impressions and clicks. Further, the ads generated in the “best QS PPLM” condition consistently outperform the human generated ads in both high and low ad budget weeks based on clicks and impressions.

An interesting pattern emerges when we compare the machine-generated ads in the low budget condition with the human generated ads in the high budget condition. The use of machine-generated ads in the low budget condition provides superior performance over humangenerated ads that are conventionally generated, and offer performance close to the ads generated by humans in the keyword-specific condition. This suggests that the use of LLMs to produce ad content may be particularly beneficial for smaller organizations with a limited digital marketing budget.

¹¹ Web Appendix A3.5 and A3.6 for more details.

Figure 5: SEA Performance Under Varying Budgetary Restrictions

Discussion

SEA is a critical tool for driving online performance. To the best of our knowledge, we are the first to provide an approach to using LLMs to generate optimized SEA content and systematically evaluate the performance of this content compared to human-generated content. We investigate the interplay of human- vs. machine-generated SEA content with the quality of the landing page and advertising budget to offer insights as to the conditions under which AIgenerated SEA content will be most beneficial to increase impressions and clicks. We empirically demonstrate the use of our semi-automated LLM-based content generation procedure through field experiments conducted with two different organizations.

We find that machine written ad content can contribute to improved SEA performance and increase the efficiency of creating content. In our procedure, we combine LLMs with a context-specific “application layer” that considers the main/focal keyword, the top ranking organic search results for the keyword, and the focal organization’s landing page to evaluate the quality of the generated ad content. We show that each of these components is a relevant information source for optimizing ad content.

As our field experiments demonstrate, the creation of keyword-specific ads by humans outperforms the conventional approach to producing SEA content, and our semi-automated approach outperforms both of these methods. This suggests two means by which the application of LLMs to SEA can benefit organizations. First, as search ad content tailored to individual keywords outperforms ads designed for a broader set of keywords, our semi-automated approach offers a scalable and highly effective means of developing keyword specific ads. Second, using LLMs to tailor ad content to the landing page and top performing organic search results allows for the creation of search ad content that outperforms human-generated content.

While our results show the potential for LLMs to support SEA, despite the growing popularity of LLMs, SEA managers should exercise caution in their use. Our field experiments reveal limits as to the benefits of using AI-generated content, particularly when both the ad content and the landing page have both been optimized. Investing in optimizing the content of both the landing page and search ads may result in traffic being split between the search results and organic results, consequently resulting in suboptimal SEA performance.

Finally, while a SEA manager might be capable of a coarse control of the CPC that automatically diminishes performance, our approach offers a mechanism by which advertising costs can be kept in check while maintaining a relatively high level of quality in the advertising content. Consistent with prior work, we find that content optimization increases both performance and CPC. As such, it is important to accommodate CPC predictions in content production decisions. A potential benefit for our procedure is the ability to achieve performance on par with human-created content but with a much lower budget, which may enable small- and medium-sized businesses to be more competitive on limited budgets.

Our insights are of high value for practitioners and future research regarding ad content design, future ad generation systems, and the effects and content optimization factors on a company level. By design, our semi-automated approach can be used with any LLM that provides adequate access, enabling it to be used with the latest generative models. We hope our analysis serves to demonstrate the potential to develop applications that sit on top of foundational models, allowing business researchers to infuse context-specific knowledge into the development of automated systems.

References

Abou Nabout NA, Skiera B (2012) Return on quality improvements in search engine marketing. Journal of Interactive Marketing. 26(3):141–154.
Agarwal A, Hosanagar K, Smith MD (2011) Location, location, location: An analysis of profitability of position in online advertising markets. Journal of Marketing Research. 48(6):1057–1073.
Agarwal A, Hosanagar K, Smith MD (2015) Do organic results help or hurt sponsored search performance? Information Systems Research. 26(4):695–713.
Amaldoss W, Preyas SD, Woochoel S (2015). Keyword search advertising and first-page bid estimates: A strategic analysis. Management Science. 61(3):507–519.
Balseiro SR., Gur Y (2019) Learning in repeated auctions with budgets: regret minimization and equilibrium. Management Science. 65(9):3952–3968.
Berman R, Katona Z (2013) The role of search engine optimization in search marketing. Marketing Science. 32(4):644–651.
Blake T, Nosko C, Tadelis S (2015) Consumer heterogeneity and paid search effectiveness: a large‐scale field experiment. Econometrica. 83(1):155–174.
Choi H, Mela CF, Balseiro SR, Leary A (2020) Online display advertising markets: a literature review and future directions. Information Systems Research. 31(2):556–575.
Cogalmis KN, Bulut A (2022) Generating ad creatives using deep learning for search advertising. Turkish Journal of Electrical Engineering and Computer Science. 30(5):1881–1896.
Dathathri S, Madotto A, Lan J, Hung J, Frank E, Molino P, Yosinski J, Liu R (2020) Plug and play language models: a simple approach to controlled text generation. arXiv. https://arxiv.org/abs/1912.02164
[20] Davenport TH, Mittal N (2022) How generative AI is changing creative work. Harvard Business Review. https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work
Deng W, Ling X, Qi Y, Tan T, Manavoglu E, Zhang Q (2018) Ad click prediction in sequence with Long Short-Term Memory Networks: an externality-aware model. SIGIR’18, 1065–1068.
Eloundou T, Manning S, Mishkin P, Rock D (2023) GPTs are GPTs: an early look at the labor market impact potential of large language models. arXiv. https://arxiv.org/pdf/2303.10130.pdf.
Mitchell E (2022) A tried and true lower-funnel tactic thrives amid uncertainty. eMarketer. https://www.insiderintelligence.com/content/us-search-ad-spending-2022
Fan M, Guo J, Zhu S, Miao S, Sun M, Li P (2019) MOBIUS: Towards the next generation of query-ad matching in Baidu’s sponsored search. KDD’19, 2509–2517.
Fujita A, Ikushima K, Sato S, Kamite R, Ishiyama K, Tamachi O (2010) Automatic generation of listing ads by reusing promotional texts. Proceedings of the 12th International Conference on Electronic Commerce, 191–200.
Ghose A, Yang S (2009) An empirical analysis of search engine advertising: sponsored search in electronic markets. Management Science. 55(10):1605–1622.
Google (2020). About ad position and ad rank. Accessed October 29 2020, https://support.google.com/google-ads/answer/1722122
Im I, Jun J, Oh W, Jeong SO (2016) Deal-seeking versus brand-seeking: search behaviors and purchase propensities in sponsored search platforms. MIS Quarterly. 40(1):187–204.
Kamigaito H, Zhang P, Takamura H, Okumura M (2021). An empirical study of generating texts for search engine advertising. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 255–262.
Park CH, Agarwal MK (2018) The order effect of advertisers on consumer search behavior in sponsored search markets. Journal of Business Research. 84:24–33.
Narayanan S, Kalyanam K (2015) Position effects in search advertising and their moderators: a regression discontinuity approach. Marketing Science. 34(3):388–407.
Noy S, Zhang W (2023) Experimental evidence on the productivity effects of generative artificial intelligence. SSRN. http://dx.doi.org/10.2139/ssrn.4375283
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. Accessed May 26 2023, https://d4mucfpksywv.cloudfront.net/better-languagemodels/language\_models\_are\_unsupervised\_multitask\_learners.pdf
Reisenbichler M, Reutterer T, Schweidel DA, Dan D (2022). Frontiers: Supporting Content Marketing with Natural Language Generation. Marketing Science. 41(3):441–452.
Rutz OJ, Sonnier GP, Trusov M (2017) A new method to aid copy testing of paid search text advertisements. Journal of Marketing Research. 54:885–900.
Sayedi A, Jerath K, Baghaie M (2018) Exclusive placement in online advertising. Marketing Science. 37(6):970–986.
Schlangenotto D, Kundisch D (2016) Read this paper! A field experiment on the role of a call-to-action in paid search. Research Papers. 63:1–15.
Schweidel DA, Reisenbichler M, Reutterer T, Zhang K (2023) Leveraging AI for content generation: a customer equity perspective. In: Sudhir, K., and Toubia, O. (Ed.): Artificial Intelligence in Marketing. Review of Marketing Research. Emerald Publishing Limited, Bingley, 20:125–145.
Simonov A, Nosko C, Rao JM (2018) Competition and crowd-out for brand keywords in sponsored search. Marketing Science. 37(2):200–215.
Skiera B, Abou Nabout NA (2013) Practice Prize Paper—PROSAD: a bidding decision support system for profit optimizing search engine advertising. Marketing Science. 32(2):213–220.
Tunuguntla S, Hoban PR (2021) A near-optimal bidding strategy for real-time display advertising auctions. Journal of Marketing Research. 58(1):1–21.
Yang S, Ghose A (2010) Analyzing the relationship between organic and sponsored search advertising: positive, negative, or zero interdependence? Marketing Science. 29(4):602– 623.
Yang S, Li D, Tao Z, Li X (2018) Search engine advertising for organic food: the effectiveness of information concreteness on advertising performance. Journal of Consumer Behavior. 17(1):47–56.
Yang Z, Wu Y, Lu C, Tu Y (2020) Effects of paid search advertising on product sales: a Chinese semantic perspective. Journal of Marketing Management. 36(15-16):1481–1504.