How to scrape Google Search with Python (Step-by-step tutorial)

Google Search results, also known as SERPs, contain valuable data such as keywords, ad listings, and rankings. Analyzing this data can improve marketing strategies. Python is a practical tool for accessing and extracting this information. By scraping Google Search, you can uncover competitor data, optimize keywords, and enhance campaigns.

This guide explains how Python can be used to scrape SERP data effectively.

Python third-party libraries such as Selenium and undetected_chromedriver to develop a custom scraper
Ready-made scraping solutions such as a scraping API or SERP scraper

Why should you scrape Google Search?

Google Search indexes a huge amount of information. Scraping its data can provide:

Insights into market trends and customer behavior.
SEO optimization for websites and content.
Brand monitoring and media tracking.
Potential leads and competitor analysis.

General data insights and analysis

Google Search results offer a treasure trove of data on global and regional trends, customer behavior, and societal interests.

SERP data helps researchers understand trends, interests, and sentiments, leading to better decisions.

How does scraping help with SEO optimization?

Scraping Google Search results can help you improve your own Search Engine Optimization (SEO) strategies by:

Finding long-tail keywords, related keywords, and keyword variations.
Analyzing your competitors’ rankings and performance.
Tracking your ranking progress.

Extracting data from Google snippets

Google snippets, like featured snippets and the knowledge graph, summarize key information. Scraping snippets lets businesses answer common questions or create datasets for detailed analysis.

Some content sites even rely entirely on scraping People Also Asked questions and featured snippets for their content.

Understanding Google SERPs

The Google Search Engine Result Page (SERP) is the page that shows the result of a search query. Previously, it was a static page, displaying only links to URLs in its index with content that matches the query.

But today, Google Search results are more sophisticated, providing searchers with immediate answers displayed as featured snippets and AI overviews.

Google SERPs have also become highly dynamic, displaying search features based on the searcher's location, history, search intent, and available content, among others.

Some common SERP features include:

Featured snippets

The featured snippet is a box at the top of a search result that provides a concise and immediate answer to a searcher’s query. This is pulled from high-ranking pages for the keyword and usually links to the page it is pulled from. This is a valuable portion of the SERP to feature in because of the potential traffic it can generate for your website.

AI overview

With the rise of generative AI, Google has begun to show an AI overview for keywords, which often replaces featured snippets. The AI overview provides AI-generated content as an answer to the searcher's query. AI overview appears on 47.4% of queries with 59% occurrence for informational queries and 19% for commercial queries.

Paid ads

Paid ads appear at the top and bottom of SERPs and are marked with either the “sponsored” or “ad” labels. Unlike the organic listings which are free, businesses and organizations have to pay to be featured here for their target keywords or niche.

Video carousels

Some SERPs, especially those with informational intent, have a horizontal video carousel that displays a list of videos related to the keyword. These videos are sourced mostly from YouTube, but can also come from other video platforms or social media networks like TikTok.

Local pack

For location-based searches such as “best restaurant in New York”, there is usually a section that lists businesses that match that query in the region. It also appeals to other location-based queries without the name of the location such as “barbershop near me.”

The local pack contains a map of the area with businesses, their contact details, and reviews.

What are the Methods for scraping Google Search results?

There are a few ways to scrape Google Search results. Choose the best scraping method based on your skills and needs. The main approaches include:

Using Google’s Custom Search JSON API.
Building a DIY scraper.
Using a web scraper API.

Google scraping using Custom Search JSON API

The Custom Search JSON API is Google's official tool for extracting search results, which it returns in JSON format, avoiding blocks and CAPTCHAs.

To use the Custom Search JSON API, you need to get an API key and search engine ID key from the official Google website.

Benefits:

Reliable and easy to use.
Bypasses anti-scraping systems.

Downsides:

Free accounts are limited to 100 searches per day.
Paid plans cost $5 for each 1,000 additional searches (up to 10,000 daily).

Sample API request:

https://www.googleapis.com/customsearch/v1?key=YOUR_API_KEY&cx=YOUR_SEARCH_ENGINE_ID&q=money

This is a good option for small-scale projects but may fall short if frequent access is required.

Building a DIY scraping solution

You can scrape Google Search results by developing a scraper from scratch. This is the most challenging option, but it’s also the cheapest and gives you the most control over your scraper—provided you are technically proficient enough to build and maintain it.

You can use any programming language to build a scraper, but Python is the most popular for scraping Google results.

All you need is a workflow to download search result pages, parse the data, and store it.

Tools:

Selenium: Automates web browsing.
Undetected_chromedriver to overcome CAPTCHA.
BeautifulSoup to extract HTML data.
CSV module to store data as CSV.

Challenges:

Frequent changes in Google’s structure can break your scraper.
Risk of IP blocking without proper safeguards.

Scraping Google Search with Web Scraping API

Because of the difficulty of developing your own Google scraping solution, using a scraper API is often the best option.

A scraper API is a specialized Google SERP scraper that returns Google Search results in JSON or HTML.

At SOAX, we’ve developed a SERP scraper that handles CAPTCHA, rotates IPs for you, and avoids all kinds of blocks.

It takes all the burden of scraping Google results off you, making it as simple as sending a web request. It also has geo-targeting support in 195 countries, so you can use our scraper APIs to access localized SERPs across different countries, states, and cities.

Scraping Google with Python

To start, you should create a Python file in your IDE. This way, you can follow up with each of the steps described in this guide.

Step 1: Inspecting results HTML and exploring parsing options

Right-click anywhere on a Google Search result page and click on the Inspect option from the menu. Using the Inspect pointer, find the organic listing container. It is contained in the div element with id as rso.

This div container contains all of the other contents including each of the 10 listings and the People Also Ask section. Each of the organic listings is contained in a div element with classes N54PNb and BToiNc.

In the div that contains each listing, the first anchor element (a) is the URL of the page, the h3 is the title of the listing, and the last span element is the meta description of the page.

Step 2: Set up your development environment

We need the Python SDK, an IDE, and Python libraries for web scraping.

Let's go through how to set them up.

Python

Python is the programming language of choice. Make sure you have at least Python 3.6 installed. If you do not, head over to the official Python download page to install it for your computer and operating system.

Selenium and undetected_chromedriver

We’ll be using Selenium as Google Search now requires JavaScript rendering to be turned on.

To bypass CAPTCHAs, we can use undetected_chromedriver. Run the below command in Command Prompt or Terminal:

pip install selenium undetected-chromedriver

IDE

You should use an IDE such as PyCharm Community Edition for your scraper development. You can use an alternative IDE if you already have one, such as Visual Studio Code.

Step 3: Create a new Python project

Create a new project in your IDE. If there is no Python file created, create one and call it google_search_scraper.py. Paste the following code:

import time
from selenium.webdriver.common.keys import Keys
import undetected_chromedriver as uc
from bs4 import BeautifulSoup

driver = uc.Chrome()
driver.get("https://www.google.com")

search_box = driver.find_element("name", "q")
search_box.send_keys("web scraping python")
search_box.send_keys(Keys.RETURN)

time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'lxml')
listings = soup.select('#rso > div')

for listing in listings:
    container = listing.find('div', class_="N54PNb BToiNc")
    if container:
        url = container.find('a')['href']
        title = container.find('h3').text
        description = container.find_all('span')[-1].text
        print(url)
        print(title)
        print(description)
        print(' ')

If you run the code above, you will get the listings printed on the console. You should expect the following on your console:

In the code above, we imported the necessary libraries and modules from Selenium, undetected_chromedriver, and BeautifulSoup.

The workflow is simple:

Access the Google homepage
Find the element by name (q is the name)
Enter the query
Simulate pressing the return key

After the result page loads, the browser waits for five seconds to make sure all the page content is present. Then BeautifulSoup parses the HTML page content.

For the parsing logic, all the div elements in the div with ID rso are stored in the listing variable. The code iterates through the listings and only acts on a listing that contains a div with classes N54PNb and BToiNc.

For this, we’ll print the following fields on the console:

URL
Title
Meta description

In subsequent sections, you will see how to store the data in a CSV format.

Note: When scraping, it’s important to set delays between requests to avoid overwhelming your target website. This is part of the recommendations for responsible scraping, especially for large-scale scraping.

Scraping Google with Python using a scraper API

Unlike the DIY solution, using a scraper API takes all the burden off you. You don’t need to do any parsing or worry about CAPTCHAs, IP bans, or other blocks. Below is a step-by-step guide on how to use the SOAX SERP API.

Step 1: Obtain the API key

Create an account with SOAX and sign up for a scraper API plan. To use the SERP scraper, you need an API key. Log into your user dashboard and from the menu, click on Scraper API. In the dashboard, you will see your API key. Click the Copy button next to the key.

Step 2: Install requests

You should have Python installed. If not, head over to the official download page to install it. With Python installed, you can install the requests module.

The requests module is an HTTP client for sending web requests. You can use it to send requests to the SOAX SERP API. Install it using the following command:

pip install requests

Step 3: Write a Python script to consume the SOAX SERP API

The API endpoint for SOAX SERP API is:

https://scraping.soax.com/v2/serp/google?q=

where q is the query parameter. You can refine the query with additional parameters, but for now, let’s stick with this basic example. Here’s a Python script to use the API:

import requests

SOAX_API_KEY = "YOUR_API_KEY"
SEARCH_QUERY = "how to scrape google serp"

headers = {"X-SOAX-API-Secret": SOAX_API_KEY}
url = f"https://scraping.soax.com/v2/serp/google?q={SEARCH_QUERY}"

response = requests.get(url, headers=headers)
response.raise_for_status()
print(response.json())

If you run the code above, the JSON response for the Google SERP query will be printed to the console.

Advanced Scraping Techniques

Once you understand the basics of scraping, you can start implementing advanced techniques, such as handling pagination or improving your scraper’s efficiency.

Handling Pagination

Google Search results pages typically contain about ten organic listings per page. The script above only scrapes the first page. In some cases, you’ll need to scrape multiple pages.

The method for handling pagination depends on your approach. If you're using Selenium and undetected_chromedriver, you’ll need to find and click the "Next" page link before passing the page source to BeautifulSoup for parsing.

If you're using the SOAX API, you can use the start parameter in the request URL. For example, to scrape the second page of results for the keyword "money," use:

https://scraping.soax.com/v2/serp/google?q=money&start=10

Extracting Different Data

Google SERPs contain multiple sections, such as ads, snippets, organic listings, "People Also Ask," and related searches. To extract data from a specific section, inspect the HTML to identify the relevant elements.

For example, in the "People Also Ask" section, each question is contained in a div with the class yEVEwb. This is inside a larger div with the class LQCGqc.

Improving Scraping Efficiency

The scrapers we’ve built so far are single-threaded. To increase efficiency, consider using:

Asynchronous requests (e.g., using asyncio and aiohttp)
Multi-threading or multiprocessing
Headless browsing to reduce resource usage

For large-scale scraping, use multithreading or multiprocessing to scrape multiple pages simultaneously. This way, instead of scraping one result page after another, you can scrape many pages at once.

Python provides an excellent method to handle this type of work. This includes multithreading (threading) and multiprocessing. The threading method is great for I/O-bound tasks, while multiprocessing is good for CPU-bound tasks.

Exporting scraped data

Both the DIY scraper and the SERP Scraper API printed the results to the console. However, in most real-world web scraping cases, you’ll need to store the data either in a file or a database. Instead of just printing the data to the console, you can store it in a CSV file. Here's how to modify the google_search_scraper.py script to save the data:

# Create a CSV file and write the header row
import csv
with open('google_search_results.csv', 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['URL', 'Title', 'Description'])  # Header row
    for listing in listings:
        container = listing.find('div', class_="N54PNb BToiNc")
        if container:
            url = container.find('a')['href']
            title = container.find('h3').text
            description = container.find_all('span')[-1].text
            # Write the data to the CSV file
            writer.writerow([url, title, description])
driver.quit()  # Close the browser after scraping
print("Data saved to google_search_results.csv")

Bypass Google Search blocking

Google does not allow scraping except via their limited Custom Search JSON API. Without bypassing blocks, Google will block your IP after a few requests. In some instances, you might be prevented from sending even a single request.

Note: The best way to avoid getting blocked is to use a reputable scraper API from a company that actively develops and maintains it. This way, you can be sure to bypass all blocks by default. If you develop your scraper from scratch, use the techniques highlighted below to bypass Google’s bot detection systems.

Use proxies and rotate IPs

Your scraper, by default, will use your computer’s IP address, which Google will block after a few requests. To stop this, you need to rotate IPs. This can be done using rotating residential proxies in Python. These proxies route your requests via the devices of regular internet users, making them undetectable, and rotate IPs after every request.

Use rotating residential proxies to change your IP address with each request, making it harder for Google to detect and block you.

SOAX offers excellent rotating proxies to help you with this.

Set delay between requests

Don't send too many requests too quickly, even with IP rotation. It’s a good web scraping practice to set a delay between requests to avoid overwhelming your target website.

Set and rotate user-agent string

If you use a client like the requests module or Scrapy, the default user-agent string will make your scraper identifiable as an automated script. You should set the user-agent string to the user-agent of a popular browser.

This way, Google won’t be able to use your user-agent to identify your scraper as bot traffic. For more protection, you can get a bunch of user agents and rotate between them.

Use a CAPTCHA-solving service

When scraping Google, the chances of getting blocked by CAPTCHAs is high as it is protected by reCAPTCHA. You need a CAPTCHA solver to bypass this. There are many CAPTCHA solvers on the market, including 2Captcha, Death By Captcha, and CapSolver.

One of the best things about using the SOAX SERP API is that you don’t need to worry about doing any of the above, as it does it for you by default.

Is scraping Google Search legal?

Scraping publicly available data is generally legal as long as you respect the terms of service. Avoid scraping personal or sensitive information. Operate within regional laws and adhere to ethical practices.

Challenges of scraping Google Search

Scraping Google comes with challenges, especially when building your own custom scraper. These challenges are mainly due to Google's anti-spam measures and the complex structure of their HTML.

CAPTCHA

The first challenge you will deal with when scraping Google Search is CAPTCHA. If you try using Selenium alone, you will get a CAPTCHA on the first try. Google is protected by reCAPTCHA, which is effective at detecting bots.

It uses advanced techniques to monitor user interaction even after solving CAPTCHA, making it even more effective than most alternatives on the market.

IP blocking

Google tracks the IP of devices that interact with it. When it receives too many requests from a single IP address, it bans that IP and blocks further requests.

Google doesn’t publish their actual request limit, but it aligns with how a normal user accesses their site. To exceed the request limit without IP blocking, you need to use rotating proxies.

Obtaining structured data

The Google Search result page is available as an HTML document. The data is structured and stored in nested elements. However, it is not easy to access this data due to dynamic CSS selectors.

Google uses different HTML structures to break scrapers, and in most cases, you can’t predict which version will be shown. This, coupled with the dynamic nature of classes and IDs it uses for elements, makes scraping difficult.

Conclusion

Scraping Google Search can be challenging, but it's a valuable way to gather data. If you're not comfortable building your scraper, consider using a scraping API like SOAX. It handles the complexities of scraping for you, so you can focus on analyzing the data.

How to scrape Google Search with Python (Step-by-step guide)