Google
|
von Autom Team

How To Scrape Google News using Python

Access to real-time news data opens up many practical use cases. When you can track what is being published as it happens, you can better understand how a brand is being talked about, follow market trends, study competitors, and react faster to important events. 

Google News is one of the largest sources of such information, as it collects articles from thousands of publishers across different regions and industries.

Because of this, scraping Google News data is useful for teams working on brand sentiment monitoring, market research, competitor analysis, and media tracking. 

Instead of manually checking news pages every day, structured data can be collected and analysed at scale.

Python is a popular choice for this task because it is simple to use and has strong libraries for web scraping and data handling. In this article, we will use Python to scrape Google News results and extract key details such as the news title, short description, publication date, and source. 

By the end of this guide, you will have a clear idea of how to collect Google News data in a structured and repeatable way.

Let’s Start Scraping Google News using Python

Before we begin, make sure your Python project is already set up on your system. Open the project in your preferred code editor and import the libraries we will use in this tutorial.

import json
import requests
from bs4 import BeautifulSoup

Next, we will create a function that sends a request to Google News and prepares the page for data extraction.

def getNewsData():
    headers = { "User-Agent": 
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
        "(KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36" }

    response = requests.get(
        "https://www.google.com/search?q=amazon&gl=us&tbm=nws&num=100",
        headers=headers
    )

    soup = BeautifulSoup(response.content, "html.parser")
    news_results = []

Here, we first define a User-Agent header. This helps our request look like it is coming from a real browser instead of a bot. Without this, Google may block or limit the request.

After that, we send an HTTP request to the Google News search URL using the requests library. The returned HTML content is stored in the response variable. We then pass this HTML to BeautifulSoup, which allows us to parse and search through the page easily. An empty list named news_results is created to store the extracted news data.

Now, we need to locate the HTML elements that contain individual news articles. If you inspect the Google News search page, you will notice that each news result is wrapped inside a div element with the class SoaBEf.

We can loop through each of these containers and prepare our data structure as shown below:

We can loop through each of these containers

Let us now search for the tags from the HTML to extract the required data.

If you inspect the HTML file, you will find every result or news article is contained inside this div.SoaBEf tag. So, we will loop every div container with the class SoaBEf to get the required data.

for el in soup.select("div.SoaBEF"):
    news_results.append({ })

print(json.dumps(news_results, indent=2))

getNewsData()

At this stage, we are only identifying each news result container. In the next step, we will locate the specific tags and classes inside these containers to extract details such as the title, description, source, and publication date.

How To Scrape News Title

To extract the news title, we first need to inspect the Google News result in the browser. Right-click on the headline and open the developer tools. This helps us understand where the title is located in the HTML structure.

As shown in the image above, each news title is wrapped inside a div element with the class MBeuO. This is the element we will target to extract the headline text.

Since we are already looping through each news result container (div.SoaBEf), we can now search for the title inside each container.

Add the following line inside the append block to capture the news title:

Line inside the append block to capture

As you can see in the above image, the title is under the div container with the class MBeuO.

Add the following code in the append block to get the news title.

"title": el.select_one("div.MBeuO").get_text(),

Scaping News Source & Title

Scaping News Source & Title

The final pieces of information we need are the news description and the publication date. These can be found by inspecting the HTML structure of each news result.

As shown in the image above, the short news description is located inside a div element with the class GI74Re. The publication time or date appears inside a div element with the class LfVVr.

Since we are already looping through each news result container, we can directly extract both values from the current element.

Code

Here, select_one(".GI74Re") extracts the visible text of the news description, while select_one(".LfVVr") retrieves the publication time shown on Google News, such as “10 hours ago” or a specific date.

With this step, we have successfully extracted all the required fields: title, source, link, description, and date. In the next step, we can print or store this data in JSON format for further use.

Complete Code

So far, we have extracted all the required fields from Google News. If you want to scrape additional information from the HTML, you can extend the same logic by inspecting new elements and updating the code.

For now, below is the complete working script that collects the news title, link, source, description, and date.

import json
import requests
from bs4 import BeautifulSoup


def getNewsData():
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"
    }

    response = requests.get(
        "https://www.google.com/search?q=us+stock+markets&gl=us&tbm=nws&num=100",
        headers=headers
    )

    soup = BeautifulSoup(response.content, "html.parser")
    news_results = []

    for el in soup.select("div.SoaBEF"):
        news_results.append({
            "link": el.find("a")["href"],
            "title": el.select_one("div.MBeuO").get_text(),
            "snippet": el.select_one("GI74Re").get_text(),
            "source": el.select_one("LVrGzf").get_text(),
            "time": el.select_one("NUnG9d span").get_text()
        })

    print(json.dumps(news_results, indent=2))


getNewsData()

Once you run this script in your terminal, you should see the extracted Google News data printed in JSON format.

Saving the Data to a CSV File

Printing the results to the terminal works for testing, but copying data manually every time is not practical. A better approach is to save the scraped data into a CSV file so it can be reused later or processed further.

First, import the csv module:

import csv

Next, replace the print statement with the following code to write the data into a CSV file:

with open("news_data.csv", "w", newline="") as csv_file:
    fieldnames = ["link", "title", "snippet", "date", "source"]
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerows(news_results)

print("Data saved to news_data.csv")

This will create a CSV file with separate columns for the link, title, snippet, date, and source. You can open this file in Excel, Google Sheets, or any data analysis tool.

At this point, we have successfully scraped and stored Google News data using Python.

Scraping Google News Using Autom Google News API (Python Example)

Instead of sending requests to Google’s HTML pages, Autom provides a dedicated Google News API that returns structured data in JSON format. This removes the need to inspect HTML, manage class names, or handle frequent layout changes.

Below is a simple Python example to fetch Google News results using Autom’s API.

import requests
import json


url = "https://api.autom.dev/googlenews"

payload = {
    "query": "us stock market",
    "gl": "us",
    "hl": "en",
    "num": 20
}

headers = {
    "Authorization": "Bearer YOUR_AUTOM_API_KEY",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, json=payload)

data = response.json()

print(json.dumps(data, indent=2))

This request sends a search query to Google News and returns a structured response that already includes details such as the article title, link, source, snippet, and publication time. You can directly store this data in a database, CSV file, or use it in an automated workflow.

You can also control parameters like country (gl), language (hl), and the number of results (num) without changing any scraping logic.

Why Use an API Instead of Scraping Google News with Python Only?

Scraping Google News using Python and HTML parsing works for small tests, but it has clear limitations. Google frequently changes its page structure and class names, which can break your scraper without warning. Even a small layout change can cause missing data or script failures.

Using an API solves these problems.

With an API:

  • You get structured data without parsing HTML

  • There is no dependency on CSS classes or page layout

  • Requests are more stable and predictable

  • It is easier to scale and automate data collection

  • You avoid dealing with blocks, headers, and browser behaviour

For one-off experiments, HTML scraping may be enough. But for recurring tasks, dashboards, monitoring systems, or production use cases, an API-based approach is more reliable and easier to maintain.

Start today with 1000 free requests!! Here is the sign up link.

Additional Resources 

  1. Autom.dev Google Search API is better Alternative to Decodo’s Scraper

  2. Serper vs SerpAPI vs Autom: Which Scraper is Best For Extracting Structured Result

The article link is stored

As shown in the image above, the article link is stored in the href attribute of the anchor (a) tag. The news source name, such as the publisher’s name, is located inside a div element with the class NUnG9d, and the actual source text is present inside a span tag within it.

Since we are already looping through each div.SoaBEf container, we can directly extract both the source and link from the current element.

Add the following lines inside the append block:

Code.

Here, select_one(".NUnG9d span") selects the span that contains the source name and extracts its text. The find("a")["href"] part retrieves the article URL from the anchor tag.

With this step, we now have the news title, source, and link for each article. In the next section, we will extract the publication date and the short description shown in Google News.

SERP API

Discover why Autom is the preferred API provider for developers.