Published on

Web Scraping with BeautifulSoup and Requests

Authors

Introduction:

Web scraping is a powerful technique used to extract and manipulate data from websites. In this tutorial, we will be learning how to perform web scraping using Python, BeautifulSoup, and Requests libraries.

Requirements:

  1. Python 3.x
  2. Beautiful Soup 4
  3. Requests

Installation:

Before we begin, make sure you have Python 3.x installed on your system. Next, install BeautifulSoup and Requests by running the following command:

pip install beautifulsoup4 requests

Step 1: Import required libraries

First, let's import the necessary libraries:

import requests
from bs4 import BeautifulSoup

Step 2: Make an HTTP request

Now, we will make a GET request to the website we want to scrape using the requests.get() method:

url = "https://example.com"
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    print("Request successful!")
else:
    print("Request failed. Status code:", response.status_code)

Step 3: Parse the HTML content

Once we have the HTML content, we will parse it using BeautifulSoup:

soup = BeautifulSoup(response.content, "html.parser")

Step 4: Extract information

Now that we have parsed the HTML, we can start extracting information from it using BeautifulSoup's methods like find(), find_all(), etc.

For example, let's say we want to extract all the headlines (in <h1> tags) from the website:

headlines = soup.find_all("h1")

for idx, headline in enumerate(headlines, start=1):
    print(f"{idx}. {headline.text}")

Conclusion:

In this tutorial, we learned how to perform web scraping in Python using the BeautifulSoup and Requests libraries. With this knowledge, you can build your own data extraction tools for various use cases like sentiment analysis, market research, or competitive analysis.

Happy scraping!