Working with Requests and BeautifulSoup

Before starting any project or scripting any solution, we first start with creating a virtual environment.

Now, what is a Virtual Environment?

If you are good with a virtual environment and know what it is, and how to build and activate it, you can skip this and continue further with the blog.

The first step always includes making a virtual environment and working in an active environment. Thus, let’s start with making an environment and activating it. Open the terminal and make an environment named “scrap” and activate it.

Before moving ahead, ask a question to yourself:

What is your favorite food?

Let’s say, it’s a PIZZA. Yes, crunchy, delicious, cheesy, and scrumptious pizza.

Next question, can you cook pizza without knowing its recipe?

Or, can anyone cook it for you without knowing what PIZZA is, how it looks like, and how cheesy, crunchy, and delicious it tastes?

The answer is a BIG NO.!

Here, Pizza is our requests library. Before working on it, let us understand it.

Request, in English, it means asking for something to get it. Similarly, in python, a request library is used to request the content of a web page. Thus, the requests library performs the elementary task of fetching web page content. (Just like buying a PIZZA bread, veggies, cheese, and all your favorite topping for a PIZZA).

Every library in python needs to be installed before use. Let us install these libraries using pip.

To install the libraries, run the following commands in the terminal:

pip3 install requests

pip install beautifulsoup4

Below is the code which shows the working of the

requests library:

import requests

from bs4 import BeautifulSoup

res = requests.get(‘https://www.protonshub.com/')

print(res)

print(“ — — — — — — — — — “)

print(“The status code is “, res.status_code)

O/P:

Code Explanation:

Before using any library, we install and import it in our script. We achieved this through line 1. Secondly, requests.get() function is used to retrieve the status code. Printing “page” will give output like: <Response [200]> .To solely fetch the status code, we use .status_code (as in line 3)

Now, read the text written as the title of the page (The place where the mouse pointer is)

How to fetch the text written there?

This is when beautifulsoup comes into play. Request is used to get the source code of the web page, but the parsing of the page, i.e how to fetch data residing in a particular tag ? Beautifulsoup will help us in this.

Beautifulsoup library in python is used to extract text from markup languages like HTML, XML and others. Let’s try to fetch the title of the page.

You know where to find the tag our desired text resides in?

It is through inspecting. Inspecting an element gives us it’s tag. We know that title resides in <title> tag. Let’s head toward scraping it now.

import requests

from bs4 import BeautifulSoup

res = requests.get(‘https://www.protonshub.com/')

print(res)

print(“ — — — — — — — — — “)

print(“The status code is “, res.status_code)

print(“ — — — — — — — — — “)

soup_data = BeautifulSoup(res.text, ‘html.parser’)

print(soup_data.title)