In this post you'll learn how to use proxies along side the Python request module.
Table of contents
- Prerequisites
- Setting Up Proxies with Python Requests
- Handling Proxy Errors and Exceptions in Python
Prerequisites
Before we delve into the nitty-gritty, we need to talk about the prerequisites. If you're already comfortable with Python, you're all set for this ride. Otherwise, you might want to brush up your Python skills first.
To kick things off, we need the Python requests library, an indispensable tool for making HTTP requests. You can grab it using Python's package installer, pip, like this:
pip install requests
With the requests library now in our toolkit, let's forge ahead and configure proxies with Python requests.
Setting Up Proxies with Python Requests
Working with proxies involves creating a dictionary of proxies for different protocols. This dictionary will come into play when we pass the proxy address into the requests.get()
method. Here's an example:
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
And here's the python code and how you can utilize it with the requests.get()
method:
requests.get('http://example.org', proxies=proxies)
Now, let's take a moment to differentiate between HTTP
, HTTPS
, and SOCKS
proxies. HTTP
and HTTPS
are often the go-to choices, while SOCKS
proxies offer more versatility at the cost of complexity.
Unfortunately, the requests module doesn’t play nice with SOCKS
proxies out of the box. To use SOCKS
proxies in Python, we need the requests[socks]
module. You can install it using pip, just like we did with the requests library:
pip install requests[socks]
After installing requests[socks]
, you can configure SOCKS
proxies like this:
proxies = {
'http': 'socks5://user:pass@host:port',
'https': 'socks5://user:pass@host:port'
}
Managing Proxy Authentication
Proxies serve as intermediaries between your local network and the internet. Some proxy servers, in their quest to ensure top-notch security, require users to authenticate themselves with a username and password. This process, known as proxy authentication, is crucial in preventing unauthorized access.
If you're using Python's requests module to make HTTP requests, there might be times when you need to pass your proxy authentication details to http request (username and password). If you're unsure about how to go about it, you're at the right place. Let's explore this in detail.
Imagine you have a proxy with the following details: IP - 192.0.2.0
, port - 8080
, username - myuser
, and password - mypassword
. You can send a GET
request using the get()
method and pass along your proxy details in this manner:
import requests
proxies = {
"http": "http://myuser:mypassword@192.0.2.0:8080",
"https": "https://myuser:mypassword@192.0.2.0:8080"
}
response = requests.get("http://example.com", proxies=proxies)
print(response.text)
The script you just saw sends a GET
request to http://example.com
via your specified proxy, using your provided username and password for authentication.
Leveraging Environment Variables
Hardcoding your own proxy server details into your script might not always be the best approach, especially if you're looking for a more flexible solution. That's where environment variables come into play.
Python allows you to use environment variables to configure proxies for your requests. You can set environment variables and store your proxy configuration details in an .env
file like this:
HTTP_PROXY=http://myuser:mypassword@192.0.2.0:8080
HTTPS_PROXY=https://myuser:mypassword@192.0.2.0:8080
Afterwards, you can access these environment variables using the os module in Python:
import os
import requests
http_proxy = os.getenv("HTTP_PROXY")
https_proxy = os.getenv("HTTPS_PROXY")
proxies = {
"http": http_proxy,
"https": https_proxy
}
response = requests.get("http://example.com", proxies=proxies)
print(response.text)
Using Sessions Alongside Python Requests and Proxies
Ever wondered how to persist certain parameters across multiple requests when interacting with a web application? Enter sessions! In Python, sessions are a powerful tool that can help maintain your cookies, headers, and proxies, saving you the hassle of redefining them with every request.
Think of a session as a 'hangout period' between a user and a web application. In the realm of Python requests, sessions are a godsend. Why? Because they allow you to persist parameters like cookies, headers, and proxies across requests. This comes in especially handy when you're making multiple requests to the same server.
To kick things off, you'll need to create a session object. This is done using the requests.Session()
function. Here's how it's done:
import requests
session = requests.Session()
Once you have a session up and running, it's time to configure your proxies. This is achieved using the .proxies attribute:
session.proxies = {
"http": "http://myuser:mypassword@192.0.2.0:8080",
"https": "https://myuser:mypassword@192.0.2.0:8080"
}
With the above code, all requests made using this session will automatically use the proxies you've specified.
Now that your proxies are set, let's make a GET
request using this session:
response = session.get("http://example.com")
print(response.text)
Using Rotating Proxies in Python Requests
When you're neck-deep in HTTP
requests using Python, you've probably bumped into some roadblocks such as blocked proxy IP addresses, bans, captchas, and pesky rate limits. These can be a real pain, slowing down or even bringing your web scraping or data gathering activities to a complete standstill. That's where rotating proxies come in handy.
Rotating proxies are IP addresses that take turns or 'rotate' after a set period or a certain number of requests. This clever strategy helps to scatter your requests, making them seem more natural and less likely to trip up defenses like IP address bans or rate limits. Using proxy types like residential or mobile proxies help trip up defenses even more.
There's more than one way to skin a cat when it comes to rotating proxies. You can opt for free proxies, which are available on various websites. However, these tend to be unreliable and sluggish. For a more robust and efficient solution, consider using a proxy provider like SOAX. If you do go the free proxy route, you can check their performance using our free proxy checker.
Here's how you can get rotating proxies to play nice with Python requests:
1. Use a List of Proxies
You can throw a list of proxies at the requests.get()
method and use the random.choice()
function to cherry-pick a random proxy for each request. Here's a basic example:
import requests
import random
# List of proxies
proxies = ['proxy1', 'proxy2', 'proxy3']
# Randomly select a proxy
proxy = random.choice(proxies)
# Make a request
response = requests.get('https://example.com',
proxies={"http": proxy, "https": proxy}
)
This method is as easy as pie. However, the quality and speed of your proxies hang on the sources you've picked.
2. Use a Premium Proxy Service
For a smoother ride, use a proxy premium service like SOAX that takes care of proxy rotation for you. Here's how to do it:
import requests
# Using a SOAX proxy with a test username and password
proxies = {
"http": "http://0YFEkZzfrwBX4Wfp:wifi;pl;@proxy.soax.com:9000",
"https": "http://0YFEkZzfrwBX4Wfp:wifi;pl;@proxy.soax.com:9000"
}
# Make a request
response = requests.get('https://example.com', proxies=proxies)
print(response.text)
This method guarantees you're using top-notch proxies that are less likely to get blacklisted.
Handling Proxy Errors and Exceptions in Python
Working with proxies and Python requests, you're bound to come across a few hiccups like ProxyError
or ConnectionError
. But don't fret! You can elegantly handle these exceptions using try-except
blocks. Let's dive into how you can do this.
Here's an example of how you can use a try-except block to handle proxy errors:
import requests
import random
# List of proxies to choose from. Replace these with your own list.
PROXIES = [
{'http': 'http://192.168.0.1:8080', 'https': 'https://192.168.0.1:8080'},
{'http': 'http://192.168.0.2:8080', 'https': 'https://192.168.0.2:8080'},
{'http': 'http://192.168.0.3:8080', 'https': 'https://192.168.0.3:8080'},
# Add more proxies here...
]
# Number of retries
MAX_RETRIES = 5
def fetch_content():
retries = 0
while retries < MAX_RETRIES:
# Pick a random proxy
proxy = random.choice(PROXIES)
try:
# Make the request
response = requests.get("http://www.google.com", proxies=proxy, timeout=5)
# Check if the request was successful
if response.status_code == 200:
print(f"Successfully fetched content using proxy {proxy}")
print(response.text[:100]) # Print the first 100 characters of the response
return
else:
print(f"Received unexpected status code {response.status_code} using proxy {proxy}")
except requests.RequestException as e:
print(f"An error occurred while using proxy {proxy}: {e}")
retries += 1
print(f"Retrying... ({retries}/{MAX_RETRIES})")
print("Max retries reached. Exiting.")
if __name__ == "__main__":
fetch_content()
The fetch_content
function tries to fetch the content of Google's homepage using a randomly chosen proxy. If the request fails for any reason, it catches the exception and retries with another randomly chosen proxy, up to a maximum number of retries (MAX_RETRIES)
By using this approach, your script will continue to run even if a proxy fails. This is critical when using free proxies which can often turn out to be unreliable.
Wrapping Up
The use of Python requests with proxies offers several advantages, particularly for web scraping, extracting data, privacy maintenance, and bypassing geo-restrictions. The code snippets provided in this guide give you a solid foundation to experiment with different proxies.
To broaden your knowledge on this topic, consider exploring the official Python requests documentation, tutorials, and blogs. These resources will provide you with a comprehensive understanding of how to effectively use Python requests and proxies.