HTTP redirect destination verification with Python

Posted in: November 2, 2023


Introduction

When working on a web application, there are multiple scenarios where you want to redirect the user to another destination. Maybe the content is no longer available, and you want to redirect to a valid site as a replacement. Or perhaps, for SEO reasons, you want to redirect to a URL request that is guaranteed to yield more relevant results.

By changing the website's router logic, it could be both time-consuming and error-prone to test each URL pattern of your site to ensure that the update redirects to the new expected destination. This is where programming scripts can save the day!

In this article, I will explain how to code a straightforward implementation for verifying HTTP redirects. I will be using Python for simplicity, but any programming language capable of performing HTTP requests and reading from CSV files can also be used with the same logic.

πŸ“₯ Installing dependencies

We will primarily use standard Python libraries, except for termcolor, which contains the cprint function. This function is used for printing logs with color on the console, making it easier to spot redirect failures. While not necessary, it adds a touch of style to your results. If desired, you can replace every instance of cprint in this tutorial with your regular old print and get the same result.

To install termcolor, run the following command:

pip install termcolor

We will also use the requests library, which you can install with:

pip install requests

πŸš€ Creating the Verification Function

Let's begin by creating a function that, given a list of tuples with two strings representing origin and destination URLs, iterates through each entry. It makes an HTTP request using the origin URL and obtains its location property, which is then used to compare it to the expected destination URLs.

To improve autocompletion and maintain cleaner code, I'm going to use type hints (supported in recent Python versions). Feel free to remove them if you're working with older Python versions. As of this article, I am using Python 3.9.

UrlTuple = NewType("UrlTuple", Tuple[str, str])


def verify_destinations(urls: List[UrlTuple]) -> bool:
    return True

Now, with the function and types defined, let's initialize our lists that we will use to store URLs that are correct and incorrect for printing results once the loop is over. πŸ“œ

def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = [] # Initialize empty list
    failed = [] # Initialize empty list

To start the loop and iterate through the URL tuples, you can do the following:

def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = []
    failed = []

    for origin, expected_destination in urls:
        # Now to implement for each
    return True

With this setup, we can proceed to make HTTP requests for each iteration using the requests library included in Python. 🌐

import requests

def transform(url: str) -> str:
    # Implementation of url transformation
    return url

def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = []
    failed = []

    for origin, expected_destination in urls:
        response = requests.get(origin)
        actual_destination = transform(response.url)

You'll notice that the transform function is currently empty and returns the same URL. This is where you can add any normalization or processing specific to your needs. In this case, we are leaving it empty, but you can customize it as required.

Next, let's verify where the HTTP request redirected or if it didn't redirect at all. You can use the following conditional for this purpose:

def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = []
    failed = []

    for origin, expected_destination in urls:
        response = requests.get(origin)
        actual_destination = transform(response.url)

        # If status code indicates redirect and destination is expected 
        if actual_destination == expected_destination:
        # otherwise, the redirect did not happen or is not as expected

Now, you can add the logic to print the verification results with or without color using the cprint function or a regular print call, as per your preference. Feel free to update the output formatting for your convenience. πŸ“„

def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = []
    failed = []

    for origin, expected_destination in urls:
        response = requests.get(origin)
        actual_destination = transform(response.url)

        if actual_destination == expected_destination:
            print(
                f"Origin: {origin}\nExpected Destination: \n{expected_destination}\nActual Destination: \n{actual_destination}"
            )
            cprint("Verification successful!\n", "green")
            correct.append(origin)
        else:
            print(f"Status code: ", response.status_code)
            print(
                f"Origin: {origin}\nExpected Destination: \n{expected_destination}\nActual Destination: \n{actual_destination}"
            )
            cprint("Verification failed!\n", "red")
            failed.append(origin)

As mentioned before, you can replace cprint with a regular print call if you don't need color-coded output.

Now, let's display the list of URLs that failed verification and the percentage of how many failed:

def calculate_percentage(correct: List, incorrect: List) -> float:
    return round((len(correct) / (len(correct) + len(incorrect))) * 100, 2)

def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = []
    failed = []

    for origin, expected_destination in urls:
        response = requests.get(origin)
        actual_destination = transform(response.url)

        if actual_destination == expected_destination:
            print(
                f"Origin: {origin}\nExpected Destination: \n{expected_destination}\nActual Destination: \n{actual_destination}"
            )
            cprint("Verification successful!\n", "green")
            correct.append(origin)
        else:
            print(f"Status code: ", response.status_code)
            print(
                f"Origin: {origin}\nExpected Destination: \n{expected_destination}\nActual Destination: \n{actual_destination}"
            )
            cprint("Verification failed!\n", "red")
            failed.append(origin)

    print("List of failures: \n" + "\n".join(failed))
    print(f"Correct %: {calculate_percentage(correct, failed)}")

    return len(failed) == 0

As mentioned earlier, the calculate_percentage function is self-explanatory and calculates the percentage of correct versus incorrect URLs.

To ensure this function returns a boolean value, we use the length of the failures list to determine success. No failures mean success. βœ…

πŸ“ƒ Building the List of URL Tuples from Input Files

Now that we have a working function for verifying a list of URL tuples, the next question is how to pass this list. While you could hardcode the list and update it when necessary, reading it from an input file is more maintainable. We will use a CSV file as the source to build this list and dynamically set the filename to load from the script's system arguments. To achieve this, we will need to import two new modules:

import sys
import csv

We now want to load up the CSV file that we will use by passing the filepath name from the system arguments or CLI:

if len(sys.argv) < 2:
    print("Usage: python script.py <file_path>")
    sys.exit(1)

file_path = sys.argv[1]

# Retrieve URLs from CSV file and construct complete URLs
url_list = []
with open(file_path, "r") as file:
    reader = csv.reader(file)

Depending on your CSV file structure, you may have a header row that you want to omit. To remove this header row programmatically, you can add the following code after the file loop:

url_list = []
with open(file_path, "r") as file:
    reader = csv.reader(file)

# Remove header tuple
url_list.pop(0)

Now, let's start iterating through the CSV file to build our list of tuples:

url_list = []
with open(file_path, "r") as file:
    reader = csv.reader(file)
    for row in reader:
        path = row[0]
        expected_destination = row[1]
        url_list.append((path, expected_destination))

# Remove header
url_list.pop(0)

This assumes that the CSV file order is "Origin URL" and "Destination URL," but you can customize it to match your file structure. πŸ—‚

With our list of URL tuples built, we can now call the function defined in the previous section with this list as an argument:

if len(sys.argv) < 2:
    print("Usage: python script.py <file_path>")
    sys.exit(1)

file_path = sys.argv[1]

# Retrieve URLs from CSV file and construct complete URLs
url_list = []
with open(file_path, "r") as file:
    reader = csv.reader(file)
    for row in reader:
        path = row[0]
        expected_destination = row[1]
        url_list.append((path, expected_destination))

# Remove header
url_list.pop(0)

verify_destinations(url_list)

πŸ›  Conclusion

The complete code would now look like this:

import csv
import requests
import sys

from typing import List, Tuple, NewType
from termcolor import cprint

UrlTuple = NewType("UrlTuple", Tuple[str, str])


def transform(url: str) -> str:
    # Implementation of url transformation
    return url


def calculatePercentage(correct: List, incorrect: List) -> float:
    return round((len(correct) / (len(correct) + len(incorrect))) * 100, 2)


def verify_destinations(urls: List[UrlTuple]) -> bool:
    correct = []
    failed = []

    for origin, expected_destination in urls:
        response = requests.get(origin)
        actual_destination = transform(response.url)

        if actual_destination == expected_destination:
            print(
                f"Origin: {origin}\nExpected Destination: \n{expected_destination}\nActual Destination: \n{actual_destination}"
            )
            cprint("Verification successful!\n", "green")
            correct.append(origin)
        else:
            print(f"Status code: ", response.status_code)
            print(
                f"Origin: {origin}\nExpected Destination: \n{expected_destination}\nActual Destination: \n{actual_destination}"
            )
            cprint("Verification failed!\n", "red")
            failed.append(origin)

    print("List of failures: \n" + "\n".join(failed))
    print(f"Correct %: {calculatePercentage(correct, failed)}")

    return len(failed) == 0


if len(sys.argv) < 2:
    print("Usage: python script.py <file_path>")
    sys.exit(1)

file_path = sys.argv[1]

# Retrieve URLs from CSV file and construct complete URLs
url_list = []
with open(file_path, "r") as file:
    reader = csv.reader(file)
    for row in reader:
        path = row[0]
        expected_destination = row[1]
        url_list.append((path, expected_destination))

# Remove header
url_list.pop(0)

verify_destinations(url_list)

With all the previous code now together, you can execute it with:

python verify.py redirect.csv

Feel free to explore the full source code in my GitHub repository. You can use it as a reference and make improvements as needed.