Scrapy is Python's most powerful web scraping framework. Its middleware architecture makes it the ideal choice for integrating rotating proxies into large-scale crawling jobs. Unlike browser-based tools, Scrapy operates at the HTTP level, which means lower resource usage, higher throughput, and more precise control over request routing.
This guide walks through configuring proxy middleware in Scrapy from scratch. You will learn how to build a custom middleware, rotate IPs per request, handle authentication, and integrate with ResProxy's rotating endpoint for production-grade scraping.

Why Scrapy Needs Proxy Middleware
Scrapy can send thousands of concurrent requests, which is exactly why you need proxies. Without IP rotation, your scraper will trigger rate limits and bans within seconds on most commercial websites. Scrapy's downloader middleware system lets you inject a proxy into every outgoing request transparently, so your spiders do not need any proxy-related code.
By using rotating residential proxies, each request gets routed through a different real IP address, making your scraper look like thousands of different users browsing the site organically.
Prerequisites
You need Python 3.9 or later and Scrapy installed:
`bash
pip install scrapy
`
You also need proxy credentials from your provider. If you are using ResProxy, grab your username, password, and gateway address from the dashboard.
Method 1: Simple Proxy via Meta
The quickest way to add a proxy to a Scrapy request is through the meta parameter:
`python
import scrapy
class SimpleProxySpider(scrapy.Spider): name = "simple_proxy" start_urls = ["https://httpbin.org/ip"]
def start_requests(self): for url in self.start_urls: yield scrapy.Request( url=url, callback=self.parse, meta={ "proxy": "http://username:password@gate.resproxy.io:7777" }, )
def parse(self, response):
self.logger.info(f"Response from IP: {response.text}")
`
This works for simple spiders, but it requires adding the meta parameter to every request. For real projects, a middleware is far more maintainable.

Method 2: Custom Proxy Middleware
A custom downloader middleware automatically injects proxy settings into every request. Create a file called middlewares.py in your Scrapy project:
`python
import base64
import logging
logger = logging.getLogger(__name__)
class ResProxyMiddleware: """Downloader middleware that routes all requests through a rotating proxy."""
def __init__(self, proxy_url, proxy_user, proxy_pass): self.proxy_url = proxy_url self.proxy_auth = "Basic " + base64.b64encode( f"{proxy_user}:{proxy_pass}".encode() ).decode()
@classmethod def from_crawler(cls, crawler): return cls( proxy_url=crawler.settings.get("RESPROXY_URL", "http://gate.resproxy.io:7777"), proxy_user=crawler.settings.get("RESPROXY_USER", ""), proxy_pass=crawler.settings.get("RESPROXY_PASS", ""), )
def process_request(self, request, spider): request.meta["proxy"] = self.proxy_url request.headers["Proxy-Authorization"] = self.proxy_auth logger.debug(f"Proxying {request.url} through {self.proxy_url}")
def process_response(self, request, response, spider): if response.status == 407: logger.error("Proxy authentication failed — check credentials") return response
def process_exception(self, request, exception, spider):
logger.warning(f"Proxy error for {request.url}: {exception}")
return None # Let Scrapy retry the request
`
Now enable the middleware in your settings.py:
`python
# settings.py
DOWNLOADER_MIDDLEWARES = { "myproject.middlewares.ResProxyMiddleware": 350, }
# Proxy configuration RESPROXY_URL = "http://gate.resproxy.io:7777" RESPROXY_USER = "your_username" RESPROXY_PASS = "your_password"
# Recommended Scrapy settings for proxy usage CONCURRENT_REQUESTS = 16 CONCURRENT_REQUESTS_PER_DOMAIN = 8 DOWNLOAD_DELAY = 1 DOWNLOAD_TIMEOUT = 30 RETRY_TIMES = 3 RETRY_HTTP_CODES = [407, 429, 500, 502, 503]
# Respect robots.txt (optional) ROBOTSTXT_OBEY = False
# Rotate user agents
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
`
With this setup, every request from every spider in your project automatically goes through the rotating proxy. No changes needed in your spider code.
Method 3: Rotating Proxy Middleware with Failover
For production workloads, you want a middleware that can handle proxy failures and rotate through multiple endpoints:
`python
import base64
import random
import logging
logger = logging.getLogger(__name__)
class RotatingProxyMiddleware: """Advanced middleware with multiple proxy endpoints and failover."""
def __init__(self, proxy_list): self.proxy_list = proxy_list self.failed_proxies = set()
@classmethod def from_crawler(cls, crawler): proxy_list = crawler.settings.getlist("PROXY_LIST", [ { "url": "http://gate.resproxy.io:7777", "user": "your_username", "pass": "your_password", }, ]) return cls(proxy_list)
def process_request(self, request, spider): available = [ p for p in self.proxy_list if p["url"] not in self.failed_proxies ] if not available: self.failed_proxies.clear() available = self.proxy_list
proxy = random.choice(available) request.meta["proxy"] = proxy["url"] credentials = base64.b64encode( f'{proxy["user"]}:{proxy["pass"]}'.encode() ).decode() request.headers["Proxy-Authorization"] = f"Basic {credentials}"
def process_response(self, request, response, spider): if response.status in (407, 502, 503): proxy_url = request.meta.get("proxy", "") self.failed_proxies.add(proxy_url) logger.warning(f"Proxy {proxy_url} returned {response.status}, marking as failed") return request.replace(dont_filter=True) return response
def process_exception(self, request, exception, spider):
proxy_url = request.meta.get("proxy", "")
self.failed_proxies.add(proxy_url)
logger.warning(f"Proxy {proxy_url} raised exception: {exception}")
return request.replace(dont_filter=True)
`
This middleware tracks failing proxies and avoids them on subsequent requests, then resets the failed list when all proxies have been tried.

Integrating with scrapy-rotating-proxies
If you prefer a battle-tested third-party solution, the scrapy-rotating-proxies package handles rotation, banning, and cooldown logic out of the box:
`bash
pip install scrapy-rotating-proxies
`
`python
# settings.py
DOWNLOADER_MIDDLEWARES = { "rotating_proxies.middlewares.RotatingProxyMiddleware": 610, "rotating_proxies.middlewares.BanDetectionMiddleware": 620, }
ROTATING_PROXY_LIST = [ "http://user:pass@gate.resproxy.io:7777", "http://user:pass@gate.resproxy.io:7778", "http://user:pass@gate.resproxy.io:7779", ]
ROTATING_PROXY_PAGE_RETRY_TIMES = 5
ROTATING_PROXY_BACKOFF_BASE = 300
`
Handling Common Errors
Here are the most common proxy-related errors in Scrapy and how to handle them:
| Error | Cause | Solution |
|---|---|---|
| 407 Proxy Auth Required | Wrong credentials | Check RESPROXY_USER and RESPROXY_PASS |
| 429 Too Many Requests | Rate limited | Increase DOWNLOAD_DELAY |
| 503 Service Unavailable | Proxy overloaded | Add retry logic or use multiple endpoints |
| Timeout | Slow proxy or target | Increase DOWNLOAD_TIMEOUT |
| ConnectionRefused | Proxy down | Use failover middleware |
Optimizing Performance
Scrapy's async architecture can push thousands of requests per minute through rotating proxies. Here are the key settings to tune:
`python
# settings.py — optimized for proxy usage
# Concurrent connections CONCURRENT_REQUESTS = 32 CONCURRENT_REQUESTS_PER_DOMAIN = 16
# Delay between requests to the same domain DOWNLOAD_DELAY = 0.5 RANDOMIZE_DOWNLOAD_DELAY = True # Adds randomness: 0.5x to 1.5x
# Timeout and retries DOWNLOAD_TIMEOUT = 30 RETRY_TIMES = 3
# Auto-throttle (recommended with proxies) AUTOTHROTTLE_ENABLED = True AUTOTHROTTLE_START_DELAY = 1 AUTOTHROTTLE_MAX_DELAY = 10 AUTOTHROTTLE_TARGET_CONCURRENCY = 8.0
# Enable HTTP caching to avoid re-fetching unchanged pages
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 3600
`
The AUTOTHROTTLE extension is particularly useful with proxies because it automatically adjusts request speed based on server response times.
Complete Spider Example
Here is a full spider that scrapes product data using the proxy middleware:
`python
import scrapy
class ProductSpider(scrapy.Spider): name = "products" start_urls = ["https://example.com/products?page=1"]
custom_settings = { "CONCURRENT_REQUESTS": 16, "DOWNLOAD_DELAY": 1, }
def parse(self, response): for product in response.css("div.product-card"): yield { "name": product.css("h2::text").get(), "price": product.css("span.price::text").get(), "url": response.urljoin(product.css("a::attr(href)").get()), }
next_page = response.css("a.next-page::attr(href)").get()
if next_page:
yield response.follow(next_page, self.parse)
`
Notice how the spider has zero proxy-related code. The middleware handles everything transparently.
Best Practices
- Use middleware over meta — Keep proxy logic separate from spider logic
- Enable AutoThrottle — Let Scrapy adapt to target site speed automatically
- Set reasonable concurrency — 16-32 concurrent requests is a good starting point
- Monitor ban rates — If more than 10 percent of requests fail, reduce concurrency or increase delays
- Use residential proxies for protected sites — Rotating residential proxies have the highest success rates
- Cache responses — Avoid re-fetching pages that have not changed
Getting Started
Create your ResProxy account through the getting started guide, grab your credentials, and paste them into your Scrapy settings. For the full Scrapy documentation including middleware architecture details, visit docs.scrapy.org.
You might also want to check our Python web scraping guide for a broader overview of scraping tools and techniques.
Editorial Team
The ResProxy editorial team combines expertise in proxy technology, web scraping, network infrastructure, and online privacy to deliver actionable guides and industry insights.