Over 60 million real residential IPs from genuine users across 190+ countries.
Over 60 million real residential IPs from genuine users across 190+ countries.
Your First Plan is on Us!
Get 100% of your first residential proxy purchase back as wallet balance, up to $900.
PROXY SOLUTIONS
Over 60 million real residential IPs from genuine users across 190+ countries.
Reliable mobile data extraction, powered by real 4G/5G mobile IPs.
Guaranteed bandwidth — for reliable, large-scale data transfer.
For time-sensitive tasks, utilize residential IPs with unlimited bandwidth.
Fast and cost-efficient IPs optimized for large-scale scraping.
A powerful web data infrastructure built to power AI models, applications, and agents.
High-speed, low-latency proxies for uninterrupted video data scraping.
Extract video and metadata at scale, seamlessly integrate with cloud platforms and OSS.
6B original videos from 700M unique channels - built for LLM and multimodal model training.
Get accurate and in real-time results sourced from Google, Bing, and more.
Execute scripts in stealth browsers with full rendering and automation
No blocks, no CAPTCHAs—unlock websites seamlessly at scale.
Get instant access to ready-to-use datasets from popular domains.
PROXY PRICING
Full details on all features, parameters, and integrations, with code samples in every major language.
LEARNING HUB
ALL LOCATIONS Proxy Locations
TOOLS
RESELLER
Get up to 50%
Contact sales:partner@thordata.com
Proxies $/GB
Over 60 million real residential IPs from genuine users across 190+ countries.
Reliable mobile data extraction, powered by real 4G/5G mobile IPs.
For time-sensitive tasks, utilize residential IPs with unlimited bandwidth.
Fast and cost-efficient IPs optimized for large-scale scraping.
Guaranteed bandwidth — for reliable, large-scale data transfer.
Scrapers $/GB
Fetch real-time data from 100+ websites,No development or maintenance required.
Get real-time results from search engines. Only pay for successful responses.
Execute scripts in stealth browsers with full rendering and automation.
Bid farewell to CAPTCHAs and anti-scraping, scrape public sites effortlessly.
Dataset Marketplace Pre-collected data from 100+ domains.
Data for AI $/GB
A powerful web data infrastructure built to power AI models, applications, and agents.
High-speed, low-latency proxies for uninterrupted video data scraping.
Extract video and metadata at scale, seamlessly integrate with cloud platforms and OSS.
6B original videos from 700M unique channels - built for LLM and multimodal model training.
Pricing $0/GB
Starts from
Starts from
Starts from
Starts from
Starts from
Starts from
Starts from
Starts from
Docs $/GB
Full details on all features, parameters, and integrations, with code samples in every major language.
Resource $/GB
EN
首单免费!
首次购买住宅代理可获得100%返现至钱包余额,最高$900。
代理 $/GB
数据采集 $/GB
AI数据 $/GB
定价 $0/GB
产品文档
资源 $/GB
简体中文$/GB
.save()) so you can reuse them later in production without retraining.requests. We show you how to inject Thordata Residential Proxies via request_args to avoid IP bans.Imagine scraping a website without ever pressing F12 to inspect the source code. Imagine just telling Python: “I want the data that looks like ‘iPhone 15′” and having the script figure out the CSS selectors automatically.
This is the promise of AutoScraper, a “Smart, Automatic, Fast and Lightweight Web Scraper for Python.” It uses a form of fuzzy matching to learn scraping rules from the examples you provide.
While many blogs cover the “Hello World” of AutoScraper, few show how to use it in the real world—where websites ban IPs and structures get complex. In this guide, I’ll take you from the basics to an advanced setup with proxy integration.
First, install the library. It is lightweight and depends on requests and bs4.
pip install autoscraper
Let’s say we want to scrape book titles from a test site. Instead of looking for div.product_pod h3 a, we just pick a title we see on the page, like “A Light in the Attic”, and give it to AutoScraper.
from autoscraper import AutoScraper
url = 'https://books.toscrape.com/'
# We want to scrape titles and prices. We give one example of each.
wanted_list = ["A Light in the Attic", "£51.77"]
scraper = AutoScraper()
# The build() function learns the rules
result = scraper.build(url, wanted_list)
print(result)
# Output: ['A Light in the Attic', 'Tipping the Velvet', ..., '£51.77', '£53.74', ...]
The output above is a flat list. In a real project, you want a dictionary where titles are linked to prices. We can ask AutoScraper to group the rules and then save the model for later use.
# Get the rules learned
rules = scraper.get_result_similar(url, grouped=True)
print(rules.keys())
# Output: dict_keys(['rule_1ab2', 'rule_8x9y'])
# Alias the rules and save the model
scraper.set_rule_aliases({'rule_1ab2': 'Title', 'rule_8x9y': 'Price'})
scraper.keep_rules(['rule_1ab2', 'rule_8x9y'])
scraper.save('books_model')
This is where most AutoScraper tutorials fail. If you try to run your saved model on Amazon or Google, you will be blocked immediately. AutoScraper uses the requests library internally, which means we can pass arguments to it via request_args.
By routing traffic through Thordata’s residential gateway, your AutoScraper script becomes virtually undetectable. The proxy handles IP rotation automatically.
from autoscraper import AutoScraper
# Load the trained model
scraper = AutoScraper()
scraper.load('books_model')
# Thordata Proxy Configuration (Username:Password)
proxies = {
"http": "http://USER:PASS@gate.thordata.com:12345",
"https": "http://USER:PASS@gate.thordata.com:12345",
}
# Scrape a new page using the proxy
target_url = 'https://books.toscrape.com/catalogue/page-2.html'
results = scraper.get_result_similar(
target_url,
group_by_alias=True,
request_args={'proxies': proxies, 'timeout': 10}
)
print(results['Title'][0])
As a data engineer, picking the right tool is half the battle. Here is my verdict on when AutoScraper shines and when it falls short:
| Use AutoScraper When… | Avoid AutoScraper When… |
|---|---|
| You need a quick prototype (POC) in 5 minutes. | You are scraping a highly dynamic SPA (React/Vue). |
| The site structure is simple and consistent. | You need complex pagination logic or login handling. |
| You want to avoid learning XPath/CSS selectors. | You need high-performance concurrency (Scrapy is better). |
AutoScraper is a brilliant tool for “lazy” scraping. It lowers the barrier to entry significantly. However, for enterprise-grade data extraction, it must be paired with robust infrastructure. By adding Thordata Residential Proxies to the mix, you turn this lightweight tool into a capable scraper for moderate workloads.
Frequently asked questions
How does AutoScraper work?
AutoScraper learns parsing rules by comparing your input (e.g., a specific product title) against the page’s HTML structure. It automatically detects the XPaths/CSS selectors needed to find similar items.
Can I use proxies with AutoScraper?
Yes, AutoScraper is built on top of the ‘requests’ library. You can pass a proxy dictionary to the ‘request_args’ parameter in the ‘build’ or ‘get_result’ methods.
Is AutoScraper better than BeautifulSoup?
It is faster for prototyping because you don’t need to inspect HTML code. However, for production-grade scraping where site structures change slightly, BeautifulSoup or Scrapy offers more reliability.
About the author
Kael is a Senior Technical Copywriter at Thordata. He works closely with data engineers to document best practices for bypassing anti-bot protections. He specializes in explaining complex infrastructure concepts like residential proxies and TLS fingerprinting to developer audiences. All code examples in this article have been tested in real-world scraping scenarios.
The thordata Blog offers all its content in its original form and solely for informational intent. We do not offer any guarantees regarding the information found on the thordata Blog or any external sites that it may direct you to. It is essential that you seek legal counsel and thoroughly examine the specific terms of service of any website before engaging in any scraping endeavors, or obtain a scraping permit if required.
Looking for
Top-Tier Residential Proxies?
您在寻找顶级高质量的住宅代理吗?
Web Scraper API Guide: Python, Node.js & cURL
Kael Odin Last updated on 2026-01-13 10 min read 📌 Key […]
Unknown
2026-01-13
Run Python in Terminal: Args, Venv & Nohup Guide
Kael Odin Last updated on 2026-01-10 12 min read 📌 Key […]
Unknown
2026-01-13
ChatGPT Web Scraping: AI Code & Parsing Guide
Kael Odin Last updated on 2026-01-13 6 min read 📌 Key T […]
Unknown
2026-01-13