Over 60 million real residential IPs from genuine users across 190+ countries.
Over 60 million real residential IPs from genuine users across 190+ countries.
Your First Plan is on Us!
Get 100% of your first residential proxy purchase back as wallet balance, up to $900.
PROXY SOLUTIONS
Over 60 million real residential IPs from genuine users across 190+ countries.
Reliable mobile data extraction, powered by real 4G/5G mobile IPs.
Guaranteed bandwidth — for reliable, large-scale data transfer.
For time-sensitive tasks, utilize residential IPs with unlimited bandwidth.
Fast and cost-efficient IPs optimized for large-scale scraping.
A powerful web data infrastructure built to power AI models, applications, and agents.
High-speed, low-latency proxies for uninterrupted video data scraping.
Extract video and metadata at scale, seamlessly integrate with cloud platforms and OSS.
6B original videos from 700M unique channels - built for LLM and multimodal model training.
Get accurate and in real-time results sourced from Google, Bing, and more.
Execute scripts in stealth browsers with full rendering and automation
No blocks, no CAPTCHAs—unlock websites seamlessly at scale.
Get instant access to ready-to-use datasets from popular domains.
PROXY PRICING
Full details on all features, parameters, and integrations, with code samples in every major language.
LEARNING HUB
ALL LOCATIONS Proxy Locations
TOOLS
RESELLER
Get up to 50%
Contact sales:partner@thordata.com
Proxies $/GB
Over 60 million real residential IPs from genuine users across 190+ countries.
Reliable mobile data extraction, powered by real 4G/5G mobile IPs.
For time-sensitive tasks, utilize residential IPs with unlimited bandwidth.
Fast and cost-efficient IPs optimized for large-scale scraping.
Guaranteed bandwidth — for reliable, large-scale data transfer.
Scrapers $/GB
Fetch real-time data from 100+ websites,No development or maintenance required.
Get real-time results from search engines. Only pay for successful responses.
Execute scripts in stealth browsers with full rendering and automation.
Bid farewell to CAPTCHAs and anti-scraping, scrape public sites effortlessly.
Dataset Marketplace Pre-collected data from 100+ domains.
Data for AI $/GB
A powerful web data infrastructure built to power AI models, applications, and agents.
High-speed, low-latency proxies for uninterrupted video data scraping.
Extract video and metadata at scale, seamlessly integrate with cloud platforms and OSS.
6B original videos from 700M unique channels - built for LLM and multimodal model training.
Pricing $0/GB
Starts from
Starts from
Starts from
Starts from
Starts from
Starts from
Starts from
Starts from
Docs $/GB
Full details on all features, parameters, and integrations, with code samples in every major language.
Resource $/GB
EN
首单免费!
首次购买住宅代理可获得100%返现至钱包余额,最高$900。
代理 $/GB
数据采集 $/GB
AI数据 $/GB
定价 $0/GB
产品文档
资源 $/GB
简体中文$/GB
venv to prevent “dependency hell” when managing complex libraries like playwright or the Thordata Python SDK.argparse to pass target URLs, proxy credentials, and concurrency settings directly from the command line without editing code.nohup, screen, or systemd for enterprise-grade uptime.If you are still clicking a green “Play” button in PyCharm or VS Code to run your data collection scripts, you are browsing, not engineering. Serious data collection infrastructure lives in the terminal. Servers do not have screens, and automated pipelines (CI/CD) run shell commands, not GUI interactions.
Transitioning from an IDE to the Command Line Interface (CLI) allows you to scale from one scraper to thousands. In this guide, we will go beyond python hello.py. We will cover environment management, dynamic argument parsing, security best practices, and background execution strategies required for high-volume scraping.
The first hurdle for many developers is simply invoking the Python interpreter correctly. The command varies significantly by operating system.
| OS | Standard Command | Why? |
|---|---|---|
| Windows | python script.py or py script.py |
Windows uses the “py” launcher to automatically select the latest installed version. |
| Mac / Linux | python3 script.py |
On Unix systems, python often refers to the obsolete Python 2.x. Always explicitly use python3 to avoid syntax errors. |
When you install libraries like requests, thordata-sdk, or pandas globally, you risk version conflicts between projects. Scraper A might need requests==2.25 while Scraper B needs requests==2.31. The solution is a Virtual Environment (venv).
# 1. Create the environment (do this once per project)
python3 -m venv venv
# 2. Activate it (Windows)
venv\Scripts\activate
# 2. Activate it (Mac / Linux)
source venv/bin/activate
Once activated, your terminal prompt will show (venv). Any pip install command now installs packages locally to this folder.
For reproducible deployments, always freeze your dependencies. This allows you to deploy your scraper to a cloud server instantly.
# Save dependencies
pip freeze > requirements.txt
# Install dependencies on a new server
pip install -r requirements.txt
Never hardcode API tokens or proxy passwords in your scripts. If you push that code to GitHub, your credentials are compromised. Use python-dotenv to load secrets from a .env file.
# .env file content
THORDATA_SCRAPER_TOKEN=your_token_here
THORDATA_RESIDENTIAL_USERNAME=user
THORDATA_RESIDENTIAL_PASSWORD=pass
# script.py
import os
from dotenv import load_dotenv
load_dotenv() # Loads variables from .env
token = os.getenv("THORDATA_SCRAPER_TOKEN")
Hardcoding URLs makes your script rigid. To build a scalable scraping pipeline, your script should accept inputs (like target URL, page limits, or proxy settings) from the command line. This is handled by the argparse library.
Here is a production-ready example using the Thordata SDK. It allows you to specify a query and engine dynamically.
import argparse
import os
from dotenv import load_dotenv
from thordata import ThordataClient
# Load env vars first
load_dotenv()
def main():
# 1. Setup Argparse
parser = argparse.ArgumentParser(description="Thordata SERP Scraper CLI")
parser.add_argument("--query", type=str, required=True, help="Search keyword")
parser.add_argument("--engine", type=str, default="google", help="Search engine (google, bing)")
parser.add_argument("--num", type=int, default=10, help="Number of results")
args = parser.parse_args()
# 2. Initialize Client
token = os.getenv("THORDATA_SCRAPER_TOKEN")
if not token:
print("Error: THORDATA_SCRAPER_TOKEN missing in .env")
return
client = ThordataClient(scraper_token=token)
# 3. Execute Search
print(f"Searching for '{args.query}' on {args.engine}...")
results = client.serp_search(
query=args.query,
engine=args.engine,
num=args.num
)
print(f"Found {len(results.get('organic', []))} organic results.")
if __name__ == "__main__":
main()
Now, you can run specific jobs without touching the code:
python scraper.py --query "Python SDK" --engine google --num 20
python scraper.py --query "Thordata" --engine bing
AsyncThordataClient from the SDK. You can add a --concurrency argument to your CLI to control how many simultaneous requests to send, allowing you to fine-tune performance based on your system’s resources.
If you are scraping 100,000 pages, the script might run for 24 hours. If you close your terminal or lose SSH connection, the script dies. Here is how to keep it alive.
nohup (No Hang Up) tells the process to ignore the hangup signal sent when you logout. We also redirect output streams so logs are saved to a file instead of the terminal.
nohup python3 scraper.py --query "Big Data" > scraper.log 2>&1 &
> scraper.log: Saves standard output (print statements) to this file.2>&1: Redirects errors (stderr) to the same log file.&: Runs the process in the background immediately, giving you back control of the terminal.screen (or tmux) creates a virtual terminal session that persists on the server. You can detach from it, go home, sleep, and reattach the next day.
# 1. Start a new session named 'scraper'
screen -S scraper
# 2. Run your script inside the session
python3 scraper.py
# 3. Detach (Press Ctrl+A, then press D)
# ... safe to close terminal window ...
# 4. Reattach later to check progress
screen -r scraper
Running scripts in the terminal is the first step towards automation. However, a script running in the background is useless if it gets IP-blocked after 5 minutes.
This is where infrastructure integration is critical. By using the Thordata Residential Proxy network within your scripts (as shown in the code above), you ensure that your background tasks rotate IPs automatically. The SDK handles connection pooling and retries, ensuring that your long-running nohup jobs don’t fail due to network interruptions.
The terminal is the control center for data engineers. By mastering virtual environments, you ensure stability. By using argparse, you make your code reusable. And by utilizing background execution tools like nohup, you turn scripts into persistent data pipelines. Combine this with the robust Thordata SDK, and you have an enterprise-grade scraping setup.
Frequently asked questions
Why is ‘python’ not recognized in my terminal?
This usually means Python is not added to your system’s PATH variable. On Windows, check ‘Add Python to PATH’ during installation. On Mac/Linux, try using ‘python3’ instead.
How do I keep my script running after I close the terminal?
Use ‘nohup python script.py &’ to run it in the background, or use ‘screen’/’tmux’ to create a detachable session. This is essential for long-running web scrapers.
Why should I use a virtual environment?
Virtual environments isolate your project’s dependencies. This prevents conflicts where Project A needs Library v1.0 but Project B needs Library v2.0. It is a best practice for stability.
About the author
Kael is a Senior Technical Copywriter at Thordata. He works closely with data engineers to document best practices for bypassing anti-bot protections. He specializes in explaining complex infrastructure concepts like residential proxies and TLS fingerprinting to developer audiences. All code examples in this article have been tested in real-world scraping scenarios.
The thordata Blog offers all its content in its original form and solely for informational intent. We do not offer any guarantees regarding the information found on the thordata Blog or any external sites that it may direct you to. It is essential that you seek legal counsel and thoroughly examine the specific terms of service of any website before engaging in any scraping endeavors, or obtain a scraping permit if required.
Looking for
Top-Tier Residential Proxies?
您在寻找顶级高质量的住宅代理吗?
ChatGPT Web Scraping: AI Code & Parsing Guide
Kael Odin Last updated on 2026-01-13 6 min read 📌 Key T […]
Unknown
2026-01-13
AutoScraper Python: Smart Scraping & Proxy Guide
Kael Odin Last updated on 2026-01-12 11 min read 📌 Key […]
Unknown
2026-01-13
Excel Web Scraping 2026: Power Query & APIs Guide
Kael Odin Last updated on 2026-01-13 13 min read 📌 Key […]
Unknown
2026-01-13