Your First Plan is on Us!

Get 100% of your first residential proxy purchase back as wallet balance, up to $900.



Proxies $/GB



Scrapers $/GB



Data for AI $/GB



Pricing $0/GB



Docs $/GB



Documentation

Full details on all features, parameters, and integrations, with code samples in every major language.

Residential Proxies Web Scraper API Web Unlocker SERP API

LEARNING HUB

Resource $/GB



Blog

Run Python in Terminal: Args, Venv & Nohup Guide

Developer managing python scripts via terminal CLI for scraping tasks

Kael Odin

Last updated on

2026-01-10

12 min read

📌 Key Takeaways

Environment Isolation: Use venv to prevent “dependency hell” when managing complex libraries like playwright or the Thordata Python SDK.
Dynamic Execution: Implement argparse to pass target URLs, proxy credentials, and concurrency settings directly from the command line without editing code.
Headless Operations: Deploy scrapers that persist after you disconnect using nohup, screen, or systemd for enterprise-grade uptime.

If you are still clicking a green “Play” button in PyCharm or VS Code to run your data collection scripts, you are browsing, not engineering. Serious data collection infrastructure lives in the terminal. Servers do not have screens, and automated pipelines (CI/CD) run shell commands, not GUI interactions.

Transitioning from an IDE to the Command Line Interface (CLI) allows you to scale from one scraper to thousands. In this guide, we will go beyond python hello.py. We will cover environment management, dynamic argument parsing, security best practices, and background execution strategies required for high-volume scraping.

1. The Basics: Invoking the Interpreter

The first hurdle for many developers is simply invoking the Python interpreter correctly. The command varies significantly by operating system.

OS	Standard Command	Why?
Windows	`python script.py` or `py script.py`	Windows uses the “py” launcher to automatically select the latest installed version.
Mac / Linux	`python3 script.py`	On Unix systems, `python` often refers to the obsolete Python 2.x. Always explicitly use `python3` to avoid syntax errors.

Troubleshooting: If you see the error “python is not recognized as an internal or external command”, you likely didn’t check the “Add Python to PATH” box during installation. You can reinstall Python or manually edit your System Environment Variables to fix this.

2. Virtual Environments: Stopping Dependency Hell

When you install libraries like requests, thordata-sdk, or pandas globally, you risk version conflicts between projects. Scraper A might need requests==2.25 while Scraper B needs requests==2.31. The solution is a Virtual Environment (venv).

Step 1: Create and Activate

Copy

# 1. Create the environment (do this once per project)

python3 -m venv venv

# 2. Activate it (Windows)
venv\Scripts\activate

# 2. Activate it (Mac / Linux)
source venv/bin/activate

Once activated, your terminal prompt will show (venv). Any pip install command now installs packages locally to this folder.

Step 2: Managing Requirements

For reproducible deployments, always freeze your dependencies. This allows you to deploy your scraper to a cloud server instantly.

# Save dependencies

pip freeze > requirements.txt

# Install dependencies on a new server
pip install -r requirements.txt

3. Security: Environment Variables

Never hardcode API tokens or proxy passwords in your scripts. If you push that code to GitHub, your credentials are compromised. Use python-dotenv to load secrets from a .env file.

Copy

# .env file content

THORDATA_SCRAPER_TOKEN=your_token_here
THORDATA_RESIDENTIAL_USERNAME=user
THORDATA_RESIDENTIAL_PASSWORD=pass

# script.py
import os
from dotenv import load_dotenv

load_dotenv() # Loads variables from .env
token = os.getenv("THORDATA_SCRAPER_TOKEN")

4. Dynamic Scrapers: The Power of Argparse

Hardcoding URLs makes your script rigid. To build a scalable scraping pipeline, your script should accept inputs (like target URL, page limits, or proxy settings) from the command line. This is handled by the argparse library.

Here is a production-ready example using the Thordata SDK. It allows you to specify a query and engine dynamically.

Copy

import argparse
import os
from dotenv import load_dotenv
from thordata import ThordataClient

# Load env vars first
load_dotenv()

def main():
    # 1. Setup Argparse
    parser = argparse.ArgumentParser(description="Thordata SERP Scraper CLI")
    parser.add_argument("--query", type=str, required=True, help="Search keyword")
    parser.add_argument("--engine", type=str, default="google", help="Search engine (google, bing)")
    parser.add_argument("--num", type=int, default=10, help="Number of results")

    args = parser.parse_args()

    # 2. Initialize Client
    token = os.getenv("THORDATA_SCRAPER_TOKEN")
    if not token:
        print("Error: THORDATA_SCRAPER_TOKEN missing in .env")
        return

    client = ThordataClient(scraper_token=token)

    # 3. Execute Search
    print(f"Searching for '{args.query}' on {args.engine}...")
    results = client.serp_search(
        query=args.query, 
        engine=args.engine, 
        num=args.num
    )
    
    print(f"Found {len(results.get('organic', []))} organic results.")

if __name__ == "__main__":
    main()

Now, you can run specific jobs without touching the code:

python scraper.py --query "Python SDK" --engine google --num 20

python scraper.py --query "Thordata" --engine bing

Pro Tip: Async Concurrency For high-volume scraping, use AsyncThordataClient from the SDK. You can add a --concurrency argument to your CLI to control how many simultaneous requests to send, allowing you to fine-tune performance based on your system’s resources.

5. Production: Running in the Background

If you are scraping 100,000 pages, the script might run for 24 hours. If you close your terminal or lose SSH connection, the script dies. Here is how to keep it alive.

Method A: Nohup (Simple & Effective)

nohup (No Hang Up) tells the process to ignore the hangup signal sent when you logout. We also redirect output streams so logs are saved to a file instead of the terminal.

nohup python3 scraper.py --query "Big Data" > scraper.log 2>&1 &

> scraper.log: Saves standard output (print statements) to this file.
2>&1: Redirects errors (stderr) to the same log file.
&: Runs the process in the background immediately, giving you back control of the terminal.

Method B: Screen (Interactive)

screen (or tmux) creates a virtual terminal session that persists on the server. You can detach from it, go home, sleep, and reattach the next day.

# 1. Start a new session named 'scraper'

screen -S scraper

# 2. Run your script inside the session
python3 scraper.py

# 3. Detach (Press Ctrl+A, then press D)
# ... safe to close terminal window ...

# 4. Reattach later to check progress
screen -r scraper

6. Managing Infrastructure

Running scripts in the terminal is the first step towards automation. However, a script running in the background is useless if it gets IP-blocked after 5 minutes.

This is where infrastructure integration is critical. By using the Thordata Residential Proxy network within your scripts (as shown in the code above), you ensure that your background tasks rotate IPs automatically. The SDK handles connection pooling and retries, ensuring that your long-running nohup jobs don’t fail due to network interruptions.

Conclusion

The terminal is the control center for data engineers. By mastering virtual environments, you ensure stability. By using argparse, you make your code reusable. And by utilizing background execution tools like nohup, you turn scripts into persistent data pipelines. Combine this with the robust Thordata SDK, and you have an enterprise-grade scraping setup.

Get started for free

Frequently asked questions

Why is ‘python’ not recognized in my terminal?

This usually means Python is not added to your system’s PATH variable. On Windows, check ‘Add Python to PATH’ during installation. On Mac/Linux, try using ‘python3’ instead.

How do I keep my script running after I close the terminal?

Use ‘nohup python script.py &’ to run it in the background, or use ‘screen’/’tmux’ to create a detachable session. This is essential for long-running web scrapers.

Why should I use a virtual environment?

Virtual environments isolate your project’s dependencies. This prevents conflicts where Project A needs Library v1.0 but Project B needs Library v2.0. It is a best practice for stability.

About the author

Kael Odin

Senior Technical Copywriter

Kael is a Senior Technical Copywriter at Thordata. He works closely with data engineers to document best practices for bypassing anti-bot protections. He specializes in explaining complex infrastructure concepts like residential proxies and TLS fingerprinting to developer audiences. All code examples in this article have been tested in real-world scraping scenarios.

The thordata Blog offers all its content in its original form and solely for informational intent. We do not offer any guarantees regarding the information found on the thordata Blog or any external sites that it may direct you to. It is essential that you seek legal counsel and thoroughly examine the specific terms of service of any website before engaging in any scraping endeavors, or obtain a scraping permit if required.