Phase 4 of the report SWE_520_Project-Phase-4.pdf is in the root directory.
PyTestPilot is an empirical evaluation framework that generates Python unit tests using open-weight Large Language Models (LLMs) and compares them against traditional property-based and random testing baselines.
This project is intended to be run within an isolated Python virtual environment.
# Create a virtual environment named .venv
python3 -m venv .venv
# Activate the virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activateEnsure you are in the project root (where this README is located) and your virtual environment is activated, then run:
pip install -r requirements.txtCopy the example environment file to .env:
cp .env.example .envBy default, the .env file uses $OPENROUTER_API_KEY, which will pull the API key from your system's environment variables if you already have it exported.
Alternatively, you can edit the .env file and paste your API key directly:
OPENROUTER_API_KEY=sk-or-v1-...
PyTestPilot uses an evaluate command that runs the full end-to-end pipeline: AST-based context extraction, test generation, isolated sandbox execution, and metrics aggregation.
To run the full pipeline on one of the included repositories:
python src/cli.py evaluate repos/toolz/ --methods llm-deepseek,hypothesis,randomAfter the pipeline completes, you can find the aggregated metrics and a human-readable report in the evaluation_results/ directory:
evaluation_results/report.md: A summary of pass rates and coverage.evaluation_results/project_evaluation.json: Detailed raw metrics for all functions.
To quickly verify the setup on a single function:
python src/cli.py evaluate repos/toolz/toolz/itertoolz.py --function-id first --methods llm-deepseekBelow is a tree view of the _DISTRO code layout and the role of each directory:
_DISTRO/
├── src/ # Core framework source code
│ ├── adaptive_refiner/ # Pipeline Stage 5: Test refinement
│ ├── ast_parser/ # Pipeline Stage 1: AST context extraction
│ ├── data_structures/ # Core data models
│ ├── llm_client/ # Pipeline Stage 3: LLM interaction
│ ├── metrics_aggregator/# Pipeline Stage 6: Metrics calculation
│ ├── prompt_engineer/ # Pipeline Stage 2: Prompt assembly
│ ├── report_generator/ # Pipeline Stage 7: Reporting
│ ├── test_executor/ # Pipeline Stage 4: Test execution & baselines
│ ├── utils/ # Utilities like logging
│ └── cli.py # Command Line Interface entry point
├── repos/ # Target repositories for empirical evaluation
├── scripts/ # Auxiliary helper scripts
├── tests/ # Framework unit tests
├── evaluation_results/ # Output directory for pipeline reports/metrics
├── analyze.py # Script to analyze raw test output
└── generate_charts.py # Script to generate visual evaluation charts
src/: Contains the core logic for the PyTestPilot framework, including all seven stages of the automated LLM-based test generation pipeline.repos/: Houses the target repositories (toolzandpython_patterns) used as fixtures during empirical evaluations.scripts/: Auxiliary scripts for parsing output, computing test coverage, aggregating metrics, and verifying AST contexts.tests/: Unit tests verifying the functionality of PyTestPilot components (e.g., adaptive refiner, metrics aggregator).evaluation_results/: Output directory storing execution results, including detailed raw JSON metrics and aggregated human-readable Markdown reports.analyze.py&generate_charts.py: Scripts deployed alongside the pipeline to parse test execution output and render performance charts.