Skip to content

RyanS974/520project

Repository files navigation

PyTestPilot

Phase 4 of the report SWE_520_Project-Phase-4.pdf is in the root directory.

PyTestPilot is an empirical evaluation framework that generates Python unit tests using open-weight Large Language Models (LLMs) and compares them against traditional property-based and random testing baselines.

Setup Instructions

This project is intended to be run within an isolated Python virtual environment.

1. Create and Activate a Virtual Environment

# Create a virtual environment named .venv
python3 -m venv .venv

# Activate the virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate

2. Install Dependencies

Ensure you are in the project root (where this README is located) and your virtual environment is activated, then run:

pip install -r requirements.txt

3. Configure API Keys

Copy the example environment file to .env:

cp .env.example .env

By default, the .env file uses $OPENROUTER_API_KEY, which will pull the API key from your system's environment variables if you already have it exported.

Alternatively, you can edit the .env file and paste your API key directly:

OPENROUTER_API_KEY=sk-or-v1-...

Running the Pipeline

PyTestPilot uses an evaluate command that runs the full end-to-end pipeline: AST-based context extraction, test generation, isolated sandbox execution, and metrics aggregation.

1. Run Full Evaluation on a Repository

To run the full pipeline on one of the included repositories:

python src/cli.py evaluate repos/toolz/ --methods llm-deepseek,hypothesis,random

2. View Results

After the pipeline completes, you can find the aggregated metrics and a human-readable report in the evaluation_results/ directory:

  • evaluation_results/report.md: A summary of pass rates and coverage.
  • evaluation_results/project_evaluation.json: Detailed raw metrics for all functions.

3. Quick Test (Single Function)

To quickly verify the setup on a single function:

python src/cli.py evaluate repos/toolz/toolz/itertoolz.py --function-id first --methods llm-deepseek

Architecture & Code Layout

Below is a tree view of the _DISTRO code layout and the role of each directory:

_DISTRO/
├── src/                 # Core framework source code
│   ├── adaptive_refiner/  # Pipeline Stage 5: Test refinement
│   ├── ast_parser/        # Pipeline Stage 1: AST context extraction
│   ├── data_structures/   # Core data models
│   ├── llm_client/        # Pipeline Stage 3: LLM interaction
│   ├── metrics_aggregator/# Pipeline Stage 6: Metrics calculation
│   ├── prompt_engineer/   # Pipeline Stage 2: Prompt assembly
│   ├── report_generator/  # Pipeline Stage 7: Reporting
│   ├── test_executor/     # Pipeline Stage 4: Test execution & baselines
│   ├── utils/             # Utilities like logging
│   └── cli.py             # Command Line Interface entry point
├── repos/               # Target repositories for empirical evaluation
├── scripts/             # Auxiliary helper scripts
├── tests/               # Framework unit tests
├── evaluation_results/  # Output directory for pipeline reports/metrics
├── analyze.py           # Script to analyze raw test output
└── generate_charts.py   # Script to generate visual evaluation charts

Directory Descriptions

  • src/: Contains the core logic for the PyTestPilot framework, including all seven stages of the automated LLM-based test generation pipeline.
  • repos/: Houses the target repositories (toolz and python_patterns) used as fixtures during empirical evaluations.
  • scripts/: Auxiliary scripts for parsing output, computing test coverage, aggregating metrics, and verifying AST contexts.
  • tests/: Unit tests verifying the functionality of PyTestPilot components (e.g., adaptive refiner, metrics aggregator).
  • evaluation_results/: Output directory storing execution results, including detailed raw JSON metrics and aggregated human-readable Markdown reports.
  • analyze.py & generate_charts.py: Scripts deployed alongside the pipeline to parse test execution output and render performance charts.

About

SWE 520 Course Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages