TestWeaveX brings test case management, execution, and AI-assisted generation into a single Git-native platform. The LLM suggests. You decide.
From install to your first gap report in under 10 minutes.
pip install testweavex tw init --llm-provider anthropic
This creates testweavex.config.yaml in your project root and a testweavex/skills/ folder with the 10 built-in skill files.
tw # same as pytest — all flags work tw tests/login.feature # run a specific feature tw -k smoke -n 4 # filter + parallel
Results are stored automatically in .testweavex/results.db. No configuration required.
tw gaps --limit 20
TestWeaveX compares your TCM against your automation suite and surfaces unautomated tests ranked by priority score. Generate automation for any gap with tw gaps --generate.
--results-server https://your-server --token $TOKEN to any tw command to share results across your team. One docker-compose up starts the server.
Every pytest flag works with tw unchanged. TestWeaveX adds its own flags alongside.
| Command | Description | Key Options |
|---|---|---|
tw [paths] | Run tests (wraps pytest) | --results-server, --token, --sync-tcm, --gaps |
tw init | Initialise TestWeaveX in a project | --llm-provider, --tcm |
tw generate | Generate tests from a feature description | --feature, --skill, --output |
tw gaps | Run gap analysis, show ranked report | --limit, --min-score, --generate |
tw import | Import from external TCM or CSV | --source (testrail/xray/csv) |
tw status | Show coverage map and summary | --format (table/json/html) |
tw history | Show execution history | --id, --last-n |
tw serve | Start local Web UI | --port (default: 8080) |
tw migrate | Migrate from external TCM | --source, --dry-run |
tw sync | Push results to external TCM | --tcm, --run-id |
- name: Run tests run: | tw run --suite regression \ --results-server ${{ secrets.TW_SERVER }} \ --token ${{ secrets.TW_TOKEN }} \ --sync-tcm testrail
Create testweavex.config.yaml in your project root. All values support ${ENV_VAR} interpolation. Missing keys use defaults.
# testweavex.config.yaml llm: provider: anthropic # openai | anthropic | ollama | azure model: claude-sonnet-4-6 api_key: ${ANTHROPIC_API_KEY} temperature: 0.3 max_retries: 3 timeout_seconds: 30 results_server: ${TESTWEAVEX_SERVER} # optional — team mode tcm: provider: none # testrail | xray | none gap_analysis: scoring_weights: priority: 0.30 test_type: 0.25 defects: 0.20 frequency: 0.15 staleness: 0.10 match_threshold: 0.65 top_gaps_default: 10
| Provider | Key Setting | Models |
|---|---|---|
| Anthropic | api_key: ${ANTHROPIC_API_KEY} | claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5 |
| OpenAI | api_key: ${OPENAI_API_KEY} | gpt-4o, gpt-4-turbo, gpt-3.5-turbo |
| Ollama | base_url: http://localhost:11434 | llama3, mistral, phi-3, any local model |
| Azure OpenAI | azure_endpoint, deployment_name | All Azure-deployed OpenAI models |
TestWeaveX is a pytest plugin with a thin CLI wrapper. All functionality flows through one of three pipelines.
| Pipeline | Input | Output |
|---|---|---|
| Generation | Feature description + skill file | Approved Gherkin + step definitions in repo |
| Execution | Feature files + pytest config | Test results in storage + TCM updated |
| Gap Analysis | TCM test cases + automation suite | Ranked gap list + optional generated automation |
| Module | Responsibility | Phase |
|---|---|---|
core/models.py | Pydantic data models — shared contract | 1 |
core/config.py | YAML config loader | 1 |
storage/sqlite.py | Local SQLite persistence (default) | 1 |
llm/ | Provider-agnostic LLM adapter layer | 2 |
skills/ | YAML skill files for each test type | 2 |
generation/ | Feature → Gherkin → step definitions | 3 |
execution/plugin.py | pytest plugin hooks | 4 |
gap/ | Gap detection, scoring, automation trigger | 5 |
web/ | FastAPI + React dashboard | 6 |
tcm/ | TestRail + Xray connectors | 7 |
Every test case gets a deterministic 64-character ID derived from its feature file path and scenario name. This ID is stable across machines, CI runs, and environments. The algorithm is frozen — never change it after first deployment.
import hashlib def generate_stable_id(*parts: str) -> str: key = "|".join(parts).encode("utf-8") return hashlib.sha256(key).hexdigest() # full 64 chars # test_case_id = generate_stable_id(feature_path, scenario_name) # feature_id = generate_stable_id(feature_path)
Gaps are ranked by a six-signal weighted score (0.0–1.0). Higher = automate first.
| Signal | Weight | Meaning |
|---|---|---|
| Priority | 30% | P1 tests must be automated before P4 |
| Test Type | 25% | Smoke/E2E gaps hurt most (score 1.0/0.9) |
| Defect History | 20% | Tests linked to past bugs are high value |
| Frequency | 15% | Frequently-run manual tests benefit most from automation |
| Staleness | 10% | Tests not run recently carry higher regression risk |
| Repo | Purpose |
|---|---|
| testweavex/testweavex | Core Python library (this package) |
| testweavex/testweavex-server | Self-hosted result server (Docker) |
| testweavex/testweavex-skills | Community skill file contributions |
| testweavex/testweavex-docs | Full Docusaurus documentation site |
The easiest way to contribute is a new skill YAML file. Create testweavex/skills/custom/your-skill.yaml:
name: custom/your-skill display_name: Your Skill Name description: What this skill generates prompt_template: | You are a senior QA engineer. Feature: {feature_description} Generate {n_suggestions} test scenarios that... Return JSON: title, gherkin, confidence, rationale, suggested_tags assertion_hints: - Verify primary outcome tags: [custom] priority: 3
git clone https://github.com/testweavex/testweavex cd testweavex pip install -e ".[dev]" pytest tests/ -v