An experimentation harness for systematic equity strategies.
Multi-agent research, ML prediction, risk-gated execution, weekly self-tuning — instrumented end-to-end[1].
First experiment: alpha capture against the S&P 500.
Six sector teams, a portfolio decision agent, and a macro layer on LangGraph + Claude. Structured outputs, LLM-as-judge.
Stacked ensemble of gradient-boosted and linear models. 21-day market-relative return predictions with confidence-driven veto.
Weekly evaluation writes optimized parameters back to four S3 configs. Downstream modules read them on cold-start.
Every signal, prediction, fill, and dollar of P&L instrumented and traceable. The console is a view, not a measurement layer.
Current phase
Every aspect of the system reliable and measurable, so Phase 3 can evaluate decisions on data, not vibes.
- Six modules wired end-to-end via S3 — research, prediction, execution, evaluation, data, dashboard.
- Multi-agent research, stacked meta-ensemble, risk-gated executor, autonomous backtester.
- Three Step Functions running unattended (Saturday weekly + weekday morning + EOD).
- Step Functions reliable end-to-end with drift detection and runtime trend alarms.
- Every decision point measurable — agent calls, predictor verdicts, fills, P&L attribution, risk events.
- Autonomous feedback loop — backtester writing four optimized configs to S3 weekly.
- Operates the autonomous feedback loop on a Phase-2-trustworthy substrate.
- Broader feature breadth in inference (current 21 features → ~50-feature ArcticDB store).
- Gated on ≥99% SF success rate over 8 weeks + transparency-inventory complete.
- Paper → live capital with progressive sizing.
- Portfolio-level risk overlays beyond per-position gates.
- Gated on sustained positive alpha vs SPY over a 12-week Phase 3 window.