Surendra Singh  ·  Staff AI Engineer  ·  SF

I build production multi-agent AI
for regulated financial services.

For 3+ years I've been shipping conversational AI platforms where the hard problems aren't model selection — they're routing layers, multi-turn state, guardrails that compliance will actually sign off on, and eval infrastructure that catches failure modes single-turn benchmarks miss. Before that, 15 years on Wall Street: Merrill Lynch building intraday risk and trade systems for institutional desks, Fidelity architecting distributed backends for the online trading platform. I know the data flows, the regulatory surface, and the operational reality of these firms from the inside.

What I Ship

Five surfaces.
Each load-bearing.

  • Multi-agent routing architectures — ML-first hybrid (DeBERTa-v3 classifiers + session-context Transformer embeddings + LLM fallback), not pure-LLM routing.
  • Multi-turn trajectory evaluation — persona-driven simulated users exercising the full N×M intent × persona matrix; trajectory scoring, not just turn-by-turn.
  • LLM evaluation pipelines — 18+ metrics, CI/CD-integrated, self-hosted for compliance; regressions block merge.
  • Memory-augmented agent systems — bitemporal, deterministic, auditable. See Attestor below.
  • Guardrail architectures for fiduciary contexts — input filtering, retrieval boundaries, response gate, full audit trail. Compliance as a co-author, not a reviewer.

Three projects, each one a public artifact.

01 · Active

Finance

MCP-native financial analytics on Claude. 25+ tools and 3 slash commands across equity research, portfolio analysis, options, cross-asset, fundamentals, and ML pipelines.

Stack: Python · MCP · sklearn · yfinance · Massive · Plaid (v1.5)
Browse 8 category pages →
02 · Active

Attestor

Auditable memory for agent teams. Deterministic. Bitemporal. Self-hosted, with no LLM in the critical path.

Stack: PostgreSQL · pgvector · Neo4j · Voyage AI embeddings
The memory layer most agent stacks pretend they don't need until the regulator asks where a number came from.
I write about multi-agent architecture, LLM evaluation, and applied AI in regulated industries. The eight pages below are a deep dive into Finance specifically — open source, in production, with the math, the source, and the design notes.
25+
MCP Tools
3
Slash Commands
2
Personas
500+
Tests Passing
3
Install Methods

Each category, in depth.

Single-asset analytics, derivatives, cross-asset, filings, ML pipelines, personas, the data layer, and a portfolio surface composed from six tools — each with the math, the source, and the design notes.

01 · Portfolio
Portfolio analytics
Per-holding risk decomposition, real vs. apparent diversification, ML liquidity scoring.
02 · Markets
Single-asset analytics
Price, returns (arithmetic vs. log), realized volatility, Sharpe / drawdown / beta.
03 · Options
Derivatives & technicals
Full chain with Greeks (Δ Γ Θ ν ρ), plus RSI / MACD / SMA / Bollinger.
04 · Cross-Asset
FX · crypto · indices
Three asset classes, one Protocol. Correlation regimes shift in stress.
05 · Fundamentals
Filings & corporate actions
10-K risk factors, dividends, splits, short interest, news, market movers.
06 · Quant / ML
scikit-learn pipelines
Liquidity regression and investor classifier — split-before-fit, persisted artifacts.
07 · Personas
Three slash commands
/finance auto-router, /finance-analyst, /finance-pm. Same data, different lens.
08 · Data Layer
Protocol & providers
yfinance · Massive (57 endpoints) · Plaid (v1.5). Swap via env var.
SHARPE RATIO ▲ 1.42 MAX DRAWDOWN ▼ -12.3% BETA vs S&P 0.87 CORRELATION MATRIX ✓ GENERATED VOLATILITY 30D 24.1% LIQUIDITY SCORE ▲ 0.91 EDA COMPLETE ✓ 847 ROWS ANNUALIZED RETURN ▲ 18.7% SHARPE RATIO ▲ 1.42 MAX DRAWDOWN ▼ -12.3% BETA vs S&P 0.87 CORRELATION MATRIX ✓ GENERATED VOLATILITY 30D 24.1% LIQUIDITY SCORE ▲ 0.91 EDA COMPLETE ✓ 847 ROWS ANNUALIZED RETURN ▲ 18.7%
Why I Built This

I've had the same conversation hundreds of times.

Over 15 years in financial services, Merrill Lynch, Fidelity, and beyond, I kept meeting the same person. A VP. An MD. A senior analyst. Brilliant at their craft. Standing by a whiteboard or hunched over a Bloomberg terminal.

And they'd always say some version of this:

"I know exactly what analysis I need. I just can't get it done fast enough."

Then they'd show me the spreadsheet. Always a spreadsheet.

A $100 trillion industry, portfolios, models, forecasts, reconciliations, board decks, all flowing through .xlsx files held together by VLOOKUP and prayer.

The smartest people in the room weren't bottlenecked by ideas. They were bottlenecked by tooling. Waiting on engineering tickets. Waiting on the quant team. Waiting on a Python script someone wrote in 2019 that nobody remembers how to run.

So I stopped waiting and built the thing they actually needed.

MERRILL LYNCH

Risk Management Systems

Built distributed platforms for institutional risk analytics
FIDELITY INVESTMENTS

Trading Infrastructure

Order routing systems and distributed trading platforms
STEALTH FINTECH · SEP 2021 – PRESENT · SF

Staff AI Engineer · Multi-Agent AI in Production

Member-facing financial product. Multi-agent routing (planning, budgeting, investment, credit) → guardrail layer → response, multi-turn state across concurrent sessions. DeBERTa-v3 intent classifiers + session-level Transformer embeddings, LLM fallback only. 18-metric eval harness with multi-turn trajectory simulation, self-hosted for compliance. Guardrail architecture for fiduciary contexts — input filter, retrieval boundaries, response gate, full audit trail.
Read the engineering memo →

Every firm. Every desk. Same story.

VP, Equity Research

"I need a Sharpe ratio comparison across 5 tickers. IT says 3 weeks."

MD, Investment Banking

"I signed up for that Python course. Got through week 2. Then Q3 close happened."

Director, FP&A

"My team spends 40% of their time just cleaning data before they can analyze anything."

PM, Hedge Fund

"I know the analysis I want. I just can't express it in code."

What if you could express it in English?

That's Finance. You describe the analysis. Claude pulls the data, runs the computation, generates charts, and interprets the results. 30 seconds. One sentence.

The Industry's Answer Was Wrong

They said "learn Python."
I watched what happened next.

In 2023, JPMorgan told investors every new analyst would be trained in Python. Training companies charged $200+ per seat. Thousands enrolled.

I watched from the inside. The same cycle played out everywhere, enthusiasm, frustration, abandonment. Deadlines don't wait for your learning curve.

Python is powerful. I've built production systems in it. But asking finance professionals to become software engineers just to run a Sharpe ratio was always the wrong answer.

The right answer: give them tools that speak their language.

W1
Week 1
Excited. Installed Python. "This is going to change everything."
W3
Week 3
Debugging import errors. Stack Overflow tabs multiplying. Deadline looming.
W6
Week 6
Back in Excel. The model ships. The Python course gathers dust.
With Finance
30 seconds. One sentence. Full analysis with charts and interpretation.

Three categories. Zero code.

The foundation layer — market analysis, ML workflows, and environment checks.

Market Analysis (6 tools)
📈

Stock Price Analysis

analyze_stock

Price chart with trend summary. "Show me AAPL's price chart for the last 6 months"

📊

Returns Analysis

get_returns

Daily and cumulative return charts. "What are NVDA's returns since January?"

🌊

Volatility Analysis

get_volatility

Annualized and 21-day rolling volatility. "How volatile has TSLA been this quarter?"

Risk Metrics

get_risk_metrics

Sharpe ratio, max drawdown, beta vs S&P 500. "Get risk metrics for GOOGL over the last year"

📉

Ticker Comparison

compare_tickers

Normalized performance chart for 2-5 tickers. "Compare AAPL, MSFT, and GOOGL over 90 days"

🔗

Correlation Heatmap

correlation_map

Return correlation for 2-10 tickers. "Show correlation between AAPL, JPM, JNJ, and XOM"

ML Workflows (3 tools)
🔍

CSV Data Ingestion

ingest_csv

Auto-profile any CSV: column detection, outlier removal, EDA charts.

🎯

Liquidity Risk Model

liquidity_predictor + predict_liquidity

Train regression model, then score clients with LOW / MODERATE / HIGH risk ratings.

🤖

Investor Classifier

investor_classifier + classify_investor

RandomForest classification for investor segmentation by profile attributes.

Environment (2 tools)
🏓

Ping

ping

Confirm the MCP server is running and ready.

Validate Environment

validate_environment

Check all 7 required packages are installed with version numbers.

Type a slash. Get institutional output.

Three finance personas. One auto-router and two role-specific lenses (analyst and PM).

/finance
General-purpose analysis, routes to any of the 25+ tools
/finance-analyst
Equity analyst lens: Sharpe first, single-stock focus
/finance-pm
Portfolio manager lens: drawdown first, portfolio risk
Private Equity

Looking for the PE workflows — DX decision diagnostic, BX cross-portco benchmarking, IC memos, DD checklists, value-creation plans? They moved to their own focused repo.

bolnet/private-equity →  ·  5-lender BX demo →  ·  Lending Club DX demo →

Same data. Different lens.

Both use the same tools. The difference is framing, priority, and audience.

/finance-analyst

Equity Analyst

Frames every ticker as a security under research coverage. Optimized for buy-side and sell-side consumers.

Leads withSharpe ratio (risk-adjusted return quality)
Beta meansStock-level market sensitivity
Next stepCompare to sector peers
❯ /finance-analyst initiate coverage on NVDA
/finance-pm

Portfolio Manager

Frames every ticker as a holding in the portfolio book. Optimized for internal risk committee and LP reporting.

Leads withMax drawdown (worst-case portfolio loss)
Beta meansPortfolio-level systematic risk exposure
Next stepCheck correlation with other holdings
❯ /finance-pm review risk on my holdings: AAPL, NVDA, JPM

You describe it. Claude delivers it.

claude-code ~/portfolio-analysis
/finance-pm check diversification across AAPL, JPM, JNJ, XOM

⠋ Running compare_tickers + correlation_map...
Retrieved 252 trading days per ticker

Portfolio Risk Summary (PM Lens)
┌──────────┬────────────┬──────────────┬──────────┐
│ Holding  │ Sharpe     │ Max Drawdown │ Beta     │
├──────────┼────────────┼──────────────┼──────────┤
│ AAPL     │ 1.42       │ -14.2%       │ 1.12     │
│ JPM      │ 1.18       │ -9.7%        │ 1.08     │
│ JNJ      │ 0.64       │ -7.1%        │ 0.55     │
│ XOM      │ 0.91       │ -11.4%       │ 0.72     │
└──────────┴────────────┴──────────────┴──────────┘

Correlation heatmap saved
Normalized performance chart saved

Real diversification detected. JNJ (beta 0.55) and XOM (0.72) provide meaningful downside protection against the tech-heavy AAPL position. Cross-correlation between JNJ and AAPL is only 0.31...

Built for how you actually work.

Role-specific workflows, because I've sat across the table from every one of these roles.

Equity Research

Coverage Initiation

analyze_stock · get_risk_metrics · get_returns

Price charts, risk metrics, and cumulative performance. The three data points that feed every research note.

Hedge Fund PM

Diversification Audit

compare_tickers · correlation_map · get_volatility

Real diversification or correlated bets in different names? Vol regime detection and pair trading signals.

Investment Banking

Comparable Analysis

compare_tickers · correlation_map

Normalized performance comparisons and relative positioning for deal pitches and pitch materials.

FP&A

Data Profiling

ingest_csv · liquidity_predictor

Automated CSV profiling, forecasting model inputs, and variance analysis prep.

Accounting

Anomaly Detection

ingest_csv

Transaction data profiling, outlier detection prep, and ERP export cleanup.

Three layers. One workflow.

AI finally came to where finance lives. Finance adds the analytical layer that was missing.

🔷

Copilot for Finance

Microsoft

Operational finance: reconciliation, variance analysis, collections, ERP integration.

ReconciliationVarianceCollectionsERP
🟣

Claude in Excel

Anthropic

Model intelligence: formula tracing, scenario testing, error debugging with cell-level citations.

Formula TraceDebuggingScenariosSkills

Three ways to install.

MCP server for Claude Code CLI, plugin for slash commands, or web connection for claude.ai in your browser.

Method 2

Claude Code Plugin

MCP server + 18 slash commands + 16 skill definitions bundled together.

cd finance
claude

# All 18 commands auto-discovered
❯ /finance analyze AAPL
Method 3

Web (claude.ai)

HTTP tunnel via ngrok or Cloudflare. Use from browser, no CLI needed.

bash scripts/start_web.sh

# Paste URL in claude.ai
# Settings > Connectors > Add