Portfolio — Finance

§ 01

The portfolio object.

There is no Portfolio class. There doesn't need to be. The smallest useful primitive is a list of tickers. Every analytic — Sharpe per holding, correlation across the book, drawdown in a stress window, allocation drift, beta to S&P — chains off that primitive and the price history each ticker carries.

# every portfolio operation reduces to one of these three calls
from finance_mcp.tools import (
    risk_metrics,    # per-holding: Sharpe, max drawdown, beta vs S&P
    comparison,      # cross-holding: normalized cumulative performance
    correlation,     # cross-holding: pairwise return correlation
)

# public signatures, exactly as shipped
get_risk_metrics(ticker: str, start: str, end: str = "") -> str
compare_tickers(tickers: str, start: str, end: str = "") -> str
correlation_map(tickers: str, start: str, end: str = "") -> str

Strings in, strings out. The "string out" is deliberate — the tool returns a written interpretation alongside the numbers, because a Sharpe of 1.42 means nothing to a director who hasn't done the math in a decade. The interpretation is part of the deliverable.

§ 02

Risk decomposition, per holding.

Three numbers per ticker, computed from 252 trading days of adjusted close. Sharpe annualizes daily excess return over its standard deviation. Max drawdown is the worst peak-to-trough on the cumulative wealth curve. Beta is the OLS slope of the holding's returns against the S&P 500 — covariance over benchmark variance, nothing fancy.

Real output, real numbers.

A four-name book — AAPL (tech), JPM (financials), JNJ (healthcare), XOM (energy) — over the trailing year:

Holding	Sharpe	Max Drawdown	Beta vs S&P	Read
AAPL	1.42	−14.2%	1.12	High Sharpe, tech beta > 1
JPM	1.18	−9.7%	1.08	Cyclicality without tech vol
JNJ	0.64	−7.1%	0.55	Defensive ballast
XOM	0.91	−11.4%	0.72	Commodity-coupled, low β

The math, exactly as shipped.

def _compute_risk_metrics(returns: pd.Series, benchmark_returns: pd.Series) -> dict:
    """Pure computation — no I/O, no side effects. Testable in isolation."""
    sharpe = (returns.mean() / returns.std()) * np.sqrt(252)

    wealth = (1 + returns).cumprod()
    max_drawdown = ((wealth - wealth.cummax()) / wealth.cummax()).min()

    aligned = pd.concat([returns, benchmark_returns], axis=1).dropna()
    cov = np.cov(aligned.iloc[:, 0], aligned.iloc[:, 1])
    beta = float(cov[0, 1] / cov[1, 1])

    return {"sharpe": float(sharpe), "max_drawdown": float(max_drawdown), "beta": beta}

Design note

The pure-function split (_compute_risk_metrics takes Series, returns dict — no I/O) is the single most useful pattern in the codebase. It makes every metric unit-testable without touching yfinance, and means the same function works whether the data came from yfinance, Massive, or a CSV the user dropped in. Most finance tooling fails this test.

§ 03

Diversification — apparent vs. real.

The most expensive mistake in retail and prosumer portfolios is mistaking apparent diversification (six tickers, six logos) for real diversification (six low-correlation return streams). The number that matters is pairwise correlation. correlation_map renders it as a heatmap and prints the matrix.

"A 4-ticker book with average pairwise ρ = 0.31 is genuinely diversified. ρ = 0.78 is one bet, in different names."

For the AAPL / JPM / JNJ / XOM book above: AAPL ↔ JPM correlates around 0.45, AAPL ↔ JNJ at 0.31, AAPL ↔ XOM at 0.28. JNJ and XOM provide the genuine downside protection — not because they were picked to, but because their return streams happen to decouple from tech in stress windows. That's a structural fact about the book, surfaced in one tool call.

The PM-lens interpretation that /finance-pm writes underneath the heatmap reads something like: "Cross-correlation between JNJ and AAPL is 0.31 — meaningful diversification. The tech-financials block (AAPL, JPM at 0.45) carries most of your equity-beta risk. Adding more tech raises beta without raising Sharpe." A senior analyst could write that. The point is that no one has to.

§ 04

Liquidity risk — a sklearn pipeline that respects split-before-fit.

Risk is not just price risk. For private wealth and FP&A use cases, the bigger question is liquidity: which clients (or which positions) can absorb a redemption shock without forcing a fire sale. liquidity_predictor fits a regression on a CSV of client attributes; predict_liquidity scores a single client against the persisted model.

Pipeline shape.

# split BEFORE fit — leak prevention is a safety property, not a hyperparameter
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

pipe = Pipeline([
    ("prep", ColumnTransformer([
        ("num", StandardScaler(), numeric_cols),
        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols),
    ])),
    ("reg", LinearRegression()),
])

pipe.fit(X_train, y_train)        # fit ONLY on train
preds = pipe.predict(X_test)
rmse  = root_mean_squared_error(y_test, preds)
r2    = r2_score(y_test, preds)

joblib.dump(pipe, "finance_output/models/liquidity_pipeline.joblib")

Why regression, not classification.

The target liquidity_risk is a continuous score in [0, 1] — the probability of being unable to service a near-term liquidity demand under stress. Bucketing it into LOW / MODERATE / HIGH for the UI is fine; bucketing it during training discards information. The model fits the score directly; the labels happen at presentation time. RMSE and R² report on the regression — confusion matrices would lie about a problem that isn't actually classification.

Design note

IQR cleaning happens before the train/test split, on numeric columns only. That's a defensible asymmetry: outlier removal is part of dataset hygiene, not part of the model's learned distribution. If you remove outliers post-split, your test set looks cleaner than production data and your evaluation lies. This is the kind of bug that ships a model.

§ 05

Two lenses on the same book.

Both /finance-analyst and /finance-pm call the same six tools. The difference is which number leads, what it's compared against, and what the next-step suggestion is. The persona is a re-ranking of attention, not a different model.

/finance-analyst

Equity Analyst lens

Frames each holding as a security under research coverage. Optimized for buy-side and sell-side note consumers.

Leads with Sharpe
Beta means stock-level market sensitivity
Next step: compare to sector peers

/finance-pm

Portfolio Manager lens

Frames each holding as a position in the book. Optimized for risk committee and LP reporting.

Leads with Max drawdown
Beta means systematic exposure of the book
Next step: check correlation against other holdings

Both lenses run the identical math. The PM never sees a Sharpe-first table because that's not the question they were hired to answer. The analyst never sees a drawdown-first table because their reader doesn't own the book. Same data, different deliverable.

§ 06

Data layer.

Three providers behind a single DataProvider Protocol. The portfolio code does not know which one is running. Swap via the DATA_PROVIDER env var.

✓ Default · Shipped

yfinance

Zero-config price history, returns, and the inputs to every metric on this page. Adequate for individual research, slow for cross-sectional work.

✓ Shipped

Massive

57 endpoints behind the same Protocol — stocks, options + Greeks, forex, crypto, indices, news, SEC filings, technicals, fundamentals, movers.

→ Roadmap v1.5

Plaid

Connect a brokerage in 20 seconds. Positions and cost basis from Fidelity, Schwab, E*TRADE, Vanguard, IBKR, Robinhood, and 12,000+ institutions.

DataProvider Protocol — duck-typed, swappable

fetch_price_history(ticker, start, end) → DataFrameall providers

get_adjusted_prices(df) → Seriesall providers

get_options_chain(ticker, expiry) → DataFrameMassive only

get_positions(account_id) → list[Holding]Plaid (v1.5)

The Plaid integration changes the audience. With yfinance and Massive, the user types tickers. With Plaid, the user authenticates a brokerage and the book arrives. Same downstream math; new top-of-funnel.

§ 07

What this is not.

It's worth being explicit, because the absence of these things is the design, not an oversight:

This is not a robo-advisor. There is no allocation recommendation engine, no glide path, no risk-tolerance questionnaire. That's a different product, with a different regulatory surface, owned by people whose job that is.
This is not a brokerage. No order entry, no execution, no custody. Plaid (when it ships) is read-only.
This is not investment advice. Every tool returns a written interpretation alongside the numbers. The interpretation is descriptive, not prescriptive. Position sizing, allocation targets, and rebalancing decisions belong to the PM.
This is not a backtest engine. Forward-looking simulation, scenario stress, and Monte Carlo live elsewhere. This stack scores what already happened.

What it is: a fast, inspectable, plain-English layer between a finance professional and the math they used to wait three weeks for IT to deliver.