Six MCP tools compose a portfolio surface: per-holding risk decomposition, real-vs-apparent diversification, ML-based liquidity scoring, and two professional lenses on the same data. The math is from the textbook. The interface is plain English.
Notes from Surendra Singh — 15 years building portfolio and risk systems at Merrill Lynch, Fidelity, and beyond. Last revised April 2026.
There is no Portfolio class. There doesn't need to be. The smallest useful primitive is a list of tickers. Every analytic — Sharpe per holding, correlation across the book, drawdown in a stress window, allocation drift, beta to S&P — chains off that primitive and the price history each ticker carries.
# every portfolio operation reduces to one of these three calls from finance_mcp.tools import ( risk_metrics, # per-holding: Sharpe, max drawdown, beta vs S&P comparison, # cross-holding: normalized cumulative performance correlation, # cross-holding: pairwise return correlation ) # public signatures, exactly as shipped get_risk_metrics(ticker: str, start: str, end: str = "") -> str compare_tickers(tickers: str, start: str, end: str = "") -> str correlation_map(tickers: str, start: str, end: str = "") -> str
Strings in, strings out. The "string out" is deliberate — the tool returns a written interpretation alongside the numbers, because a Sharpe of 1.42 means nothing to a director who hasn't done the math in a decade. The interpretation is part of the deliverable.
Three numbers per ticker, computed from 252 trading days of adjusted close. Sharpe annualizes daily excess return over its standard deviation. Max drawdown is the worst peak-to-trough on the cumulative wealth curve. Beta is the OLS slope of the holding's returns against the S&P 500 — covariance over benchmark variance, nothing fancy.
A four-name book — AAPL (tech), JPM (financials), JNJ (healthcare), XOM (energy) — over the trailing year:
| Holding | Sharpe | Max Drawdown | Beta vs S&P | Read |
|---|---|---|---|---|
| AAPL | 1.42 | −14.2% | 1.12 | High Sharpe, tech beta > 1 |
| JPM | 1.18 | −9.7% | 1.08 | Cyclicality without tech vol |
| JNJ | 0.64 | −7.1% | 0.55 | Defensive ballast |
| XOM | 0.91 | −11.4% | 0.72 | Commodity-coupled, low β |
def _compute_risk_metrics(returns: pd.Series, benchmark_returns: pd.Series) -> dict: """Pure computation — no I/O, no side effects. Testable in isolation.""" sharpe = (returns.mean() / returns.std()) * np.sqrt(252) wealth = (1 + returns).cumprod() max_drawdown = ((wealth - wealth.cummax()) / wealth.cummax()).min() aligned = pd.concat([returns, benchmark_returns], axis=1).dropna() cov = np.cov(aligned.iloc[:, 0], aligned.iloc[:, 1]) beta = float(cov[0, 1] / cov[1, 1]) return {"sharpe": float(sharpe), "max_drawdown": float(max_drawdown), "beta": beta}
The pure-function split (_compute_risk_metrics takes Series, returns dict — no I/O) is the single most useful pattern in the codebase. It makes every metric unit-testable without touching yfinance, and means the same function works whether the data came from yfinance, Massive, or a CSV the user dropped in. Most finance tooling fails this test.
The most expensive mistake in retail and prosumer portfolios is mistaking apparent diversification (six tickers, six logos) for real diversification (six low-correlation return streams). The number that matters is pairwise correlation. correlation_map renders it as a heatmap and prints the matrix.
"A 4-ticker book with average pairwise ρ = 0.31 is genuinely diversified. ρ = 0.78 is one bet, in different names."
For the AAPL / JPM / JNJ / XOM book above: AAPL ↔ JPM correlates around 0.45, AAPL ↔ JNJ at 0.31, AAPL ↔ XOM at 0.28. JNJ and XOM provide the genuine downside protection — not because they were picked to, but because their return streams happen to decouple from tech in stress windows. That's a structural fact about the book, surfaced in one tool call.
The PM-lens interpretation that /finance-pm writes underneath the heatmap reads something like: "Cross-correlation between JNJ and AAPL is 0.31 — meaningful diversification. The tech-financials block (AAPL, JPM at 0.45) carries most of your equity-beta risk. Adding more tech raises beta without raising Sharpe." A senior analyst could write that. The point is that no one has to.
Risk is not just price risk. For private wealth and FP&A use cases, the bigger question is liquidity: which clients (or which positions) can absorb a redemption shock without forcing a fire sale. liquidity_predictor fits a regression on a CSV of client attributes; predict_liquidity scores a single client against the persisted model.
# split BEFORE fit — leak prevention is a safety property, not a hyperparameter X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) pipe = Pipeline([ ("prep", ColumnTransformer([ ("num", StandardScaler(), numeric_cols), ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols), ])), ("reg", LinearRegression()), ]) pipe.fit(X_train, y_train) # fit ONLY on train preds = pipe.predict(X_test) rmse = root_mean_squared_error(y_test, preds) r2 = r2_score(y_test, preds) joblib.dump(pipe, "finance_output/models/liquidity_pipeline.joblib")
The target liquidity_risk is a continuous score in [0, 1] — the probability of being unable to service a near-term liquidity demand under stress. Bucketing it into LOW / MODERATE / HIGH for the UI is fine; bucketing it during training discards information. The model fits the score directly; the labels happen at presentation time. RMSE and R² report on the regression — confusion matrices would lie about a problem that isn't actually classification.
IQR cleaning happens before the train/test split, on numeric columns only. That's a defensible asymmetry: outlier removal is part of dataset hygiene, not part of the model's learned distribution. If you remove outliers post-split, your test set looks cleaner than production data and your evaluation lies. This is the kind of bug that ships a model.
Both /finance-analyst and /finance-pm call the same six tools. The difference is which number leads, what it's compared against, and what the next-step suggestion is. The persona is a re-ranking of attention, not a different model.
Frames each holding as a security under research coverage. Optimized for buy-side and sell-side note consumers.
Frames each holding as a position in the book. Optimized for risk committee and LP reporting.
Both lenses run the identical math. The PM never sees a Sharpe-first table because that's not the question they were hired to answer. The analyst never sees a drawdown-first table because their reader doesn't own the book. Same data, different deliverable.
Three providers behind a single DataProvider Protocol. The portfolio code does not know which one is running. Swap via the DATA_PROVIDER env var.
Zero-config price history, returns, and the inputs to every metric on this page. Adequate for individual research, slow for cross-sectional work.
57 endpoints behind the same Protocol — stocks, options + Greeks, forex, crypto, indices, news, SEC filings, technicals, fundamentals, movers.
Connect a brokerage in 20 seconds. Positions and cost basis from Fidelity, Schwab, E*TRADE, Vanguard, IBKR, Robinhood, and 12,000+ institutions.
The Plaid integration changes the audience. With yfinance and Massive, the user types tickers. With Plaid, the user authenticates a brokerage and the book arrives. Same downstream math; new top-of-funnel.
It's worth being explicit, because the absence of these things is the design, not an oversight:
What it is: a fast, inspectable, plain-English layer between a finance professional and the math they used to wait three weeks for IT to deliver.