Data Layer — Finance

§ 01

The Protocol.

from typing import Protocol

class DataProvider(Protocol):
    def fetch_price_history(self, ticker: str, start: str, end: str | None) -> DataFrame: ...
    def get_adjusted_prices(self, df: DataFrame) -> Series: ...
    def get_options_chain(self, ticker: str, expiry: str) -> DataFrame: ...
    def get_news(self, ticker: str | None, limit: int) -> list[dict]: ...
    # ... and so on, one method per capability

Python's Protocol is structural typing — any object with the right methods satisfies the contract, no inheritance required. yfinance, Massive, and Plaid are independent classes. None of them inherit from DataProvider. They all are one, because they all have the methods.

Design note

Inheritance would have been a mistake. yfinance and Massive have nothing in common at the implementation level — different auth, different serialization, different error semantics. Protocol-based duck typing lets each provider be optimized for its own internals while still being substitutable. This is what "good interface" actually means.

§ 02

The three providers, today.

✓ Default · Shipped

yfinance

Zero-config. Public Yahoo endpoints. Adequate for individual research, slow for cross-sectional work, rate-limited under load.

✓ Shipped

Massive

57 endpoints behind the same Protocol — stocks, options + Greeks, forex, crypto, indices, news, SEC filings, technicals, fundamentals, movers.

→ Roadmap v1.5

Plaid

Connect a brokerage in 20 seconds. Positions and cost basis from Fidelity, Schwab, E*TRADE, Vanguard, IBKR, Robinhood, and 12,000+ institutions.

# swap via environment variable, no code change
DATA_PROVIDER=yfinance     # default
DATA_PROVIDER=massive      # requires MASSIVE_API_KEY
DATA_PROVIDER=plaid        # v1.5, requires PLAID_CLIENT_ID + PLAID_SECRET

§ 03

yfinance — the right default for the wrong reasons.

yfinance is not the most reliable data source. It is rate-limited, occasionally serves stale data, and depends on Yahoo's web endpoints which Yahoo can break at any time without notice. It is the right default anyway. Why: zero config, zero cost, works for the user-trying-the-tool-for-the-first-time. The bar for "default" is "does the README example work without an API key in 90 seconds." yfinance clears it. Nothing else does.

The consequence: yfinance is for evaluation and individual analyst work. The moment the workload becomes cross-sectional or production, the recommendation is to switch to Massive. The Protocol means switching is one env var.

§ 04

Massive — the production layer.

Wrapped via a thin client (finance_mcp/providers/massive/client.py) and a set of mappers that translate Massive's response shape into the Protocol's expected types. The split matters:

client.py — HTTP, auth, retries, JSON parsing. Knows about Massive.
mappers.py — translates Massive's payloads into the canonical pandas shapes. Knows about both.
provider.py — implements the Protocol methods by composing client + mappers. Knows only about the Protocol.

If Massive changes their schema tomorrow, only the mappers change. The tools and the Protocol are insulated. This is what "vendor risk mitigation" looks like at the code level.

Massive provider — internal layout

client.pyHTTP, auth, retries~120 LOC

mappers.pyMassive payload → pandas / dict~200 LOC

stocks.pyprice history, ticker info, splits, dividends~150 LOC

options.pychain + Greeks~80 LOC

currencies.pyFX rates and crypto~60 LOC

indices.pybenchmark snapshots~40 LOC

provider.pyProtocol composition~100 LOC

§ 05

Plaid (v1.5) — the audience shift.

yfinance and Massive give the user a way to type tickers and get math. Plaid changes the top of the funnel: the user authenticates a brokerage, and the book arrives. Same downstream tools. New audience — the analyst who never typed tickers because that wasn't the bottleneck.

The Plaid integration adds two Protocol methods:

def get_positions(self, account_id: str) -> list[Holding]: ...
def get_cost_basis(self, account_id: str, ticker: str) -> float: ...

@dataclass(frozen=True)
class Holding:
    ticker:     str
    shares:     float
    cost_basis: float
    account:    str

Read-only. No order entry, no execution, no custody. The integration is an account-linking convenience layer — not a brokerage.

§ 06

What this page is not.

Not a market data vendor. The library does not republish prices. Each provider keeps its own license and rate limits.
Not a tick database. Daily bars only. Intraday is a different problem with different storage and a different user.
Not a normalization layer for everything. The Protocol covers the methods the tools actually use. Adding a method just because some provider has it is how interfaces rot.