Architecture

The repository is organized around one simple rule: generation logic stays in Python, while the Next.js app only loads and visualizes derived files.

Layers

Python pipeline

agentic_trading/ contains generation code:

pipeline_common.py: shared config, CSV, and split helpers
preprocessing.py: raw prices/news to derived dashboard and training data
finbert_sentiment.py: FinBERT scoring and cache helpers
prediction_features.py: shared lagged-price, rolling, time, calendar, and sentiment feature builder
prediction_baseline.py, prediction_arimax.py, prediction_gaussian_process.py, prediction_ridge_arx.py, prediction_lightgbm.py, prediction_lightgbm_direct.py, and prediction_lstm.py: forecast generators
training/: PPO training modules

This is the main refactor boundary for reproducibility. Forecasting and training modules should not own UI logic, plotting logic, or browser-only assumptions.

App-side loading

src/lib/data/ owns local file loading and normalization for the dashboard. The React components consume typed data objects rather than reading CSV files directly.

The main loaders are:

loaders.ts
agent-loaders.ts
prediction-loaders.ts

Analytics helpers

src/lib/analytics/ contains deterministic helpers for:

commodity normalization and metadata
chart viewport math
signal summaries
news source labels

UI

src/components/dashboard/ contains reusable chart and control components. The dashboard owns interaction state such as:

selected commodity
shared viewport range
marker and line styling
panel open/closed state

The UI should stay thin: it reads generated files and renders them.

Current modularity improvements

The main cleanup applied in this review was:

shared Python pipeline utilities moved into agentic_trading/pipeline_common.py
preprocessing config coercion moved into a typed PreprocessingConfig
duplicate prediction config removed
multi-asset loader stopped hardcoding a fixed commodity list and now discovers commodity price columns from the output schema
docs were rewritten to match the current four-chart dashboard

Non-goals

Still intentionally out of scope:

in-browser retraining
a backend API service
database-backed persistence
notebook-style plotting embedded in training code