Data
Current Files
Path |
Source |
Purpose |
|---|---|---|
|
Commodity snapshot |
Base LME price rows |
|
Commodity snapshot |
Curated news events |
|
Generated |
Joined price, news, and sentiment rows |
|
Generated |
Per-commodity training data for single-asset bots |
|
Generated |
PPO evaluation outputs used by the trading bots gym |
Only data/raw/prices.csv and data/raw/news.csv are required checked-in data. The other paths are generated and ignored by Git.
Primary Dashboard Schema
prices_with_sentiment.csv is the dashboard’s main fact table:
date,commodity,price,news_ids,news_count,news_items,news_summary,negative,neutral,positive,sentiment_score,finbert_negative,finbert_neutral,finbert_positive,finbert_sentiment_score,finbert_label
news_events.csv is a normalized event table generated from raw news:
event_id,date,event_day,title,url,impacted_commodities,summary
impacted_commodities is a semicolon-separated list of canonical slugs. This preserves the real relationship that one news item can affect multiple assets.
news_items stores the full list of matched news objects as JSON inside the generated CSV row. This solves the case where one price row has more than one relevant news item, while news_summary remains a compact combined text field for sentiment scoring and quick chart summaries.
The lightweight negative, neutral, positive, and sentiment_score fields are retained for MVP compatibility and existing PPO configs. The finbert_* fields are generated with ProsusAI/finbert. The pipeline scores each normalized news event once, caches those outputs in data/processed/finbert_event_sentiment.csv, and averages all linked event scores onto each price row.
The loader maps commodities into canonical slugs:
copper_lmenickel_lmealuminium_lme
Agent Output Schema
The dashboard normalizes single-asset and multi-asset PPO outputs into one UI shape:
date, commodity, action, prob_hold, prob_buy, prob_sell, entropy, net_worth, reward
For the trading bots gym layer, opacity is derived from decision confidence:
confidence = max(prob_hold, prob_buy, prob_sell)
Lower opacity means the agent was more uncertain.
The agent-output loader discovers files rather than relying on a fixed split count. Supported filename patterns are:
data/agent_outputs/single_asset_ppo/evaluation_<commodity>_split_<n>.csv
data/agent_outputs/single_asset_ppo/full_dataset_predictions_<commodity>_split_<n>.csv
data/agent_outputs/multiple_asset_ppo/evaluation_split_<n>_multi_asset_<mode>.csv
data/agent_outputs/multiple_asset_ppo/evaluation_full_dataset_split_<n>_multi_asset_<mode>.csv
Full-dataset diagnostic outputs include a phase column. The trading bots gym plots the full series and uses that field to draw the vertical transition from training history to test period.
Refresh Rule
Keep raw data snapshots in data/raw/. Generate processed visualization data and bot training data with:
npm run preprocess
Keep transformation logic in agentic_trading/preprocessing.py, src/lib/data, and src/lib/analytics. Avoid embedding data assumptions directly inside React components.