Data ==== Current Files ------------- .. list-table:: :header-rows: 1 * - Path - Source - Purpose * - ``data/raw/prices.csv`` - Commodity snapshot - Base LME price rows * - ``data/raw/news.csv`` - Commodity snapshot - Curated news events * - ``data/processed/*.csv`` - Generated - Joined price, news, and sentiment rows * - ``data/training/commodity_outputs/*.csv`` - Generated - Per-commodity training data for single-asset bots * - ``data/agent_outputs/**/*.csv`` - Generated - PPO evaluation outputs used by the trading bots gym Only ``data/raw/prices.csv`` and ``data/raw/news.csv`` are required checked-in data. The other paths are generated and ignored by Git. Primary Dashboard Schema ------------------------ ``prices_with_sentiment.csv`` is the dashboard's main fact table: .. code-block:: text date,commodity,price,news_ids,news_count,news_items,news_summary,negative,neutral,positive,sentiment_score,finbert_negative,finbert_neutral,finbert_positive,finbert_sentiment_score,finbert_label ``news_events.csv`` is a normalized event table generated from raw news: .. code-block:: text event_id,date,event_day,title,url,impacted_commodities,summary ``impacted_commodities`` is a semicolon-separated list of canonical slugs. This preserves the real relationship that one news item can affect multiple assets. ``news_items`` stores the full list of matched news objects as JSON inside the generated CSV row. This solves the case where one price row has more than one relevant news item, while ``news_summary`` remains a compact combined text field for sentiment scoring and quick chart summaries. The lightweight ``negative``, ``neutral``, ``positive``, and ``sentiment_score`` fields are retained for MVP compatibility and existing PPO configs. The ``finbert_*`` fields are generated with ``ProsusAI/finbert``. The pipeline scores each normalized news event once, caches those outputs in ``data/processed/finbert_event_sentiment.csv``, and averages all linked event scores onto each price row. The loader maps commodities into canonical slugs: * ``copper_lme`` * ``nickel_lme`` * ``aluminium_lme`` Agent Output Schema ------------------- The dashboard normalizes single-asset and multi-asset PPO outputs into one UI shape: .. code-block:: text date, commodity, action, prob_hold, prob_buy, prob_sell, entropy, net_worth, reward For the trading bots gym layer, opacity is derived from decision confidence: .. code-block:: text confidence = max(prob_hold, prob_buy, prob_sell) Lower opacity means the agent was more uncertain. The agent-output loader discovers files rather than relying on a fixed split count. Supported filename patterns are: .. code-block:: text data/agent_outputs/single_asset_ppo/evaluation__split_.csv data/agent_outputs/single_asset_ppo/full_dataset_predictions__split_.csv data/agent_outputs/multiple_asset_ppo/evaluation_split__multi_asset_.csv data/agent_outputs/multiple_asset_ppo/evaluation_full_dataset_split__multi_asset_.csv Full-dataset diagnostic outputs include a ``phase`` column. The trading bots gym plots the full series and uses that field to draw the vertical transition from training history to test period. Refresh Rule ------------ Keep raw data snapshots in ``data/raw/``. Generate processed visualization data and bot training data with: .. code-block:: bash npm run preprocess Keep transformation logic in ``agentic_trading/preprocessing.py``, ``src/lib/data``, and ``src/lib/analytics``. Avoid embedding data assumptions directly inside React components.