Data
Source boundary
The only required source files are:
data/raw/prices.csvdata/raw/news.csv
Everything else is either generated or a committed demo snapshot.
Generated and committed artifacts
Some derived files are intentionally committed because the deployed static demo depends on them:
data/processed/finbert_event_sentiment.csvdata/agent_outputs/**data/prediction_outputs/**
Other derived files are regenerated during preprocessing or local training runs.
Primary derived tables
data/processed/news_events.csv
Normalized event table generated from raw news:
event_id,date,event_day,title,url,impacted_commodities,summary
impacted_commodities is a semicolon-separated list of canonical slugs.
data/processed/prices_with_news.csv
Joined price/news table used before sentiment enrichment:
date,commodity,price,news_ids,news_count,news_items,news_summary
news_items stores the full event payload list as JSON so the dashboard can show multiple events for a single date row without flattening them away.
data/processed/prices_with_sentiment.csv
Main dashboard fact table:
date,commodity,price,news_ids,news_count,news_items,news_summary,negative,neutral,positive,sentiment_score,finbert_negative,finbert_neutral,finbert_positive,finbert_sentiment_score,finbert_label
Training and model outputs
data/training/commodity_outputs/*.csv
Per-commodity training inputs for single-asset PPO and forecast modules.
data/agent_outputs/**/*.csv
Saved PPO outputs used by the Decision Chart.
data/prediction_outputs/**/*.csv
Saved baseline, Ridge ARX, ARIMAX, LightGBM, Gaussian-process, and LSTM outputs used by the Predictions Chart.
Refresh workflow
Generate preprocessing outputs with:
npm run preprocess
Optional regeneration of model outputs:
npm run train:single
npm run train:multi
npm run predict:baseline
npm run predict:arimax
npm run predict:ridge
npm run predict:lightgbm
npm run predict:lightgbm:direct
npm run predict:gp
npm run predict:lstm