Trading Agents
Agent Configuration
The project keeps agent behavior in configuration files:
configs/agents/
├── single_asset_ppo.json
└── multiple_asset_ppo.json
Reusable training code lives under agentic_trading/training/ and reads those configs. Training modules should not contain hardcoded data paths or plotting functions.
Training Commands
Generate derived data first:
npm run preprocess
Then run the agent modules when PPO outputs need to be refreshed:
npm run train:single
npm run train:multi
The equivalent direct Python commands are:
python3 -m agentic_trading.training.single_asset_ppo --config configs/agents/single_asset_ppo.json
python3 -m agentic_trading.training.multiple_asset_ppo --config configs/agents/multiple_asset_ppo.json
Single-Asset PPO
The single-asset agent trains one PPO policy per commodity CSV under data/training/commodity_outputs/. It uses price and sentiment features and emits per-step actions:
0: hold1: buy2: sell
Multi-Asset PPO
The multi-asset agent uses a shared policy over aluminium, copper, and nickel. It emits one action per commodity and tracks shared portfolio state.
Both modules write test-split evaluations and full-dataset diagnostics. Full-dataset outputs are useful for visual inspection, but out-of-sample conclusions should come from the walk-forward test split.
Dashboard Visualization
The browser demo does not retrain PPO models. If generated PPO outputs exist under data/agent_outputs/, it reads them and visualizes:
action chosen
greedy action
probability of hold, buy, and sell
entropy and normalized entropy
net worth and reward
position when available
The dashboard discovers output files dynamically from data/agent_outputs/single_asset_ppo and data/agent_outputs/multiple_asset_ppo. Single-asset filenames include the commodity slug, so the gym can switch between available assets. Changing n_splits or switching multi-asset capital mode does not require a UI code change as long as the trainer filename conventions are preserved.
Decision-marker opacity is derived from confidence:
confidence = max(prob_hold, prob_buy, prob_sell)
Higher confidence produces stronger, larger markers.
Generated Output Policy
data/agent_outputs/ is ignored by Git. Outputs are generated by training commands and can be regenerated from raw data plus preprocessing.