Trading Agents

Agent Configuration

The project keeps agent behavior in configuration files:

configs/agents/
├── single_asset_ppo.json
└── multiple_asset_ppo.json

Reusable training code lives under agentic_trading/training/ and reads those configs. Training modules should not contain hardcoded data paths or plotting functions.

Training Commands

Generate derived data first:

npm run preprocess

Then run the agent modules when PPO outputs need to be refreshed:

npm run train:single
npm run train:multi

The equivalent direct Python commands are:

python3 -m agentic_trading.training.single_asset_ppo --config configs/agents/single_asset_ppo.json
python3 -m agentic_trading.training.multiple_asset_ppo --config configs/agents/multiple_asset_ppo.json

Single-Asset PPO

The single-asset agent trains one PPO policy per commodity CSV under data/training/commodity_outputs/. It uses price and sentiment features and emits per-step actions:

  • 0: hold

  • 1: buy

  • 2: sell

Multi-Asset PPO

The multi-asset agent uses a shared policy over aluminium, copper, and nickel. It emits one action per commodity and tracks shared portfolio state.

Both modules write test-split evaluations and full-dataset diagnostics. Full-dataset outputs are useful for visual inspection, but out-of-sample conclusions should come from the walk-forward test split.

Dashboard Visualization

The browser demo does not retrain PPO models. If generated PPO outputs exist under data/agent_outputs/, it reads them and visualizes:

  • action chosen

  • greedy action

  • probability of hold, buy, and sell

  • entropy and normalized entropy

  • net worth and reward

  • position when available

The dashboard discovers output files dynamically from data/agent_outputs/single_asset_ppo and data/agent_outputs/multiple_asset_ppo. Single-asset filenames include the commodity slug, so the gym can switch between available assets. Changing n_splits or switching multi-asset capital mode does not require a UI code change as long as the trainer filename conventions are preserved.

Decision-marker opacity is derived from confidence:

confidence = max(prob_hold, prob_buy, prob_sell)

Higher confidence produces stronger, larger markers.

Generated Output Policy

data/agent_outputs/ is ignored by Git. Outputs are generated by training commands and can be regenerated from raw data plus preprocessing.