Trading Agents ============== Agent Configuration ------------------- The project keeps agent behavior in configuration files: .. code-block:: text configs/agents/ ├── single_asset_ppo.json └── multiple_asset_ppo.json Reusable training code lives under ``agentic_trading/training/`` and reads those configs. Training modules should not contain hardcoded data paths or plotting functions. Training Commands ----------------- Generate derived data first: .. code-block:: bash npm run preprocess Then run the agent modules when PPO outputs need to be refreshed: .. code-block:: bash npm run train:single npm run train:multi The equivalent direct Python commands are: .. code-block:: bash python3 -m agentic_trading.training.single_asset_ppo --config configs/agents/single_asset_ppo.json python3 -m agentic_trading.training.multiple_asset_ppo --config configs/agents/multiple_asset_ppo.json Single-Asset PPO ---------------- The single-asset agent trains one PPO policy per commodity CSV under ``data/training/commodity_outputs/``. It uses price and sentiment features and emits per-step actions: * ``0``: hold * ``1``: buy * ``2``: sell Multi-Asset PPO --------------- The multi-asset agent uses a shared policy over aluminium, copper, and nickel. It emits one action per commodity and tracks shared portfolio state. Both modules write test-split evaluations and full-dataset diagnostics. Full-dataset outputs are useful for visual inspection, but out-of-sample conclusions should come from the walk-forward test split. Dashboard Visualization ----------------------- The browser demo does not retrain PPO models. If generated PPO outputs exist under ``data/agent_outputs/``, it reads them and visualizes: * action chosen * greedy action * probability of hold, buy, and sell * entropy and normalized entropy * net worth and reward * position when available The dashboard discovers output files dynamically from ``data/agent_outputs/single_asset_ppo`` and ``data/agent_outputs/multiple_asset_ppo``. Single-asset filenames include the commodity slug, so the gym can switch between available assets. Changing ``n_splits`` or switching multi-asset capital mode does not require a UI code change as long as the trainer filename conventions are preserved. Decision-marker opacity is derived from confidence: .. code-block:: text confidence = max(prob_hold, prob_buy, prob_sell) Higher confidence produces stronger, larger markers. Generated Output Policy ----------------------- ``data/agent_outputs/`` is ignored by Git. Outputs are generated by training commands and can be regenerated from raw data plus preprocessing.