Trading Agents
==============

Agent Configuration
-------------------

The project keeps agent behavior in configuration files:

.. code-block:: text

   configs/agents/
   ├── single_asset_ppo.json
   └── multiple_asset_ppo.json

Reusable training code lives under ``agentic_trading/training/`` and reads those configs. Training modules should not contain hardcoded data paths or plotting functions.

Training Commands
-----------------

Generate derived data first:

.. code-block:: bash

   npm run preprocess

Then run the agent modules when PPO outputs need to be refreshed:

.. code-block:: bash

   npm run train:single
   npm run train:multi

The equivalent direct Python commands are:

.. code-block:: bash

   python3 -m agentic_trading.training.single_asset_ppo --config configs/agents/single_asset_ppo.json
   python3 -m agentic_trading.training.multiple_asset_ppo --config configs/agents/multiple_asset_ppo.json

Single-Asset PPO
----------------

The single-asset agent trains one PPO policy per commodity CSV under ``data/training/commodity_outputs/``. It uses price and sentiment features and emits per-step actions:

* ``0``: hold
* ``1``: buy
* ``2``: sell

Multi-Asset PPO
---------------

The multi-asset agent uses a shared policy over aluminium, copper, and nickel. It emits one action per commodity and tracks shared portfolio state.

Both modules write test-split evaluations and full-dataset diagnostics. Full-dataset outputs are useful for visual inspection, but out-of-sample conclusions should come from the walk-forward test split.

Dashboard Visualization
-----------------------

The browser demo does not retrain PPO models. If generated PPO outputs exist under ``data/agent_outputs/``, it reads them and visualizes:

* action chosen
* greedy action
* probability of hold, buy, and sell
* entropy and normalized entropy
* net worth and reward
* position when available

The dashboard discovers output files dynamically from ``data/agent_outputs/single_asset_ppo`` and ``data/agent_outputs/multiple_asset_ppo``. Single-asset filenames include the commodity slug, so the gym can switch between available assets. Changing ``n_splits`` or switching multi-asset capital mode does not require a UI code change as long as the trainer filename conventions are preserved.

Decision-marker opacity is derived from confidence:

.. code-block:: text

   confidence = max(prob_hold, prob_buy, prob_sell)

Higher confidence produces stronger, larger markers.

Generated Output Policy
-----------------------

``data/agent_outputs/`` is ignored by Git. Outputs are generated by training commands and can be regenerated from raw data plus preprocessing.