Core modules

class chatbot_eval.types.Sample(question, expected_answer)[source]

Bases: object

One evaluation row loaded from the FAQ CSV file.

Parameters:
  • question (str)

  • expected_answer (str)

question: str
expected_answer: str
class chatbot_eval.types.Completion(text, thinking=None, raw=<factory>)[source]

Bases: object

Raw completion returned by a chat backend.

Parameters:
  • text (str)

  • thinking (str | None)

  • raw (Dict[str, Any])

text

Final model output shown to the user.

Type:

str

thinking

Optional reasoning trace, when the provider exposes one.

Type:

str | None

raw

Provider-native payload kept for debugging.

Type:

Dict[str, Any]

text: str
thinking: str | None
raw: Dict[str, Any]
class chatbot_eval.types.BotResult(answer, metadata=<factory>)[source]

Bases: object

Answer returned by a bot together with trace metadata.

Parameters:
  • answer (str)

  • metadata (Dict[str, Any])

answer: str
metadata: Dict[str, Any]
class chatbot_eval.types.MetricResult(name, score, details=<factory>)[source]

Bases: object

Result produced by one metric for one question-answer pair.

Parameters:
  • name (str)

  • score (float)

  • details (Dict[str, Any])

name: str
score: float
details: Dict[str, Any]
class chatbot_eval.evaluation.evaluator.Evaluator(metrics)[source]

Bases: object

Evaluate bots against samples and collect row-level outputs.

Parameters:

metrics (list[object])

metrics: list[object]
evaluate_sample(sample, bot)[source]
Parameters:
Return type:

dict[str, str]

evaluate_dataset(samples, bots)[source]

Evaluate all bots against all samples.

Parameters:
  • samples (list[Sample])

  • bots (list[object])

Return type:

list[dict[str, str]]