Add structured logging and tracing to understand agent decision-making and debug issues.
Should include:
- Detailed logging of LLM prompts, responses, and token usage
- Tool execution traces (inputs, outputs, duration)
- Decision point logging (why agent chose specific tools)
- Export traces in standard formats (JSON)
- Visual/CLI tools to inspect agent runs