Conversation Testing Guide
Overview
This guide helps you analyze the 10 conversation simulations to identify improvement opportunities in the CaseAgent.
How to Run
-
Start the toto-ai-hub server:
cd toto-ai-hub
npm run devServer should be running on
http://localhost:8080 -
Run the test script:
cd toto-ai-hub
.\test-conversations-v2.ps1 -
Review results:
- Check console output for real-time responses
- Review
conversation-results-v2.jsonfor full conversation logs
Test Scenarios
Conversation 1: Affirmative Response Loop Test
Purpose: Verify the bug fix we just implemented
What to check:
- ✅ Does agent progress conversation after "Si" responses?
- ✅ Does agent avoid repeating the same case introduction?
- ✅ Does agent move to actionable steps (how to help, donation process)?
- ❌ Does agent repeat case info multiple times? (This would indicate the fix didn't work)
Expected behavior:
- First "Si": Agent should explain HOW to help (donation steps, sharing, etc.)
- Second "Si": Agent should ask specific questions or provide concrete next steps
- Third "Si": Agent should continue progressing, not loop
Conversation 2: Vague Questions
Purpose: Test agent's ability to guide users who don't know what to ask
What to check:
- ✅ Does agent provide clear, actionable options?
- ✅ Does agent ask clarifying questions?
- ✅ Does agent offer multiple ways to help?
- ❌ Does agent get stuck or ask user to be more specific without helping?
Expected behavior:
- Agent should proactively suggest ways to help
- Agent should provide clear next steps
- Agent should be helpful even with vague queries
Conversation 3: Emotional User (Worried)
Purpose: Test empathy and emotional intelligence
What to check:
- ✅ Does agent acknowledge user's concern?
- ✅ Does agent provide reassurance (without making medical promises)?
- ✅ Does agent offer urgent action options?
- ❌ Does agent ignore emotional cues?
- ❌ Does agent make promises about outcomes?
Expected behavior:
- Agent should show empathy
- Agent should provide urgent help options
- Agent should be honest about what it can/can't guarantee
Conversation 4: Information Overload Request
Purpose: Test agent's ability to provide digestible information
What to check:
- ✅ Does agent break down information into digestible chunks?
- ✅ Does agent prioritize most important information?
- ✅ Can agent clarify specific points when asked?
- ❌ Does agent dump too much information at once?
- ❌ Does agent get confused when asked to clarify?
Expected behavior:
- Agent should provide structured, prioritized information
- Agent should be able to clarify specific points
- Agent should avoid overwhelming the user
Conversation 5: Topic Change
Purpose: Test agent's ability to adapt to changing user intent
What to check:
- ✅ Does agent smoothly transition from adoption to donation?
- ✅ Does agent acknowledge the change in intent?
- ✅ Does agent provide relevant information for new topic?
- ❌ Does agent get confused or stuck?
- ❌ Does agent continue talking about old topic?
Expected behavior:
- Agent should acknowledge the change
- Agent should smoothly transition to new topic
- Agent should provide relevant information for new intent
Conversation 6: Fully Funded Case
Purpose: Test handling of completed/fully-funded cases
What to check:
- ✅ Does agent acknowledge case is fully funded?
- ✅ Does agent explain that additional donations still help?
- ✅ Does agent suggest other ways to help?
- ❌ Does agent discourage donations unnecessarily?
- ❌ Does agent not mention the case is fully funded?
Expected behavior:
- Agent should acknowledge funding status
- Agent should explain that additional support still helps
- Agent should offer alternative ways to help
Conversation 7: Minimal Responses
Purpose: Test agent's ability to handle very short user messages
What to check:
- ✅ Does agent understand intent from minimal responses?
- ✅ Does agent still progress conversation?
- ✅ Does agent ask clarifying questions when needed?
- ❌ Does agent get stuck on short responses?
- ❌ Does agent ask for more detail without helping?
Expected behavior:
- Agent should infer intent from context
- Agent should progress conversation naturally
- Agent should be helpful even with minimal input
Conversation 8: Technical Questions
Purpose: Test accuracy of technical information
What to check:
- ✅ Does agent explain verification process correctly?
- ✅ Does agent correctly explain direct transfer (NOT through platform)?
- ✅ Does agent provide accurate information about donation process?
- ❌ Does agent say donations go "through the platform"? (WRONG)
- ❌ Does agent provide incorrect information?
Expected behavior:
- Agent should explain direct bank transfer to guardian alias
- Agent should explain verification process accurately
- Agent should NOT say money goes through platform
Conversation 9: Multiple Help Options
Purpose: Test agent's knowledge of all ways to help
What to check:
- ✅ Does agent suggest multiple ways to help (donate, share, adopt)?
- ✅ Does agent explain sharing helps?
- ✅ Does agent mention Totitos for sharing?
- ❌ Does agent only suggest donations?
- ❌ Does agent dismiss sharing as not helpful?
Expected behavior:
- Agent should suggest multiple ways to help
- Agent should explain that sharing is valuable
- Agent should mention Totitos system
Conversation 10: Missing Information
Purpose: Test graceful handling of incomplete case data
What to check:
- ✅ Does agent handle missing information gracefully?
- ✅ Does agent offer alternatives (TRF) when alias is missing?
- ✅ Does agent explain what information is missing?
- ❌ Does agent break or give errors?
- ❌ Does agent make up information?
Expected behavior:
- Agent should acknowledge missing information
- Agent should offer alternatives (TRF)
- Agent should be honest about what it doesn't know
Common Issues to Look For
🔴 Critical Issues
- Repeating same information - Agent loops on same content
- Incorrect donation process - Says "through platform" instead of "direct transfer"
- Wrong TRF translation - Says "Transferencia Rápida de Fondos" instead of "Fondo de Rescate de Toto"
- Missing banking alias - Doesn't provide alias when available
- Making up information - Inventing case details not provided
🟡 Medium Issues
- Not progressing conversation - Stuck in same place
- Not adapting to user style - Too formal/casual for user
- Missing empathy - Doesn't acknowledge emotions
- Information overload - Too much at once
- Not offering alternatives - Only suggests one option
🟢 Minor Issues
- Awkward phrasing - Could be more natural
- Too verbose - Could be more concise
- Missing context - Doesn't reference previous messages
- Generic responses - Not personalized enough
Analysis Template
For each conversation, document:
### Conversation X: [Scenario Name]
**Issues Found:**
- [ ] Issue 1: Description
- [ ] Issue 2: Description
**What Worked Well:**
- ✅ Good point 1
- ✅ Good point 2
**Recommendations:**
1. Suggestion 1
2. Suggestion 2
**Severity:** 🔴 Critical / 🟡 Medium / 🟢 Minor
Next Steps After Testing
- Document all issues in a markdown file
- Prioritize fixes by severity
- Create tickets for each issue
- Test fixes with same scenarios
- Iterate until all critical issues are resolved