I’ve spent plenty of time thinking about UX, but mostly the kind you can see: screens, buttons, layouts. Then I decided to build an AI agent to run a complex, multi-step process.
It was a completely different game. Designing the “UI” was easy, because there was none. The challenge was in designing the conversation itself. The way the agent asks, reacts, clarifies, and guides.
On my first attempt, it looked fine in the code. The system prompt was long, detailed, and “bulletproof” on paper. But in testing, the flow felt robotic, repetitive, and easy to lose track of. The logic was there, but the experience was broken.
Here’s what I ran into during testing, and the changes that actually made it work.
Challenge 1: The Robot Interrogator
In the first version, the agent would drop several questions at once. It felt like filling out a form.
Problem: Multiple questions at once overload the user and lead to rushed, shallow answers.
Fix: One question at a time, always. Wait for a reply before moving on.
Prompt Example: Ask questions sequentially. In steps with multiple questions, you must ask them one at a time, waiting for the user’s answer before proceeding.
Why it worked: The whole interaction feels lighter, and each answer gets the attention it needs.
Challenge 2: The AI with Amnesia
During testing, I noticed the agent would sometimes ask for information I’d already given. It instantly made the flow feel dumb.
Problem: Forgetting context breaks the illusion of intelligence.
Fix: Bake in context awareness. If something was said before, acknowledge it and skip ahead.
Prompt Example: If the user already answered this in a previous step, acknowledge it and move on (e.g., “Great, looks like you’ve already shared that…”).
Why it worked: Even a small “as you mentioned earlier” changes the feeling from scripted to attentive.
Challenge 3: Garbage In, Garbage Out
In early runs, I gave vague inputs like “we help businesses grow” just to see what happened. The output was equally vague.
Problem: If the agent passively accepts poor input, the result will be poor too.
Fix: Make it push for clarity. If an answer is too broad, ask for specifics and give examples.
Prompt Example: If a user’s answer is vague, politely ask for more specific details. If they still struggle, offer an example to guide them.
Why it worked: It raised the bar for the quality of information going in, which improved the output dramatically.
Challenge 4: Lost in Conversation
This process had 9 steps. Without signposting, it was easy to lose track.
Problem: Long flows feel endless without a sense of progress.
Fix: Announce each step clearly, like “Step 2/9: Core Audience”.
Prompt Example: Begin every new step with a standalone title line in this format: ## Step [X]/[Y]: [Step Name]
Why it worked: It gave structure. Knowing where you are makes the process feel manageable.
Challenge 5: Show You Can Think
The biggest improvement came when the agent stopped waiting for me to spell everything out. Instead, it took what I had already shared and worked out the deeper points on its own. For example, it could look at the benefits I mentioned and infer the underlying problems they solved. Then it played that back and asked if it was correct.
Problem: If the agent only records what you say, you end up with gaps. Most people don’t know how to phrase the deeper layers of a problem, and they shouldn’t have to.
Fix: Add a step where the agent generates the next layer of insight, things the user didn’t explicitly provide, and then asks for confirmation.
Prompt Example: From the details provided, infer the underlying problems they point to. Present them clearly and ask: “Did I get this right?”
Why it worked: It made the agent feel smart and collaborative. The user doesn’t need to know all the frameworks or jargon. The agent brings that structure for them, but still checks if it matches reality.
What I learned
I went in thinking prompt design was about clever instructions for the model. Now I see most of the work is designing the interaction itself. If I had to start again, I’d spend as much time mapping the flow of the conversation as I do writing the system prompt. The tech can be perfect, but if the experience is off, the whole thing fails.