AI Diagnostics & Observability Engineer

About Sage Care

Sage Care is a fast-growing Series A healthcare technology startup founded by leaders from Apple, Uber, and Carbon Health. We recently emerged from stealth with $20 million in funding led by Yosemite, and investors including General Catalyst, Metrodora Ventures (co-founded by Chelsea Clinton), OVTR.VC, SV Angel, Liquid 2 Ventures, Seven Stars, Refract Ventures, AME Cloud Ventures, and Apolo Ohno.

Our founding story and vision were profiled in Forbes, highlighting Sage Care’s mission to build an “air-traffic-control system for healthcare.”

With a strong customer pipeline, Sage Care is transforming healthcare by simplifying care navigation. Our platform makes it easier for patients to find the right doctor, helps providers focus on those who need them most, and ensures faster access to care. By harnessing clinically grounded AI and real-time optimization, we improve operational efficiency, increase system capacity, and deliver better patient outcomes at scale.

The Role

Every day, our AI agents handle thousands of real patient conversations.

Those conversations contain the information needed to improve the system. Today much of that learning process is still manual. Humans review calls, identify issues, investigate failures, and work with engineers to improve agent behavior.

Your mission is to build the systems that close that loop.

As a Senior AI Reliability Engineer, you will own the feedback and quality platform that helps our agents learn from production conversations. You will build systems that detect failures, analyze root causes, incorporate human feedback, and drive continuous improvements in agent performance.

This role sits at the intersection of AI evaluation, production reliability, feedback systems, and agent quality. You will work closely with AI engineers, operations teams, and product leaders to ensure our agents become more accurate, more reliable, and more capable over time.

What You'll Do

Build AI Quality and Feedback Systems

Design systems that continuously analyze production conversations and identify quality issues
Develop automated workflows that transform human feedback into actionable improvements
Build evaluation pipelines that measure agent performance across key dimensions
Create systems that detect recurring failure patterns and cluster similar issues
Design mechanisms for routing issues to the appropriate AI, engineering, or operational workflows

Improve Agent Reliability

Investigate production failures and identify root causes across transcription, reasoning, retrieval, and orchestration systems
Build tooling that helps engineers quickly understand why an agent behaved a certain way
Establish quality metrics and reliability standards for production agents
Partner with AI engineers to measure the impact of improvements and validate fixes

Automate Learning Loops

Build systems that reduce manual effort required to improve agents
Create workflows that propose fixes, recommendations, or improvements based on production behavior
Explore approaches for automatically incorporating feedback into agent development workflows
Help define the long-term architecture for self-improving AI systems

Partner Across Teams

Work closely with AI engineers developing production agents
Collaborate with operations teams who review calls and provide quality feedback
Help define processes for identifying, prioritizing, and resolving agent issues
Mentor junior engineers working in the AI quality and evaluation space

What We're Looking For

We're looking for an engineer who has built and operated production AI systems and is excited about improving their quality over time.

You likely have experience in one or more of the following:

Building AI evaluation platforms
Developing feedback systems for ML and LLM applications
Operating production AI agents or conversational AI systems
Creating observability, reliability, or quality tooling for AI products
Designing automated workflows that improve model or agent performance

Strong candidates will have:

5+ years of software engineering experience
Experience building production systems in Python, Java or Rust
Experience working with LLMs, AI agents, or conversational AI applications
Strong backend engineering skills and systems thinking
Experience working with ambiguous problems and defining solutions from first principles
Excellent communication and collaboration skills

Bonus points for:

Experience at AI-native companies building production AI applications
Experience with evaluation frameworks and model quality measurement
Experience with voice AI systems
Experience designing human-in-the-loop workflows
Experience building internal platforms used by engineering teams

Why This Role Matters

AI agents are only as good as their ability to learn from real-world usage.

As our systems scale, manually identifying and fixing issues does not scale with them.

The systems you build will directly influence how quickly our agents improve, how reliably they serve patients, and how effectively healthcare organizations can trust AI in critical workflows.

This is an opportunity to work on one of the most important challenges in production AI: turning real-world interactions into continuous learning and improvement.