AI Diagnostics & Observability Engineer

Sage Care · HQ

Ashby Posted Mar 18, 2026 First seen May 22, 2026

About Sage Care

Sage Care is a fast-growing Series A healthcare technology startup founded by leaders from Apple, Uber, and Carbon Health. We recently emerged from stealth with $20 million in funding led by Yosemite, and investors including General Catalyst, Metrodora Ventures (co-founded by Chelsea Clinton), OVTR.VC, SV Angel, Liquid 2 Ventures, Seven Stars, Refract Ventures, AME Cloud Ventures, and Apolo Ohno.

Our founding story and vision were profiled in Forbes, highlighting Sage Care’s mission to build an “air-traffic-control system for healthcare.”

With a strong customer pipeline, Sage Care is transforming healthcare by simplifying care navigation. Our platform makes it easier for patients to find the right doctor, helps providers focus on those who need them most, and ensures faster access to care. By harnessing clinically grounded AI and real-time optimization, we improve operational efficiency, increase system capacity, and deliver better patient outcomes at scale.

The Role

Every day, our AI agents handle thousands of real patient conversations.

Those conversations contain the information needed to improve the system. Today much of that learning process is still manual. Humans review calls, identify issues, investigate failures, and work with engineers to improve agent behavior.

Your mission is to build the systems that close that loop.

As a Senior AI Reliability Engineer, you will own the feedback and quality platform that helps our agents learn from production conversations. You will build systems that detect failures, analyze root causes, incorporate human feedback, and drive continuous improvements in agent performance.

This role sits at the intersection of AI evaluation, production reliability, feedback systems, and agent quality. You will work closely with AI engineers, operations teams, and product leaders to ensure our agents become more accurate, more reliable, and more capable over time.

What You'll Do

Build AI Quality and Feedback Systems

  • Design systems that continuously analyze production conversations and identify quality issues

  • Develop automated workflows that transform human feedback into actionable improvements

  • Build evaluation pipelines that measure agent performance across key dimensions

  • Create systems that detect recurring failure patterns and cluster similar issues

  • Design mechanisms for routing issues to the appropriate AI, engineering, or operational workflows

Improve Agent Reliability

  • Investigate production failures and identify root causes across transcription, reasoning, retrieval, and orchestration systems

  • Build tooling that helps engineers quickly understand why an agent behaved a certain way

  • Establish quality metrics and reliability standards for production agents

  • Partner with AI engineers to measure the impact of improvements and validate fixes

Automate Learning Loops

  • Build systems that reduce manual effort required to improve agents

  • Create workflows that propose fixes, recommendations, or improvements based on production behavior

  • Explore approaches for automatically incorporating feedback into agent development workflows

  • Help define the long-term architecture for self-improving AI systems

Partner Across Teams

  • Work closely with AI engineers developing production agents

  • Collaborate with operations teams who review calls and provide quality feedback

  • Help define processes for identifying, prioritizing, and resolving agent issues

  • Mentor junior engineers working in the AI quality and evaluation space

What We're Looking For

We're looking for an engineer who has built and operated production AI systems and is excited about improving their quality over time.

You likely have experience in one or more of the following:

  • Building AI evaluation platforms

  • Developing feedback systems for ML and LLM applications

  • Operating production AI agents or conversational AI systems

  • Creating observability, reliability, or quality tooling for AI products

  • Designing automated workflows that improve model or agent performance

Strong candidates will have:

  • 5+ years of software engineering experience

  • Experience building production systems in Python, Java or Rust

  • Experience working with LLMs, AI agents, or conversational AI applications

  • Strong backend engineering skills and systems thinking

  • Experience working with ambiguous problems and defining solutions from first principles

  • Excellent communication and collaboration skills

Bonus points for:

  • Experience at AI-native companies building production AI applications

  • Experience with evaluation frameworks and model quality measurement

  • Experience with voice AI systems

  • Experience designing human-in-the-loop workflows

  • Experience building internal platforms used by engineering teams

Why This Role Matters

AI agents are only as good as their ability to learn from real-world usage.

As our systems scale, manually identifying and fixing issues does not scale with them.

The systems you build will directly influence how quickly our agents improve, how reliably they serve patients, and how effectively healthcare organizations can trust AI in critical workflows.

This is an opportunity to work on one of the most important challenges in production AI: turning real-world interactions into continuous learning and improvement.