Our product is designed to help you quantify the performance of your LLM-powered app and identify any inconsistencies, hallucinations, or other mistakes that may affect your product's quality.
With that track the product performance and usage metrics over time.
All evaluators can be run programmatically using our SDK, and be viewed and acted on via our SaaS platform.
With these evaluators, you exactly know what input or output your product is not faithful, missing answer relevancy, or where the language is not consistent.
✓ Context Contains Enough Information: Does the retrieved context contain enough information to answer the user?
✓ Faithfulness: Is the response faithful to the context?
✓ Answer Relevancy: Does the response answer the user's input?
LangWatch evaluators run on your entire RAG pipeline, not just on the answers.
Create your own evaluator on LangWatch in no time. Use an our LLM boolean check, score, or a custom function such as "matches Regex" to create your evaluator. Become very creative to get the best results!
✓ PII Detection
Detect when user input or LLM response contains any personally identifiable information.
✓ Content Moderation
Determine if a response is harmful, toxic, violent, threatening or sexual.
✓Jailbreaking users (Prompt injections)
Detects if the input attempts to Jailbreak the LLM to produce answers and execute tasks that it was not supposed to.
Dive deep into your analytics and segment by any attribute. With all the metrics available you can create your own graphs.