Generative AI Performance

Our product is designed to help you quantify the performance of your LLM-powered app and identify any inconsistencies, hallucinations, or other mistakes that may affect your product's quality.

With that track the product performance and usage metrics over time.

Evaluations Guardrails Analytics

Evaluations

Ragas Evaluators

✓ Context Contains Enough Information: Does the retrieved context contain enough information to answer the user?

✓ Faithfulness: Is the response faithful to the context?

✓ Answer Relevancy: Does the response answer the user's input?
‍
LangWatch evaluators run on your entire RAG pipeline, not just on the answers.

Evaluations

Custom Evaluations for your GenAI

Create your own evaluator on LangWatch in no time. Use an our LLM boolean check, score, or a custom function such as "matches Regex" to create your evaluator. Become very creative to get the best results!

Detect non-behaving users

✓ PII Detection
Detect when user input or LLM response contains any personally identifiable information.
‍
✓ Content Moderation
Determine if a response is harmful, toxic, violent, threatening or sexual.
‍
✓Jailbreaking users (Prompt injections)
Detects if the input attempts to Jailbreak the LLM to produce answers and execute tasks that it was not supposed to.

ANalytics

Track GenAI performance over time

Dive deep into your analytics and segment by any attribute. With all the metrics available you can create your own graphs. ‍

Generative AI Performance

Langwatch offers a large libary of preset evaluators

Ragas Evaluators

Custom Evaluations for your GenAI

Detect non-behaving users

Track GenAI performance over time

Ready to get started? Create an account today