Build a Groundedness Verification Tool Using Upstage API and LangChain

Upstage’s Groundedness Check service provides a powerful API for verifying that AI-generated responses are firmly anchored in reliable source material. By submitting context–answer pairs to the Upstage endpoint, we can instantly determine whether the supplied context supports a given answer and receive a confidence assessment of that grounding. In this tutorial, we demonstrate how to […] The post Build a Groundedness Verification Tool Using Upstage API and LangChain appeared first on MarkTechPost.

Jun 24, 2025 - 10:00

Build a Groundedness Verification Tool Using Upstage API and LangChain

Upstage’s Groundedness Check service provides a powerful API for verifying that AI-generated responses are firmly anchored in reliable source material. By submitting context–answer pairs to the Upstage endpoint, we can instantly determine whether the supplied context supports a given answer and receive a confidence assessment of that grounding. In this tutorial, we demonstrate how to utilize Upstage’s core capabilities, including single-shot verification, batch processing, and multi-domain testing, to ensure that our AI systems produce factual and trustworthy content across diverse subject areas.

Copy CodeCopiedUse a different Browser

!pip install -qU langchain-core langchain-upstage


import os
import json
from typing import List, Dict, Any
from langchain_upstage import UpstageGroundednessCheck


os.environ["UPSTAGE_API_KEY"] = "Use Your API Key Here"

We install the latest LangChain core and the Upstage integration package, import the necessary Python modules for data handling and typing, and set our Upstage API key in the environment to authenticate all subsequent groundedness check requests.

Copy CodeCopiedUse a different Browser

class AdvancedGroundednessChecker:
    """Advanced wrapper for Upstage Groundedness Check with batch processing and analysis"""
   
    def __init__(self):
        self.checker = UpstageGroundednessCheck()
        self.results = []
   
    def check_single(self, context: str, answer: str) -> Dict[str, Any]:
        """Check groundedness for a single context-answer pair"""
        request = {"context": context, "answer": answer}
        response = self.checker.invoke(request)
       
        result = {
            "context": context,
            "answer": answer,
            "grounded": response,
            "confidence": self._extract_confidence(response)
        }
        self.results.append(result)
        return result
   
    def batch_check(self, test_cases: List[Dict[str, str]]) -> List[Dict[str, Any]]:
        """Process multiple test cases"""
        batch_results = []
        for case in test_cases:
            result = self.check_single(case["context"], case["answer"])
            batch_results.append(result)
        return batch_results
   
    def _extract_confidence(self, response) -> str:
        """Extract confidence level from response"""
        if hasattr(response, 'lower'):
            if 'grounded' in response.lower():
                return 'high'
            elif 'not grounded' in response.lower():
                return 'low'
        return 'medium'
   
    def analyze_results(self) -> Dict[str, Any]:
        """Analyze batch results"""
        total = len(self.results)
        grounded = sum(1 for r in self.results if 'grounded' in str(r['grounded']).lower())
       
        return {
            "total_checks": total,
            "grounded_count": grounded,
            "not_grounded_count": total - grounded,
            "accuracy_rate": grounded / total if total > 0 else 0
        }


checker = AdvancedGroundednessChecker()

The AdvancedGroundednessChecker class wraps Upstage’s groundedness API into a simple, reusable interface that lets us run both single and batch context–answer checks while accumulating results. It also includes helper methods to extract a confidence label from each response and compute overall accuracy statistics across all checks.

Copy CodeCopiedUse a different Browser

print("=== Test Case 1: Height Discrepancy ===")
result1 = checker.check_single(
    context="Mauna Kea is an inactive volcano on the island of Hawai'i.",
    answer="Mauna Kea is 5,207.3 meters tall."
)
print(f"Result: {result1['grounded']}")


print("\n=== Test Case 2: Correct Information ===")
result2 = checker.check_single(
    context="Python is a high-level programming language created by Guido van Rossum in 1991. It emphasizes code readability and simplicity.",
    answer="Python was made by Guido van Rossum & focuses on code readability."
)
print(f"Result: {result2['grounded']}")


print("\n=== Test Case 3: Partial Information ===")
result3 = checker.check_single(
    context="The Great Wall of China is approximately 13,000 miles long and took over 2,000 years to build.",
    answer="The Great Wall of China is very long."
)
print(f"Result: {result3['grounded']}")


print("\n=== Test Case 4: Contradictory Information ===")
result4 = checker.check_single(
    context="Water boils at 100 degrees Celsius at sea level atmospheric pressure.",
    answer="Water boils at 90 degrees Celsius at sea level."
)
print(f"Result: {result4['grounded']}")

We run four standalone groundedness checks, covering a factual error in height, a correct statement, a vague partial match, and a contradictory claim, using the AdvancedGroundednessChecker. It prints each Upstage result to illustrate how the service flags grounded versus ungrounded answers across these different scenarios.

Copy CodeCopiedUse a different Browser

print("\n=== Batch Processing Example ===")
test_cases = [
    {
        "context": "Shakespeare wrote Romeo and Juliet in the late 16th century.",
        "answer": "Romeo and Juliet was written by Shakespeare."
    },
    {
        "context": "The speed of light is approximately 299,792,458 meters per second.",
        "answer": "Light travels at about 300,000 kilometers per second."
    },
    {
        "context": "Earth has one natural satellite called the Moon.",
        "answer": "Earth has two moons."
    }
]


batch_results = checker.batch_check(test_cases)
for i, result in enumerate(batch_results, 1):
    print(f"Batch Test {i}: {result['grounded']}")


print("\n=== Results Analysis ===")
analysis = checker.analyze_results()
print(f"Total checks performed: {analysis['total_checks']}")
print(f"Grounded responses: {analysis['grounded_count']}")
print(f"Not grounded responses: {analysis['not_grounded_count']}")
print(f"Groundedness rate: {analysis['accuracy_rate']:.2%}")


print("\n=== Multi-domain Testing ===")
domains = {
    "Science": {
        "context": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, & water into glucose and oxygen.",
        "answer": "Plants use photosynthesis to make food from sunlight and CO2."
    },
    "History": {
        "context": "World War II ended in 1945 after the surrender of Japan following the atomic bombings.",
        "answer": "WWII ended in 1944 with Germany's surrender."
    },
    "Geography": {
        "context": "Mount Everest is the highest mountain on Earth, located in the Himalayas at 8,848.86 meters.",
        "answer": "Mount Everest is the tallest mountain and is located in the Himalayas."
    }
}


for domain, test_case in domains.items():
    result = checker.check_single(test_case["context"], test_case["answer"])
    print(f"{domain}: {result['grounded']}")

We execute a series of batched groundedness checks on predefined test cases, print individual Upstage judgments, and then compute and display overall accuracy metrics. It also conducts multi-domain validations in science, history, and geography to illustrate how Upstage handles groundedness across different subject areas.

Copy CodeCopiedUse a different Browser

def create_test_report(checker_instance):
    """Generate a detailed test report"""
    report = {
        "summary": checker_instance.analyze_results(),
        "detailed_results": checker_instance.results,
        "recommendations": []
    }
   
    accuracy = report["summary"]["accuracy_rate"]
    if accuracy < 0.7:
        report["recommendations"].append("Consider reviewing answer generation process")
    if accuracy > 0.9:
        report["recommendations"].append("High accuracy - system performing well")
   
    return report


print("\n=== Final Test Report ===")
report = create_test_report(checker)
print(f"Overall Performance: {report['summary']['accuracy_rate']:.2%}")
print("Recommendations:", report["recommendations"])


print("\n=== Tutorial Complete ===")
print("This tutorial demonstrated:")
print("• Basic groundedness checking")
print("• Batch processing capabilities")
print("• Multi-domain testing")
print("• Results analysis and reporting")
print("• Advanced wrapper implementation")

Finally, we define a create_test_report helper that compiles all accumulated groundedness checks into a summary report, complete with overall accuracy and tailored recommendations, and then prints out the final performance metrics along with a recap of the tutorial’s key demonstrations.

In conclusion, with Upstage’s Groundedness Check at our disposal, we gain a scalable, domain-agnostic solution for real-time fact verification and confidence scoring. Whether we’re validating isolated claims or processing large batches of responses, Upstage delivers clear, grounded/not-grounded judgments and confidence metrics that enable us to monitor accuracy rates and generate actionable quality reports. By integrating this service into our workflow, we can enhance the reliability of AI-generated outputs and maintain rigorous standards of factual integrity across all applications.

Check out the Codes. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Build a Groundedness Verification Tool Using Upstage API and LangChain appeared first on MarkTechPost.