How to Build a Knowledge Base AI Agent Using LangChain, LangGraph, LangSmith, Azure, AWS, and OCI
GENERATIVE AI
Subbu D
3/1/20252 min read


Artificial Intelligence agents that can understand, recall, and reason over enterprise data are at the heart of digital transformation. In this blog, we’ll walk through building a Knowledge Base AI Agent using cutting-edge open-source tools like LangChain, LangGraph, and LangSmith, integrated with cloud services from Azure, AWS, and OCI (Oracle Cloud Infrastructure).
Whether you're developing an AI assistant for internal documentation, customer support, or compliance management, this guide provides a modular, production-ready architecture for scalable, reliable solutions.
🔧 Tools & Technologies
Component Tool/Service
Framework LangChain
Workflow Engine LangGraph
Observability LangSmith
Embedding/LLM Services. Azure OpenAI, AWS Bedrock, OCI AI Services
Vector Store Pinecone, ChromaDB, FAISS, Azure Cognitive Search, Amazon Kendra, or OCI Search with OpenSearch
Storage Azure Blob, Amazon S3, or OCI Object Storage
🧠 Step 1: Define the Use Case
Let’s say you're building an AI assistant that answers questions based on your company’s internal documentation stored across different formats (PDFs, DOCX, HTML, etc.).
Your objectives:
Ingest unstructured documents
Embed & store knowledge in a vector database
Use LangChain to orchestrate retrieval-augmented generation (RAG)
Build multi-step reasoning with LangGraph
Monitor pipeline and fine-tune via LangSmith
Deploy securely via Azure, AWS, or OCI
📥 Step 2: Ingest and Chunk Documents
Use LangChain’s document loaders to ingest your data.
from langchain.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = DirectoryLoader('data/', glob='**/*.pdf', loader_cls=PyPDFLoader)
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = splitter.split_documents(documents)
🧬 Step 3: Generate Embeddings
Option A: Azure OpenAI
from langchain.embeddings import AzureOpenAIEmbeddings
embeddings = AzureOpenAIEmbeddings(
deployment="your-embedding-deployment",
openai_api_key="your-key",
openai_api_base="https://<your-resource>.openai.azure.com/",
openai_api_version="2023-05-15"
)
Option B: AWS Bedrock (Titan Embeddings)
from langchain.embeddings import BedrockEmbeddings
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1")
Option C: OCI AI Embedding
Use the OCI Generative AI SDK with LangChain’s CustomEmbeddings.
🧠 Step 4: Store Embeddings in a Vector Store
Option A: Azure Cognitive Search
LangChain supports Azure Search out of the box.
from langchain.vectorstores import AzureCognitiveSearch
vectorstore = AzureCognitiveSearch(
azure_search_endpoint="https://<your-service>.search.windows.net",
azure_search_key="your-key",
index_name="docs-index",
embedding_function=embeddings
)
Option B: Amazon Kendra / OpenSearch
For more tailored solutions, use OpenSearch or integrate with LangChain’s FAISS/Chroma for local dev.
🤖 Step 5: Create a RAG Chain with LangChain
from langchain.chains import RetrievalQA
from langchain.chat_models import AzureChatOpenAI
llm = AzureChatOpenAI(deployment_name="gpt-4", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever()
)
🔄 Step 6: Build Reasoning Workflows with LangGraph
LangGraph lets you define complex workflows as graphs of LangChain components.
from langgraph.graph import StateGraph
from langchain.schema.runnable import RunnableLambda
builder = StateGraph()
builder.add_node("question", RunnableLambda(qa_chain))
builder.add_edge("question", "end")
graph = builder.compile()
result = graph.invoke({"input": "What is our refund policy?"})
For complex flows (e.g., classification → summarization → answer generation), define multiple nodes and conditional branches.
📊 Step 7: Monitor and Debug with LangSmith
LangSmith captures traces, errors, and performance metrics.
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
Run your pipeline and monitor everything in the LangSmith dashboard. You can tag runs, evaluate prompt quality, and visualize graphs.
☁️ Step 8: Deployment via Azure / AWS / OCI
Deploy LangChain Agent as a FastAPI app:
from fastapi import FastAPI
app = FastAPI()
@app.post("/query")
def query(input: str):
result = qa_chain.run(input)
return {"response": result}
Hosting Options:
Cloud Hosting Option
Azure Azure App Service / Azure Kubernetes Service
AWS. ECS / Lambda / SageMaker Endpoint
OCI. Oracle Functions / OCI Data Science Notebook
Security Tip:
Use managed identity or secret vaults (Azure Key Vault, AWS Secrets Manager, OCI Vault) for API keys and credentials.
📌 Bonus: Real-time Collaboration with LangGraph Agents
Use LangGraph’s agent support to build agents that can:
Ask clarifying questions
Search internal systems
Escalate to a human when confidence is low
✅ Conclusion
By combining LangChain's orchestration, LangGraph’s flow control, LangSmith’s observability, and the power of Azure, AWS, or OCI cloud ecosystems, you can build powerful, scalable Knowledge Base AI agents tailored for real-world enterprise needs.