How to Create AI Agent to Read/Analyze PDF with CopilotKit, CrewAI and NextJS ~ SudutPHP

If you’ve ever spent hours combing through PDFs for one tiny piece of information, you know the struggle is real. Researchers juggle dozens of academic papers. Students wrestle with lecture notes and textbooks. Developers dig through technical docs for that one obscure parameter. It’s like searching for a needle in a haystack, except the haystack is digital, and your patience is running on fumes.

An AI-powered PDF assistant flips the script. Instead of you doing the heavy lifting, the AI does it for you. It reads your PDFs, understands them, and answers your questions in plain English. Need a summary of a 50-page report? Done. Want to know the publication date of a paper? Got it. Curious about a specific topic buried in a 200-page manual? No problem. It’s like having a 24/7 research buddy who’s lightning-fast, never complains, and always knows where to look.

Here’s the best part: we’re building this with tools that make it fun and approachable, even if you’re not a coding wizard. We’ll use:

CopilotKit for a slick, chatty front-end interface.
Next.js to wrap it all in a modern web app.
FastAPI for a zippy back-end to handle uploads and queries.
CrewAI as the brainy AI agent that reads and analyzes your PDFs.

Think of it as assembling a superhero team: each tool has its own superpower, and together, they create something unstoppable. Let’s break it down step-by-step, in plain English, so you can follow along without getting lost in tech jargon.

Before we dive into the code, let’s map out the plan. We’re creating a web app that lets you upload PDFs, ask questions about them, and get smart, context-aware answers. It’s like giving your PDFs a voice and a brain. The app will have three main parts:

Front-End (Next.js + CopilotKit): This is the face of our app—a clean, user-friendly website where you can upload PDFs and chat with your AI assistant. It’s built with Next.js for speed and CopilotKit for a polished chat interface that feels like texting a friend.
Back-End (FastAPI): This is the behind-the-scenes engine. It stores your PDFs, handles your questions, and passes them to the AI for processing. FastAPI keeps things lightweight and blazing fast.
AI Brain (CrewAI): This is the star of the show—an AI agent that reads your PDFs, understands their content, and answers your questions with precision. CrewAI is like a super-smart reader who can summarize, search, and explain anything in your documents.

Here’s how it all comes together:

You upload a PDF to the web app.
The back-end saves it and sends it to CrewAI for analysis.
You ask a question in the chat interface (e.g., “What’s the main argument in this paper?”).
CrewAI digs through the PDF, finds the answer, and sends it back in plain English.
The front-end displays the response, nice and tidy, with no fuss.

This setup means no more endless scrolling, no more CTRL+F marathons, and no more guesswork. Plus, we’re keeping the code minimal by letting CrewAI handle the heavy lifting, so our front-end stays sleek and our back-end stays simple. Ready to make it happen? Let’s start with the back-end—our app’s foundation.

Step 1: Building the Back-End with FastAPI

Alright, let’s give our AI a place to live. The back-end is like the engine room of our app—it handles PDF uploads, stores them, processes user questions, and talks to CrewAI to get answers. We’re using FastAPI, a Python framework that’s fast (hence the name), easy to use, and perfect for building APIs that connect our front-end to our AI.

Setting Up FastAPI

First, we need to set up a FastAPI application. If you don’t have Python installed, grab it from python.org (version 3.8 or higher works great). Then, create a new project folder—let’s call it pdf-assistant—and set up a virtual environment to keep things tidy:

mkdir pdf-assistant
cd pdf-assistant
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Now, install FastAPI and a few dependencies we’ll need, like uvicorn (to run the server) and python-multipart (to handle file uploads):

pip install fastapi uvicorn python-multipart

Let’s create a basic FastAPI app. In your project folder, create a file called main.py and add this code:

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
import os

app = FastAPI()

# Directory to store uploaded PDFs
UPLOAD_DIR = "./uploads"
if not os.path.exists(UPLOAD_DIR):
    os.makedirs(UPLOAD_DIR)

@app.post("/upload")
async def upload_pdf(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_DIR, file.filename)
    with open(file_path, "wb") as f:
        f.write(await file.read())
    return JSONResponse(content={"message": f"File {file.filename} uploaded successfully"})

@app.get("/")
async def root():
    return {"message": "Welcome to the PDF Assistant API!"}

This code sets up a FastAPI app with two endpoints:

/upload: Accepts PDF files and saves them to an uploads folder.
/: A simple welcome message to test that the server’s running.

To run the server, use this command:

uvicorn main:app --reload

Open your browser and go to http://127.0.0.1:8000. You should see a JSON message: {"message": "Welcome to the PDF Assistant API!"}. If you want to check the API docs, head to http://127.0.0.1:8000/docs—FastAPI generates them automatically, which is super handy for testing.

Adding a Query Endpoint

Now, let’s add an endpoint to handle user questions. For now, we’ll make it simple and just echo back the query (we’ll connect it to CrewAI later). Update main.py like this:

@app.post("/query")
async def query_pdf(data: dict):
    question = data.get("question")
    return JSONResponse(content={"question": question, "answer": "This is a placeholder response!"})

This /query endpoint expects a JSON payload with a question field and returns it with a placeholder answer. We’ll flesh it out with CrewAI in a bit, but first, let’s make sure PDFs are stored properly and retrievable.

Storing and Managing PDFs

To keep track of uploaded PDFs, let’s add a simple file management system. We’ll store metadata (like filenames and upload dates) in a JSON file to make it easy to list available documents. Create a documents.json file in the project folder and update main.py to manage it:

import json
from datetime import datetime

# Load or initialize documents metadata
DOCS_FILE = "documents.json"
if not os.path.exists(DOCS_FILE):
    with open(DOCS_FILE, "w") as f:
        json.dump([], f)

def save_document_metadata(filename):
    with open(DOCS_FILE, "r") as f:
        docs = json.load(f)
    docs.append({"filename": filename, "uploaded_at": datetime.now().isoformat()})
    with open(DOCS_FILE, "w") as f:
        json.dump(docs, f)

@app.post("/upload")
async def upload_pdf(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_DIR, file.filename)
    with open(file_path, "wb") as f:
        f.write(await file.read())
    save_document_metadata(file.filename)
    return JSONResponse(content={"message": f"File {file.filename} uploaded successfully"})

@app.get("/documents")
async def list_documents():
    with open(DOCS_FILE, "r") as f:
        docs = json.load(f)
    return JSONResponse(content={"documents": docs})

Now, when you upload a PDF, its metadata gets saved, and the /documents endpoint lists all uploaded files. Test it by uploading a PDF using the API docs at /docs or with a tool like Postman. You’re building a solid foundation—nice work!

Step 2: Bringing in the AI with CrewAI

Here’s where things get exciting: we’re giving our app a brain with CrewAI, an AI framework that excels at understanding and processing documents. CrewAI will read our PDFs, extract their content, and answer questions with context-aware responses. Let’s integrate it into our FastAPI back-end.

Installing CrewAI

First, install CrewAI (make sure you’re in your virtual environment)

pip install crewai

CrewAI needs a language model to power its analysis. For this tutorial, we’ll assume you’re using a model like OpenAI’s GPT-4 (via an API key) or a local model like LLaMA. If you’re using OpenAI, set your API key as an environment variable:

export OPENAI_API_KEY="your-api-key-here"

Processing PDFs with CrewAI

To analyze PDFs, we need to extract their text. Let’s use the PyPDF2 library for this. Install it:

pip install PyPDF2

Now, let’s update our /query endpoint to use CrewAI. We’ll extract text from the PDF, pass it to a CrewAI agent, and get a response. Here’s the updated main.py:

from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.responses import JSONResponse
import os
import json
from datetime import datetime
import PyPDF2
from crewai import Agent, Task, Crew

app = FastAPI()

# Directory and metadata setup
UPLOAD_DIR = "./uploads"
DOCS_FILE = "documents.json"
if not os.path.exists(UPLOAD_DIR):
    os.makedirs(UPLOAD_DIR)
if not os.path.exists(DOCS_FILE):
    with open(DOCS_FILE, "w") as f:
        json.dump([], f)

def save_document_metadata(filename):
    with open(DOCS_FILE, "r") as f:
        docs = json.load(f)
    docs.append({"filename": filename, "uploaded_at": datetime.now().isoformat()})
    with open(DOCS_FILE, "w") as f:
        json.dump(docs, f)

def extract_pdf_text(file_path):
    try:
        with open(file_path, "rb") as f:
            reader = PyPDF2.PdfReader(f)
            text = ""
            for page in reader.pages:
                text += page.extract_text() or ""
        return text
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Error reading PDF: {str(e)}")

# CrewAI setup
pdf_analyst = Agent(
    role="PDF Analyst",
    goal="Analyze PDFs and answer questions accurately",
    backstory="You’re a skilled researcher who reads PDFs with laser focus, extracting insights and answering questions with clarity.",
    verbose=True,
    allow_delegation=False
)

@app.post("/upload")
async def upload_pdf(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_DIR, file.filename)
    with open(file_path, "wb") as f:
        f.write(await file.read())
    save_document_metadata(file.filename)
    return JSONResponse(content={"message": f"File {file.filename} uploaded successfully"})

@app.get("/documents")
async def list_documents():
    with open(DOCS_FILE, "r") as f:
        docs = json.load(f)
    return JSONResponse(content={"documents": docs})

@app.post("/query")
async def query_pdf(data: dict):
    question = data.get("question")
    filename = data.get("filename")
    if not question or not filename:
        raise HTTPException(status_code=400, detail="Question and filename are required")
    
    file_path = os.path.join(UPLOAD_DIR, filename)
    if not os.path.exists(file_path):
        raise HTTPException(status_code=404, detail="File not found")
    
    # Extract text
    pdf_text = extract_pdf_text(file_path)
    
    # Create a CrewAI task
    task = Task(
        description=f"Answer the following question based on this PDF content: {question}\n\nContent: {pdf_text}",
        agent=pdf_analyst,
        expected_output="A clear, concise answer to the question."
    )
    
    # Run the Crew
    crew = Crew(agents=[pdf_analyst], tasks=[task], verbose=2)
    result = crew.kickoff()
    
    return JSONResponse(content={"question": question, "answer": result})

This code does a lot, so let’s unpack it:

We define a pdf_analyst agent with CrewAI, giving it a role and goal to analyze PDFs.
The /query endpoint takes a question and filename, checks if the file exists, and extracts its text using PyPDF2.
We create a CrewAI task that feeds the PDF text and question to the agent.
The Crew runs the task, and we return the AI’s response as JSON.

Test it by uploading a PDF and sending a POST request to /query with a JSON body like:

{
  "question": "What is the main topic of this document?",
  "filename": "your-uploaded-file.pdf"
}

You should get a response with the AI’s answer. If the PDF is a research paper, for example, the AI might say, “The main topic is machine learning applications in healthcare.” Cool, right? Our back-end is now a PDF-reading powerhouse!

Step 3: Crafting a Sleek Front-End with Next.js and CopilotKit

Our back-end is humming along, storing PDFs and answering questions with CrewAI’s help. Now, let’s give it a friendly face—a web app where users can upload files and chat with the AI. We’re using Next.js for a modern, fast front-end and CopilotKit for a chat interface that’s smooth and intuitive.

Setting Up Next.js

First, create a new Next.js app. In a new terminal (outside your FastAPI project), run:

npx create-next-app@latest pdf-assistant-frontend
cd pdf-assistant-frontend
npm install

Choose the default options for simplicity (TypeScript, Tailwind, etc.). Start the development server to make sure it’s working:

npm run dev

Visit http://localhost:3000 to see the default Next.js page. Now, let’s install CopilotKit to add our chat interface:

npm install @copilotkit/react-ui

Building the UI

We want two main components:

A sidebar for uploading PDFs and listing documents.
A chat interface for asking questions and seeing answers.

Let’s replace the default page.tsx in app/page.tsx with this:

"use client";

import { useState, useEffect } from "react";
import { CopilotChat } from "@copilotkit/react-ui";
import styles from "./page.module.css";

export default function Home() {
  const [documents, setDocuments] = useState([]);
  const [selectedFile, setSelectedFile] = useState(null);
  const [uploading, setUploading] = useState(false);

  // Fetch list of documents
  useEffect(() => {
    fetch("http://127.0.0.1:8000/documents")
      .then((res) => res.json())
      .then((data) => setDocuments(data.documents || []))
      .catch((err) => console.error("Error fetching documents:", err));
  }, []);

  // Handle file upload
  const handleUpload = async (event) => {
    const file = event.target.files[0];
    if (!file) return;

    setUploading(true);
    const formData = new FormData();
    formData.append("file", file);

    try {
      const res = await fetch("http://127.0.0.1:8000/upload", {
        method: "POST",
        body: formData,
      });
      const data = await res.json();
      console.log(data.message);
      // Refresh document list
      const docsRes = await fetch("http://127.0.0.1:8000/documents");
      const docsData = await docsRes.json();
      setDocuments(docsData.documents || []);
    } catch (err) {
      console.error("Error uploading file:", err);
    } finally {
      setUploading(false);
    }
  };

  // Handle file selection
  const handleSelectFile = (filename) => {
    setSelectedFile(filename);
  };

  return (
    <div className={styles.container}>
      <div className={styles.sidebar}>
        <h2>PDF Assistant</h2>
        <input
          type="file"
          accept=".pdf"
          onChange={handleUpload}
          disabled={uploading}
          className={styles.uploadInput}
        />
        {uploading && <p>Uploading...</p>}
        <h3>Uploaded Documents</h3>
        <ul>
          {documents.map((doc) => (
            <li
              key={doc.filename}
              onClick={() => handleSelectFile(doc.filename)}
              className={selectedFile === doc.filename ? styles.selected : ""}
            >
              {doc.filename}
            </li>
          ))}
        </ul>
      </div>
      <div className={styles.chat}>
        {selectedFile ? (
          <CopilotChat
            endpoint="http://127.0.0.1:8000/query"
            payload={{ filename: selectedFile }}
            placeholder="Ask a question about the PDF..."
          />
        ) : (
          <p>Select a document to start chatting!</p>
        )}
      </div>
    </div>
  );
}

And update app/page.module.css to style it:

.container {
  display: flex;
  height: 100vh;
}

.sidebar {
  width: 300px;
  padding: 20px;
  background-color: #f4f4f9;
  border-right: 1px solid #ddd;
}

.sidebar h2 {
  margin-bottom: 20px;
  font-size: 1.5rem;
}

.uploadInput {
  margin-bottom: 20px;
}

.sidebar h3 {
  margin-top: 20px;
  font-size: 1.2rem;
}

.sidebar ul {
  list-style: none;
  padding: 0;
}

.sidebar li {
  padding: 10px;
  cursor: pointer;
  border-radius: 5px;
}

.sidebar li:hover {
  background-color: #e0e0e0;
}

.selected {
  background-color: #0070f3;
  color: white;
}

.chat {
  flex: 1;
  padding: 20px;
  display: flex;
  flex-direction: column;
  justify-content: center;
  align-items: center;
}

This creates a layout with:

A sidebar where you can upload PDFs and see a list of documents.
A chat area powered by CopilotKit that connects to our FastAPI /query endpoint.

When you select a document, the chat interface activates, sending the filename along with your questions to the back-end. CopilotKit handles the chat UI, complete with message history and typing indicators, so we don’t have to build it from scratch.

Connecting Front-End and Back-End

Make sure your FastAPI server is running (uvicorn main:app --reload) on http://127.0.0.1:8000. Then, start the Next.js app (npm run dev) and visit http://localhost:3000. Upload a PDF, select it from the sidebar, and ask a question in the chat box. The front-end sends the question to FastAPI, which passes it to CrewAI, and you get a response displayed in the chat. It’s like magic, but it’s just code!

Step 4: Adding Advanced Features

Our app is already awesome—it uploads PDFs, analyzes them, and answers questions. But let’s take it up a notch with some advanced features to make it even more powerful.

Result Panel for Summaries

Let’s add a panel that shows a summary of the PDF’s key points when you select it. Update app/page.tsx to fetch a summary when a file is selected:

// Add to state
const [summary, setSummary] = useState("");

// Fetch summary when file is selected
useEffect(() => {
  if (selectedFile) {
    fetch("http://127.0.0.1:8000/query", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        question: "Provide a brief summary of the document.",
        filename: selectedFile,
      }),
    })
      .then((res) => res.json())
      .then((data) => setSummary(data.answer))
      .catch((err) => console.error("Error fetching summary:", err));
  }
}, [selectedFile]);

// Update JSX
<div className={styles.chat}>
  {selectedFile ? (
    <>
      <div className={styles.summary}>
        <h3>Document Summary</h3>
        <p>{summary || "Loading summary..."}</p>
      </div>
      <CopilotChat
        endpoint="http://127.0.0.1:8000/query"
        payload={{ filename: selectedFile }}
        placeholder="Ask a question about the PDF..."
      />
    </>
  ) : (
    <p>Select a document to start chatting!</p>
  )}
</div>

And in app/page.module.css:

.summary {
  width: 100%;
  max-width: 600px;
  margin-bottom: 20px;
  padding: 15px;
  background-color: #f9f9f9;
  border-radius: 8px;
}

.summary h3 {
  margin-bottom: 10px;
}

Now, when you select a PDF, you’ll see a summary above the chat, giving you a quick overview of the document’s content.

Caching for Performance

To avoid processing the same question twice, let’s add caching to the back-end. We’ll use a simple in-memory cache for now. Update main.py:

from functools import lru_cache

@lru_cache(maxsize=100)
def cached_query(question, filename):
    file_path = os.path.join(UPLOAD_DIR, filename)
    pdf_text = extract_pdf_text(file_path)
    task = Task(
        description=f"Answer the following question based on this PDF content: {question}\n\nContent: {pdf_text}",
        agent=pdf_analyst,
        expected_output="A clear, concise answer to the question."
    )
    crew = Crew(agents=[pdf_analyst], tasks=[task], verbose=2)
    return crew.kickoff()

@app.post("/query")
async def query_pdf(data: dict):
    question = data.get("question")
    filename = data.get("filename")
    if not question or not filename:
        raise HTTPException(status_code=400, detail="Question and filename are required")
    
    file_path = os.path.join(UPLOAD_DIR, filename)
    if not os.path.exists(file_path):
        raise HTTPException(status_code=404, detail="File not found")
    
    answer = cached_query(question, filename)
    return JSONResponse(content={"question": question, "answer": answer})

The @lru_cache decorator stores up to 100 recent query results, so repeated questions get instant responses, saving time and resources.

Step 5: Testing and Polishing

Let’s test the whole system. Upload a PDF (try a research paper or a manual), select it, and ask questions like:

“What’s the main conclusion?”
“When was this published?”
“Summarize the first section.”

The AI should respond quickly and accurately, with the summary panel giving you a quick overview. If something’s off, check the console logs for errors (e.g., file not found, API key issues).

To polish the UI, consider:

Adding loading spinners for uploads and queries.
Styling the chat bubbles for better readability.
Including a “Clear Chat” button to reset the conversation.

Here’s a quick example for a loading spinner in app/page.tsx:

const [queryLoading, setQueryLoading] = useState(false);

// Wrap CopilotChat
<CopilotChat
  endpoint="http://127.0.0.1:8000/query"
  payload={{ filename: selectedFile }}
  placeholder="Ask a question about the PDF..."
  onSend={(message) => {
    setQueryLoading(true);
    return new Promise((resolve) => {
      setTimeout(() => {
        setQueryLoading(false);
        resolve();
      }, 1000); // Simulate async
    });
  }}
/>
{queryLoading && <p className={styles.loading}>Loading response...</p>}

And in page.module.css:

.loading {
  margin-top: 10px;
  color: #0070f3;
}

Step 6: Taking It to the Next Level

Our PDF assistant is already a rockstar, but there’s always room to grow. Here are some ideas to make it even better:

Security: Add user authentication (e.g., with OAuth or Firebase) to protect private PDFs. Use role-based permissions to control who can access what.
Scalability: If you’re handling thousands of PDFs, switch to cloud storage (like AWS S3) and use a database (like PostgreSQL) instead of documents.json. For faster searches, try a vector database like Pinecone.
Multi-Document Analysis: Want the AI to cross-reference multiple PDFs? Use a vector database to index content and enable semantic searches across documents.
Industry-Specific Features: Tailor the AI for domains like healthcare (HIPAA compliance), legal (contract analysis), or finance (report summarization).
Offline Support: Add a local model option for users who want to run the AI without an internet connection.

Conclusion

And there you have it—a fully functional AI PDF assistant that turns your documents into dynamic, chatty knowledge bases. Let’s recap what we built:

A FastAPI back-end that stores PDFs and handles queries.
A CrewAI agent that reads and analyzes PDFs with precision.
A Next.js + CopilotKit front-end that’s sleek, intuitive, and fun to use.
Advanced features like summaries and caching to make it production-ready.

This isn’t just a cool project—it’s a glimpse into the future of document management. No more digging through pages or losing your mind over misplaced info. Your PDFs are now a living resource, ready to answer your questions at a moment’s notice.

Want to keep tinkering? Try adding user accounts, scaling it for a team, or teaching the AI to handle specialized documents. The possibilities are endless, and you’ve got the foundation to make it happen.

Thanks for joining me on this adventure! If you run into any snags or have ideas to share, drop a comment—I’d love to hear how you’re using your PDF assistant.

SudutPHP

PHP Coding and Corner

Menu

Saturday, April 12, 2025