Skip to content

An intelligent, RAG-based AI chatbot built with Streamlit and GPT-4o to act as an interactive portfolio, featuring a real-time analytics dashboard.

Notifications You must be signed in to change notification settings

Nayan-Reddy/Nayan-chatbot

Repository files navigation

AI-Powered Chatbot 🤖

An intelligent, self-aware AI chatbot that serves as a dynamic, interactive portfolio for a user, powered by a sophisticated RAG pipeline and advanced NLP.

Python Streamlit OpenAI spaCy Google Sheets Plotly


📈 Live Demos

Experience the chatbot in action: ➡️ Interact with the Live Demo Here

  Nayan's AI Assistant Demo GIF

View the live analytics of the chatbot: ➡️ Live Analytics Dashboard
  

  Analytics Dashboard Demo GIF


📖 Table of Contents


🌟 Introduction

Nayan's AI Assistant is a full-stack chatbot application designed to be more than just an information source—it's an intelligent, interactive representation of my professional profile. It leverages a Hybrid Retrieval-Augmented Generation (RAG) pipeline to provide recruiters and collaboraters an accurate, context-aware answers about my skills, projects, and background.

This project was built to demonstrate a deep understanding of modern AI application development, from sophisticated NLP-powered guardrails and conversational memory to a complete, cloud-based analytics pipeline for monitoring user interactions in real-time.


✨ Core Features

This project is more than a simple Q&A bot. It's an end-to-end showcase of modern AI application development.

  • 🧠 Hybrid RAG System: A multi-step retrieval strategy ensures fast, accurate, and relevant answers. The logic prioritizes responses in the following order:

    1. High-Confidence Semantic Match: Uses a BAAI/bge-small-en sentence transformer to find the most similar question from a pre-computed vector database. An answer is returned if the cosine similarity score is ≥ 0.87.
    2. Lexical Fuzzy Match: If semantic search fails, it uses fuzzywuzzy's token sort ratio to find a close match. An answer is returned if the score is ≥ 90.
    3. Generative Fallback with Context: For novel or nuanced questions, the bot uses OpenAI's GPT-4o model, providing it with the recent conversation history for context.
  • 🛡️ Intelligent Guardrails:

    • NER-Powered Scope Control: Utilizes spaCy for Named Entity Recognition (NER) to detect if a question mentions another person's name. This prevents the bot from answering questions that are outside its scope of representing Nayan Reddy Soma.
    • Sensitive Topic Filtering: A custom keyword filter deflects inappropriate or overly personal questions with professional, pre-defined responses.
  • 📊 Real-time Analytics Pipeline:

    • Every user interaction with the live chatbot is logged in real-time to a Google Sheet using the gspread API.
    • Data points captured include session_id, timestamp, user_query, final_response, response_source (e.g., fallback, llm_general), and response_time_ms.
  • 📈 Decoupled Analytics Dashboard:

    • A separate Streamlit app (analytics.py) reads the live data from Google Sheets to provide insights on:
      • KPIs: Total users, total questions, average questions per user, and average response time.
      • Performance: A pie chart showing the distribution of response sources (how often the RAG system vs. the LLM provides an answer).
      • Engagement: A bar chart of daily usage and a table of the most frequently asked questions.
  • 🗣️ Context-Aware Follow-ups: The chatbot remembers the context of the last interaction, allowing it to handle follow-up questions like "tell me more about that" or "why was that important?" with high relevance.


🛠️ Architecture & Tech Stack

This project is built with a modern, end-to-end Python stack designed for performance and scalability.

Category Technology / Library Purpose
Web Framework Streamlit For building the interactive chat UI and the analytics dashboard.
Backend Logic Python 3.10+ Core application logic, data processing, and integrations.
NLP (Retrieval) sentence-transformers To generate vector embeddings for semantic search (BAAI/bge-small-en model).
scikit-learn For calculating cosine similarity between text embeddings.
fuzzywuzzy For lexical-based fuzzy string matching as a secondary retrieval layer.
NLP (Guardrails) spaCy (en_core_web_sm) For Named Entity Recognition (NER) to power the smart scope-control guardrail.
Generative AI OpenAI GPT-4o The final generative layer for handling novel and conversational questions.
Database & Logging Google Sheets API (gspread) A robust and free solution for real-time logging and data collection from the cloud.
Data Analysis pandas, plotly For data manipulation and creating visualizations in the analytics dashboard.
Deployment Streamlit Community Cloud For hosting the live chatbot and analytics dashboard.
Dependencies joblib, numpy For serializing/deserializing the embedding file and numerical operations.
Environment Mgmt python-dotenv To manage local environment variables.

⚙️ Setup and Local Installation

To run this project on your local machine, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/Nayan-Reddy/Nayan-chatbot.git
    cd your-repo-name
  2. Set Up a Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install Dependencies: The requirements.txt file is configured to install CPU-specific versions of PyTorch and the direct spaCy model for efficiency.

    pip install -r requirements.txt
  4. Generate Embeddings: This is a crucial one-time preprocessing step. Run this script to create the fallback_embeddings.pkl file from your Q&A data.

    python generate_embeddings.py
  5. Configure Environment Variables/Secrets:

  6. Run the Apps: You can now run the chatbot and the analytics dashboard locally.

    • Run the Chatbot:
      streamlit run app.py
    • Run the Analytics Dashboard:
      streamlit run analytics.py

🔑 Environment Configuration

The application requires two separate files for credentials:

  1. For the GitHub API Token (via OpenAI):

    • Create a file named .env in the project's root directory.
    • Add your GitHub token (used for the OpenAI proxy):
      # .env
      GITHUB_TOKEN="ghp_YOUR_TOKEN_HERE"
  2. For Google Sheets Logging:

    • Create a folder named .streamlit in the project's root directory.
    • Inside that folder, create a file named secrets.toml.
    • Paste your Google Cloud Platform service account JSON credentials here. This is used by both app.py and analytics.py.
              # .streamlit/secrets.toml
      
        [gcp_service_account]
        type = "service_account"
        project_id = "your-gcp-project-id"
        private_key_id = "your-private-key-id"
        private_key = "-----BEGIN PRIVATE KEY-----\nYOUR-PRIVATE-KEY\n-----END PRIVATE KEY-----\n"
        client_email = "your-client-email@your-gcp-project-id.iam.gserviceaccount.com"
        client_id = "your-client-id"
        # ... and so on for the rest of the JSON key file.

🔮 Future Improvements

This project has a strong foundation that can be extended with even more features:

  • User Feedback System: Add thumbs-up/down buttons to log user satisfaction with responses directly into the Google Sheet for finer-grained analysis.
  • Advanced Analytics: Use semantic clustering on the logged questions to identify common user intents that are not yet covered in the fallback_qna.json.
  • Multi-Modal Capabilities: Integrate tools to display images of projects or architecture diagrams directly in the chat when asked.

📫 Get In Touch

I'm a passionate data enthusiast actively seeking opportunities in data analytics and AI. If you're impressed by this project or have any questions, I'd love to connect!

About

An intelligent, RAG-based AI chatbot built with Streamlit and GPT-4o to act as an interactive portfolio, featuring a real-time analytics dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages