ChannelGPT is an AI-powered content analysis tool that allows users to query and analyze any YouTube channel's content using natural language. The application processes video transcripts and uses advanced language models to provide detailed analysis and insights.
SEE DEMO VIDEO: https://youtu.be/NSxQu9Pn-Cc
- Analyze any YouTube channel by providing its Channel ID
- Query video content using natural language
- AI-powered analysis of transcripts
- User-friendly web interface
- RESTful API endpoints
- Real-time content updates
- Python 3.8+
- OpenAI API key
- YouTube API key
- Clone the repository:
git clone <repository-url>
cd ChannelGPT
- Install required packages:
pip install -r requirements.txt
- Create a configuration file:
- Copy
config.template.py
toconfig.py
- Add your API keys to
config.py
:OPENAI_API_KEY = "your_openai_api_key" YOUTUBE_API_KEY = "your_youtube_api_key" CHANNEL_ID = "UCatt7TBjfBkiJWx8khav_Gg"
- Copy
The application consists of two components that need to be running simultaneously:
- Start the FastAPI backend server:
uvicorn main:app --reload --port 8001
to reload server after changes:
uvicorn main:app --reload
- In a new terminal, start the Gradio web interface:
python app.py
The application will be available at:
- FastAPI backend: http://localhost:8001
- Gradio interface: http://localhost:7860 (default)
- Access the Gradio web interface in your browser
- Enter the YouTube Channel ID you want to analyze
- You can find a channel's ID by:
- Going to the channel's page
- Right-clicking and viewing page source
- Searching for "channelId"
- Or using online tools like Comment Picker
- You can find a channel's ID by:
- Enter your query in the text box
- Click "Submit" to get AI-powered analysis
- Use example queries for inspiration
- POST
/query
- Endpoint for querying the knowledge base
- Request body:
{"channel_id": "channel_id_here", "text": "your query here"}
- Returns analysis based on video transcripts
main.py
: FastAPI backend and core functionalityapp.py
: Gradio web interfaceconfig.py
: Configuration and API keysrequirements.txt
: Project dependencies
Never commit your config.py
file with actual API keys. It's included in .gitignore
for security.
-
Initialization:
- Set API keys for OpenAI and YouTube in the environment.
- Define paths for saving the knowledge base (FAISS index and metadata).
-
Fetch Latest Video IDs:
- Use the YouTube API to fetch the latest video IDs from a specified channel. This involves: Verifying the existence of the channel.
- Retrieving the most recent videos based on the channel ID and sorting them by date.
-
Download Transcripts:
- For each video ID, download the corresponding video transcripts using yt-dlp. This is done concurrently using a thread pool to speed up the process.
-
Reformat Transcripts:
- Convert the downloaded VTT files into a more usable text format. This involves:
- Reading each VTT file.
- Extracting timestamps and text, and formatting them into a cleaner structure.
- Convert the downloaded VTT files into a more usable text format. This involves:
-
Chunkify Transcripts:
- Break down the formatted transcripts into smaller chunks. This is necessary for processing large texts and improving the manageability of data. Each chunk includes: The text of the chunk.
- Metadata containing the video ID.
-
Knowledge Base Management:
- Load Knowledge Base: Load existing FAISS index and metadata if available.
- Update Knowledge Base: Add new chunks to the FAISS index and update the metadata. This includes:
- Filtering out chunks from videos already present in the metadata.
- Embedding the text of new chunks using OpenAI's embeddings.
- Adding new embeddings to the FAISS index.
- Updating the in-memory document store with new chunks.
- Save Knowledge Base: Save the updated FAISS index and metadata to disk.
-
Query the Knowledge Base:
- Perform a similarity search in the FAISS vector store using a query. This involves:
- Embedding the query using OpenAI's model.
- Retrieving the most similar documents.
- Formatting the results to include clickable YouTube links with timestamps.
- Perform a similarity search in the FAISS vector store using a query. This involves:
-
API and Web Interface:
- FastAPI Setup: Define an API endpoint that accepts queries. This endpoint handles:
- Validation of the channel ID.
- Loading or updating the knowledge base with the latest videos.
- Performing the query and returning the analysis.
- Gradio Interface: Although not explicitly detailed in the provided code, a Gradio interface would interact with this API to provide a user-friendly interface for submitting queries and displaying results.
- FastAPI Setup: Define an API endpoint that accepts queries. This endpoint handles:
-
Execution:
- The main function orchestrates the above steps when run locally, using a predefined channel ID.
- When deployed as a web service, the API endpoint can dynamically accept different channel IDs and queries from users.