YATSEE -- Yet Another Tool for Speech Extraction & Enrichment
YATSEE is a local-first, end-to-end data pipeline designed to systematically refine raw meeting audio into clean, searchable, and auditable intelligence. It automates the tedious work of downloading, transcribing, and normalizing unstructured conversations.
This is a local-first, privacy-respecting toolkit for anyone who wants to turn public noise into actionable intelligence.
Public records are often public in name only. Civic business is frequently buried in four-hour livestreams and jargon-filled transcripts that are technically accessible but functionally opaque. The barrier to entry for an interested citizen is hours of time and dealing with complex jargon.
YATSEE solves that by using a carefully tuned local LLM to transform that wall of text into a high-signal summary. YATSEE can be set to extract specific votes, contracts, and policy debates that allow you to find what you are interested in fast. It's a tool for creating clarity and accountability that modern civic discourse requires.
All modules are fully documented using standard Python docstrings.
To browse module documentation use pydoc locally:
pydoc ./yatsee_summarize_transcripts.pyFollow these steps to get YATSEE running.
git clone https://github.com/alias454/yatsee.git
cd yatsee# Copy the template to create your local config file
cp yatsee.conf yatsee.toml- Open
yatsee.tomlin any text editor. - Add at least one entity with the required fields:
entity(unique identifier)
chmod +x setup.sh
./setup.sh- Installs Python dependencies
- Downloads NLP models (
spaCy, etc.) - Checks for GPU (CUDA/MPS) and warns if only CPU is available
source .venv/bin/activatePython ≥3.10 recommended. CPU works, but GPU/MPS accelerates transcription.
python yatsee_build_config.py --create- Uses the entity info in
yatsee.tomlto:- Create the main directory(default ./data) for the pipeline
- Initialize per-entity pipeline configs
sources.youtube.youtube_path(YouTube channel/playlist)- Any optional data structures like titles, people, replacements etc.
- This is the minimum viable entity needed for the downloader.
- Important: Run this after
setup.shand after adding at least one entity.
see Script Summary below- Processes audio/video in
downloads/ - Converts to
.flac/.wavinaudio/ - Generates transcripts, normalizes text, and produces summaries
- All scripts are modular: you can run them individually or as a pipeline
streamlit run yatsee_search_demo.py -- -e entity_name_configured- Provides semantic and structured search over transcripts and summaries
entityis a unique key identifier for all scripts. Keep it consistent.- Each pipeline stage ensures directories exist for output; do not manually create them.
- Optional: You can edit additional pipeline settings (like per-entity hotwords or divisions) in the generated config.
This pipeline was developed and tested on the following setup:
- CPU: Intel Core i7-10750H (6 cores / 12 threads, up to 5.0 GHz)
- RAM: 32 GB DDR4
- GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM, CUDA 12.8)
- Storage: NVMe SSD
- OS: Fedora Linux
- Shell: Bash
- Python: 3.10 or newer
Additional testing was performed on Apple Silicon (macOS):
- Model: Mac Mini (M4 Base)
- CPU: Apple M4 (10 cores / 4 performance cores, up to 120GB/s memory bandwidth)
- RAM: 16 GB
- Storage: NVMe SSD
- OS: macOS Sonoma / Sequoia
- Shell: ZSH
- Python: 3.9 or newer
GPU acceleration was enabled for Whisper / faster-whisper using CUDA 12.8 and NVIDIA driver 570.144 on Linux. However, faster whisper has limited/no support for mps.
Note: Audio transcription was much slower on the MAC than on Linux. it's doable but it's much slower.
Note: The pipeline works on CPU-only systems without a GPU. However, transcription (especially with Whisper or faster-whisper) will be much slower compared to systems with CUDA-enabled GPU acceleration or MPS.
⚠️ Not tested on Windows. Use at your own risk onWindowsplatforms.
Manual Installation (If not using setup.sh)
If you cannot use the setup script, ensure you have ffmpeg and yt-dlp installed via your package manager, then install the Python requirements:
yt-dlp– Download livestream audio from YouTubeffmpeg– Convert audio to.flacor.wavformat
tomlNeeded for reading the toml configrequestsNeeded for interacting with ollama API if installedtorchRequired for Whisper and model inference (with or without CUDA)pyyamlYAML output support (for summaries)whisperAudio transcription (standard)spacySentence segmentation + text cleanup- Model: en_core_web_sm (or larger)
faster-whisperAudio transcription (optional)
- ollama Run local LLMs for summarization
macOS (Homebrew):
brew install ffmpegFedora:
sudo dnf install ffmpegDebian/Ubuntu:
sudo apt-get update
sudo apt-get install ffmpegYou can use pip to install the core requirements:
Install:
pip install -r requirements.txt
python -m spacy download en_core_web_smOn first run, it will download a model (e.g., base, medium). Ensure you have enough RAM.
Used for generating markdown or YAML summaries from transcripts.
install:
curl -fsSL https://ollama.com/install.sh | shSee https://ollama.com for supported models and system requirements.
YATSEE is open-source software licensed under the GNU Affero General Public License v3.0 (AGPLv3).
- Freedom: You are free to use, modify, and distribute this software.
- Open Source: If you modify YATSEE and distribute it (or run it as a service over a network), you must open-source your modifications under the same AGPLv3 license.
Commercial Licensing: If you wish to use YATSEE in a proprietary product or closed-source commercial environment, please contact admin alias454 com for a commercial license.
