Audio Summary with Local LLM
This tool is designed to provide a quick and concise summary of audio and video files. It supports summarizing content either from a local file or directly from YouTube. The tool uses Whisper for transcription and a local version of Llama3 (via Ollama) for generating summaries.
Tip
It is possible to change the model you wish to use. To do this, change the
OLLAMA_MODELvariable, and download the associated model via ollama
Features
- YouTube Integration: Download and summarize content directly from YouTube.
- Local File Support: Summarize audio/video files available on your local disk.
- Transcription: Converts audio content to text using Whisper.
- Summarization: Generates a concise summary using Llama3 (Ollama).
- Transcript Only Option: Option to only transcribe the audio content without generating a summary.
- Device Optimization: Automatically uses the best available hardware (MPS for Mac, CUDA for NVIDIA GPUs, or CPU).
Prerequisites
Before you start using this tool, you need to install the following dependencies:
- Python 3.12 and lower than 3.13
- Ollama for LLM model management
ffmpeg(required for audio processing)- uv for package management
Installation
Using uv
Clone the repository and install the required Python packages using uv:
git clone https://github.com/damienarnodo/audio-summary-with-local-LLM.git
cd audio-summary-with-local-LLM
# Create and activate a virtual environment with uv
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activate
LLM Requirement
Download and install Ollama to carry out LLM Management. More details about LLM models supported can be found on the Ollama GitHub.
Download and use the Llama3 model:
ollama pull llama3
## Test the access:
ollama run llama3 "tell me a joke"
Usage
The tool can be executed with the following command line options:
--from-youtube: To download and summarize a video from YouTube.--from-local: To load and summarize an audio or video file from the local disk.--output: Specify the output file path (default: ./summary.md)--transcript-only: To only transcribe the audio content without generating a summary.--language: Select the language to be used for the transcription (default: en)
Examples
-
Summarizing a YouTube video:
uv run python src/summary.py --from-youtube <YouTube-Video-URL> -
Summarizing a local audio file:
uv run python src/summary.py --from-local <path-to-audio-file> -
Transcribing a YouTube video without summarizing:
uv run python src/summary.py --from-youtube <YouTube-Video-URL> --transcript-only -
Transcribing a local audio file without summarizing:
uv run python src/summary.py --from-local <path-to-audio-file> --transcript-only -
Specifying a custom output file:
uv run python src/summary.py --from-youtube <YouTube-Video-URL> --output my_summary.md
The output summary will be saved in a markdown file in the specified output directory, while the transcript will be saved in the temporary directory.
Output
The summarized content is saved as a markdown file (default: summary.md) in the current working directory. This file includes a title and a concise summary of the content. The transcript is saved in the tmp/transcript.txt file.
Hardware Acceleration
The tool automatically detects and uses the best available hardware:
- MPS (Metal Performance Shaders) for Apple Silicon Macs
- CUDA for NVIDIA GPUs
- Falls back to CPU when neither is available
Handling Longer Audio Files
This tool can process audio files of any length. For files longer than 30 seconds, the script automatically:
- Chunks the audio into manageable segments
- Processes each chunk separately
- Combines the results into a single transcript
This approach allows for efficient processing of longer content while managing memory usage. However, be aware that:
- Longer files will take proportionally more time to process
- Very long files (>30 minutes) may require significant processing time, especially on CPU
- For extremely long content, consider splitting the audio file into smaller segments before processing
If you encounter memory issues with very long files, you can try:
- Using a smaller Whisper model by changing
WHISPER_MODELto "openai/whisper-base" - Reducing the
chunk_length_sparameter in thetranscribe_filefunction - Processing the file in separate parts and combining the summaries afterward
Sources
- YouTube Video Summarizer with OpenAI Whisper and GPT
- Ollama GitHub Repository
- Transformers by Hugging Face
- yt-dlp Documentation
Troubleshooting
ffmpeg not found
If you encounter this error::
yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
Please refer to this post
Audio Format Issues
If you encounter this error:
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted.
Try converting your file with ffmpeg:
ffmpeg -i my_file.mp4 -movflags faststart my_file_fixed.mp4
Memory Issues on CPU
If you're running on CPU and encounter memory issues during transcription, consider:
- Using a smaller Whisper model
- Processing shorter audio segments
- Ensuring you have sufficient RAM available
Slow Transcription
Transcription can be slow on CPU. For best performance:
- Use a machine with GPU or Apple Silicon (MPS)
- Keep audio files under 10 minutes when possible
- Close other resource-intensive applications
Update the Whisper or LLM Model
You can easily change the models used for transcription and summarization by modifying the variables at the top of the script:
# Default models
OLLAMA_MODEL = "llama3"
WHISPER_MODEL = "openai/whisper-large-v2"
Changing the Whisper Model
To use a different Whisper model for transcription:
-
Update the
WHISPER_MODELvariable with one of these options:"openai/whisper-tiny"(fastest, least accurate)"openai/whisper-base"(faster, less accurate)"openai/whisper-small"(balanced)"openai/whisper-medium"(slower, more accurate)"openai/whisper-large-v2"(slowest, most accurate)
-
Example:
WHISPER_MODEL = "openai/whisper-medium" # A good balance between speed and accuracy
For CPU-only systems, using a smaller model like whisper-base is recommended for better performance.
Changing the LLM Model
To use a different model for summarization:
-
First, pull the desired model with Ollama:
ollama pull mistral # or any other supported model -
Then update the
OLLAMA_MODELvariable:OLLAMA_MODEL = "mistral" # or any other model you've pulled -
Popular alternatives include:
"llama3"(default)"mistral""llama2""gemma:7b""phi"
For a complete list of available models, visit the Ollama model library.