Add support pour GPU (MPS and CUDA)

Migrate to `uv`
2025-03-28 12:58:39 +01:00
parent 4eb5f586d4
commit a8005cce50
6 changed files with 1336 additions and 45 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
-# Audio Summary with local LLM
+# Audio Summary with Local LLM

-This tool is designed to provide a quick and concise summary of audio and video files. It supports summarizing content either from a local file or directly from YouTube. The tool uses Whisper for transcription and a local version of Mistral AI (Ollama) for generating summaries.
+This tool is designed to provide a quick and concise summary of audio and video files. It supports summarizing content either from a local file or directly from YouTube. The tool uses Whisper for transcription and a local version of Llama3 (via Ollama) for generating summaries.

 > [!TIP]  
 > It is possible to change the model you wish to use.
@@ -9,45 +9,47 @@ This tool is designed to provide a quick and concise summary of audio and video
 ## Features

 - **YouTube Integration**: Download and summarize content directly from YouTube.
- **Local File Support**: Summarize audio files available on your local disk.
+- **Local File Support**: Summarize audio/video files available on your local disk.
 - **Transcription**: Converts audio content to text using Whisper.
- **Summarization**: Generates a concise summary using Mistral AI (Ollama).
+- **Summarization**: Generates a concise summary using Llama3 (Ollama).
 - **Transcript Only Option**: Option to only transcribe the audio content without generating a summary.
+- **Device Optimization**: Automatically uses the best available hardware (MPS for Mac, CUDA for NVIDIA GPUs, or CPU).

 ## Prerequisites

 Before you start using this tool, you need to install the following dependencies:

- Python 3.8 or higher
- `pytube` for downloading videos from YouTube.
- `pathlib` for local file handling
- `openai-whisper` for audio transcription.
- [Ollama](https://ollama.com) for LLM model management.
- `ffmpeg` (required for whisper)
+- Python 3.12 and lower than 3.13
+- [Ollama](https://ollama.com) for LLM model management
+- `ffmpeg` (required for audio processing)
+- [uv](https://docs.astral.sh/uv/getting-started/installation/) for package management

 ## Installation

-### Python Requirements
+### Using uv

-Clone the repository and install the required Python packages:
+Clone the repository and install the required Python packages using [uv](https://github.com/astral-sh/uv):

 ```bash
 git clone https://github.com/damienarnodo/audio-summary-with-local-LLM.git
 cd audio-summary-with-local-LLM
-pip install -r src/requirements.txt
+
+# Create and activate a virtual environment with uv
+uv sync
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
 ```

 ### LLM Requirement

 [Download and install](https://ollama.com) Ollama to carry out LLM Management. More details about LLM models supported can be found on the Ollama [GitHub](https://github.com/ollama/ollama).

-Download and use the Mistral model:
+Download and use the Llama3 model:

 ```bash
-ollama pull mistral
+ollama pull llama3

 ## Test the access:
-ollama run mistral "tell me a joke"
+ollama run llama3 "tell me a joke"
 ```

 ## Usage
@@ -56,51 +58,175 @@ The tool can be executed with the following command line options:

 - `--from-youtube`: To download and summarize a video from YouTube.
 - `--from-local`: To load and summarize an audio or video file from the local disk.
- `--transcript-only`: To only transcribe the audio content without generating a summary. This option must be used with either `--from-youtube` or `--from-local`.
+- `--output`: Specify the output file path (default: ./summary.md)
+- `--transcript-only`: To only transcribe the audio content without generating a summary.
+- `--language`: Select the language to be used for the transcription (default: en)

 ### Examples

 1. **Summarizing a YouTube video:**

   ```bash
-   python src/summary.py --from-youtube <YouTube-Video-URL>
+   uv run python src/summary.py --from-youtube <YouTube-Video-URL>
   ```

 2. **Summarizing a local audio file:**

   ```bash
-   python src/summary.py --from-local <path-to-audio-file>
+   uv run python src/summary.py --from-local <path-to-audio-file>
   ```

 3. **Transcribing a YouTube video without summarizing:**

   ```bash
-   python src/summary.py --from-youtube <YouTube-Video-URL> --transcript-only
+   uv run python src/summary.py --from-youtube <YouTube-Video-URL> --transcript-only
   ```

 4. **Transcribing a local audio file without summarizing:**

   ```bash
-   python src/summary.py --from-local <path-to-audio-file> --transcript-only
+   uv run python src/summary.py --from-local <path-to-audio-file> --transcript-only
+   ```
+
+5. **Specifying a custom output file:**
+
+   ```bash
+   uv run python src/summary.py --from-youtube <YouTube-Video-URL> --output my_summary.md
   ```

 The output summary will be saved in a markdown file in the specified output directory, while the transcript will be saved in the temporary directory.

 ## Output

-The summarized content is saved as a markdown file named `summary.md` in the current working directory. This file includes the transcribed text and its corresponding summary. If `--transcript-only` is used, only the transcription will be saved in the temporary directory.
+The summarized content is saved as a markdown file (default: `summary.md`) in the current working directory. This file includes a title and a concise summary of the content. The transcript is saved in the `tmp/transcript.txt` file.
+
+## Hardware Acceleration
+
+The tool automatically detects and uses the best available hardware:
+
+- MPS (Metal Performance Shaders) for Apple Silicon Macs
+- CUDA for NVIDIA GPUs
+- Falls back to CPU when neither is available
+
+### Handling Longer Audio Files
+
+This tool can process audio files of any length. For files longer than 30 seconds, the script automatically:
+
+1. Chunks the audio into manageable segments
+2. Processes each chunk separately
+3. Combines the results into a single transcript
+
+This approach allows for efficient processing of longer content while managing memory usage. However, be aware that:
+
+- Longer files will take proportionally more time to process
+- Very long files (>30 minutes) may require significant processing time, especially on CPU
+- For extremely long content, consider splitting the audio file into smaller segments before processing
+
+If you encounter memory issues with very long files, you can try:
+
+1. Using a smaller Whisper model by changing `WHISPER_MODEL` to "openai/whisper-base"
+2. Reducing the `chunk_length_s` parameter in the `transcribe_file` function
+3. Processing the file in separate parts and combining the summaries afterward

 ## Sources

 - [YouTube Video Summarizer with OpenAI Whisper and GPT](https://github.com/mirabdullahyaser/Summarizing-Youtube-Videos-with-OpenAI-Whisper-and-GPT-3/tree/master)
- [Mistral Python Client](https://github.com/mistralai/client-python)
- [Ollama : Installez LLama 2 et Code LLama en quelques secondes !](https://www.geeek.org/tutoriel-installation-llama-2-et-code-llama/)
+- [Ollama GitHub Repository](https://github.com/ollama/ollama)
+- [Transformers by Hugging Face](https://huggingface.co/docs/transformers/index)
+- [yt-dlp Documentation](https://github.com/yt-dlp/yt-dlp)

-## Known Issues
+## Troubleshooting

-```python
-ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to **download** the audio file.
+### ffmpeg not found
+
+If you encounter this error::
+
+```bash
+yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
 ```

-To fix it :
-`ffmpeg -i my_file.mp4 -movflags faststart my_file_fixed.mp4`
+Please refer to [this post](https://www.reddit.com/r/StacherIO/wiki/ffmpeg/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
+
+### Audio Format Issues
+
+If you encounter this error:
+
+```bash
+ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted.
+```
+
+Try converting your file with ffmpeg:
+
+```bash
+ffmpeg -i my_file.mp4 -movflags faststart my_file_fixed.mp4
+```
+
+### Memory Issues on CPU
+
+If you're running on CPU and encounter memory issues during transcription, consider:
+
+1. Using a smaller Whisper model
+2. Processing shorter audio segments
+3. Ensuring you have sufficient RAM available
+
+### Slow Transcription
+
+Transcription can be slow on CPU. For best performance:
+
+1. Use a machine with GPU or Apple Silicon (MPS)
+2. Keep audio files under 10 minutes when possible
+3. Close other resource-intensive applications
+
+### Update the Whisper or LLM Model
+
+You can easily change the models used for transcription and summarization by modifying the variables at the top of the script:
+
+```python
+# Default models
+OLLAMA_MODEL = "llama3"
+WHISPER_MODEL = "openai/whisper-large-v2"
+```
+
+#### Changing the Whisper Model
+
+To use a different Whisper model for transcription:
+
+1. Update the `WHISPER_MODEL` variable with one of these options:
+   - `"openai/whisper-tiny"` (fastest, least accurate)
+   - `"openai/whisper-base"` (faster, less accurate)
+   - `"openai/whisper-small"` (balanced)
+   - `"openai/whisper-medium"` (slower, more accurate)
+   - `"openai/whisper-large-v2"` (slowest, most accurate)
+
+2. Example:
+
+   ```python
+   WHISPER_MODEL = "openai/whisper-medium"  # A good balance between speed and accuracy
+   ```
+
+For CPU-only systems, using a smaller model like `whisper-base` is recommended for better performance.
+
+#### Changing the LLM Model
+
+To use a different model for summarization:
+
+1. First, pull the desired model with Ollama:
+
+   ```bash
+   ollama pull mistral  # or any other supported model
+   ```
+
+2. Then update the `OLLAMA_MODEL` variable:
+
+   ```python
+   OLLAMA_MODEL = "mistral"  # or any other model you've pulled
+   ```
+
+3. Popular alternatives include:
+   - `"llama3"` (default)
+   - `"mistral"`
+   - `"llama2"`
+   - `"gemma:7b"`
+   - `"phi"`
+
+For a complete list of available models, visit the [Ollama model library](https://ollama.com/library).