Add support pour GPU (MPS and CUDA)
Migrate to `uv`
This commit is contained in:
5
.gitignore
vendored
5
.gitignore
vendored
@@ -1,6 +1,9 @@
|
|||||||
|
audio_summary_with_local_LLM.egg-info/
|
||||||
|
.ruff_cache/
|
||||||
# Virtual Env
|
# Virtual Env
|
||||||
.venv
|
.venv
|
||||||
|
|
||||||
# Local data
|
# Local data
|
||||||
.DS_Store
|
.DS_Store
|
||||||
tmp
|
tmp
|
||||||
|
summary.md
|
||||||
184
README.md
184
README.md
@@ -1,6 +1,6 @@
|
|||||||
# Audio Summary with local LLM
|
# Audio Summary with Local LLM
|
||||||
|
|
||||||
This tool is designed to provide a quick and concise summary of audio and video files. It supports summarizing content either from a local file or directly from YouTube. The tool uses Whisper for transcription and a local version of Mistral AI (Ollama) for generating summaries.
|
This tool is designed to provide a quick and concise summary of audio and video files. It supports summarizing content either from a local file or directly from YouTube. The tool uses Whisper for transcription and a local version of Llama3 (via Ollama) for generating summaries.
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> It is possible to change the model you wish to use.
|
> It is possible to change the model you wish to use.
|
||||||
@@ -9,45 +9,47 @@ This tool is designed to provide a quick and concise summary of audio and video
|
|||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **YouTube Integration**: Download and summarize content directly from YouTube.
|
- **YouTube Integration**: Download and summarize content directly from YouTube.
|
||||||
- **Local File Support**: Summarize audio files available on your local disk.
|
- **Local File Support**: Summarize audio/video files available on your local disk.
|
||||||
- **Transcription**: Converts audio content to text using Whisper.
|
- **Transcription**: Converts audio content to text using Whisper.
|
||||||
- **Summarization**: Generates a concise summary using Mistral AI (Ollama).
|
- **Summarization**: Generates a concise summary using Llama3 (Ollama).
|
||||||
- **Transcript Only Option**: Option to only transcribe the audio content without generating a summary.
|
- **Transcript Only Option**: Option to only transcribe the audio content without generating a summary.
|
||||||
|
- **Device Optimization**: Automatically uses the best available hardware (MPS for Mac, CUDA for NVIDIA GPUs, or CPU).
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
Before you start using this tool, you need to install the following dependencies:
|
Before you start using this tool, you need to install the following dependencies:
|
||||||
|
|
||||||
- Python 3.8 or higher
|
- Python 3.12 and lower than 3.13
|
||||||
- `pytube` for downloading videos from YouTube.
|
- [Ollama](https://ollama.com) for LLM model management
|
||||||
- `pathlib` for local file handling
|
- `ffmpeg` (required for audio processing)
|
||||||
- `openai-whisper` for audio transcription.
|
- [uv](https://docs.astral.sh/uv/getting-started/installation/) for package management
|
||||||
- [Ollama](https://ollama.com) for LLM model management.
|
|
||||||
- `ffmpeg` (required for whisper)
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### Python Requirements
|
### Using uv
|
||||||
|
|
||||||
Clone the repository and install the required Python packages:
|
Clone the repository and install the required Python packages using [uv](https://github.com/astral-sh/uv):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/damienarnodo/audio-summary-with-local-LLM.git
|
git clone https://github.com/damienarnodo/audio-summary-with-local-LLM.git
|
||||||
cd audio-summary-with-local-LLM
|
cd audio-summary-with-local-LLM
|
||||||
pip install -r src/requirements.txt
|
|
||||||
|
# Create and activate a virtual environment with uv
|
||||||
|
uv sync
|
||||||
|
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||||
```
|
```
|
||||||
|
|
||||||
### LLM Requirement
|
### LLM Requirement
|
||||||
|
|
||||||
[Download and install](https://ollama.com) Ollama to carry out LLM Management. More details about LLM models supported can be found on the Ollama [GitHub](https://github.com/ollama/ollama).
|
[Download and install](https://ollama.com) Ollama to carry out LLM Management. More details about LLM models supported can be found on the Ollama [GitHub](https://github.com/ollama/ollama).
|
||||||
|
|
||||||
Download and use the Mistral model:
|
Download and use the Llama3 model:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ollama pull mistral
|
ollama pull llama3
|
||||||
|
|
||||||
## Test the access:
|
## Test the access:
|
||||||
ollama run mistral "tell me a joke"
|
ollama run llama3 "tell me a joke"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
@@ -56,51 +58,175 @@ The tool can be executed with the following command line options:
|
|||||||
|
|
||||||
- `--from-youtube`: To download and summarize a video from YouTube.
|
- `--from-youtube`: To download and summarize a video from YouTube.
|
||||||
- `--from-local`: To load and summarize an audio or video file from the local disk.
|
- `--from-local`: To load and summarize an audio or video file from the local disk.
|
||||||
- `--transcript-only`: To only transcribe the audio content without generating a summary. This option must be used with either `--from-youtube` or `--from-local`.
|
- `--output`: Specify the output file path (default: ./summary.md)
|
||||||
|
- `--transcript-only`: To only transcribe the audio content without generating a summary.
|
||||||
|
- `--language`: Select the language to be used for the transcription (default: en)
|
||||||
|
|
||||||
### Examples
|
### Examples
|
||||||
|
|
||||||
1. **Summarizing a YouTube video:**
|
1. **Summarizing a YouTube video:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python src/summary.py --from-youtube <YouTube-Video-URL>
|
uv run python src/summary.py --from-youtube <YouTube-Video-URL>
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Summarizing a local audio file:**
|
2. **Summarizing a local audio file:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python src/summary.py --from-local <path-to-audio-file>
|
uv run python src/summary.py --from-local <path-to-audio-file>
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **Transcribing a YouTube video without summarizing:**
|
3. **Transcribing a YouTube video without summarizing:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python src/summary.py --from-youtube <YouTube-Video-URL> --transcript-only
|
uv run python src/summary.py --from-youtube <YouTube-Video-URL> --transcript-only
|
||||||
```
|
```
|
||||||
|
|
||||||
4. **Transcribing a local audio file without summarizing:**
|
4. **Transcribing a local audio file without summarizing:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python src/summary.py --from-local <path-to-audio-file> --transcript-only
|
uv run python src/summary.py --from-local <path-to-audio-file> --transcript-only
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Specifying a custom output file:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv run python src/summary.py --from-youtube <YouTube-Video-URL> --output my_summary.md
|
||||||
```
|
```
|
||||||
|
|
||||||
The output summary will be saved in a markdown file in the specified output directory, while the transcript will be saved in the temporary directory.
|
The output summary will be saved in a markdown file in the specified output directory, while the transcript will be saved in the temporary directory.
|
||||||
|
|
||||||
## Output
|
## Output
|
||||||
|
|
||||||
The summarized content is saved as a markdown file named `summary.md` in the current working directory. This file includes the transcribed text and its corresponding summary. If `--transcript-only` is used, only the transcription will be saved in the temporary directory.
|
The summarized content is saved as a markdown file (default: `summary.md`) in the current working directory. This file includes a title and a concise summary of the content. The transcript is saved in the `tmp/transcript.txt` file.
|
||||||
|
|
||||||
|
## Hardware Acceleration
|
||||||
|
|
||||||
|
The tool automatically detects and uses the best available hardware:
|
||||||
|
|
||||||
|
- MPS (Metal Performance Shaders) for Apple Silicon Macs
|
||||||
|
- CUDA for NVIDIA GPUs
|
||||||
|
- Falls back to CPU when neither is available
|
||||||
|
|
||||||
|
### Handling Longer Audio Files
|
||||||
|
|
||||||
|
This tool can process audio files of any length. For files longer than 30 seconds, the script automatically:
|
||||||
|
|
||||||
|
1. Chunks the audio into manageable segments
|
||||||
|
2. Processes each chunk separately
|
||||||
|
3. Combines the results into a single transcript
|
||||||
|
|
||||||
|
This approach allows for efficient processing of longer content while managing memory usage. However, be aware that:
|
||||||
|
|
||||||
|
- Longer files will take proportionally more time to process
|
||||||
|
- Very long files (>30 minutes) may require significant processing time, especially on CPU
|
||||||
|
- For extremely long content, consider splitting the audio file into smaller segments before processing
|
||||||
|
|
||||||
|
If you encounter memory issues with very long files, you can try:
|
||||||
|
|
||||||
|
1. Using a smaller Whisper model by changing `WHISPER_MODEL` to "openai/whisper-base"
|
||||||
|
2. Reducing the `chunk_length_s` parameter in the `transcribe_file` function
|
||||||
|
3. Processing the file in separate parts and combining the summaries afterward
|
||||||
|
|
||||||
## Sources
|
## Sources
|
||||||
|
|
||||||
- [YouTube Video Summarizer with OpenAI Whisper and GPT](https://github.com/mirabdullahyaser/Summarizing-Youtube-Videos-with-OpenAI-Whisper-and-GPT-3/tree/master)
|
- [YouTube Video Summarizer with OpenAI Whisper and GPT](https://github.com/mirabdullahyaser/Summarizing-Youtube-Videos-with-OpenAI-Whisper-and-GPT-3/tree/master)
|
||||||
- [Mistral Python Client](https://github.com/mistralai/client-python)
|
- [Ollama GitHub Repository](https://github.com/ollama/ollama)
|
||||||
- [Ollama : Installez LLama 2 et Code LLama en quelques secondes !](https://www.geeek.org/tutoriel-installation-llama-2-et-code-llama/)
|
- [Transformers by Hugging Face](https://huggingface.co/docs/transformers/index)
|
||||||
|
- [yt-dlp Documentation](https://github.com/yt-dlp/yt-dlp)
|
||||||
|
|
||||||
## Known Issues
|
## Troubleshooting
|
||||||
|
|
||||||
```python
|
### ffmpeg not found
|
||||||
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to **download** the audio file.
|
|
||||||
|
If you encounter this error::
|
||||||
|
|
||||||
|
```bash
|
||||||
|
yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
|
||||||
```
|
```
|
||||||
|
|
||||||
To fix it :
|
Please refer to [this post](https://www.reddit.com/r/StacherIO/wiki/ffmpeg/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
|
||||||
`ffmpeg -i my_file.mp4 -movflags faststart my_file_fixed.mp4`
|
|
||||||
|
### Audio Format Issues
|
||||||
|
|
||||||
|
If you encounter this error:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted.
|
||||||
|
```
|
||||||
|
|
||||||
|
Try converting your file with ffmpeg:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ffmpeg -i my_file.mp4 -movflags faststart my_file_fixed.mp4
|
||||||
|
```
|
||||||
|
|
||||||
|
### Memory Issues on CPU
|
||||||
|
|
||||||
|
If you're running on CPU and encounter memory issues during transcription, consider:
|
||||||
|
|
||||||
|
1. Using a smaller Whisper model
|
||||||
|
2. Processing shorter audio segments
|
||||||
|
3. Ensuring you have sufficient RAM available
|
||||||
|
|
||||||
|
### Slow Transcription
|
||||||
|
|
||||||
|
Transcription can be slow on CPU. For best performance:
|
||||||
|
|
||||||
|
1. Use a machine with GPU or Apple Silicon (MPS)
|
||||||
|
2. Keep audio files under 10 minutes when possible
|
||||||
|
3. Close other resource-intensive applications
|
||||||
|
|
||||||
|
### Update the Whisper or LLM Model
|
||||||
|
|
||||||
|
You can easily change the models used for transcription and summarization by modifying the variables at the top of the script:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Default models
|
||||||
|
OLLAMA_MODEL = "llama3"
|
||||||
|
WHISPER_MODEL = "openai/whisper-large-v2"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Changing the Whisper Model
|
||||||
|
|
||||||
|
To use a different Whisper model for transcription:
|
||||||
|
|
||||||
|
1. Update the `WHISPER_MODEL` variable with one of these options:
|
||||||
|
- `"openai/whisper-tiny"` (fastest, least accurate)
|
||||||
|
- `"openai/whisper-base"` (faster, less accurate)
|
||||||
|
- `"openai/whisper-small"` (balanced)
|
||||||
|
- `"openai/whisper-medium"` (slower, more accurate)
|
||||||
|
- `"openai/whisper-large-v2"` (slowest, most accurate)
|
||||||
|
|
||||||
|
2. Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
WHISPER_MODEL = "openai/whisper-medium" # A good balance between speed and accuracy
|
||||||
|
```
|
||||||
|
|
||||||
|
For CPU-only systems, using a smaller model like `whisper-base` is recommended for better performance.
|
||||||
|
|
||||||
|
#### Changing the LLM Model
|
||||||
|
|
||||||
|
To use a different model for summarization:
|
||||||
|
|
||||||
|
1. First, pull the desired model with Ollama:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull mistral # or any other supported model
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Then update the `OLLAMA_MODEL` variable:
|
||||||
|
|
||||||
|
```python
|
||||||
|
OLLAMA_MODEL = "mistral" # or any other model you've pulled
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Popular alternatives include:
|
||||||
|
- `"llama3"` (default)
|
||||||
|
- `"mistral"`
|
||||||
|
- `"llama2"`
|
||||||
|
- `"gemma:7b"`
|
||||||
|
- `"phi"`
|
||||||
|
|
||||||
|
For a complete list of available models, visit the [Ollama model library](https://ollama.com/library).
|
||||||
|
|||||||
109
pyproject.toml
Normal file
109
pyproject.toml
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
[project]
|
||||||
|
name = "audio-summary-with-local-LLM"
|
||||||
|
dynamic = ["version"]
|
||||||
|
description = 'Sum up your local or remote files with a local LLM'
|
||||||
|
keywords = ["audio", "summary", "local-llm", "ollama", "whisper"]
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.12, <3.13"
|
||||||
|
authors = [
|
||||||
|
{ name = "darnodo", email = "sepales.pret0h@icloud.com" },
|
||||||
|
]
|
||||||
|
dependencies = [
|
||||||
|
"ffmpeg>=1.4",
|
||||||
|
"ollama>=0.4.7",
|
||||||
|
"openai-whisper>=20240930",
|
||||||
|
"torch>=2.6.0",
|
||||||
|
"torchaudio>=2.6.0",
|
||||||
|
"torchvision>=0.21.0",
|
||||||
|
"transformers>=4.50.2",
|
||||||
|
"yt-dlp>=2025.3.27",
|
||||||
|
]
|
||||||
|
[tool.setuptools]
|
||||||
|
py-modules = []
|
||||||
|
|
||||||
|
[tool.ruff]
|
||||||
|
# Exclude a variety of commonly ignored directories.
|
||||||
|
exclude = [
|
||||||
|
".bzr",
|
||||||
|
".direnv",
|
||||||
|
".eggs",
|
||||||
|
".git",
|
||||||
|
".git-rewrite",
|
||||||
|
".hg",
|
||||||
|
".ipynb_checkpoints",
|
||||||
|
".mypy_cache",
|
||||||
|
".nox",
|
||||||
|
".pants.d",
|
||||||
|
".pyenv",
|
||||||
|
".pytest_cache",
|
||||||
|
".pytype",
|
||||||
|
".ruff_cache",
|
||||||
|
".svn",
|
||||||
|
".tox",
|
||||||
|
".venv",
|
||||||
|
".vscode",
|
||||||
|
"__pypackages__",
|
||||||
|
"_build",
|
||||||
|
"buck-out",
|
||||||
|
"build",
|
||||||
|
"dist",
|
||||||
|
"node_modules",
|
||||||
|
"site-packages",
|
||||||
|
"venv",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Same as Black.
|
||||||
|
line-length = 88
|
||||||
|
indent-width = 4
|
||||||
|
|
||||||
|
# Assume Python 3.8
|
||||||
|
target-version = "py38"
|
||||||
|
|
||||||
|
[tool.ruff.lint]
|
||||||
|
# Enable Pyflakes (`F`) and a subset of the pycodestyle (`E`) codes by default.
|
||||||
|
# Unlike Flake8, Ruff doesn't enable pycodestyle warnings (`W`) or
|
||||||
|
# McCabe complexity (`C901`) by default.
|
||||||
|
select = ["E4", "E7", "E9", "F"]
|
||||||
|
ignore = []
|
||||||
|
|
||||||
|
# Allow fix for all enabled rules (when `--fix`) is provided.
|
||||||
|
fixable = ["ALL"]
|
||||||
|
unfixable = []
|
||||||
|
|
||||||
|
# Allow unused variables when underscore-prefixed.
|
||||||
|
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
|
||||||
|
|
||||||
|
[tool.ruff.format]
|
||||||
|
# Like Black, use double quotes for strings.
|
||||||
|
quote-style = "double"
|
||||||
|
|
||||||
|
# Like Black, indent with spaces, rather than tabs.
|
||||||
|
indent-style = "space"
|
||||||
|
|
||||||
|
# Like Black, respect magic trailing commas.
|
||||||
|
skip-magic-trailing-comma = false
|
||||||
|
|
||||||
|
# Like Black, automatically detect the appropriate line ending.
|
||||||
|
line-ending = "auto"
|
||||||
|
|
||||||
|
# Enable auto-formatting of code examples in docstrings. Markdown,
|
||||||
|
# reStructuredText code/literal blocks and doctests are all supported.
|
||||||
|
#
|
||||||
|
# This is currently disabled by default, but it is planned for this
|
||||||
|
# to be opt-out in the future.
|
||||||
|
docstring-code-format = false
|
||||||
|
|
||||||
|
# Set the line length limit used when formatting code snippets in
|
||||||
|
# docstrings.
|
||||||
|
#
|
||||||
|
# This only has an effect when the `docstring-code-format` setting is
|
||||||
|
# enabled.
|
||||||
|
docstring-code-line-length = "dynamic"
|
||||||
|
|
||||||
|
[dependency-groups]
|
||||||
|
lint = [
|
||||||
|
"ruff>=0.0.17",
|
||||||
|
]
|
||||||
|
dev = [
|
||||||
|
"ipython>=5.10.0",
|
||||||
|
]
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
openai-whisper==20231117
|
|
||||||
ollama==0.1.8
|
|
||||||
torch==2.5.0.dev20240712
|
|
||||||
torchaudio==2.4.0.dev20240712
|
|
||||||
torchvision==0.20.0.dev20240712
|
|
||||||
transformers==4.42.4
|
|
||||||
@@ -3,8 +3,11 @@ import argparse
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from transformers import pipeline
|
from transformers import pipeline
|
||||||
import yt_dlp
|
import yt_dlp
|
||||||
|
import torch
|
||||||
|
|
||||||
OLLAMA_MODEL = "llama3"
|
OLLAMA_MODEL = "llama3"
|
||||||
|
WHISPER_MODEL = "openai/whisper-large-v2"
|
||||||
|
WHISPER_LANGUAGE = "en" # Set to desired language or None for auto-detection
|
||||||
|
|
||||||
# Function to download a video from YouTube using yt-dlp
|
# Function to download a video from YouTube using yt-dlp
|
||||||
def download_from_youtube(url: str, path: str):
|
def download_from_youtube(url: str, path: str):
|
||||||
@@ -20,26 +23,70 @@ def download_from_youtube(url: str, path: str):
|
|||||||
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||||
ydl.download([url])
|
ydl.download([url])
|
||||||
|
|
||||||
|
# Function to get the best available device
|
||||||
|
def get_device():
|
||||||
|
if torch.backends.mps.is_available():
|
||||||
|
return "mps"
|
||||||
|
elif torch.cuda.is_available():
|
||||||
|
return "cuda"
|
||||||
|
else:
|
||||||
|
return "cpu"
|
||||||
|
|
||||||
# Function to transcribe an audio file using the transformers pipeline
|
# Function to transcribe an audio file using the transformers pipeline
|
||||||
def transcribe_file(file_path: str, output_file: str) -> str:
|
def transcribe_file(file_path: str, output_file: str, language: str = None) -> str:
|
||||||
# Load the pipeline model for automatic speech recognition with MPS
|
# Get the best available device
|
||||||
transcriber_gpu = pipeline("automatic-speech-recognition", model="openai/whisper-large-v2", device="mps")
|
device = get_device()
|
||||||
|
print(f"Using device: {device} for transcription")
|
||||||
|
|
||||||
|
# Load the pipeline model for automatic speech recognition
|
||||||
|
transcriber = pipeline(
|
||||||
|
"automatic-speech-recognition",
|
||||||
|
model=WHISPER_MODEL,
|
||||||
|
device=device,
|
||||||
|
chunk_length_s=30, # Process in 30-second chunks
|
||||||
|
return_timestamps=True # Enable timestamp generation for longer audio
|
||||||
|
)
|
||||||
|
|
||||||
# Transcribe the audio file
|
# Transcribe the audio file
|
||||||
transcribe = transcriber_gpu(file_path)
|
# For CPU, we might want to use a smaller model or chunk the audio if memory is an issue
|
||||||
|
if device == "cpu":
|
||||||
|
print("Warning: Using CPU for transcription. This may be slow.")
|
||||||
|
|
||||||
|
# Set up generation keyword arguments including language
|
||||||
|
generate_kwargs = {}
|
||||||
|
if language and language.lower() != "auto":
|
||||||
|
generate_kwargs["language"] = language
|
||||||
|
print(f"Transcribing in language: {language}")
|
||||||
|
else:
|
||||||
|
print("Using automatic language detection")
|
||||||
|
|
||||||
|
# Transcribe the audio file
|
||||||
|
print("Starting transcription (this may take a while for longer files)...")
|
||||||
|
transcribe = transcriber(file_path, generate_kwargs=generate_kwargs)
|
||||||
|
|
||||||
|
# Extract the full text from the chunked transcription
|
||||||
|
if isinstance(transcribe, dict) and "text" in transcribe:
|
||||||
|
# Simple case - just one chunk
|
||||||
|
full_text = transcribe["text"]
|
||||||
|
elif isinstance(transcribe, dict) and "chunks" in transcribe:
|
||||||
|
# Multiple chunks with timestamps
|
||||||
|
full_text = " ".join([chunk["text"] for chunk in transcribe["chunks"]])
|
||||||
|
else:
|
||||||
|
# Fallback for other return formats
|
||||||
|
full_text = transcribe["text"] if "text" in transcribe else str(transcribe)
|
||||||
|
|
||||||
# Save the transcribed text to the specified temporary file
|
# Save the transcribed text to the specified temporary file
|
||||||
with open(output_file, 'w') as tmp_file:
|
with open(output_file, 'w') as tmp_file:
|
||||||
tmp_file.write(transcribe["text"])
|
tmp_file.write(full_text)
|
||||||
print(f"Transcription saved to file: {output_file}")
|
print(f"Transcription saved to file: {output_file}")
|
||||||
|
|
||||||
# Return the transcribed text
|
# Return the transcribed text
|
||||||
return transcribe["text"]
|
return full_text
|
||||||
|
|
||||||
# Function to summarize a text using the Ollama model
|
# Function to summarize a text using the Ollama model
|
||||||
def summarize_text(text: str, output_path: str) -> str:
|
def summarize_text(text: str, output_path: str) -> str:
|
||||||
# Define the system prompt for the Ollama model
|
# Define the system prompt for the Ollama model
|
||||||
system_prompt = f"I would like for you to assume the role of a Technical Expert"
|
system_prompt = "I would like for you to assume the role of a Technical Expert"
|
||||||
# Define the user prompt for the Ollama model
|
# Define the user prompt for the Ollama model
|
||||||
user_prompt = f"""Generate a concise summary of the text below.
|
user_prompt = f"""Generate a concise summary of the text below.
|
||||||
Text : {text}
|
Text : {text}
|
||||||
@@ -73,9 +120,15 @@ def main():
|
|||||||
group.add_argument("--from-local", type=str, help="Path to the local audio file.")
|
group.add_argument("--from-local", type=str, help="Path to the local audio file.")
|
||||||
parser.add_argument("--output", type=str, default="./summary.md", help="Output markdown file path.")
|
parser.add_argument("--output", type=str, default="./summary.md", help="Output markdown file path.")
|
||||||
parser.add_argument("--transcript-only", action='store_true', help="Only transcribe the file, do not summarize.")
|
parser.add_argument("--transcript-only", action='store_true', help="Only transcribe the file, do not summarize.")
|
||||||
|
parser.add_argument("--language", type=str, help="Language code for transcription (e.g., 'en', 'fr', 'es', or 'auto' for detection)")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Determine language setting
|
||||||
|
language = args.language if args.language else WHISPER_LANGUAGE
|
||||||
|
if language and language.lower() == "auto":
|
||||||
|
language = None # None triggers automatic language detection
|
||||||
|
|
||||||
# Set up data directory
|
# Set up data directory
|
||||||
data_directory = Path("tmp")
|
data_directory = Path("tmp")
|
||||||
# Check if the directory exists, if not, create it
|
# Check if the directory exists, if not, create it
|
||||||
@@ -94,7 +147,7 @@ def main():
|
|||||||
|
|
||||||
print(f"Transcribing file: {file_path}")
|
print(f"Transcribing file: {file_path}")
|
||||||
# Transcribe the audio file
|
# Transcribe the audio file
|
||||||
transcript = transcribe_file(str(file_path), data_directory / "transcript.txt")
|
transcript = transcribe_file(str(file_path), data_directory / "transcript.txt", language)
|
||||||
|
|
||||||
if args.transcript_only:
|
if args.transcript_only:
|
||||||
print("Transcription complete. Skipping summary generation.")
|
print("Transcription complete. Skipping summary generation.")
|
||||||
@@ -111,4 +164,4 @@ def main():
|
|||||||
print(f"Summary written to {args.output}")
|
print(f"Summary written to {args.output}")
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
Reference in New Issue
Block a user