diff --git a/README.md b/README.md index c9cc302..9b8f24f 100644 --- a/README.md +++ b/README.md @@ -1,212 +1,35 @@ # Splitter -Splitter is a high‑performance command line tool for cutting one or more video files into equal or fixed‑length segments using multi‑threaded FFmpeg execution. -It supports batch input, flexible duration formats, rotation, smart face/body‑aware cropping, ETA and speed reporting, and both rich and plain‑text terminal output. - -![Splitter](splitter.png) +Splitter is a high-performance command line tool for cutting one or more video files into equal or +fixed‑length segments using multi‑threaded FFmpeg execution. It supports batch input, flexible +duration formats, rotation, smart face/body‑aware cropping, ETA and speed reporting, with nice GUI +or both rich and plain‑text terminal output. ## Features -- Multi‑threaded FFmpeg splitting for maximum throughput +- Human face or body detection with smart cropping +- Multi-threaded FFmpeg splitting for maximum throughput - Equal or fixed‑length segmentation - Batch input via file masks or list files - Smart cropping with face/body tracking - Rotation correction - ETA, speed, and progress display - FFmpeg passthrough for advanced control -- [Potentially] Cross‑platform (.NET 10) +- [Potentially] Cross-platform (.NET 10) + +## Screenshots + +### Command line interface +![Splitter](splitter-cli/splitter.png) +### Graphical user interface +![Splitter UI](splitter-ui/screenshot.png) ## Requirements - FFmpeg and FFprobe available in system PATH - .NET 10 Runtime or newer -If you want to update model: - -- For face detection: [opencv_zoo/models/face_detection_yunet at main · opencv/opencv_zoo](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet) -- For body detection: [yolov8s.pt · Ultralytics/YOLOv8 at main](https://huggingface.co/Ultralytics/YOLOv8/blob/main/yolov8s.pt) - -To convert models from PyTorch to ONNX, you can use the following command: - -```python -from ultralytics import YOLO - -model = YOLO("yolov8x.pt") -model.export(format="onnx", opset=12, half=False) # FP32 ONNX -``` - -## How It Works - -1. Reads total duration using ffprobe -2. Parses target duration -3. Computes number of segments -4. If not forced, equalizes segment lengths -5. Runs multiple FFmpeg processes in parallel -6. Applies rotation, crop, and tracking if enabled -7. Displays progress, ETA, and speed - -## Face Tracking vs Body Tracking - -Face tracking and body tracking serve different purposes, and Splitter supports both because each -excels in different recording environments. When converting horizontal footage into vertical clips, -the choice of detector determines how stable, reliable, and natural the automated camera motion will be. - -![Face vs Body Tracking](tracking.png) - -### Face Tracking Using UltraFace 320 - -Splitter uses the UltraFace 320 ONNX model to perform lightweight, real‑time face detection on each -frame of the input video. The detector produces bounding boxes for visible faces, and the tracking -system maintains a stable, smoothed target region across time. This is achieved by combining per‑frame -detections with temporal smoothing (EMA), dropout tolerance, and camera easing. The result is a -continuous, stable crop window that follows the performer even when the face is partially occluded, -briefly lost, or moving rapidly. - -During segmentation, the crop window is recalculated for every frame, ensuring that each output -segment inherits the same smooth camera motion. This makes the vertical clips appear as if they -were recorded with a dedicated portrait‑oriented camera operator. The UltraFace 320 model is -fast enough to run alongside multi‑threaded FFmpeg splitting without becoming a bottleneck, -making it suitable for long recordings and batch processing. - -### Benefits of Full‑Body Detection Using YOLOv8s for Live Gig Recordings - -When recording concerts or live gigs, performers often move unpredictably, turn away from the -camera, or become partially obscured by lighting, instruments, or stage effects. -Full‑body detection using a YOLOv8s ONNX model provides a more reliable tracking anchor than -face detection alone. Because YOLOv8s can detect the entire human silhouette, the tracker -maintains stable framing even when the face is not visible, when the performer is far from -the camera, or when stage lighting makes facial features hard to detect. This produces vertical -clips that feel intentional and professionally framed, with fewer sudden jumps or lost‑tracking -moments. For creators converting horizontal gig footage into short vertical clips for YouTube -Shorts or TikTok, body‑based tracking significantly improves consistency, reduces manual editing, -and preserves the energy and motion of the performance. - -### Automated Camera Control - -Splitter includes an automated camera control system that simulates the behavior of a virtual -camera operator when generating vertical crops from horizontal footage. The goal is to maintain -smooth, intentional framing around the tracked subject, even when detections are noisy, intermittent, -or temporarily lost. - -The controller receives object detections (face or body) and converts them into a stable crop -window using a combination of Kalman filtering, exponential smoothing, dropout tolerance, -and a three‑state tracking model. The Kalman filter provides predictive motion smoothing, -while the EMA factor blends the predicted position with the previous camera center to avoid jitter. -The camera easing value controls how quickly the virtual camera follows the subject, producing -natural‑looking motion rather than abrupt jumps. - -When detections disappear, the controller enters one of two fallback modes. In LostFreeze mode, -the camera holds its last known position for a configurable number of frames, preventing sudden -jumps during brief occlusions. If the subject remains lost beyond that threshold, the controller -transitions to LostDrift mode, slowly drifting the camera back toward a neutral center position. -This prevents the crop from drifting off‑screen and ensures that the output remains usable even -when tracking fails. All positions are clamped to valid bounds, guaranteeing that the crop window -never leaves the video frame. - -### Automatic rotation detection - -The rotation‑estimation method is based on analyzing the distribution of gradient orientations within -a video frame. After converting the frame to grayscale, the algorithm computes horizontal and vertical -image gradients using Sobel operators and derives per‑pixel gradient magnitudes and orientations. -These orientations are folded into the range [0, 180) and accumulated into a fixed‑size, -magnitude‑weighted histogram. The histogram represents the structural edge distribution of the frame, -independent of brightness fluctuations or local lighting artifacts. By comparing the total gradient -energy concentrated near 0 degrees (vertical edges) with the energy near 90 degrees (horizontal edges), -the method determines whether the frame is more consistent with an upright or sideways orientation. - -This approach is designed for environments where brightness‑based cues are unreliable, such as -live concerts with strobe lights, LED walls, haze, and crowd movement. It relies solely on geometric -edge structure, which remains stable even under extreme lighting variation. The implementation is -optimized for high‑throughput video processing: all intermediate Mats, buffers, and histograms are -preallocated, and pixel data is accessed directly through pointers to avoid per‑frame memory -allocation. The method is intentionally biased toward the upright orientation, returning a sideways -classification only when the horizontal‑edge energy significantly exceeds the vertical‑edge energy. - -## Usage - -``` -splitter [ ...] [options] [--] -``` - -Inputs may be provided directly, via `--file=...`, or using file masks such as `videos/*.mp4`. - - -## Options - -Below is a clean, ASCII‑only **options table** version of your content. -All option names are preserved exactly, and descriptions are consolidated for clarity. - ---- - -## Options - -| Option | Description | -|--------|-------------| -| **--out=** | Output folder for generated segments. Default: `/Splitter`. | -| **--file=** | Input file list or file mask. If omitted, the first non‑option argument is used as input. Examples: `--file=videos/*.mp4`, `--file=file_list.txt`. | -| **--mask=** | Custom output filename pattern. Default: `[NAME]_seg[NN].[EXT]`. Supports `[NAME]`, `[N]`, `[NN]`, `[NNN]`, `[NNNN]`, `[EXT]`. Example: `--mask="[NAME]_[NNNN].mp4"`. | -| **--duration=** | Override target segment duration. Formats: `Ns`, `NmMs`, `N`. Examples: `--duration=90s`, `--duration=2m30s`, `--duration=45`. Without `--force`: max 58 seconds, equalized across segments. | -| **--force** | Use the duration exactly as provided. Last segment may be shorter. | -| **--rotate=** | Rotate video by 90, 180, or 270 degrees. Useful for correcting orientation metadata. | -| **--rotate-auto** | Use automatic rotation detection. | -| **--estimate** | Print calculated segment information and exit. No splitting is performed. | -| **--crop[=]** | Crop video to a target width and height with face/body tracking. Default: 607x1080. Ideal for Shorts, TikTok, Reels. | -| **--detect=** | Object detector for tracking. Values: `face` (UltraFace), `body` (YoloOnnx, default), `none` (center crop). | -| **--gravitate=** | Bias the crop window toward a normalized point in the frame. Example: `--gravitate=0.2:0.5`. | -| **--text** | Use plain‑text logging instead of the rich terminal UI. | -| **--single-thread** | Disable parallel FFmpeg execution. Useful for debugging or low‑resource systems. | -| **--debug** | Show debug overlay during tracking. No cropping performed, but crop region shown. | -| **-p:=** | Set custom parameters for the object detector. Example: `-p:confidence=0.5`. Defaults: DropoutToleranceFrames=20, EmaFactor=0.65, CameraEasing=0.03, LostFreezeFrames=60. | - -## FFmpeg Passthrough - -Anything after `--` is passed directly to FFmpeg. - -Example: -``` -splitter video.mp4 --force --duration=45 -- -an -sn -``` - -## Input and Output Behavior - -- `input.mp4` may be a file mask (`videos/*.mp4`) -- Output filenames follow the `--mask` pattern -- Output folder defaults to `/Splitter` unless overridden - -## Examples - -Split into equal 60‑second segments: -``` -splitter vertical-video.mp4 -``` - -Split into equal 90‑second segments: -``` -splitter vertical-video.mp4 --duration=90s -``` - -Custom naming: -``` -splitter vertical-video.mp4 --duration=2m30s --mask="[NAME]_[NNNN].mp4" -``` - -Estimate only: -``` -splitter vertical-video.mp4 --estimate -``` - -Fixed 45‑second segments with passthrough: -``` -splitter vertical-video.mp4 --force --duration=45 -- -an -sn -``` - -Smart crop for Shorts: -``` -splitter horizontal-video.mp4 --out=Cropped/ --crop -``` - -Batch processing with body tracking: -``` -splitter --file=file_names.txt --out=Cropped/ --crop --detect=body -``` +## More info +[Command line tool](splitter-cli/README.md) +[GUI tool](splitter-ui/README.md) diff --git a/Splitter-UI/README.md b/Splitter-UI/README.md new file mode 100644 index 0000000..a0b451b --- /dev/null +++ b/Splitter-UI/README.md @@ -0,0 +1,59 @@ +# Splitter-UI + + +A compact, modern desktop front-end for Splitter (the high-performance FFmpeg-based video splitter). Built with Avalonia 12 and +targeting .NET 10, this project provides a native-feeling cross-platform UI to configure splitting jobs, preview smart +crops, and drive the Splitter CLI backend. + +## Overview + +Splitter-UI wraps the core Splitter pipeline (the referenced splitter-cli project) and exposes common workflow tasks +through an accessible interface: input selection, output naming, duration and crop controls, rotation options, detector settings, +and a job monitor with progress and ETA. For the full command-line feature set and the implementation rationale, see the +repository root README (../README.md). + +## Screenshots + +![Main window with job list and settings](screenshot.png) + + +## Getting started + +Requirements: .NET 10 runtime and FFmpeg/FFprobe available on PATH. The UI references the splitter-cli project; +build the solution to ensure the CLI is available to the UI during development. + +To build and run locally: + +1. From the solution root run: dotnet build +2. Start the UI project: dotnet run --project Splitter-UI + +## Packaging + +The csproj is configured for a win-x64 self-contained runtime identifier. Use dotnet publish with the desired +configuration and runtime identifier to produce distributable artifacts. + +## Configuration + +Settings exposed in the UI map closely to the CLI options: output folder, filename mask, segment duration and +force mode, rotation, crop and detector choices, gravitation bias and detector parameters. Advanced passthrough +arguments can still be supplied to FFmpeg via the CLI passthrough field. + +## Developer notes + +- Project: Splitter-UI (Avalonia 12, net10.0) +- Key packages: Avalonia, Avalonia.Controls.DataGrid, Avalonia.Desktop, Avalonia.Themes.Fluent, CommunityToolkit.Mvvm +- The UI project references the splitter-cli project for tight integration during development. + +## Troubleshooting + +If FFmpeg or FFprobe are not found, the app will be unable to probe media or run splits. Verify tools are on the +system PATH and that the runtime matches the built RID. + +## Contributing and License + +Contributions follow the main repository guidelines. See the root README for contributor and license information. + +## Contact + +For issues or questions, open an issue on the project repository or contact the +maintainer listed in the main README. diff --git a/Splitter-UI/Services/ThumbnailService.cs b/Splitter-UI/Services/ThumbnailService.cs index d8883bf..788c03b 100644 --- a/Splitter-UI/Services/ThumbnailService.cs +++ b/Splitter-UI/Services/ThumbnailService.cs @@ -13,7 +13,35 @@ public sealed class ThumbnailService : IThumbnailService private readonly byte [] _bgrBuffer = new byte[_thumbWidth * _thumbHeight * 3]; private readonly byte [] _bgraBuffer = new byte[_thumbWidth * _thumbHeight * 4]; + public SemaphoreSlim _lock = new(1,1); + public async Task CreateThumbnailAsync( + string file, + VideoInfo probe, + TimeSpan? skip = null, + int? width = null, + int? height = null, + int? rotateDegree = null) + { + await _lock.WaitAsync(); + try + { + return await CreateThumbnailInternal( + file, + probe, + skip, + width, + height, + rotateDegree + ); + } + finally + { + _lock.Release(); + } + } + + private async Task CreateThumbnailInternal( string file, VideoInfo probe, TimeSpan? skip = null, diff --git a/Splitter-UI/screenshot.png b/Splitter-UI/screenshot.png new file mode 100644 index 0000000..c788c27 Binary files /dev/null and b/Splitter-UI/screenshot.png differ diff --git a/splitter-cli/README.md b/splitter-cli/README.md new file mode 100644 index 0000000..a74d9d0 --- /dev/null +++ b/splitter-cli/README.md @@ -0,0 +1,212 @@ +# Splitter + +Splitter is a high-performance command line tool for cutting one or more video files into equal or fixed-length segments using multi-threaded FFmpeg execution. +It supports batch input, flexible duration formats, rotation, smart face/body-aware cropping, ETA and speed reporting, and both rich and plain-text terminal output. + +![Splitter](splitter.png) + +## Features + +- Multi-threaded FFmpeg splitting for maximum throughput +- Equal or fixed-length segmentation +- Batch input via file masks or list files +- Smart cropping with face/body tracking +- Rotation correction +- ETA, speed, and progress display +- FFmpeg passthrough for advanced control +- [Potentially] Cross-platform (.NET 10) + +## Requirements + +- FFmpeg and FFprobe available in system PATH +- .NET 10 Runtime or newer + +If you want to update model: + +- For face detection: [opencv_zoo/models/face_detection_yunet at main · opencv/opencv_zoo](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet) +- For body detection: [yolov8s.pt · Ultralytics/YOLOv8 at main](https://huggingface.co/Ultralytics/YOLOv8/blob/main/yolov8s.pt) + +To convert models from PyTorch to ONNX, you can use the following command: + +```python +from ultralytics import YOLO + +model = YOLO("yolov8x.pt") +model.export(format="onnx", opset=12, half=False) # FP32 ONNX +``` + +## How It Works + +1. Reads total duration using ffprobe +2. Parses target duration +3. Computes number of segments +4. If not forced, equalizes segment lengths +5. Runs multiple FFmpeg processes in parallel +6. Applies rotation, crop, and tracking if enabled +7. Displays progress, ETA, and speed + +## Face Tracking vs Body Tracking + +Face tracking and body tracking serve different purposes, and Splitter supports both because each +excels in different recording environments. When converting horizontal footage into vertical clips, +the choice of detector determines how stable, reliable, and natural the automated camera motion will be. + +![Face vs Body Tracking](tracking.png) + +### Face Tracking Using UltraFace 320 + +Splitter uses the UltraFace 320 ONNX model to perform lightweight, real-time face detection on each +frame of the input video. The detector produces bounding boxes for visible faces, and the tracking +system maintains a stable, smoothed target region across time. This is achieved by combining per-frame +detections with temporal smoothing (EMA), dropout tolerance, and camera easing. The result is a +continuous, stable crop window that follows the performer even when the face is partially occluded, +briefly lost, or moving rapidly. + +During segmentation, the crop window is recalculated for every frame, ensuring that each output +segment inherits the same smooth camera motion. This makes the vertical clips appear as if they +were recorded with a dedicated portrait-oriented camera operator. The UltraFace 320 model is +fast enough to run alongside multi-threaded FFmpeg splitting without becoming a bottleneck, +making it suitable for long recordings and batch processing. + +### Benefits of Full-Body Detection Using YOLOv8s for Live Gig Recordings + +When recording concerts or live gigs, performers often move unpredictably, turn away from the +camera, or become partially obscured by lighting, instruments, or stage effects. +Full-body detection using a YOLOv8s ONNX model provides a more reliable tracking anchor than +face detection alone. Because YOLOv8s can detect the entire human silhouette, the tracker +maintains stable framing even when the face is not visible, when the performer is far from +the camera, or when stage lighting makes facial features hard to detect. This produces vertical +clips that feel intentional and professionally framed, with fewer sudden jumps or lost-tracking +moments. For creators converting horizontal gig footage into short vertical clips for YouTube +Shorts or TikTok, body-based tracking significantly improves consistency, reduces manual editing, +and preserves the energy and motion of the performance. + +### Automated Camera Control + +Splitter includes an automated camera control system that simulates the behavior of a virtual +camera operator when generating vertical crops from horizontal footage. The goal is to maintain +smooth, intentional framing around the tracked subject, even when detections are noisy, intermittent, +or temporarily lost. + +The controller receives object detections (face or body) and converts them into a stable crop +window using a combination of Kalman filtering, exponential smoothing, dropout tolerance, +and a three-state tracking model. The Kalman filter provides predictive motion smoothing, +while the EMA factor blends the predicted position with the previous camera center to avoid jitter. +The camera easing value controls how quickly the virtual camera follows the subject, producing +natural-looking motion rather than abrupt jumps. + +When detections disappear, the controller enters one of two fallback modes. In LostFreeze mode, +the camera holds its last known position for a configurable number of frames, preventing sudden +jumps during brief occlusions. If the subject remains lost beyond that threshold, the controller +transitions to LostDrift mode, slowly drifting the camera back toward a neutral center position. +This prevents the crop from drifting off-screen and ensures that the output remains usable even +when tracking fails. All positions are clamped to valid bounds, guaranteeing that the crop window +never leaves the video frame. + +### Automatic rotation detection + +The rotation-estimation method is based on analyzing the distribution of gradient orientations within +a video frame. After converting the frame to grayscale, the algorithm computes horizontal and vertical +image gradients using Sobel operators and derives per-pixel gradient magnitudes and orientations. +These orientations are folded into the range [0, 180) and accumulated into a fixed-size, +magnitude-weighted histogram. The histogram represents the structural edge distribution of the frame, +independent of brightness fluctuations or local lighting artifacts. By comparing the total gradient +energy concentrated near 0 degrees (vertical edges) with the energy near 90 degrees (horizontal edges), +the method determines whether the frame is more consistent with an upright or sideways orientation. + +This approach is designed for environments where brightness-based cues are unreliable, such as +live concerts with strobe lights, LED walls, haze, and crowd movement. It relies solely on geometric +edge structure, which remains stable even under extreme lighting variation. The implementation is +optimized for high-throughput video processing: all intermediate Mats, buffers, and histograms are +preallocated, and pixel data is accessed directly through pointers to avoid per-frame memory +allocation. The method is intentionally biased toward the upright orientation, returning a sideways +classification only when the horizontal-edge energy significantly exceeds the vertical-edge energy. + +## Usage + +``` +splitter [ ...] [options] [--] +``` + +Inputs may be provided directly, via `--file=...`, or using file masks such as `videos/*.mp4`. + + +## Options + +Below is a clean, ASCII-only **options table** version of your content. +All option names are preserved exactly, and descriptions are consolidated for clarity. + +--- + +## Options + +| Option | Description | +|--------|-------------| +| **--out=** | Output folder for generated segments. Default: `/Splitter`. | +| **--file=** | Input file list or file mask. If omitted, the first non-option argument is used as input. Examples: `--file=videos/*.mp4`, `--file=file_list.txt`. | +| **--mask=** | Custom output filename pattern. Default: `[NAME]_seg[NN].[EXT]`. Supports `[NAME]`, `[N]`, `[NN]`, `[NNN]`, `[NNNN]`, `[EXT]`. Example: `--mask="[NAME]_[NNNN].mp4"`. | +| **--duration=** | Override target segment duration. Formats: `Ns`, `NmMs`, `N`. Examples: `--duration=90s`, `--duration=2m30s`, `--duration=45`. Without `--force`: max 58 seconds, equalized across segments. | +| **--force** | Use the duration exactly as provided. Last segment may be shorter. | +| **--rotate=** | Rotate video by 90, 180, or 270 degrees. Useful for correcting orientation metadata. | +| **--rotate-auto** | Use automatic rotation detection. | +| **--estimate** | Print calculated segment information and exit. No splitting is performed. | +| **--crop[=]** | Crop video to a target width and height with face/body tracking. Default: 607x1080. Ideal for Shorts, TikTok, Reels. | +| **--detect=** | Object detector for tracking. Values: `face` (UltraFace), `body` (YoloOnnx, default), `none` (center crop). | +| **--gravitate=** | Bias the crop window toward a normalized point in the frame. Example: `--gravitate=0.2:0.5`. | +| **--text** | Use plain-text logging instead of the rich terminal UI. | +| **--single-thread** | Disable parallel FFmpeg execution. Useful for debugging or low-resource systems. | +| **--debug** | Show debug overlay during tracking. No cropping performed, but crop region shown. | +| **-p:=** | Set custom parameters for the object detector. Example: `-p:confidence=0.5`. Defaults: DropoutToleranceFrames=20, EmaFactor=0.65, CameraEasing=0.03, LostFreezeFrames=60. | + +## FFmpeg Passthrough + +Anything after `--` is passed directly to FFmpeg. + +Example: +``` +splitter video.mp4 --force --duration=45 -- -an -sn +``` + +## Input and Output Behavior + +- `input.mp4` may be a file mask (`videos/*.mp4`) +- Output filenames follow the `--mask` pattern +- Output folder defaults to `/Splitter` unless overridden + +## Examples + +Split into equal 60-second segments: +``` +splitter vertical-video.mp4 +``` + +Split into equal 90-second segments: +``` +splitter vertical-video.mp4 --duration=90s +``` + +Custom naming: +``` +splitter vertical-video.mp4 --duration=2m30s --mask="[NAME]_[NNNN].mp4" +``` + +Estimate only: +``` +splitter vertical-video.mp4 --estimate +``` + +Fixed 45-second segments with passthrough: +``` +splitter vertical-video.mp4 --force --duration=45 -- -an -sn +``` + +Smart crop for Shorts: +``` +splitter horizontal-video.mp4 --out=Cropped/ --crop +``` + +Batch processing with body tracking: +``` +splitter --file=file_names.txt --out=Cropped/ --crop --detect=body +``` + diff --git a/splitter.png b/splitter-cli/splitter.png similarity index 100% rename from splitter.png rename to splitter-cli/splitter.png diff --git a/tracking.png b/splitter-cli/tracking.png similarity index 100% rename from tracking.png rename to splitter-cli/tracking.png