Overview¶
gsply is a high-performance Python library for Gaussian Splatting PLY file I/O. It combines zero-copy memory management with JIT-compiled compression pipelines to deliver industry-leading performance with a clean, Pythonic API.
Core Design Philosophy¶
gsply focuses on three complementary workflows:
Ultra-fast I/O — Read and write uncompressed or PlayCanvas-compressed PLY files using vectorized NumPy kernels and Numba JIT pipelines. Zero-copy views into a shared
_basebuffer eliminate unnecessary memory allocations.Optimal GPU Transfer — Data loaded from
plyread()uses the_basetensor for 11x faster GPU transfers. Single tensor transfer eliminates CPU-side memory copies, achieving 1.99ms vs 22.78ms for 400K Gaussians.Flexible Containers —
GSData(CPU) andGSTensor(GPU) provide helpers for concatenation, masking, contiguity optimization, and CPU↔GPU transfers. Format state is tracked automatically via_formatdictionary for seamless in-place conversions.
Key Capabilities¶
Performance First¶
Benchmarked at 93M Gaussians/sec peak read and 57M Gaussians/sec peak write (zero-copy). Uses fused unpack/pack kernels and pre-computed lookup tables. Zero-copy reads eliminate memory overhead; auto-consolidation optimizes writes automatically.
Format Flexibility¶
detect_format() auto-detects PLY layout. plywrite() selects compressed output when
compressed=True or when the file extension is .compressed.ply. Supports uncompressed PLY,
PlayCanvas compressed PLY, and SOG (Splat Ordering Grid) formats.
Object-Oriented API¶
Convenient save/load methods and factory methods for cleaner code:
data.save(file_path, compressed=False)- Instance method for savingGSData.load(file_path)- Classmethod for loading (auto-detects format)GSData.from_arrays(...)- Create from NumPy arrays with format presetGSData.from_dict(data_dict)- Create from dictionary with format presetgstensor.save(file_path, compressed=True)- GPU compression by defaultGSTensor.load(file_path, device='cuda')- Direct GPU loadingGSTensor.from_arrays(...)- Create from PyTorch tensors with format presetGSTensor.from_dict(data_dict)- Create from dictionary with format preset
Format Conversion with In-Place Tracking¶
Convert between linear and PLY formats seamlessly with automatic format state tracking:
normalize()/denormalize()- Convert scales/opacities between linear and PLY formatsUses fused Numba kernels internally (~8-15x faster than individual operations)
Single-pass processing reduces memory overhead
Parallel execution for optimal performance
apply_pre_activations()/apply_pre_deactivations()- Direct access to fused kernelsFine-grained control over activation/deactivation parameters
Quaternion normalization included in activation kernel
to_rgb()/to_sh()- Convert sh0 between SH and RGB color formatsAvailable for both
GSData(CPU) andGSTensor(GPU)In-place operations by default (
inplace=True) for efficiencyFormat state tracked automatically via
_formatdictionary (scales, opacities, sh0, sh_order)Conversion methods update
_formatto reflect current data state
Advanced Mask Management¶
GSData and GSTensor support multiple boolean mask layers with named entries.
Use add_mask_layer(), combine_masks(), and apply_masks() for filtering.
Masks persist through slicing, concatenation, and CPU↔GPU transfers.
Memory Layout Optimization¶
GSData.make_contiguous() recompacts _base views into contiguous arrays once you cross
the ≈8 operations break-even point, dramatically accelerating reductions (sum, max, etc.)
and point-wise transforms. Up to 45x faster for certain operations.
Optimal GPU Transfer¶
GSTensor.from_gsdata() uses the _base tensor optimization for 11x faster GPU transfers:
With
_base(fromplyread()): Single tensor transfer, zero CPU copy overhead (1.99ms for 400K Gaussians)Without
_base: Falls back to stacking arrays on CPU then transferring (22.78ms for 400K Gaussians)Mask layers persist through transfers, and GPU operations leverage PyTorch’s parallelism for 100-1000x speedups over CPU
Format state (
_formatdict) is preserved during GPU transfers for seamless conversion tracking
Data Layout¶
Each Gaussian is represented by the following NumPy arrays:
Attribute |
Shape |
Description |
|---|---|---|
|
|
XYZ world-space coordinates |
|
|
Log-scale parameters |
|
|
WXYZ quaternion rotations |
|
|
Logit opacity values |
|
|
DC spherical harmonics (RGB) |
|
|
Higher-order SH bands (optional) |
|
|
Boolean mask layers (optional) |
Zero-copy optimization: When reading from PLY files, these properties are arranged as
column slices of a single _base array. Each view shares storage with the base array,
and Python’s reference counting keeps the base alive automatically.
API Selection Guide¶
Choose the right API for your use case:
Scenario |
Recommended API |
|---|---|
Load a PLY file (any format) |
|
Create from external arrays/dicts |
|
Write back to disk (auto-optimized) |
|
Load SOG format files |
|
Convert linear ↔ PLY format |
|
Convert SH ↔ RGB colors |
|
Stream compressed bytes over network |
|
Batch merge hundreds of shards |
|
GPU training / rendering loops |
|
Create GPU tensors from external data |
|
GPU compression |
|
Filter data with multiple conditions |
|
Optimize for many array operations |
|
Next Steps¶
New users: Start with the :doc:
usageguide for installation and basic examplesAPI reference: Browse the :doc:
api/indexfor complete function and class documentationPerformance tuning: See the performance notes in individual function docstrings