Overview

gsply is a high-performance Python library for Gaussian Splatting PLY file I/O. It combines zero-copy memory management with JIT-compiled compression pipelines to deliver industry-leading performance with a clean, Pythonic API.

Core Design Philosophy

gsply focuses on three complementary workflows:

  • Ultra-fast I/O — Read and write uncompressed or PlayCanvas-compressed PLY files using vectorized NumPy kernels and Numba JIT pipelines. Zero-copy views into a shared _base buffer eliminate unnecessary memory allocations.

  • Optimal GPU Transfer — Data loaded from plyread() uses the _base tensor for 11x faster GPU transfers. Single tensor transfer eliminates CPU-side memory copies, achieving 1.99ms vs 22.78ms for 400K Gaussians.

  • Flexible ContainersGSData (CPU) and GSTensor (GPU) provide helpers for concatenation, masking, contiguity optimization, and CPU↔GPU transfers. Format state is tracked automatically via _format dictionary for seamless in-place conversions.

Key Capabilities

Performance First

Benchmarked at 93M Gaussians/sec peak read and 57M Gaussians/sec peak write (zero-copy). Uses fused unpack/pack kernels and pre-computed lookup tables. Zero-copy reads eliminate memory overhead; auto-consolidation optimizes writes automatically.

Format Flexibility

detect_format() auto-detects PLY layout. plywrite() selects compressed output when compressed=True or when the file extension is .compressed.ply. Supports uncompressed PLY, PlayCanvas compressed PLY, and SOG (Splat Ordering Grid) formats.

Object-Oriented API

Convenient save/load methods and factory methods for cleaner code:

  • data.save(file_path, compressed=False) - Instance method for saving

  • GSData.load(file_path) - Classmethod for loading (auto-detects format)

  • GSData.from_arrays(...) - Create from NumPy arrays with format preset

  • GSData.from_dict(data_dict) - Create from dictionary with format preset

  • gstensor.save(file_path, compressed=True) - GPU compression by default

  • GSTensor.load(file_path, device='cuda') - Direct GPU loading

  • GSTensor.from_arrays(...) - Create from PyTorch tensors with format preset

  • GSTensor.from_dict(data_dict) - Create from dictionary with format preset

Format Conversion with In-Place Tracking

Convert between linear and PLY formats seamlessly with automatic format state tracking:

  • normalize() / denormalize() - Convert scales/opacities between linear and PLY formats

    • Uses fused Numba kernels internally (~8-15x faster than individual operations)

    • Single-pass processing reduces memory overhead

    • Parallel execution for optimal performance

  • apply_pre_activations() / apply_pre_deactivations() - Direct access to fused kernels

    • Fine-grained control over activation/deactivation parameters

    • Quaternion normalization included in activation kernel

  • to_rgb() / to_sh() - Convert sh0 between SH and RGB color formats

  • Available for both GSData (CPU) and GSTensor (GPU)

  • In-place operations by default (inplace=True) for efficiency

  • Format state tracked automatically via _format dictionary (scales, opacities, sh0, sh_order)

  • Conversion methods update _format to reflect current data state

Advanced Mask Management

GSData and GSTensor support multiple boolean mask layers with named entries. Use add_mask_layer(), combine_masks(), and apply_masks() for filtering. Masks persist through slicing, concatenation, and CPU↔GPU transfers.

Memory Layout Optimization

GSData.make_contiguous() recompacts _base views into contiguous arrays once you cross the ≈8 operations break-even point, dramatically accelerating reductions (sum, max, etc.) and point-wise transforms. Up to 45x faster for certain operations.

Optimal GPU Transfer

GSTensor.from_gsdata() uses the _base tensor optimization for 11x faster GPU transfers:

  • With _base (from plyread()): Single tensor transfer, zero CPU copy overhead (1.99ms for 400K Gaussians)

  • Without _base: Falls back to stacking arrays on CPU then transferring (22.78ms for 400K Gaussians)

  • Mask layers persist through transfers, and GPU operations leverage PyTorch’s parallelism for 100-1000x speedups over CPU

  • Format state (_format dict) is preserved during GPU transfers for seamless conversion tracking

Data Layout

Each Gaussian is represented by the following NumPy arrays:

Attribute

Shape

Description

means

(N, 3)

XYZ world-space coordinates

scales

(N, 3)

Log-scale parameters

quats

(N, 4)

WXYZ quaternion rotations

opacities

(N,)

Logit opacity values

sh0

(N, 3)

DC spherical harmonics (RGB)

shN

(N, K, 3)

Higher-order SH bands (optional)

masks

(N,) or (N, L)

Boolean mask layers (optional)

Zero-copy optimization: When reading from PLY files, these properties are arranged as column slices of a single _base array. Each view shares storage with the base array, and Python’s reference counting keeps the base alive automatically.

API Selection Guide

Choose the right API for your use case:

Scenario

Recommended API

Load a PLY file (any format)

GSData.load() — Object-oriented, auto-detects format

Create from external arrays/dicts

GSData.from_arrays() / GSData.from_dict() — Factory methods with format presets

Write back to disk (auto-optimized)

data.save() — Object-oriented, automatic optimization

Load SOG format files

sogread() — Returns GSData (same API)

Convert linear ↔ PLY format

normalize() / denormalize() — Fused kernels (~8-15x faster)

Convert SH ↔ RGB colors

to_rgb() / to_sh() — In-place color conversion

Stream compressed bytes over network

compress_to_bytes() / decompress_from_bytes()

Batch merge hundreds of shards

GSData.concatenate() — Bulk merge (5.74x faster)

GPU training / rendering loops

GSTensor.load() — Direct GPU loading

Create GPU tensors from external data

GSTensor.from_arrays() / GSTensor.from_dict() — Factory methods with format presets

GPU compression

gstensor.save() — GPU compression (default)

Filter data with multiple conditions

add_mask_layer() + combine_masks()

Optimize for many array operations

make_contiguous() — Up to 45x speedup

Next Steps

  • New users: Start with the :doc:usage guide for installation and basic examples

  • API reference: Browse the :doc:api/index for complete function and class documentation

  • Performance tuning: See the performance notes in individual function docstrings