Usage Guide¶
This guide covers the most common workflows: installation, reading/writing Gaussian splats, format conversion, color conversion, and moving between CPU and GPU containers.
Installation¶
Basic installation:
pip install gsply
Optional features:
GPU acceleration (PyTorch):
pip install torch
Enables GSTensor, plyread_gpu(), plywrite_gpu(), and GPU-accelerated format conversions.
SOG format support:
pip install gsply[sogs]
Enables sogread() for reading SOG format files.
Full installation:
pip install gsply[sogs] torch # GPU + SOG support
Development:
# Editable install with development extras
pip install -e ".[dev]"
# Install documentation extras for Sphinx
pip install -e ".[docs]"
Requirements: Python 3.10+ with NumPy and Numba (auto-installed).
Reading Gaussian Splats¶
Object-Oriented API (Recommended):
from gsply import GSData
# Auto-detects format (compressed or uncompressed)
data = GSData.load("scene.ply")
print(f"Loaded {len(data):,} Gaussians")
print(f"SH degree: {data.get_sh_degree()}")
print(f"Contiguous: {data.is_contiguous()}")
Functional API:
from gsply import plyread
# Auto-detects format (compressed or uncompressed)
data = plyread("scene.ply")
The returned GSData object exposes vector-friendly NumPy arrays. All reads use zero-copy
views into a shared _base buffer, so slicing and masking operations don’t duplicate memory.
SOG Format Support:
from gsply import sogread
# Read SOG format (requires gsply[sogs])
data = sogread("model.sog") # Returns GSData (same API as plyread)
# In-memory reading from bytes
with open("model.sog", "rb") as f:
sog_bytes = f.read()
data = sogread(sog_bytes) # No disk I/O
Mask Management¶
Create named mask layers and combine them with boolean logic:
# Add named mask layers
data.add_mask_layer("high_opacity", data.opacities > 0.25)
data.add_mask_layer("foreground", data.means[:, 2] < 0.0)
# Combine with AND logic (both conditions must pass)
filtered = data.apply_masks(mode="and")
# Or combine with OR logic (either condition passes)
visible = data.apply_masks(mode="or", layers=["high_opacity", "foreground"])
# Combine specific layers programmatically
combined_mask = data.combine_masks(mode="and", layers=["high_opacity", "foreground"])
Mask layers persist through slicing, concatenation, and CPU↔GPU transfers.
Creating Data from External Sources¶
From Arrays (Recommended):
from gsply import GSData, GSTensor
import numpy as np
# Create from NumPy arrays with auto-format detection
data = GSData.from_arrays(
means=means, # (N, 3) xyz positions
scales=scales, # (N, 3) scales (auto-detects log vs linear)
quats=quats, # (N, 4) quaternions (wxyz order)
opacities=opacities, # (N,) opacities (auto-detects logit vs linear)
sh0=sh0, # (N, 3) DC spherical harmonics
shN=shN, # (N, K, 3) higher-order SH (optional, auto-detects degree)
format="auto" # "auto", "ply", or "linear"
)
# Explicit format specification (faster, skips detection)
data = GSData.from_arrays(means, scales, quats, opacities, sh0, format="ply")
# From dictionary
data = GSData.from_dict({
"means": means,
"scales": scales,
"quats": quats,
"opacities": opacities,
"sh0": sh0,
"shN": shN # optional
}, format="auto")
GPU Tensors:
import torch
# Create GSTensor from PyTorch tensors
gstensor = GSTensor.from_arrays(
means=means_tensor, # torch.Tensor (N, 3)
scales=scales_tensor, # torch.Tensor (N, 3)
quats=quats_tensor, # torch.Tensor (N, 4)
opacities=opacities_tensor, # torch.Tensor (N,)
sh0=sh0_tensor, # torch.Tensor (N, 3)
shN=shN_tensor, # torch.Tensor (N, K, 3) optional
format="auto",
device="cuda", # Auto-converts to target device
dtype=torch.float32 # Auto-converts to target dtype
)
# From dictionary
gstensor = GSTensor.from_dict({
"means": means_tensor,
"scales": scales_tensor,
"quats": quats_tensor,
"opacities": opacities_tensor,
"sh0": sh0_tensor,
"shN": shN_tensor
}, format="ply", device="cuda")
Format Presets:
"auto"(default): Automatically detects PLY format (log-scales/logit-opacities) vs Linear format"ply": Explicitly sets PLY format (log-scales/logit-opacities) - use when data matches PLY file spec"linear"or"rasterizer": Explicitly sets linear format (linear scales/opacities) - use for renderer compatibility
SH Degree Inference:
Automatically infers SH degree from
shN.shape[1]if not specifiedValid degrees: 0 (no shN), 1 (9 bands), 2 (24 bands), 3 (45 bands)
Raises
ValueErrorif shape doesn’t match a valid degree
Writing Data¶
Object-Oriented API (Recommended):
from gsply import GSData, GSTensor
# Save uncompressed (auto-optimized)
data.save("output.ply")
# Save compressed format
data.save("output.ply", compressed=True)
# GPU acceleration
gstensor = GSTensor.load("model.ply", device='cuda')
gstensor.save("output.compressed.ply") # GPU compression (default)
Functional API:
from gsply import plywrite
# Write uncompressed (auto-optimized)
plywrite("output.ply", data)
# Write compressed format
plywrite("output.ply", data, compressed=True)
# Or use file extension to indicate compression
plywrite("output.compressed.ply", data)
Automatic optimizations:
Zero-copy writes: When
data._baseexists (fromplyread()), the buffer is streamed directlyAuto-consolidation: Without
_base, arrays are automatically consolidated for 2.4x faster writesFormat detection: Compression is selected when
compressed=Trueor when the file extension is.compressed.ply/.ply_compressedFormat conversion: Automatically converts to PLY format (log-scales, logit-opacities) before writing
In-Memory Compression¶
Compress and decompress data without disk I/O:
from gsply import compress_to_bytes, decompress_from_bytes
# Compress to bytes (no disk I/O)
payload = compress_to_bytes(data)
# Decompress from bytes
round_trip = decompress_from_bytes(payload)
assert round_trip.means.shape == data.means.shape
Ideal for network transport, streaming, or custom storage backends. Achieves 71-74% size reduction with PlayCanvas format.
Format Conversion¶
PLY files store scales in log-space and opacities in logit-space. Convert between formats as needed:
from gsply import GSData
# Load PLY file (contains log-scales and logit-opacities)
data = GSData.load("scene.ply")
# Convert to linear format for computation/visualization
data.denormalize() # Uses fused kernel (~8-15x faster)
# Converts log-scales → linear, logit-opacities → linear, normalizes quaternions
print(f"Linear opacity range: [{data.opacities.min():.3f}, {data.opacities.max():.3f}]")
# Modify in linear space
data.opacities = np.clip(data.opacities * 1.2, 0, 1)
# Convert back to PLY format before saving
data.normalize() # Uses fused kernel (~8-15x faster)
# Converts linear → log-scales, linear → logit-opacities
data.save("modified.ply")
Color Conversion (SH ↔ RGB):
# Convert sh0 from SH format to RGB colors
data.to_rgb() # sh0 now contains RGB colors [0, 1]
data.sh0 *= 1.5 # Make brighter (RGB space)
data.to_sh() # Convert back to SH format for PLY compatibility
Advanced: Direct Fused Kernels:
from gsply import apply_pre_activations, apply_pre_deactivations
# Direct access to fused activation kernel (~8-15x faster)
# Useful for fine-grained control over activation parameters
apply_pre_activations(
data,
min_scale=0.01, # Custom scale bounds
max_scale=10.0,
min_quat_norm=1e-6, # Custom quaternion norm floor
inplace=True
)
# Direct access to fused deactivation kernel (~8-15x faster)
apply_pre_deactivations(
data,
min_scale=1e-8, # Custom scale floor
min_opacity=1e-5, # Custom opacity bounds
max_opacity=0.999,
inplace=True
)
GPU Acceleration with PyTorch¶
Object-Oriented API (Recommended):
from gsply import GSTensor
# Direct GPU loading (auto-detects format)
gstensor = GSTensor.load("model.ply", device="cuda")
# Save with GPU compression
gstensor.save("output.compressed.ply") # GPU compression (default)
# GPU format conversion
gstensor.denormalize() # GPU-accelerated (uses torch.exp, torch.sigmoid)
gstensor.to_rgb() # GPU-accelerated SH → RGB conversion
Functional API:
from gsply import GSTensor, plyread_gpu, plywrite_gpu
# Transfer to GPU (11x faster with zero-copy base tensor)
gstensor = GSTensor.from_gsdata(data, device="cuda", requires_grad=False)
# Direct GPU I/O
gstensor = plyread_gpu("model.compressed.ply", device="cuda")
plywrite_gpu("output.compressed.ply", gstensor)
# GPU-optimized mask operations (100-1000x faster than CPU)
mask = gstensor.combine_masks(mode="and")
subset = gstensor[mask]
# Enable gradients for training
gstensor_train = GSTensor.from_gsdata(data, device="cuda", requires_grad=True)
GSTensor mirrors GSData ergonomics: .add(), .concatenate(), mask helpers,
and apply_masks(). When _base is present, transfers use a single operation for efficiency.
Performance Tips¶
Contiguity Optimization¶
For workloads with many array operations, convert to contiguous layout:
# Check if arrays are contiguous
if not data.is_contiguous():
# Convert (one-time cost, but 2-45x faster per operation)
data.make_contiguous(inplace=True)
# Now array operations are much faster
result = data.means.sum() + data.means.max() # Up to 45x faster!
Break-even: Convert if you will perform ≥8 operations on the data.
Bulk Concatenation¶
For merging multiple datasets, use bulk concatenation:
# Fast: Single allocation (5.74x faster)
combined = GSData.concatenate([data1, data2, data3, ...])
# Slower: Repeated pairwise operations
result = data1
for d in [data2, data3, ...]:
result = result.add(d) # Creates intermediate allocations
GPU Mask Operations¶
GPU mask operations are 100-1000x faster than CPU:
# CPU: ~1.43ms for 100K Gaussians, 5 layers
mask = data.combine_masks(mode="and")
# GPU: ~0.001ms for 100K Gaussians, 5 layers (1000x faster!)
gstensor = GSTensor.from_gsdata(data, device="cuda")
mask = gstensor.combine_masks(mode="and")