Changelog

All notable changes to gsply are documented in this file.

Release Notes

v0.2.10 (Code Elegance & API Cleanup)

API Changes (Breaking)

  • Removed Deprecated Aliases: Cleaned up API surface by removing deprecated function and method aliases

    • Removed create_linear_format() function - Use create_rasterizer_format() instead

    • Removed GSData.to_ply_format() method - Use normalize() instead

    • Removed GSData.from_ply_format() method - Use denormalize() instead

    • Removed GSData.to_linear() method - Use denormalize() instead

    • Removed GSTensor.to_ply_format() method - Use normalize() instead

    • Removed GSTensor.from_ply_format() method - Use denormalize() instead

    • Removed GSTensor.to_linear() method - Use denormalize() instead

    • These aliases were introduced in v0.2.5 for backward compatibility and are no longer needed

Code Quality Improvements

  • Internal Refactoring: Improved code maintainability and organization

    • Broke down monolithic _validate_and_normalize_inputs() into 4 focused helper functions:

      • _ensure_numpy_arrays() - Type conversion

      • _convert_to_float32() - dtype normalization with fast path

      • _validate_array_shapes() - Shape validation

      • _flatten_shn() - Array reshaping

    • Extracted duplicated chunk boundary calculation logic into reusable helpers:

      • _compute_chunk_boundaries() for NumPy (CPU)

      • _compute_chunk_boundaries_gpu() for PyTorch (GPU)

    • Single responsibility principle applied throughout

  • Documentation: Converted internal helper function docstrings to Sphinx/reST format

    • Explicit :param:, :type:, :return:, :rtype: annotations

    • Better IDE integration and API documentation generation

Migration Guide

# Old (v0.2.9 and earlier)
format_dict = create_linear_format(sh_degree=2)  # DEPRECATED
data_normalized = data.to_ply_format()  # DEPRECATED
data_linear = data.from_ply_format()  # DEPRECATED
data_linear = data.to_linear()  # DEPRECATED

# New (v0.2.10+)
format_dict = create_rasterizer_format(sh_degree=2)  # Use canonical name
data_normalized = data.normalize()  # Use canonical method
data_linear = data.denormalize()  # Use canonical method
data_linear = data.denormalize()  # Use canonical method

Testing

  • All 406 tests passing (down from 407 after removing deprecated alias test)

  • Zero breaking changes to canonical API

  • No performance regressions


v0.2.9 (Protocol Interfaces & Performance Optimization)

New Features

  • Protocol Interfaces: Type-safe interfaces for format management and data operations

    • FormatAware - Protocol for objects that track format state (scales, opacities, sh0, sh_order)

    • Normalizable - Protocol for objects that support format conversion (normalize/denormalize)

    • GaussianContainer - Protocol for objects containing Gaussian splat data

    • Enables type checking across GSData and GSTensor with structural typing

    • Improves IDE autocomplete and static analysis

  • Format Management API: Advanced methods for format state control

    • format_state property - Returns current format as immutable FormatState TypedDict

    • copy_format_from(other) - Copy format state from another FormatAware object

    • with_format(**kwargs) - Create shallow copy with modified format (functional style)

    • Simplifies format handling in complex data pipelines

  • In-Place Format Conversion: All format conversion methods now modify data in-place by default

    • normalize(inplace=True) - Default behavior optimized for performance

    • denormalize(inplace=True) - Default behavior optimized for performance

    • to_rgb(inplace=True) - Default behavior optimized for performance

    • to_sh(inplace=True) - Default behavior optimized for performance

    • Set inplace=False to create a copy (previous default behavior)

Performance Improvements

  • Removed Auto-Consolidate Overhead: Eliminated automatic _base array construction in plywrite()

    • Previous behavior: plywrite() automatically called consolidate() for 2.6-2.8x speedup

    • New behavior: Users can manually call data.make_contiguous() or data.consolidate() when needed

    • Reason: Auto-consolidate added unnecessary overhead for already-contiguous data

    • Maintains backward compatibility - zero-copy path still works automatically

  • In-Place Operations Default: Format conversions now modify data in-place by default

    • Reduces memory allocations and copies

    • Better performance for typical use cases where copy is not needed

    • Previous behavior available via inplace=False parameter

API Changes

  • Format Conversion Default Changed: inplace=True is now the default for all conversion methods

    • Previous default: inplace=False (created copies)

    • New default: inplace=True (modifies in-place)

    • Migration: Add inplace=False if you need a copy instead of in-place modification

  • Example Migration:

    # Old code (v0.2.8 and earlier)
    data_normalized = data.normalize()  # Created a copy
    
    # New code (v0.2.9+)
    data_normalized = data.normalize(inplace=False)  # Explicitly request copy
    # or
    data.normalize()  # Modifies in-place (new default)
    

Implementation Details

  • Protocol interfaces use structural typing (no inheritance required)

  • FormatState uses TypedDict with Required/NotRequired for partial updates

  • All existing GSData and GSTensor methods support new protocols

  • Format copying preserves format state across operations

  • with_format() enables functional-style format updates without mutation

Testing

  • Added comprehensive test coverage for protocol interfaces (11 new tests)

    • tests/test_protocols.py - Tests for FormatAware, Normalizable, GaussianContainer protocols

    • Tests cover: protocol compliance, format_state property, copy_format_from(), with_format()

    • Integration tests verify GSData and GSTensor implement all protocols correctly

  • Total test count: 406 tests (365 + 41 new)


v0.2.8 (Format Query Properties)

New Features

  • Format Query Properties: Convenient boolean properties to check current data format

    • Scale format: is_scales_ply, is_scales_linear

    • Opacity format: is_opacities_ply, is_opacities_linear

    • Color format: is_sh0_sh, is_sh0_rgb

    • SH degree: is_sh_order_0, is_sh_order_1, is_sh_order_2, is_sh_order_3

    • Available on both GSData and GSTensor classes

    • Properties update automatically during format conversions (normalize(), denormalize(), to_rgb(), to_sh())

Usage Example

data = gsply.plyread("scene.ply")

# Check format before operations
if data.is_scales_ply:
    data.denormalize()  # Convert to linear

if data.is_sh0_sh:
    data.to_rgb()  # Convert to RGB colors

# Check SH degree
if data.is_sh_order_3:
    print("High-quality SH3 data")

Implementation Details

  • Properties use safe .get() access on _format dict

  • All conversion methods properly update format tracking

  • Format is preserved through copy, slice, concatenate, and device transfer operations

  • Format validation in add() and concatenate() raises clear errors for mismatches


v0.2.7 (Fused Activation Kernels & Performance Optimization)

New Features

  • Fused Activation Kernels: Ultra-fast format conversion with parallel Numba kernels

    • apply_pre_activations(data, min_scale=1e-4, max_scale=100.0, min_quat_norm=1e-8, inplace=True) - Fused kernel for activating scales, opacities, and quaternions

      • Converts log-scales → linear scales (exp + clamp) in single pass

      • Converts logit-opacities → linear opacities (sigmoid) in single pass

      • Normalizes quaternions with safety floor

      • Performance: ~8-15x faster than individual operations

    • apply_pre_deactivations(data, min_scale=1e-9, min_opacity=1e-4, max_opacity=0.9999, inplace=True) - Fused kernel for deactivating scales and opacities

      • Converts linear scales → log-scales (log + clamp) in single pass

      • Converts linear opacities → logit-opacities (logit + clamp) in single pass

      • Performance: ~8-15x faster than individual operations

    • Both functions use parallel Numba JIT compilation for optimal performance

    • Single-pass processing reduces memory overhead and improves cache locality

Performance Improvements

  • Format Conversion Optimization: normalize() and denormalize() now use fused kernels internally

    • GSData.normalize() uses apply_pre_deactivations() for ~8-15x speedup

    • GSData.denormalize() uses apply_pre_activations() for ~8-15x speedup

    • Quaternion normalization included in activation kernel (denormalize only)

    • Scales and opacities processed together in single parallel pass

  • Memory Efficiency: Fused kernels reduce intermediate allocations

    • Single-pass processing improves cache locality

    • Lower memory overhead compared to sequential operations

Improvements

  • Internal Refactoring: Format conversion methods now use optimized fused kernels

    • normalize() replaced manual np.log() and logit() calls with apply_pre_deactivations()

    • denormalize() replaced manual np.exp() and sigmoid() calls with apply_pre_activations()

    • Maintains backward compatibility - same API, better performance

  • Code Quality: Centralized activation/deactivation logic in reusable functions

    • Consistent behavior across all format conversion operations

    • Easier to maintain and optimize

Testing

  • Added comprehensive test coverage for new activation functions (17 new tests)

    • tests/test_pre_activations.py - Full test suite for apply_pre_activations() and apply_pre_deactivations()

    • Tests cover: basic functionality, in-place vs copy, custom bounds, edge cases, validation errors, roundtrips

    • Integration tests verify normalize() and denormalize() use optimized kernels correctly

  • Total test count: 365 tests (348 + 17 new)

v0.2.6 (Format Safety & Auto-detection)

New Features

  • Convenience Factory Methods: Create GSData/GSTensor from external data with format presets

    • GSData.from_arrays(means, scales, quats, opacities, sh0, shN=None, format='auto') - Create from arrays with format preset

    • GSData.from_dict(data_dict, format='auto') - Create from dictionary with format preset

    • GSTensor.from_arrays(means, scales, quats, opacities, sh0, shN=None, format='auto', device='cuda') - Create from tensors with format preset

    • GSTensor.from_dict(data_dict, format='auto', device='cuda') - Create from dictionary with format preset

    • Format presets: "auto" (detect), "ply" (log/logit), "linear" or "rasterizer" (linear)

    • Auto-detects SH degree from shN shape when not specified

  • Automatic Format Detection: Smart heuristics to detect PLY format vs Linear format

    • Automatically detects if data uses log-scales/logit-opacities (PLY format) or linear values

    • _detect_format_from_values() uses statistical analysis of data ranges

    • Works when creating GSData or GSTensor from raw arrays

    • Ensures correct format handling without manual flag setting

  • Format Helper Functions: Clearer API for creating format dictionaries

    • create_ply_format(sh_degree) - For data matching PLY file spec

    • create_rasterizer_format(sh_degree) - For data matching renderer spec

    • create_linear_format(sh_degree) - Alias for rasterizer format

  • Strict Format Validation:

    • GSData and GSTensor now enforce format consistency during concatenation

    • Prevents accidental merging of mixed formats (e.g. linear + log-space)

    • Raises clear ValueError with helpful instructions

Improvements

  • Enhanced GSData / GSTensor:

    • _format field is now always present (never None), auto-populated if missing

    • __post_init__ automatically detects format from data values if not specified

    • All format conversion methods (normalize, denormalize, to_rgb, to_sh) correctly update format tracking

  • Writer Safety:

    • plywrite() now auto-detects format when passed raw arrays

    • plywrite() ensures data is in PLY format before writing (auto-converts linear -> PLY if needed)

    • Prevents writing linear data as if it were log-space (which would cause invalid scale/opacity values)

Testing

  • Added tests/test_format_management.py covering all new format utilities and safety checks

  • Added comprehensive edge case tests for from_arrays() and from_dict() methods (26 new tests)

    • Empty data, single Gaussian, shape mismatches, format boundary values

    • Missing/extra dictionary keys, device/dtype handling, format preset edge cases

v0.2.5 (SOG Format Support & API Improvements)

New Features

  • SOG Format Reader: Read SOG (Splat Ordering Grid) format files

    • sogread(file_path | bytes) - Read SOG files from path or bytes (requires gsply[sogs])

    • Returns GSData container (same as plyread()) for consistent API

    • Supports .sog ZIP bundles and folder formats

    • In-memory ZIP extraction: Can read directly from bytes without disk I/O

    • Uses imagecodecs (fastest WebP decoder) for optimal performance

    • Compatible with PlayCanvas splat-transform format

  • Object-Oriented I/O API: Convenient save/load methods for GSData and GSTensor

    • data.save(file_path, compressed=False) - Instance method wrapping plywrite() for object-oriented API

    • GSData.load(file_path) - Classmethod wrapping plyread() (auto-detects format)

    • gstensor.save(file_path, compressed=True) - Instance method for saving GSTensor (GPU compression by default)

    • gstensor.save_compressed(file_path) - Convenience alias for compressed saves

    • GSTensor.load(file_path, device='cuda') - Classmethod for loading GSTensor (auto-detects format, uses GPU decompression for compressed files)

    • Provides cleaner object-oriented API while maintaining backward compatibility with module-level functions

  • Format Conversion API: Elegant in-place operations for PLY format conversion

    • GSData.normalize(inplace=True) - Convert linear scales/opacities to PLY-compatible log/logit format

    • GSData.denormalize(inplace=True) - Convert PLY format back to linear scales/opacities

    • GSTensor.normalize(inplace=True) - GPU version of normalize

    • GSTensor.denormalize(inplace=True) - GPU version of denormalize

    • Supports both in-place modification and copy creation

    • Uses optimized Numba-accelerated functions (CPU) and PyTorch CUDA kernels (GPU)

  • Color Conversion API: In-place SH ↔ RGB conversion methods

    • data.to_rgb(inplace=True) - Convert sh0 from SH format to RGB colors (Numba-optimized CPU)

    • data.to_sh(inplace=True) - Convert sh0 from RGB format to SH coefficients (Numba-optimized CPU)

    • gstensor.to_rgb(inplace=True) - GPU version of to_rgb

    • gstensor.to_sh(inplace=True) - GPU version of to_sh

    • True in-place operations (modifies arrays/tensors directly without intermediate copies)

API Improvements

  • Object-Oriented I/O: Added save/load methods to GSData and GSTensor for cleaner API

    • Module-level functions (plyread, plywrite) remain available for functional style

    • Lazy imports prevent circular dependencies with writer.py and reader.py

    • GSTensor.save() uses GPU compression by default for optimal performance

  • Refactored conversion methods: to_ply_data() and from_ply_data() now use normalize()/denormalize() internally

    • More consistent API design

    • Better support for in-place operations

    • Clearer method names (normalize/denormalize vs to_ply_data/from_ply_data)

  • Format tracking: Internal _format dictionary tracks data format state

    • Uses TypedDict for type safety and IDE autocomplete

    • Tracks scales (PLY/linear), opacities (PLY/linear), sh0 (SH/RGB), and SH order

    • Automatically set during I/O operations and format conversions

  • Simplified dependencies: SOG support now requires only imagecodecs (removed fallback libraries)

  • API consistency: SOG reader returns GSData container matching plyread() behavior

Performance Improvements

  • SOG reading: In-memory reading from bytes is ~6x faster than file path reading

  • CPU utilities: Enhanced logit() and sigmoid() functions with Numba parallel JIT compilation

Code Cleanup

  • Removed redundant code: Eliminated torch/utils.py wrapper module

    • GPU operations now use PyTorch functions directly (torch.logit, torch.sigmoid)

    • Reduced code duplication

    • Simpler import structure

  • Optimized CPU utilities: Enhanced logit() and sigmoid() functions

    • Numba parallel JIT compilation for better performance

    • Both functions are now part of the public API

Documentation

  • Added comprehensive documentation for save() and load() methods in README and API reference

  • Added documentation for normalize() and denormalize() methods

  • Added documentation for to_rgb() and to_sh() color conversion methods

  • Added documentation for logit() and sigmoid() utility functions

  • Updated API reference with examples for object-oriented I/O

  • Updated AGENTS.md with implementation details for save/load methods

Dependencies

  • Added optional sogs dependency group: pip install gsply[sogs]

    • Installs imagecodecs>=2024.0.0 for WebP decoding


v0.2.4 (GPU I/O API & Performance Optimizations)

New Features

  • GPU I/O API: Direct GPU compression/decompression functions

    • plyread_gpu(file_path, device='cuda') - Read compressed PLY directly to GPU

      • 4-5x faster than CPU decompression + GPU transfer

      • Direct GPU memory allocation (no intermediate CPU copies)

      • Optimized batch memory transfer (1.71x speedup)

      • ~19ms for 365K Gaussians (19 M/s throughput)

    • plywrite_gpu(file_path, gstensor, compressed=True) - Write GSTensor using GPU compression

      • 4-5x faster compression than CPU Numba

      • GPU reduction for chunk bounds (instant)

      • Minimal CPU-GPU data transfer

      • ~18ms for 365K Gaussians (20 M/s throughput)

    • Lazy import pattern - PyTorch only loaded when functions are accessed

    • Consistent API style matching plyread()/plywrite()

Performance Improvements

  • GPU Compression: Full GPU-accelerated compression pipeline

    • Optimized memory transfers (batch transfer reduces DMA overhead)

    • Pre-computed ranges for quantization

    • Vectorized chunk bounds computation

  • CPU Compression: Pre-compute ranges optimization

    • 1.44x speedup by computing ranges once per chunk instead of per-vertex

    • Eliminates redundant calculations in packing loops

API Changes

  • New top-level functions: gsply.plyread_gpu() and gsply.plywrite_gpu()

    • Available via lazy import (PyTorch not required unless used)

    • Returns GSTensor instead of GSData for GPU operations

    • Only supports compressed format (GPU path optimized for this)

Documentation

  • Added GPU I/O API documentation to Sphinx docs

  • Updated API reference with performance metrics

  • Added examples for GPU workflows


v0.2.2 (Data Concatenation & Performance)

New Features

  • Data Concatenation: Bulk merge operations

    • GSData.concatenate([data1, data2, data3]) - 6.15x faster than loops

    • GSData.add(other) - Optimized pairwise (1.9x faster)

    • GSTensor.add(other) - GPU concatenation (18x faster than CPU)

  • Performance Optimization:

    • make_contiguous() - Fix cache locality (2-45x speedup for operations)

    • is_contiguous() - Check array layout

    • Direct masked GPU transfer (no intermediate CPU copies)

  • Mask Management:

    • Multi-layer boolean masks with named layers

    • GPU-optimized mask operations (100-1000x faster)

    • Automatic mask merging during concatenation


v0.2.0 (Breaking Changes & New Features)

Breaking Changes

  • GSData is now a regular dataclass (was NamedTuple)

  • Removed tuple unpacking compatibility - Use direct attribute access only

    • Before: means, scales, quats, opacities, sh0, shN = data[:6]

    • After: Access via data.means, data.scales, etc.

  • GSData constructor requires keyword arguments

    • Before: GSData(means, scales, quats, opacities, sh0, shN, _base)

    • After: GSData(means=means, scales=scales, ...)

New Features

  • Mutable fields: All GSData fields can now be modified after creation

    • data.means[0, 0] = 999.0 # Now works!

    • data.scales *= 2.0 # In-place operations supported

    • data.means = new_array # Complete array replacement

  • New masks attribute: Boolean mask for filtering Gaussians

    • Initialized to all True when reading files

    • data.masks[100:200] = False # Mark Gaussians for filtering

    • Use for filtering: filtered_means = data.means[data.masks]

    • Not persisted to PLY files (runtime-only)

  • len(data) returns number of Gaussians: More intuitive API

    • print(f"Loaded {len(data)} Gaussians") # Natural usage

    • Returns data.means.shape[0]

  • Efficient slicing with data[slice]: Pythonic data access

    • data[0] - Single Gaussian

    • data[100:200] - Range of Gaussians

    • data[::10] - Every 10th Gaussian

    • data[mask] - Boolean mask selection

    • Optimized with _base array slicing (up to 25x faster for masks)

Performance Improvements

  • Peak read performance: 93M Gaussians/sec (up from 78M)

  • Peak write performance: 57M Gaussians/sec (zero-copy)

  • Real-world average: 75.5M Gaussians/sec on 90 test files

  • Automatic write optimization: All writes are automatically optimized

    • Auto-consolidation: 2.6-2.8x faster writes via automatic _base construction

    • Zero-copy path: Additional 2.8x speedup for data from plyread() (total 7-8x vs baseline)

    • Works transparently - no user code changes required!

    • 400K SH0: 18-22ms (auto-optimized) or 7ms (zero-copy from file)

    • 400K SH3: 96ms (auto-optimized) or 35ms (zero-copy from file)

  • Dataclass implementation faster than NamedTuple

Migration Guide

# Old (v0.1.x)
means, scales, quats, opacities, sh0, shN = data[:6]
data = GSData(means, scales, quats, opacities, sh0, shN, _base=None)

# New (v0.2.0+)
means = data.means
scales = data.scales
# ... or use attributes directly

# Write operations - automatically optimized!
data = gsply.plyread("input.ply")
gsply.plywrite("output.ply", data)  # RECOMMENDED - zero-copy (7-8x faster)!

# Creating new data - automatically optimized via consolidation (2.6-2.8x faster)
data = GSData(means=means, scales=scales, ...)
gsply.plywrite("output.ply", data)  # Automatically consolidated internally

# Or unpack - still automatically optimized
gsply.plywrite("output.ply", *data.unpack())  # Auto-consolidated too!

# Creating new GSData
data = GSData(
    means=means,
    scales=scales,
    quats=quats,
    opacities=opacities,
    sh0=sh0,
    shN=shN,
    masks=None,  # Optional
    _base=None
)

v0.1.1 (Performance & Code Quality)

Performance Improvements

  • Peak Performance: 78M Gaussians/sec read, 29.4M Gaussians/sec write

  • Zero-copy reads: Always enabled for maximum performance

  • Fast-path dtype checks: Skip unnecessary float32 conversions

  • LRU header caching: Cache frequently generated PLY headers

  • Single file handle: Reduce file open/close syscalls

  • Lookup tables: Eliminate SH degree branching

  • Direct array operations: Optimize opacity column assignment

Code Quality Improvements

  • Type Safety: Complete type hints with mypy configuration

  • Documentation: Enhanced docstrings with algorithm details and bit-packing format

  • Error Messages: Improved validation errors with actionable context

  • Code Organization: Extracted magic numbers to named constants, eliminated code duplication

  • Development Tools: Configured ruff linter and mypy for CI/CD integration

  • Testing: Added edge case tests (92 tests passing)

Benchmarks (1M Gaussians, SH0)

  • Uncompressed Read: 12.8ms (78M/sec)

  • Compressed Write: 35.5ms (28.2M/sec)

  • Compression: 71% file size reduction

Testing

  • All 92 tests passing

  • Zero regressions

  • Full backward compatibility

v0.1.0 (Initial - Optimized Release)

Features

  • Ultra-fast Gaussian Splatting PLY I/O library

  • Pure Python + numpy + numba (no C++ compilation required)

  • Numba JIT for parallel processing and fast compressed I/O

  • Support for SH degrees 0-3 (14, 23, 38, 59 properties)

  • Auto-format detection (uncompressed vs compressed)

  • Full compressed format support (PlayCanvas compatible) - read AND write

  • Zero-copy reads for maximum performance

  • GSData namedtuple container for clean API

Performance (Real-World Benchmarks)

Tested on 90 files, 36M Gaussians total, SH degree 0:

Uncompressed I/O:

  • Read (zero-copy): 8.09ms for 400K Gaussians (49M Gaussians/sec)

  • Write: 8.72ms vs 12.18ms (plyfile) - 1.4x faster

Compressed I/O:

  • Read: 14.74ms for 400K Gaussians (27M Gaussians/sec)

  • Write: 63ms for 400K Gaussians (6.3M Gaussians/sec)

  • Compression ratio: 3.44x (1.92 GB → 558 MB)

Comparison vs Other Libraries (50K Gaussians, SH3):

  • Read: 2.89ms vs 18.23ms (plyfile) - 6.3x faster

  • Write: 8.72ms vs 12.18ms (plyfile) - 1.4x faster

Optimizations Implemented

Phase 1: Vectorization

  • Vectorized quaternion extraction (eliminated Python loops)

  • Performance: 21% improvement

Phase 2: Algorithmic Optimization

  • O(n log n) chunk bounds computation (replaced O(n*m) boolean masking)

  • Performance: 73% improvement

Phase 3: Radix Sort + Parallel Processing

  • O(n) radix sort for chunk sorting (vs O(n log n) comparison sort)

  • Parallel JIT processing with Numba for bit packing/unpacking

  • Performance: 10.4x write, 15.3x read vs baseline

Combined Impact: Production-ready performance with 60+ FPS capability

API

import gsply

# Read PLY file (auto-detects format, returns GSData dataclass)
data = gsply.plyread("scene.ply")  # Uses fast zero-copy by default
data = gsply.plyread("scene.ply", fast=False)  # Safe copies

# Access via attributes
positions = data.means
colors = data.sh0

# Or unpack if needed
means, scales, quats, opacities, sh0, shN = data.unpack()

# Write uncompressed PLY
gsply.plywrite("output.ply", data.means, data.scales, data.quats,
               data.opacities, data.sh0, data.shN)

# Write compressed PLY (auto-adjusts extension to .compressed.ply)
gsply.plywrite("output.ply", data.means, data.scales, data.quats,
               data.opacities, data.sh0, data.shN, compressed=True)

# Detect format
is_compressed, sh_degree = gsply.detect_format("scene.ply")

Documentation

  • Comprehensive README with performance benchmarks

  • OPTIMIZATION_SUMMARY.md: Detailed optimization history and analysis

  • GUIDE.md: User guide with examples

  • CONTRIBUTING.md: Contribution guidelines

  • Extensive inline code documentation

Testing

  • 65 passing tests

  • Full coverage of read/write operations

  • Compressed format tests (read + write)

  • Format detection tests

  • Round-trip verification

  • Edge case handling

Distribution

  • Universal wheel (py3-none-any)

  • Works on Linux, macOS, Windows

  • No platform-specific compilation required

  • PEP 561 type hints marker included

Known Limitations

  • Requires Python 3.10+

  • ASCII PLY format not supported (binary little-endian only)

  • Compressed format is lossy (chunk-based quantization)

Installation

# Basic installation
pip install gsply

# Development installation
pip install -e .[dev]

# All optional dependencies (dev + benchmark)
pip install -e .[all]

Dependencies

  • Required: numpy>=1.20.0, numba>=0.59.0

  • Dev: pytest, pytest-cov, build, twine

  • Benchmark: open3d, plyfile

Contributors

  • OpsiClear

License

MIT License


Previous Development Phases

Phase 1: Initial Implementation

  • Basic uncompressed PLY read/write

  • Format detection

  • SH degree support (0-3)

Phase 2: Compressed Format Support

  • PlayCanvas compressed PLY reading

  • Vectorized bit unpacking

  • 38.5x speedup over naive Python loops

Phase 3: Compressed Writing

  • Compressed PLY writing implementation

  • Vectorized quaternion extraction

  • O(n log n) chunk bounds computation

  • 78.6% speedup vs initial implementation

Phase 4: Parallel Optimization (Current)

  • O(n) radix sort for chunk sorting

  • Parallel JIT processing with Numba

  • 10.4x write, 15.3x read speedup vs baseline

  • Production-ready performance (60+ FPS capable)

For the most up-to-date information, see the CHANGELOG.md file in the repository root.