The Power of GPU-Driven Scenes - Tellusim Technologies Inc.

December 23, 2024

The Power of GPU-Driven Scenes

Tellusim Engine is not a typical 3D engine with fixed workflows. It provides full customization of the application flow and behavior. Unlike traditional engines, there is no Engine::init() function to handle all initializations. Instead, Tellusim offers modular components designed for specific tasks, providing flexibility for embedding or integrating the engine into legacy or new applications.

It may sound challenging, but we provide numerous ready-to-use examples and templates for different use cases. Tellusim Executor is an example runtime initialization application that we use for prototyping and simple application creation.

Below is the default SceneManager instantiation in C# (where main_async and process_async are shared thread pools):

// create scene manager
SceneManager scene_manager = new SceneManager();
if(!scene_manager.create(device, SceneManager.Flags.DefaultFlags, null, main_async)) return;

// process thread
Thread process_thread = new Thread(() => {
    while(scene_manager && !scene_manager.isTerminated()) {
        if(!scene_manager.process(process_async)) Time.sleep(1000);
    }
});
process_thread.Start();

Tellusim Core SDK is included in the Tellusim Engine and provides an API layer for graphics and platform abstraction. This layer enables developers to build their own 3D engines in various languages, including C++, C#, Rust, Swift, Java, Python, and JavaScript, ensuring broad compatibility across supported platforms. Its primary purpose is to enhance possibilities for modifications and customizations while maintaining excellent performance and compatibility.

Tellusim Engine SDK includes essential modules and systems, such as scene management, rendering, plugins, and tools. These systems empower developers to fully leverage GPU capabilities for all aspects of an application, ranging from object representation and rendering to application logic:

In this post, we will focus on the Scene system, which integrates multiple subsystems and provides user-friendly access to GPU-driven representations. GPU-driven scene processing means that all CPU resources are available for logic and streaming. The complexity of the scene has minimal impact on performance. Below is a diagram of the Scene system:

SceneManager encapsulates all Scene* subsystems and provides access to internal algorithms such as PrefixScan, RadixSort, BitonicSort, BVH SpatialTree, and CubeFilter. These algorithms can be reused, for example, in GPU physics simulations. Access to internal temporal buffers can also reduce the GPU memory footprint. All modifications and updates from the CPU are batched into a single compute dispatch, ensuring no performance degradation when modifying many objects simultaneously.
SceneBuffer simplifies GPU buffer management. It handles memory management and data uploads within a single heap GPU buffer. Dedicated buffers are provided for geometry (Vertex and Index) and compute operations (Vector and Scalar).
SceneAnimator manages transformations of internal Object nodes (ObjectNode), such as skeletons. It uses an animation blending tree provided by the CPU, which is constructed using basic operations like frame interpolation, blending, multiplication, and inversion:
```
// target frame
ObjectFrame frame;

// fetch animation and place it into the first cache register
ObjectFrame frame_1 = frame.cache(lerp(ObjectFrame(animation_0, t), ObjectFrame(animation_1, t), pow(sin(time * 0.5f), 2.0f)), 0);

// animation amplitude multiplicator we can use the same cache register
ObjectFrame frame_2 = frame.cache(lerp(mul(frame_1, frame_1 * frame_0, 0.4f), frame_1, k), 0);

// multiply frame by location transformations
frame.append(lerp(frame_2, ObjectFrame(location) * frame_2, k * 0.8f));

// set frame to the node
node.setFrame(move(frame));
```
All these operations are compiled into GPU-driven bytecode, which the SceneAnimator executes for all characters in the scene. The performance is remarkable, allowing hundreds of thousands of entities to have unique trees or animation parameters. The result is a ready-to-use skeleton transformation. Additionally, the GPU can drive animations directly from any compute shader.
SceneSpatial aggregates hierarchical node transformations and BVH (Bounding Volume Hierarchy) updates. Since Tellusim is a SceneGraph-based engine, it supports highly complex transformation hierarchies. Child nodes are automatically transformed when their parent nodes are moved, significantly simplifying the creation of complex or articulated structures and mechanisms. This process is entirely delegated to the GPU, ensuring no performance issues even when transforming or updating millions of objects instantly.

Each Node has global, local, and pivot transformation matrices. Transformations can be provided by CPU logic or by any GPU shader. The CPU API operates with double-precision transformation matrices, while GPU precision depends on the engine’s build configuration.
SceneSpatial generates BVH trees based on the resulting node transformations. Multiple BVH trees are created for different types of nodes within various graphs. GPU-based BVH traversal is highly efficient and is used for rendering, ray tracing, and collision detection. Mesh objects can optionally include per-triangle BVH structures for fast compute-shader ray tracing. Additionally, SceneSpatial is responsible for the efficient creation and updating of hardware ray tracing acceleration structures.
SceneTracer is a compute-based ray tracing system for the entire scene. While it is slower than hardware (HW) ray tracing, it reuses all internal BVH trees and avoids the memory overhead of additional acceleration structures, making it ideal for scenarios where HW ray tracing is not required or available. The SceneTracer performance supports millions of ray queries per frame, with both the CPU and GPU able to initiate ray queries with equal efficiency. The intersection results can be returned either to a CPU callback (with a few frames of delay) or directly to a GPU shader for immediate access.
SceneCollider performs GPU-based collision detection between user-defined bounding queries and the scene. It functions similarly to SceneTracer but is specifically designed for bounding primitives.
SceneCloner is a crucial subsystem for procedural object generation. It enables any shader to clone nodes, allowing duplication of any scene node with a single shader function call: clone_graph_node(graph_index, node, index, transformation, …). This compute-based procedural object placement tool offers excellent performance with no noticeable generation delays and can be issued even from the fragment shader.
SceneStream is responsible for asynchronous scene and object I/O operations for importing and exporting. Our internal representations of scenes, graphs, and nodes can be saved in XML, JSON, or binary formats (e.g., scenex, scenej, sceneb, graphx, graphj, graphb, nodex, nodej, nodeb). This process is highly efficient and includes all Tellusim-related options and attributes. Simultaneously, we can easily import various formats out of the box, including GLB, USD, FBX, DAE, and others. Support for any custom format can be added through a Mesh interface format extension plugin. SceneStream automatically converts and creates objects, materials, lights, cameras, and animations. All operations can be performed in background threads for efficient, stall-free streaming.
ScenePhysics serves as a bridge for integrating physics simulation plugins into the Tellusim Engine. The SDK includes plugins for Box2D, Bullet, Jolt, and PhysX.
SceneScript serves as a bridge for script compilation. Our default scripting language is C++, but with a plugin extension, any language can be used for scripts, providing full access to the Tellusim Engine. Included plugins support languages such as C#, Java, Rust, Swift, and Python.
SceneRender/Renderer serves as a bridge to the rendering subsystem. A custom renderer can be integrated with the Tellusim scene system if needed.

SceneManager includes high-level configuration parameters for all subsystems and simplifies CPU access to reported intersections, collisions, and memory fetches. There are some internal subsystems that don’t require API-level access, such as texture management, which includes GPU texture decompression (JPEG) and compression (BC, ASTC), environment texture convolutions, and more. However, the main purpose of all these subsystems is to provide a foundation for scene-related classes:

A Scene represents visual, logical, and physical properties. Each scene is composed of Graphs, Objects, Cameras, Materials, Lights, and Bodies. Each Graph has its own transformation and represents a spatial part of the scene, featuring unique BVH trees. There is no performance difference between plain and hierarchical Node representations. GravityMark scene is a plain hierarchy containing thousands of NodeObjects. Each NodeObject refers to an ObjectMesh with 8 LOD levels, which are switched based on distance:

How to create 200,000 asteroids from the GraphScript:

// create asteroids
uint32_t num_asteroids = 200000;
Array<uint32_t> indices_data(num_asteroids);
for(uint32_t i = 0; i < num_asteroids; i++) {
    Object &object = asteroids[random.geti32(0, asteroids.size() - 1)];
    NodeObject node_object(*this, object);
    indices_data[i] = node_object.getIndex();
}

The current set of Nodes includes:

Node classes provide spatial transformations for the corresponding scene entities, such as objects, cameras, or lights. This means that the scene acts as a library of entities, while nodes serve as spatial transformers for the referenced entities. Node classes also offer additional properties and attributes, such as unique transformations and materials for objects (each object has its own skeleton) or light colors and intensity for lights.

Each Node is a highly memory-efficient entity. However, it is still not possible to represent scenes composed of hundreds of millions of nodes due to high memory consumption. NodeInstance addresses this issue by referencing entire scene graph objects. It can also work at many hierarchical levels, with multiple NodeInstance objects nested within NodeInstance hierarchies. For example, if we create an object made up of multiple nodes representing various objects and lights, we can make multiple references to that entire object using a single NodeInstance. This approach dramatically reduces memory consumption and allows for the referencing of external graph objects for collaborative work.
NodeJoint attaches its transformation to a node (with an optional ObjectNode) from another hierarchy. It is not always possible to represent complex entities using a single transformation hierarchy, such as when attaching particles or props to different objects. The SceneManager automatically handles all joint attachments.
NodeScript, GraphScript, and MaterialScript classes are C++ (by default) script entities that can perform self-updates and communicate with other interfaces. The source code of all scene scripts can be exported as a single C++ file, compiled, and linked into the binary. This process creates no additional dependencies and provides the best possible performance.

GraphVarying and NodeVarying are GLSL-based scripts executed on the GPU. They allow access to and modification of any scene objects from the shader, as well as the ability to clone objects and perform compute shader ray tracing, collision detection, and animations:

This compute shader snippet simulates particles and updates the Node’s global transformation matrix:

compute {

    uint global_id = gl_GlobalInvocationID.x;

    uint num_spheres = NUM_SPHERES;

    // integrate spheres
    [[branch]] if(global_id < num_spheres) {

        // simulate sphere
        float ifps = min(scene_ifps, 1.0f / 60.0f);
        spheres_buffer[global_id].velocity += spheres_buffer[global_id].impulse;
        spheres_buffer[global_id].velocity += vec3(0.0f, 0.0f, -global_gravity) * ifps;
        spheres_buffer[global_id].velocity -= spheres_buffer[global_id].velocity * ifps * sphere_damping;
        spheres_buffer[global_id].position += spheres_buffer[global_id].velocity * ifps;

        // transform sphere
        mat3x4 transform = mat4x3_translate_scale(spheres_buffer[global_id].position, vec3(sphere_radius));
        set_node_global_transform(base_graph_node, spheres_buffer[global_id].index, transform);
    }
}

The geometric scene representation is provided by the Object class, which can be either an ObjectMesh or an ObjectBrep. Each object has its own hierarchy of ObjectNode (for skeletons or simple animations) and ObjectGeometry (for hierarchical LODs, classic LODs, and object variations). The number of ObjectNode and ObjectGeometry instances is not limited. The engine can easily handle cases with thousands of ObjectNode instances (skeleton joints) per object.

ObjectMesh can be fully customized in terms of rendering primitive type, meshlet configurations, hierarchical LOD structures, and vertex attributes. We use ObjectMesh for static and dynamic mesh geometry, terrains, particle systems, and SDF objects.
ObjectBrep represents all high-level CAD primitives, including planes, spheres, cones, torii, NURBS, and more. The standard method for rendering CAD geometry involves primitive triangulation and LOD generation which results in a large number of triangles and high memory consumption. In contrast, Tellusim Engine operates with high-order primitives, maintaining a low memory budget.

All Scene Materials are hierarchical: child materials can override parent material parameters or automatically inherit parent material parameters if not overridden. Material hierarchies simplify scene creation and modification:

Tellusim SDK includes a comprehensive set of scene examples for various scenarios.

Get ready for our next post, where we will explore advanced rendering techniques that will elevate your projects even further.

Start your journey with the Tellusim Engine today: SDK Evaluation