January 20, 2025
SDK 41
New Engine features:
- C++ reflection.
- JavaScript API.
- Node-based materials editing.
- Material depth intersection blending.
- Randomized texture coordinate sampling.
- All Core SDK language examples render torii objects.
- C++ API reflection plugin has been added.
- JavaScript API plugin for JavaScript-only applications has been added.
- MaterialFlow plugin provides full node-based material editing functionality.
- webp Image format plugin has been added.
- Bound types have been added to all languages.
- Flow plugin includes base GLSL generation library.
- Flow plugin supports different variants and versions.
- Input and output attachment, and create callbacks have been added to the Flow plugin.
- Primitive plugin skips zero-sized cylinder caps.
- All Explorer Material uniform sliders have text editing mode on the right mouse click.
- Explorer Material editor supports vec2 uniforms.
- Old scenes are backed up before the Explorer scene save action.
- Explorer includes a MaterialFlow plugin for node-based material editing.
- 10_hello_image manual sample.
- 11_hello_bindless manual sample.
- platform/storage test has been added.
- materials/material_foliage.scenex animation material example.
- nodes/node_geometry.scenex geometry selection example.
- render/render_refraction.scenex render refraction example.
- material_flow_animation|anisotropy|bumpmap|cell|forward|fractal|parallax|perlin|skinned|texcoord|transparent|vertex sample scenes with MaterialFlow-based materials have been added to Engine SDK.
- Each API class has reflection methods getClassName() and getClassNamePtr().
- makeQuickCompare() function has been added to enable simple inline comparison lambdas.
- Parser::expectToken|Name() functions have been added.
- Spatial::intersection() correctly handles single Node cases.
- Control::addChild() method replaces/swaps the child control and returns the old one if it was replaced (new child) or null if it was swapped.
- ControlCombo can hide the current text with setTextEnabled() and act as simple menu.
- ControlSlider has Double and Right mouse button callback handlers.
- ControlSlider has unconstrained editing mode with the Option key pressed.
- ControlEdit can handle user keyboard events (keyboard events from other controls) with updateKeyboard() method.
- HSV color conversion has been added to the Color struct.
- An issue with a wrong number of parameters returned from String::scanf() function on empty numbers has been fixed.
- Variables with “step” or any other reserved by external function names are automatically renamed into “step_%x” by the compiler to avoid name collision on some API.
- The shader preprocessor recognizes “quiet” pragma and hides macro debug information during errors.
- The shader compiler recognizes “dynamic” pragma and avoids shader cache creation for such shaders.
- ControlRoot::setOverlayOrder() automatically fixes the issue with overlays rendering inside ControlArea.
- ControlDialog limits the number of iterations to 8 to avoid hangs.
- A vertex shader pass has been added to the MaterialShading class. This pass is performed after vertex skinning and morphing and is useful for dynamic geometry that calculates motion vectors automatically.
- All Object shaders use #assign instead of #define macro to avoid “macro has been redefined” warnings.
- attribute_position and attribute_normal shader inputs provide pre-skinned|morphed values good for procedural texture coordinate generation.
- GLTF KHR_materials_anisotropy extension support.
- HLSL access to half SSBO types.
- Global GLSL constant arrays support has been added.
- GLSL array initialization is compatible with vector types.
- MeshMaterial::Anisotropy|AnisotropyAngle parameters have been added.
- The number of SSBO buffers in Pipelines, Kernels, and Traversals has been increased to 32.
- Correct evaluation of nested shader macros without values.
- The number of barriers per barrier() call has been increased to 128.
- A random bug with incorrect parsing of some JPEG files has been fixed.
- isFinite|Normal|Inf|Nan functions are manually implemented.
- Access to SpatialTree internal buffers has been added.
- WebGPU can be initialized from a preinitialized device.
- Frame Viewport struct includes scene_time|ifps|seed|layer members.
- Correct depth bias evaluation for screen space occlusion. The banding artifact has been removed.
- HDR reflection overflow has been fixed on iOS.
- Temporal antialiasing has a better reaction to luminance changes.
- An incorrect first frame motion vector artifact has been removed.
- NodeObject has a new geometry index parameter for strict geometry selection.
- LightGlobal has a new split offset parameter for a better split distribution.
- MaterialMetallic|Specular has a PixelDepth option for random depth value for smooth depth intersection.
- MaterialMetallic|Specular has a Randomized texture coordinates option for tile-less texture sampling.
- New Apple devices have been added to system Info. A device version is also included in the system info string.
December 23, 2024
The Power of GPU-Driven Scenes
Tellusim Engine is not a typical 3D engine with fixed workflows. It provides full customization of the application flow and behavior. Unlike traditional engines, there is no Engine::init() function to handle all initializations. Instead, Tellusim offers modular components designed for specific tasks, providing flexibility for embedding or integrating the engine into legacy or new applications.
It may sound challenging, but we provide numerous ready-to-use examples and templates for different use cases. Tellusim Executor is an example runtime initialization application that we use for prototyping and simple application creation.
Below is the default SceneManager instantiation in C# (where main_async and process_async are shared thread pools):
// create scene manager SceneManager scene_manager = new SceneManager(); if(!scene_manager.create(device, SceneManager.Flags.DefaultFlags, null, main_async)) return; // process thread Thread process_thread = new Thread(() => { while(scene_manager && !scene_manager.isTerminated()) { if(!scene_manager.process(process_async)) Time.sleep(1000); } }); process_thread.Start();
Tellusim Core SDK is included in the Tellusim Engine and provides an API layer for graphics and platform abstraction. This layer enables developers to build their own 3D engines in various languages, including C++, C#, Rust, Swift, Java, Python, and JavaScript, ensuring broad compatibility across supported platforms. Its primary purpose is to enhance possibilities for modifications and customizations while maintaining excellent performance and compatibility.
Tellusim Engine SDK includes essential modules and systems, such as scene management, rendering, plugins, and tools. These systems empower developers to fully leverage GPU capabilities for all aspects of an application, ranging from object representation and rendering to application logic:
In this post, we will focus on the Scene system, which integrates multiple subsystems and provides user-friendly access to GPU-driven representations. GPU-driven scene processing means that all CPU resources are available for logic and streaming. The complexity of the scene has minimal impact on performance. Below is a diagram of the Scene system:
-
SceneManager encapsulates all Scene* subsystems and provides access to internal algorithms such as PrefixScan, RadixSort, BitonicSort, BVH SpatialTree, and CubeFilter. These algorithms can be reused, for example, in GPU physics simulations. Access to internal temporal buffers can also reduce the GPU memory footprint. All modifications and updates from the CPU are batched into a single compute dispatch, ensuring no performance degradation when modifying many objects simultaneously.
-
SceneBuffer simplifies GPU buffer management. It handles memory management and data uploads within a single heap GPU buffer. Dedicated buffers are provided for geometry (Vertex and Index) and compute operations (Vector and Scalar).
-
SceneAnimator manages transformations of internal Object nodes (ObjectNode), such as skeletons. It uses an animation blending tree provided by the CPU, which is constructed using basic operations like frame interpolation, blending, multiplication, and inversion:
// target frame ObjectFrame frame; // fetch animation and place it into the first cache register ObjectFrame frame_1 = frame.cache(lerp(ObjectFrame(animation_0, t), ObjectFrame(animation_1, t), pow(sin(time * 0.5f), 2.0f)), 0); // animation amplitude multiplicator we can use the same cache register ObjectFrame frame_2 = frame.cache(lerp(mul(frame_1, frame_1 * frame_0, 0.4f), frame_1, k), 0); // multiply frame by location transformations frame.append(lerp(frame_2, ObjectFrame(location) * frame_2, k * 0.8f)); // set frame to the node node.setFrame(move(frame));
All these operations are compiled into GPU-driven bytecode, which the SceneAnimator executes for all characters in the scene. The performance is remarkable, allowing hundreds of thousands of entities to have unique trees or animation parameters. The result is a ready-to-use skeleton transformation. Additionally, the GPU can drive animations directly from any compute shader.
-
SceneSpatial aggregates hierarchical node transformations and BVH (Bounding Volume Hierarchy) updates. Since Tellusim is a SceneGraph-based engine, it supports highly complex transformation hierarchies. Child nodes are automatically transformed when their parent nodes are moved, significantly simplifying the creation of complex or articulated structures and mechanisms. This process is entirely delegated to the GPU, ensuring no performance issues even when transforming or updating millions of objects instantly.
Each Node has global, local, and pivot transformation matrices. Transformations can be provided by CPU logic or by any GPU shader. The CPU API operates with double-precision transformation matrices, while GPU precision depends on the engine’s build configuration.
-
SceneSpatial generates BVH trees based on the resulting node transformations. Multiple BVH trees are created for different types of nodes within various graphs. GPU-based BVH traversal is highly efficient and is used for rendering, ray tracing, and collision detection. Mesh objects can optionally include per-triangle BVH structures for fast compute-shader ray tracing. Additionally, SceneSpatial is responsible for the efficient creation and updating of hardware ray tracing acceleration structures.
-
SceneTracer is a compute-based ray tracing system for the entire scene. While it is slower than hardware (HW) ray tracing, it reuses all internal BVH trees and avoids the memory overhead of additional acceleration structures, making it ideal for scenarios where HW ray tracing is not required or available. The SceneTracer performance supports millions of ray queries per frame, with both the CPU and GPU able to initiate ray queries with equal efficiency. The intersection results can be returned either to a CPU callback (with a few frames of delay) or directly to a GPU shader for immediate access.
-
SceneCollider performs GPU-based collision detection between user-defined bounding queries and the scene. It functions similarly to SceneTracer but is specifically designed for bounding primitives.
-
SceneCloner is a crucial subsystem for procedural object generation. It enables any shader to clone nodes, allowing duplication of any scene node with a single shader function call: clone_graph_node(graph_index, node, index, transformation, …). This compute-based procedural object placement tool offers excellent performance with no noticeable generation delays and can be issued even from the fragment shader.
-
SceneStream is responsible for asynchronous scene and object I/O operations for importing and exporting. Our internal representations of scenes, graphs, and nodes can be saved in XML, JSON, or binary formats (e.g., scenex, scenej, sceneb, graphx, graphj, graphb, nodex, nodej, nodeb). This process is highly efficient and includes all Tellusim-related options and attributes. Simultaneously, we can easily import various formats out of the box, including GLB, USD, FBX, DAE, and others. Support for any custom format can be added through a Mesh interface format extension plugin. SceneStream automatically converts and creates objects, materials, lights, cameras, and animations. All operations can be performed in background threads for efficient, stall-free streaming.
-
ScenePhysics serves as a bridge for integrating physics simulation plugins into the Tellusim Engine. The SDK includes plugins for Box2D, Bullet, Jolt, and PhysX.
-
SceneScript serves as a bridge for script compilation. Our default scripting language is C++, but with a plugin extension, any language can be used for scripts, providing full access to the Tellusim Engine. Included plugins support languages such as C#, Java, Rust, Swift, and Python.
-
SceneRender/Renderer serves as a bridge to the rendering subsystem. A custom renderer can be integrated with the Tellusim scene system if needed.
SceneManager includes high-level configuration parameters for all subsystems and simplifies CPU access to reported intersections, collisions, and memory fetches. There are some internal subsystems that don’t require API-level access, such as texture management, which includes GPU texture decompression (JPEG) and compression (BC, ASTC), environment texture convolutions, and more. However, the main purpose of all these subsystems is to provide a foundation for scene-related classes:
A Scene represents visual, logical, and physical properties. Each scene is composed of Graphs, Objects, Cameras, Materials, Lights, and Bodies. Each Graph has its own transformation and represents a spatial part of the scene, featuring unique BVH trees. There is no performance difference between plain and hierarchical Node representations. GravityMark scene is a plain hierarchy containing thousands of NodeObjects. Each NodeObject refers to an ObjectMesh with 8 LOD levels, which are switched based on distance:
How to create 200,000 asteroids from the GraphScript:
// create asteroids uint32_t num_asteroids = 200000; Array<uint32_t> indices_data(num_asteroids); for(uint32_t i = 0; i < num_asteroids; i++) { Object &object = asteroids[random.geti32(0, asteroids.size() - 1)]; NodeObject node_object(*this, object); indices_data[i] = node_object.getIndex(); }
The current set of Nodes includes:
Node classes provide spatial transformations for the corresponding scene entities, such as objects, cameras, or lights. This means that the scene acts as a library of entities, while nodes serve as spatial transformers for the referenced entities. Node classes also offer additional properties and attributes, such as unique transformations and materials for objects (each object has its own skeleton) or light colors and intensity for lights.
-
Each Node is a highly memory-efficient entity. However, it is still not possible to represent scenes composed of hundreds of millions of nodes due to high memory consumption. NodeInstance addresses this issue by referencing entire scene graph objects. It can also work at many hierarchical levels, with multiple NodeInstance objects nested within NodeInstance hierarchies. For example, if we create an object made up of multiple nodes representing various objects and lights, we can make multiple references to that entire object using a single NodeInstance. This approach dramatically reduces memory consumption and allows for the referencing of external graph objects for collaborative work.
-
NodeJoint attaches its transformation to a node (with an optional ObjectNode) from another hierarchy. It is not always possible to represent complex entities using a single transformation hierarchy, such as when attaching particles or props to different objects. The SceneManager automatically handles all joint attachments.
-
NodeScript, GraphScript, and MaterialScript classes are C++ (by default) script entities that can perform self-updates and communicate with other interfaces. The source code of all scene scripts can be exported as a single C++ file, compiled, and linked into the binary. This process creates no additional dependencies and provides the best possible performance.
-
GraphVarying and NodeVarying are GLSL-based scripts executed on the GPU. They allow access to and modification of any scene objects from the shader, as well as the ability to clone objects and perform compute shader ray tracing, collision detection, and animations:
This compute shader snippet simulates particles and updates the Node’s global transformation matrix:
compute { uint global_id = gl_GlobalInvocationID.x; uint num_spheres = NUM_SPHERES; // integrate spheres [[branch]] if(global_id < num_spheres) { // simulate sphere float ifps = min(scene_ifps, 1.0f / 60.0f); spheres_buffer[global_id].velocity += spheres_buffer[global_id].impulse; spheres_buffer[global_id].velocity += vec3(0.0f, 0.0f, -global_gravity) * ifps; spheres_buffer[global_id].velocity -= spheres_buffer[global_id].velocity * ifps * sphere_damping; spheres_buffer[global_id].position += spheres_buffer[global_id].velocity * ifps; // transform sphere mat3x4 transform = mat4x3_translate_scale(spheres_buffer[global_id].position, vec3(sphere_radius)); set_node_global_transform(base_graph_node, spheres_buffer[global_id].index, transform); } }
The geometric scene representation is provided by the Object class, which can be either an ObjectMesh or an ObjectBrep. Each object has its own hierarchy of ObjectNode (for skeletons or simple animations) and ObjectGeometry (for hierarchical LODs, classic LODs, and object variations). The number of ObjectNode and ObjectGeometry instances is not limited. The engine can easily handle cases with thousands of ObjectNode instances (skeleton joints) per object.
-
ObjectMesh can be fully customized in terms of rendering primitive type, meshlet configurations, hierarchical LOD structures, and vertex attributes. We use ObjectMesh for static and dynamic mesh geometry, terrains, particle systems, and SDF objects.
-
ObjectBrep represents all high-level CAD primitives, including planes, spheres, cones, torii, NURBS, and more. The standard method for rendering CAD geometry involves primitive triangulation and LOD generation which results in a large number of triangles and high memory consumption. In contrast, Tellusim Engine operates with high-order primitives, maintaining a low memory budget.
All Scene Materials are hierarchical: child materials can override parent material parameters or automatically inherit parent material parameters if not overridden. Material hierarchies simplify scene creation and modification:
Tellusim SDK includes a comprehensive set of scene examples for various scenarios.
Get ready for our next post, where we will explore advanced rendering techniques that will elevate your projects even further.
Start your journey with the Tellusim Engine today: SDK Evaluation
December 8, 2024
11 Hello Bindless
All modern graphics APIs support bindless resources. The term “bindless” refers to the ability to skip explicit resource binding, enabling the binding of all resources with a single API call and accessing required resources by index. This binding model is crucial for ray tracing, where rendering demands access to any resource in the scene. It also provides significant benefits for rasterization, particularly in GPU-driven workflows.
However, the similarities between APIs end there, as each takes a distinct approach to achieving this functionality.
Fortunately, Tellusim Core SDK+ provides a simple abstraction for managing bindless buffers and textures across Vulkan, Direct3D12, and Metal APIs. Using the Core SDK, developers can seamlessly implement cross-platform GLTF scene ray tracing. While this example can serve as a starting point for advanced shading experiments, such as ReSTIR, the primary purpose of this tutorial is to demonstrate bindless resources. For simplicity, we use the Phong shading model with two rays per pixel.
In Tellusim Engine, bindless resources are standard Texture and Buffer objects stored within TextureTable or BufferTable containers. These table containers can manage thousands of simultaneously bound resources and provide straightforward access within shaders. For example, the following snippet iterates over a loaded GLTF scene to collect textures and geometry ranges:
Array<Texture> textures; Array<Geometry> geometries; Tracing model_tracing = device.createTracing(); for(uint32_t i = 0; i < model.getNumGeometries(); i++) { MeshGeometry mesh_geometry = mesh.getGeometry(i); for(uint32_t j = 0; j < model.getNumMaterials(i); j++) { MeshMaterial mesh_material = mesh_geometry.getMaterial(j); // geometry parameters Geometry &geometry = geometries.append(); geometry.base_index = model.getMaterialBaseIndex(i, j); // load normal texture Texture normal_texture = create_texture(device, mesh_material, MeshMaterial::TypeNormal); if(normal_texture) { geometry.normal_index = textures.size(); textures.append(normal_texture); } ... // tracing geometry model_tracing.addVertexBuffer(model.getNumGeometryVertices(i), model_pipeline.getAttributeFormat(0), model.getVertexBufferStride(0), vertex_buffer); model_tracing.addIndexBuffer(model.getNumMaterialIndices(i, j), FormatRu32, index_buffer, sizeof(uint32_t) * model.getMaterialBaseIndex(i, j)); } } if(!model_tracing.create(Tracing::TypeTriangle, Tracing::FlagCompact | Tracing::FlagFastTrace)) return 1;
Next, we create a TextureTable from all textures using a single line:
// create texture table TextureTable texture_table = device.createTextureTable(textures);
A similar process applies to Buffer resources: create a BufferTable from the generated buffers. The tests/platform/bindless SDK example demonstrates this process for both buffers and textures in a rasterization-based scenario.
Once the table is ready, binding all resources becomes straightforward. During rendering, we can sample the required texture using an integer index:
// load diffuse texture [[branch]] if(geometry.diffuse_index != ~0u) { diffuse_color = textureLod(nonuniformEXT(sampler2D(in_textures[geometry.diffuse_index], in_sampler)), texcoord, 0.0f); #if ALPHA_TEST [[branch]] if(diffuse_color.w < 0.5f) { diffuse_color = vec4(1.0f); continue; } #endif }
Note: A nonuniform decorator is necessary not only for textures but also for buffers. Without it, rendering on AMD GPUs may produce artifacts.
This tutorial utilizes Ray Queries and also demonstrates how to trace materials with binary alpha visibility. To achieve this, you need to iterate the rayQueryProceedEXT() shader function multiple times until the ray intersection is correct. Unfortunately, hardware ray tracing efficiency drops significantly on Apple and AMD GPUs under these conditions. As a result, hybrid ray tracing combined with rasterization for primary rays can be more efficient than full ray tracing. By default, this example does not perform alpha-test intersection, but you can enable it by uncommenting the ALPHA_TEST macro.
Stay tuned! Upcoming tutorials will explore GPU-driven rendering with Tellusim Engine.
November 11, 2024
GPU texture encoder
November 9, 2024
10 Hello Image
November 8, 2024
SDK 40
September 25, 2024
09 Hello Controls
September 1, 2024
08 Hello Canvas
September 1, 2024
SDK 39
September 18, 2023
07 Hello Splatting
August 28, 2023
06 Hello Traversal
August 13, 2023
05 Hello Tracing
August 12, 2023
04 Hello Raster
July 31, 2023
03 Hello Mesh
July 7, 2023
Scene Import
June 25, 2023
02 Hello Compute
May 15, 2023
01 Hello USDZ
May 14, 2023
00 Hello Triangle
April 24, 2023
WebGPU Update
April 4, 2023
Tellusim Upscaler Demo
February 10, 2023
DLSS 3.1.1 vs DLSS 2.4.0
January 31, 2023
Dispatch, Dispatch, Dispatch
October 28, 2022
Tellusim upscaler
October 14, 2022
Upscale SDK comparison
September 20, 2022
Improved Blue Noise
June 19, 2022
Intel Arc 370M analysis
January 16, 2022
Mesh Shader Emulation
December 16, 2021
Mesh Shader Performance
October 10, 2021
Blue Noise Generator
October 7, 2021
Ray Tracing versus Animation
September 24, 2021
Ray Tracing Performance Comparison
September 13, 2021
Compute versus Hardware
September 9, 2021
MultiDrawIndirect and Metal
September 4, 2021
Mesh Shader versus MultiDrawIndirect
June 30, 2021
Shader Pipeline