November 11, 2024
GPU texture encoder
Creating fast, real-time 3D applications always involves balancing quality and performance, especially when targeting platforms without top-tier GPUs. One of the main bottlenecks in these scenarios is memory throughput, which can significantly impact performance. The amount of texture data used directly affects memory bandwidth, and hardware texture compression helps alleviate this issue by reducing the required memory bandwidth and footprint.
All GPUs support block compression formats; however, there is no universal standard that works seamlessly across all platforms. Currently, different GPUs support three major compression formats:
- BC1-5 (also known as DXT or S3TC) – Supported by all desktop GPUs.
- BC6-7 – Supported by D3D11+ desktop GPUs.
- ASTC – Supported by modern mobile GPUs.
There are also older mobile compression formats that are still in use today, such as ETC, ETC2, EAC, ATC, and PVRTC. Unfortunately, using BC formats on mobile devices and ASTC on desktops is not feasible, necessitating different data packs for various platforms.
Compressing textures to BC1-5 formats was relatively straightforward using either the CPU or GPU, as the encoding algorithm was simple. However, the introduction of BC6-7 increased the complexity due to additional compression modes, making the encoding process significantly slower. ASTC formats further increased complexity due to the vast number of modes and the use of integer coding with trits and quints.
Compressing textures offline and shipping them with the project is common but often suboptimal, especially for dynamic or procedural textures. For instance, GLTF and USD resources typically include embedded JPEG images to reduce asset size, while some algorithms generate procedural textures at runtime. In such cases, fast, real-time compression is necessary.
Tellusim SDK provides real-time compression for BC1-5, BC6-7, and ASTC formats on all platforms. BC1 texture compression remains a viable option for PCs because BC formats do not have variable block sizes and BC1 is twice as compact as BC7, with excellent compression speed. A practical use case for BC1 compression is in real-time applications like Google Maps or XYZ tile compression, where it helps minimize memory overhead and reduce compression stalls.
Our SDK provides GPU encoders via the EncoderBC15, EncoderBC67, and EncoderASTC interfaces. Each encoder has specific flags and can be initialized for required formats only since initial kernel compilation can take some time.
The encoder input is a standard texture, and the output is an integer texture in RGBAu16/RGBAu32 format, with one pixel per block dimension. The application must copy this intermediate integer texture into the final block texture because direct copying to block-compressed formats is typically unsupported. (RGBAu16 is required only for BC1 and BC4 formats).
Integer textures cannot fully represent all required mipmap levels due to size truncation. This issue needs to be managed manually, either by reducing the number of mipmaps being compressed or by increasing the size of the integer texture. The truncation occurs because the final 1×1 mipmap level in the integer texture represents a 4×4 (or 5×4, 5×5) pixel block, leaving no space for the smallest mipmaps (2×2 and 1×1).
Below is an example of GPU ASTC 5×5 texture compression using the Tellusim SDK:
// texture format Format format = FormatASTC44RGBAu8n; // create intermediate image uint32_t width = src_texture.getWidth(); uint32_t height = src_texture.getHeight(); uint32_t block_width = getFormatBlockWidth(format); uint32_t block_height = getFormatBlockHeight(format); Image dest_image = Image(Image::Type2D, FormatRGBAu32, Size(udiv(width, block_width), udiv(height, block_height))); // create intermediate texture Texture dest_texture = device.createTexture(dest_image, Texture::FlagSurface | Texture::FlagSource); if(!dest_texture) return 1; // dispatch encoder { Compute compute = device.createCompute(); encoder.dispatch(compute, EncoderASTC::ModeASTC44RGBAu8n, dest_texture, src_texture); } // flush context context.flush(); // get intermediate image data if(!device.getTexture(dest_texture, dest_image)) return 1; // copy image data Image image = Image(Image::Type2D, format, Size(width, height)); memcpy(image.getData(), dest_image.getData(), min(image.getDataSize(), dest_image.getDataSize())); // save encoded image image.save("texture.astc");
Additionally, the SDK includes a fast GPU JPEG decompression interface, which significantly accelerates JPEG to BC or ASTC conversions.
Of course, achieving real-time compression speeds involves quality trade-offs, which can reduce the resulting texture quality. Below are tables with PSNR and time values for compressing a test 1024×512 RGB image on Apple M1 Max:
PSNR RGB (db) | CPU Fast | CPU Default | CPU Best | GPU |
---|---|---|---|---|
BC1 | 39.83 | 39.85 | 39.69 | |
BC7 | 48.27 | 48.53 | 44.85 | |
ASTC 4×4 | 48.13 | 48.29 | 48.50 | 44.97 |
ASTC 5×4 | 46.18 | 46.34 | 46.50 | 42.74 |
ASTC 5×5 | 44.48 | 44.61 | 44.73 | 40.96 |
PSNR RG (db) | CPU Fast | CPU Default | CPU Best | GPU |
---|---|---|---|---|
BC1 | 49.52 | 49.62 | 40.11 | |
BC7 | 50.14 | 50.38 | 46.28 | |
ASTC 4×4 | 51.07 | 51.18 | 51.37 | 47.43 |
ASTC 5×4 | 48.65 | 48.77 | 48.92 | 44.02 |
ASTC 5×5 | 46.41 | 46.54 | 46.67 | 41.85 |
Time RGB (ms) | CPU Fast | CPU Default | CPU Best | GPU |
---|---|---|---|---|
BC1 | 28 | 43 | 0.4 | |
BC7 | 105 | 186 | 1.0 | |
ASTC 4×4 | 100 | 157 | 386 | 2.2 |
ASTC 5×4 | 117 | 175 | 464 | 4.8 |
ASTC 5×5 | 138 | 212 | 542 | 3.1 |
Reducing the number of input texture components improves PSNR values, which is beneficial for normal maps and luminance-only textures. ASTC encoding performance can be further optimized by limiting the number of compression modes if needed. However, the current performance is satisfactory for applications using JPEG input textures.
The latest version of the reference astcenc compressor demonstrates excellent CPU encoding performance, while we stopped our CPU ASTC encoder optimizations at BC7 performance level:
PSNR RGB (db) | Fast | Medium | Thorough |
---|---|---|---|
ASTC 4×4 | 47.31 | 48.19 | 48.50 |
ASTC 5×4 | 45.58 | 46.34 | 46.63 |
ASTC 5×5 | 43.56 | 44.61 | 44.97 |
Time RGB (ms) | Fast | Medium | Thorough |
---|---|---|---|
ASTC 4×4 | 22 | 28 | 65 |
ASTC 5×4 | 20 | 26 | 64 |
ASTC 5×5 | 21 | 25 | 67 |
All textures and metrics were taken by Tellusim Image Processing Tool from Core SDK using this script:
November 9, 2024
10 Hello Image
Efficient image processing is crucial for any application, given the increasing resolutions and growing data volumes. Even small optimizations in this pipeline can save hours, days, or even weeks of processing time. Additionally, flexible access to image conversion and manipulation utilities across various languages and environments is a valuable feature.
The Image interface is a central component of image handling in Tellusim. It supports loading and saving 2D, 3D, Cube, 2D Array, and Cube Array images, as well as format/type conversion, component/region extraction, scaling, and mipmap generation. All heavy operations are optimized with SIMD and multithreading. The extension system allows for custom format and conversion operation support. The Python API ensures compatibility with popular libraries such as NumPy, PyTorch, and Pillow. The Image interface is simple to use and fully compatible with all supported programming languages.
The following Python snippets showcase basic image operations that can be useful for batch image processing.
Getting image information, including Exif metadata, without loading image content:
# fast operation without content loading
image.info("image.png")
print(image.description)
Performing basic operations with image:
# load Image from file image.load("image.png") print(image.description) # swap red and blue components image.swap(0, 2) # rotate image by 90 degrees CCW image = image.getRotated(-1) # convert image to RGBA format image = image.toFormat(FormatRGBAu8n) # crop image image = image.getRegion(Region(40, 150, 64, 94)) # upscale image using default Cubic filter image = image.getResized(image.size * 4) # create mipmap chain using default mipmap filter image = image.getMipmapped(Image.FilterMip, Image.FlagGamma) # save image image.save("test_basic.dds") print(image.description)
The ImageSampler interface provides access to individual pixels of a specific image layer, mipmap, or face. A high-order Catmull-Rom filter is available for high-quality image resampling when needed. The following snippet demonstrates how to create a simple procedural image:
# create new image
image.create2D(FormatRGBu8n, 512, 256)
# create image sampler from the first image layer
sampler = ImageSampler(image)
# fill image
color = ImageColor(255)
for y in range(image.height):
for x in range(image.width):
v = ((x ^ y) & 255) / 32.0
color.r = int(math.cos(Pi * 1.0 + v) * 127.5 + 127.5)
color.g = int(math.cos(Pi * 0.5 + v) * 127.5 + 127.5)
color.b = int(math.cos(Pi * 0.0 + v) * 127.5 + 127.5)
sampler.set2D(x, y, color)
# save image
image.save("test_xor.png")
print(image.description)
Conversions between panorama and cube formats are straightforward. The following example converts an RGB cube image to a panoramic projection:
# create Cube image
image.createCube(FormatRGBu8n, 128)
print(image.description)
# clear image
for face in range(0, 6, 3):
ImageSampler(image, Slice(Face(face + 0))).clear(ImageColor(255, 0, 0))
ImageSampler(image, Slice(Face(face + 1))).clear(ImageColor(0, 255, 0))
ImageSampler(image, Slice(Face(face + 2))).clear(ImageColor(0, 0, 255))
# convert to 2D panorama
# it will be horizonal cross without Panorama flag
image = image.toType(Image.Type2D, Image.FlagPanorama)
image.save("test_panorama.png")
print(image.description)
Our CPU texture encoders are fast and deliver excellent compression quality. Only a single function call is needed to compress a texture to any BC or ASTC format. An Async interface can be supplied to the function for more precise thread control. By default, the compressors utilize all available CPU cores:
# load and resize Image image.load("image.png") image = image.getResized(image.size * 2) # create mipmaps image = image.getMipmapped() # compress image to BC1 format image_bc1 = image.toFormat(FormatBC1RGBu8n) image_bc1.save("test_bc1.dds") print(image_bc1.description) # compress image to BC7 format image_bc7 = image.toFormat(FormatBC7RGBAu8n) image_bc7.save("test_bc7.dds") print(image_bc7.description) # compress image to ASTC4x4 format image_astc44 = image.toFormat(FormatASTC44RGBAu8n) image_astc44.save("test_astc44.ktx") print(image_astc44.description) # compress image to ASTC8x8 format image_astc88 = image.toFormat(FormatASTC88RGBAu8n) image_astc88.save("test_astc88.ktx") print(image_astc88.description)
Python buffer protocol support simplifies data sharing between Tellusim and other Python frameworks. The following snippet demonstrates modifying image content using NumPy operations:
# load image and convert to float32 format image.load("image.png") image = image.toFormat(FormatRGBf32) # create array with specified dimension and format array = numpy.zeros(shape = ( image.width, image.height, 3 ), dtype = numpy.float32) # copy image data into the array image.getData(array) # set inverted data into the image image.setData(1.0 - array) # save inverted image image.save("test_numpy.dds") print(image.description)
The following file formats can be loaded directly: ASTC, BMP, BW, CUR, DDS, DEM, EXR, HDR, HGT, ICO, IMAGE, JPEG, KTX, LA, PBM, PGM, PNG, PPM, PSD, RGB, RGBA, SGI, TGA, and TIFF. The list of supported saving formats excludes only DEM and HGT files. Any other formats can be added via a C++ plugin and will function as native formats.
While it is easy to create such scripts in Python or other supported languages, this can also be avoided by using the Tellusim Image Processing Tool. The command-line options work as a pipeline of operations on the loaded images, dramatically simplifying batch processing. For example, the following command loads all images in the directory, resizes them, creates gamma mipmaps, encodes them to ASTC55, and saves all images with an “_astc” postfix and a .ktx extension:
ts_image *.jpg -scale 0.5 -mipmaps gamma -format astc55rgbau8n -p _astc -e ktx
For GravityMark, we used the following script to convert NASA Topo maps to the appropriate dimensions and formats:
#!/bin/bash SRC=world.topo.bathy.200412 SIZE=8192 SCALE=0.5 NAME=color mkdir -p $NAME.$SIZE ts_image -v -create rgbu8n 86400 43200 \ $SRC.3x21600x21600.A1.png -insert 0 0 -remove \ $SRC.3x21600x21600.A2.png -insert 0 21600 -remove \ $SRC.3x21600x21600.B1.png -insert 21600 0 -remove \ $SRC.3x21600x21600.B2.png -insert 21600 21600 -remove \ $SRC.3x21600x21600.C1.png -insert 43200 0 -remove \ $SRC.3x21600x21600.C2.png -insert 43200 21600 -remove \ $SRC.3x21600x21600.D1.png -insert 64800 0 -remove \ $SRC.3x21600x21600.D2.png -insert 64800 21600 -remove \ -s $SCALE -cube -mipmaps gamma \ -clone -push -face 0 -o $NAME.$SIZE/$NAME.0.jpg -remove -pop \ -clone -push -face 1 -o $NAME.$SIZE/$NAME.1.jpg -remove -pop \ -clone -push -face 2 -o $NAME.$SIZE/$NAME.2.jpg -remove -pop \ -clone -push -face 3 -o $NAME.$SIZE/$NAME.3.jpg -remove -pop \ -clone -push -face 4 -o $NAME.$SIZE/$NAME.4.jpg -remove -pop \ -clone -push -face 5 -o $NAME.$SIZE/$NAME.5.jpg -remove -pop \ -clone -push -format bc1rgbu8n -o $NAME.$SIZE/$NAME.bc1.ktx -remove -pop \ -clone -push -format etc2rgbu8n -o $NAME.$SIZE/$NAME.etc2.ktx -remove -pop \ -clone -push -format astc66rgbau8n -o $NAME.$SIZE/$NAME.astc.ktx -remove -pop
The Image Processing Tool supports fast GPU compression to BC and ASTC formats. To use this, you simply need to specify the “gpu” flag for the “format” operation:
We will compare the performance and quality of the CPU and GPU encoders in the next post. Stay tuned.
November 8, 2024
SDK 40
New Engine features:
- GLSL user stucture SSBO views (references).
- Real-time GPU ASTC encoder for 4×4, 4×5, and 5×5 blocks.
- Bindless buffer support has been added via the BufferTable interface.
- Bindless resource support has been added to macOS and iOS.
- Full Brep render on macOS and iOS via Mesh shader.
- ts_chword.sh, ts_echo.sh, ts_exec.sh asks to install clang++ if it is not available.
- interface/flow plugin for block-based editors.
- interface/element plugin for the single CanvasElement insertion into Controls system.
- tests/graphics/cube_filter sample showcases real-time Cube texture diffuse convolution.
- tests/graphics/decoder_jpeg sample showcases real-time GPU JPEG texture loading.
- tests/graphics/encoder_astc sample showcases real-time GPU ASTC texture encoding.
- tests/graphics/multi_window sample showcases multi-window rendering.
- tests/platform/reference sample showcases shader references.
- tests/platform/alpha sample showcases anti-aliased alpha tests.
- tests/platform/atomic sample showcases buffer atomic operations.
- tests/platform/bindless sample showcases bindless buffer and texture tables.
- tests/platform/blending sample showcases uniform Blend color parameter.
- tests/platform/buffer sample showcases cross-API access to an array of buffers.
- tests/platform/indirect sample showcases MDI per-instance parameters.
- tests/platform/multisample sample showcases AA window target.
- tests/platform/preprocessor sample showcases macro-based function generalization.
- tests/parallel/spatial_tree sample has optional BVH tree visualization.
- tests/core/radix sample showcases the radix sort algorithm.
- The project generation tool can create new draft project files for Core and Engine SDK and convert Makefile-based projects to Xcode, VS, CMake, and Gradle projects.
- ts_image supports GPU ASTC texture compression with “gpu” command line argument flag.
- ts_shader supports texture and sample argument buffer generation with flags command line argument.
- Set and Map containers can use const char* keys for fast constant table initializations.
- radixSort algorithm has been added to TellusimSort.h
- String::split() methods split the string by provided delimiters.
- String::tprintf() method provides type-based printf system with {0}, {1}, … argument accessors.
- The draw statistics methods have been added to the Canvas interface.
- Comparison operators for FontStyle, StrokeStyle, and GradientStyle structures have been added.
- BufferTable interface for bindless Buffer data has been added.
- Command and Compute functions that receive an Array<Sampler|Texture|Buffer|Tracing|*Table> arguments have been added.
- ShaderCompler flags for generated shader output control (MTLIndirect enables argument buffers for all textures and samplers on Metal).
- BrepModel interface for low-level Brep rendering has been added.
- EncoderASTC interface for GPU ASTC texture compression has been added.
- Dedicated HDR ASTC color formats have been added.
- LDR ASTC formats have u8n and u8ns postfixes.
- Dynamic storage buffer argument has been removed from Kernel, Pipeline, and Traversal interface. The more flexible replacement is the BindFlags enum.
- Improved Kernel, Pipeline, and Traversal interfaces support BufferTable and TextureTable bindings, Shader::Mask, and BindingFlags.
- Swizzled viewport rendering feature has been removed from Pipeline.
- CanvasStrip has dedicated methods for creating quadratic and cubic curves.
- ControlArea supports content scaling and absolute children transformations.
- AtomicCompareExchange operation has been fixed.
- Swizzled accessors to bindless buffers have been added to HLSL shader generator.
- HLSL and MSL shader translators support bindless arrays for buffers and textures.
- Correct structure and array access for atomic and load/store operations in HLSL.
- Fast MeshAttribute and MeshIndices allocation for addAttribute() and addIndices() methods.
- ControlSplit renders lines via the CanvasStrip element.
- WinApp Window supports all mouse buttons and wheel events.
- Emscripten Window supports mouse wheel events.
- Correct WebGPU window initialization due to the latest WebGPU API update.
- Metal private resources are Heap allocated.
- Material shader pragmas can control the pipeline type (Vertex or Mesh shader mode).
- SceneTexture supports non-uniform block compression sizes.
September 25, 2024
09 Hello Controls
September 1, 2024
08 Hello Canvas
September 1, 2024
SDK 39
September 18, 2023
07 Hello Splatting
August 28, 2023
06 Hello Traversal
August 13, 2023
05 Hello Tracing
August 12, 2023
04 Hello Raster
July 31, 2023
03 Hello Mesh
July 7, 2023
Scene Import
June 25, 2023
02 Hello Compute
May 15, 2023
01 Hello USDZ
May 14, 2023
00 Hello Triangle
April 24, 2023
WebGPU Update
April 4, 2023
Tellusim Upscaler Demo
February 10, 2023
DLSS 3.1.1 vs DLSS 2.4.0
January 31, 2023
Dispatch, Dispatch, Dispatch
October 28, 2022
Tellusim upscaler
October 14, 2022
Upscale SDK comparison
September 20, 2022
Improved Blue Noise
June 19, 2022
Intel Arc 370M analysis
January 16, 2022
Mesh Shader Emulation
December 16, 2021
Mesh Shader Performance
October 10, 2021
Blue Noise Generator
October 7, 2021
Ray Tracing versus Animation
September 24, 2021
Ray Tracing Performance Comparison
September 13, 2021
Compute versus Hardware
September 9, 2021
MultiDrawIndirect and Metal
September 4, 2021
Mesh Shader versus MultiDrawIndirect
June 30, 2021
Shader Pipeline