Skip to content

Core API

TeraXLang extends Triton's core language with additional primitives for shared memory operations, tensor memory accelerator (TMA), and warp-level synchronization.

Thread & Block Identity

# Thread and block dimensions
tid()       # Thread ID within block
tdim()      # Block size
bid()       # Block ID
bdim()      # Block dimensions

# Warp-level
thread0()       # First thread in block
wg_thread0()    # First thread in warpgroup
warp_id()      # Warp ID within block
warpgroup_id() # Warpgroup ID
lane_id()      # Lane ID within warp

Shared Memory Operations

# Shared memory allocation
smem_alloc()      # Allocate shared memory
smem_load()        # Load from shared memory
smem_store()       # Store to shared memory
smem_index()       # Index into shared memory
smem_slice()       # Slice shared memory
smem_trans()       # Transpose shared memory
smem_reshape()     # Reshape shared memory

# Fragment operations
frag_smem_load()   # Load with fragment layout
frag_smem_store()  # Store with fragment layout

Tensor Memory Accelerator (TMA)

# TMA operations
tma_load()         # TMA load from global to shared
tma_store()        # TMA store from shared to global
tma_gather()       # TMA gather operation
tma_load_wait()    # Wait for TMA load
tma_store_wait()   # Wait for TMA store

Mailbox (Mbar) Synchronization

# Mailbox operations
mbar_alloc()      # Allocate mailbox
mbar_expect()     # Expect mailbox value
mbar_wait()       # Wait for mailbox
mbar_arrive()    # Arrive at mailbox

Warp-Level Primitives

# Warp-level reductions
warp_max()        # Warp-level max
warp_sum()        # Warp-level sum

Memory & Synchronization

# Register allocation
reg_alloc()       # Allocate registers
reg_dealloc()     # Deallocate registers

# Synchronization
fence_proxy_async()  # Async fence
bar_arrive()      # Barrier arrive
bar_wait()        # Barrier wait

# Layout
relayout()        # Relayout tensor
print_layout()    # Print tensor layout

Additional Math Operations

# Dot product
dotx()           # Extended dot product
dot_wait()       # Wait for dot operation

# Async operations
async_load()     # Async load
async_load_wait() # Wait for async load

For the full Triton core API, see Triton Language Reference.