Projects
HandClaw
One Slack = Multiple AI Coding Agents
Connect Claude Code, Codex, and OpenCode to Slack. Code anywhere from anything. Work from your phone. Monitor agent progress. Switch agents by renaming channels.
Key Features
- One workspace = Multiple agents
- Code from phone, even from your watch
- Walk away, let agents work
- Supports persistent plan/build mode switching
- Notifies users when coding tasks are completed
HandClaw vs OpenClaw
| Feature | HandClaw | OpenClaw |
|---|---|---|
| Switch plan/build mode | ✓ (`!code switch plan/build`) | ✗ |
| Early stop code CLI | ✓ (`!stop`) | ✗ |
| Project management via channels | ✓ (just rename channel) | ✗ (need install acpx + complex config) |
| Support ACP | ✓ Easy (rename channel) | ✗ Complex |
TeraXLang
Triton Extension for LLM. As fast as FlashAttention
A CUDA kernel-specific DSL built on top of Triton that achieves SOTA GPU kernel performance on both Hopper (H100) and Blackwell (B200) architectures.
Why TeraXLang?
- What optimizations has Triton done?
- Why do many DSLs claim they can easily outperform Triton?
- What if we add a few more APIs that might harm Triton's generality, but bring superior performance in exchange?
Key Features
- Minimal Extensions: Adds only essential methods to Triton (smem, tmem, mbar, TMA operations)
- Warp-level Primitives: Efficient warpgroup synchronization and reduction
- TMA Support: Hardware-accelerated tensor memory operations
- Multi-Architecture: Optimized for both Hopper and Blackwell GPUs
Performance
Matmul (H100 80GB HBM3)
M=8192, N=8192, K=1024
| Kernel | TFLOPS |
|---|---|
| cuBLAS | 710.4 |
| TXL (hopper_txl_ws_persistent) | 697.7 (~2% slower) |
Flash Attention (H100 80GB HBM3)
batch=16, heads=32, seq_len=16384, head_dim=128
| Kernel | TFLOPS |
|---|---|
| FlashAttention3 | 640 |
| TXL (hopper_txl_ws_fa3) | 676.26 (~6% faster) |
MLA Decoding (H100 80GB HBM3)
| Kernel | Time (ms) | TFLOPS |
|---|---|---|
| HuggingFace MLA | 2.03 | 592 |
| TXL MLA | 2.22 | 754 |
NSA Prefill (H100 80GB HBM3)
| Kernel | Time (us) | TFLOPS |
|---|---|---|
| FlashNSA | 235 | 2248.4 |
| TXL NSA | 219 | 266.4 |