News
Abstract: The Kernel k-Means algorithm for clustering extends the classic k-Means clustering algorithm. It uses the kernel trick to implicitly calculate distances on a higher dimensional space, thus ...
Abstract: Kernel Principal Component Analysis (KPCA) is a nonlinear feature extraction approach, which generally needs to eigen-decompose the kernel matrix. But the size of kernel matrix scales with ...
This repository showcases four optimized matrix multiplication kernels, each tailored to specific data characteristics (dense vs. sparse) and hardware platforms (CPU vs. GPU). The implementations ...
This script tests the model’s accuracy and loss on the MNIST test set, validating whether the model works as expected after training with the custom kernel. While the custom CUDA kernel is used ...
FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference
Large Language Models (LLMs) face deployment challenges due to latency issues caused by memory bandwidth constraints. Researchers use weight-only quantization to address this, compressing LLM ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results