GPU Memory Problem - Search News

Nvidia shrinks LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

20h

Nvidia BlueField-4 STX adds a context memory layer to storage to close the agentic AI throughput gap

Nvidia's BlueField-4 STX reference architecture inserts a dedicated context memory layer between GPUs and traditional storage, claiming 5x token throughput and 4x energy efficiency for agentic AI ...

New memory architecture targets AI inference bottlenecks

Lightbits Labs Ltd. today is introducing a new architecture aimed at addressing one of the most stubborn bottlenecks in large-scale artificial intelligence inference: the growing mismatch between the ...

Hosted on MSN

Intel's Arc B770 gaming graphics rumoured to be cancelled thanks to the memory crisis

Website XDA claims that the Intel Arc B770 graphics card is dead, a casualty of "financial viability." In other words, with memory chips now so expensive, launching a new 16 GB GPU at anything ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results