QVAC Fabric is an open-source framework launched by Tether that enables phones to directly fine-tune large language models, achieving true on-device AI training.

Can phones really train LLMs?

Yes. QVAC Fabric solves mobile memory limitations through Dynamic Tiling technology, completing fine-tuning in about 13 hours on a Qualcomm Adreno 830 GPU.

What technologies does QVAC Fabric use?

It integrates LoRA efficient fine-tuning, BitNet 1-bit parameter quantization, and the cross-platform Vulkan GPU computing interface.

What are the advantages of BitNet?

BitNet compresses model parameters from 16-bit/32-bit to just 3 values (-1, 0, +1), significantly reducing memory requirements and enabling phones to run large models.

How does QVAC Fabric help developers?

Developers can fine-tune models for free on personal devices, protect data privacy, and dramatically reduce AI development costs and barriers.

Tether Does AI - QVAC Fabric Lets You Train LLMs on Your Phone

So Tether, the Stablecoin Company, Suddenly Did AI?

If you’re like me and follow the intersection of AI and crypto, this news probably made you stop and think for a moment.

Tether — yeah, the USDT Tether — launched something called QVAC Fabric LLM. To put it simply: you can now fine-tune large language models on your phone.

Not just running inference. Actually training. On a phone.

My first reaction was “What?” But after going through the technical details, I realized this is way more interesting than it seems at first glance.

So What Exactly Is QVAC Fabric?

Let me start with what problem it solves.

To fine-tune an LLM today, you typically need: an A100 or H100 GPU, a cloud account (AWS/Azure/GCP), and then you upload your data to someone else’s servers.

For big companies, this isn’t a problem. But for individual developers or SMBs? High costs, big privacy risks, and you’re completely dependent on cloud service providers.

QVAC Fabric’s approach is to move the entire fine-tuning pipeline to local devices. Here’s what it does:

1. Cramming LoRA into llama.cpp

LoRA (Low-Rank Adaptation) is currently the most popular efficient fine-tuning method — it doesn’t touch the original model’s weights, just adds a small set of trainable parameters. QVAC Fabric is the first framework to integrate full LoRA fine-tuning directly into the llama.cpp runtime environment.

This means you don’t need PyTorch, don’t need CUDA — you can run fine-tuning with just llama.cpp.

2. Using Vulkan Instead of CUDA

This is one of the smartest design decisions.

CUDA only runs on NVIDIA GPUs. But Vulkan is a cross-platform GPU computing interface, supported by NVIDIA, AMD, Intel, Apple Silicon, and Qualcomm Adreno.

One codebase, all hardware can run it. Phones, laptops, desktops, servers — same pipeline.

3. Dynamic Tiling Solves Mobile Memory Bottlenecks

Phone GPUs typically only have a few GB of memory — not nearly enough for full matrix operations. QVAC Fabric’s solution is Dynamic Tiling — breaking large matrix operations into small chunks, processing them sequentially, then assembling the results.

You sacrifice some speed, but the tradeoff is: phones can actually run this.

How Fast Is It in Real Life?

This is the data I was most curious about:

Device	Fine-tuning Time
NVIDIA RTX 4090 (desktop GPU)	~45 minutes
Qualcomm Adreno 830 (phone GPU)	~13 hours

Thirteen hours sounds like a long time? But this is the first time in human history that LLM fine-tuning has been completed on a phone-grade GPU. And you just let it run overnight while you sleep.

As for quality, their benchmark results are on par with PyTorch on industry-standard tests, and some metrics are even slightly better.

BitNet 1-bit: Making AI Models Ridiculously Thin

Besides LoRA fine-tuning, QVAC also integrates support for Microsoft’s BitNet architecture.

Traditional LLMs store each parameter using 16-bit or 32-bit floating-point numbers. BitNet compresses parameters down to just three values: -1, 0, +1.

What’s the effect? Models that originally took up several GB or even tens of GB can have their memory usage dramatically reduced to what regular phones can handle.

QVAC’s BitNet LoRA framework claims to be the first cross-platform implementation globally — supporting mainstream model architectures like Llama3, Qwen3, and Gemma3.

Why Tether?

You might ask: why would a stablecoin company want to build an AI framework?

Actually, Tether has been positioning themselves for this over the past year. They have a QVAC ecosystem that includes:

QVAC Workbench — local AI workstation app
QVAC Health — health data AI
Genesis II — 148 billion token training dataset
And now QVAC Fabric — fine-tuning framework

Paolo Ardoino (Tether CEO) put it plainly:

“AI shouldn’t only be controlled by large cloud platforms. QVAC Fabric lets individuals and enterprises run inference and fine-tune powerful models on their own terms.”

This is actually the same core spirit as the crypto space: decentralization, self-sovereignty, no middlemen.

It’s just that this time, it’s not about finance — it’s about AI.

Why Does This Matter? My Take

Our team deals with various AI models every day — Claude, Gemini, MiniMax — just coordinating分工 between different models is a whole skill set. So when I saw this news, my thinking wasn’t just “cool,” but rather how this will change the entire ecosystem.

Three directions I think are worth noting:

First, privacy. Right now when you send documents to ChatGPT for analysis, that data goes to the cloud. If you can run a fine-tuned model on your own phone, the data never leaves your device — this is a massive breakthrough for medical, legal, and financial fields.

Second, cost. Fine-tuning a model on the cloud can cost anywhere from tens to hundreds of dollars. If your laptop can do this, the barrier for individual developers and small teams drops to nearly zero.

Third, personalization. Everyone can train an AI that’s entirely their own — using your own data, your writing style, your expertise. Not a generic GPT, but your GPT. This is actually what our team has been doing all along, just using API + prompt engineering + agent architecture. If we can do this locally in the future, a lot of approaches will change completely.

Can You Use It Now?

Yes. QVAC Fabric has been released under Apache 2.0 open-source license, and you can download it directly from GitHub. The fine-tuning part is in the qvac-rnd-fabric-llm-finetune repo. There are also pre-compiled binaries and adapters on Hugging Face.

Supported models include Llama3, Qwen3, and Gemma3, covering iOS, Android, Windows, macOS, and Linux.

But honestly, the target audience for this right now is still developers. You need a basic understanding of llama.cpp and model fine-tuning to get started. But given the speed of the open-source community, I’m guessing someone will package this into a one-click install app soon.

Conclusion

This whole “Tether does AI” thing sounds like a weird pivot, but when you think about it, it actually makes a lot of sense — stablecoins want decentralized finance, QVAC wants decentralized AI. The underlying philosophy is exactly the same.

And they’re not just making big promises — they actually built something, open-sourced it, and ran the benchmarks. The mere fact of “fine-tuning LLMs on a phone” is already a technical milestone.

The future of AI doesn’t have to be in the cloud. It might just be in your pocket.

Further Reading:

AI × Trading Complete Guide — 13-chapter hands-on course
$49 · Technical analysis + Risk management + Python automation
Learn More →

So Tether, the Stablecoin Company, Suddenly Did AI?#

So What Exactly Is QVAC Fabric?#

1. Cramming LoRA into llama.cpp#

2. Using Vulkan Instead of CUDA#

3. Dynamic Tiling Solves Mobile Memory Bottlenecks#

How Fast Is It in Real Life?#

BitNet 1-bit: Making AI Models Ridiculously Thin#

Why Tether?#

Why Does This Matter? My Take#

Can You Use It Now?#

Conclusion#

Get new posts by email: