Running DeepSeek-R1 on refubrished blade server: 01 Introduction

"Intro"

So I started to look into ways to self-host LLMs, including bigger ones like R1.

Let's start by setting some targets for this project.

Extended goals/nice to haves:

"Lets see our options"

I decided to rate options in 6 categories:

GPU-only server

3d render from Nvidia promotional materials, that shows HGX A100 server. External link to nvidianews.nvidia.com

Using something like Nvidia HGX A100 (with 8x80gb VRAM)

Cost: Arm, leg, house, and everything you probably own x2
Total memory: 1Tb+ can fit on two of those
Speed: 5/5 (Parallel inference possible)
Power consumption: 3/5 (4-5 kW)
Bulkiness: 4/5 (2-4u server equivalent)
Looks: 4/5 (Nvidia servers often look cool)
Partner approval factor: 0/5 ("No, we are not selling the house")

Use N mac studios with Exo to distribute weights

Screenshot of the post made by user alexocheema on twitter (x.com), message says: AGI at home. Running DeepSeek R1 across my 7 M4 Pro Mac Minis and 1 M4  Max Macbook Pro Total. Total unified memory = 496Gb. Uses exolabs distributed inference with 4-bit quantization. Next goal is fp8 (requires >700Gb). There is also photo of 7 mac minis stacked on top of each other near macbook. External link to x.com
Cost: 24k+ (for 512Gb), 48k+ (for 1024 Gb)
Total memory: 1Tb+
Speed: 3/5
Power consumption: 3/5 (1.5kW)
Bulkiness: 3/5 (unstable)
Looks: 5/5 (it's mac, people will like it)
Partner approval factor: 1/5 (I don't know how to explain that I spent 50k for 16 mac minis)

Multiple Used GPU in "server"

Photo of 8x RTX 3090 rig, all connected to 1 motherboard External link to reddit.com

This rig, for example has 192gb of ram, my target would need at least 3 of those.

Cost: 20k+
Total memory: 1Tb+ will probably require multiple motherboards
Speed: 4/5 (Parallel inference possible)
Power consumption: 1/5 (Metric fuck ton)
Bulkiness: 5/5
Looks: 2/5 (Etherium miner in 2020)
Partner approval factor: 1/5 (I asked and was forbidden to occupy whole room with GPUs)

Using server motherboard. CPU inference

Photo of server with AMD Epyc, watercooling loop, in DIY metal cube case External link to digitalspaceport.com
Cost: 2-3k
Total memory: 1Tb+
Speed: 2/5
Power consumption: 5/5 (~400w)
Bulkiness: 2/5
Looks: 3/5 (I like this techno-grunge, but for some people could look scary)
Partner approval factor: 4/5 ("Looks like overpowered gaming PC")

Blade server motherboard

Or as I like to call it: "Great, but can we make it cheaper"

After reviewing my options, I understood that getting server motherboard is my local optima. This category also has several sub options:

Since inference is memory-bandwidth limited, DDR3 and older goes out. 1/2u servers are still bulky and noisy, and this can't be fixed easily. Then when browsing ebay I saw that you can get 1u server blade for 20$ (CPU included) and they have several additional pros over regular server motherboard.

Cost: Memory and psu will be needed separetly, but cpu is included (Add 600$ for memory, 200$ for psu)
Total memory: 1Tb, theoretically 2 possible (If you can find 128Gb modules)
Bulkiness: 5/5 (it's basically half of 1u server), around 3-4 mac minis in length
Looks: 4/5 (Techno-grudge, but smaller == cuter, right?, like cats and dogs)
Partner approval factor: 4/5 (If you can make it at least not noisy)

Speed and power consumption stays the same as with server motherboards.

So, meet the specimen:

Supermicro X10DRT-P

Photo of SuperMicro X10DRT-P server blade Small list of it's features:

So, yes, this blade has 2 obvious problems: power and cooling.

Power is provided by proprietary Supermicro blade connector:
Photo of backplane connector located on SuperMicro X10 Blades Some comments on reddit state that blades are impossible to use without the proper case because of communication protocol between them.
At the same time, there is a post. Where person is booting some blade server.

So at least some people are able to use blade without backplane.

Cooling in blade servers is usually up to case fans, but this motherboard has 2 exposed fan connectors, and hopefully I can hook up to them.

This is current state for now.
In next post I will go over power topic, and hopefully get to a first boot.

When next part is available, link to it will be here: Running DeepSeek-R1 on refubrished blade server: 02 Power.

Bonus section of exotic (dumb) ideas

"just because you can doesn’t mean you should"

Here is list of ideas that I come up with, but decided not to pursue (for different reasons). They are sorted in rising madness you need for implementation.

Use a several of NVMe drives and mmap

Plan: Get motherbord with a lot of nvme ports, buy N nvme disks with max IOPS.
Pros: Could be super cheap, and probably smallest of all proposed.
Cons: Even slower then optane, issues with random access speeds

Optane memory server

Plan: Buy optane-ram compatible server, 8 optane sticks with 128Gb.
Pros: 128Gb optane sticks cost like dirt
Cons: very slow, hard to get server that would support it (I wasn't able to source one for a reasonable price)

64x 16gb Raspberry PIs (or other SBC)

Plan: Get 3 of 48x1Gb switches, interconnect them, struggle, cry
Pros: You can become popular SBC blogger
Cons: Cost a lot of money, slow inference

A lot of smaller PCs with older gaiming rigs 16,32 Gb ones.

Plan: Raid every ebay listing, facebook marketplace and thrift shop build giant cluster out of them.
Pros: It's your chance to meet new people, you will see a lot of new places around the city (or even neighboring ones)
Cons: A lot of struggles to interconnect them, huge power consumption, constant hardware failures

LLM@Home

Plan: Organize community for people to distribute parts of the weights on their PCs, compute/combine.
Pros: Basically free (It's not you who will pay for hardware) and in batch inference actually could work (like how SETI@Home works).
Cons: Slow, unreliable, huge overhead

Tape drives

Plan: Get several older generation of tape drives, connect them in parallel, distribute weights between them.
Pros: Very comedic, getting 1 token per shuffles of tapes will make you appreciate speed of modern computers.
Cons: No cons, please do it, share your video or post with me, I want to see it.