Building an AI Hardware Agent: 1. The Idea and the Shopping List

doctorsmonsters
1 day ago
3 min read

How about a device that can control/run any other computer? Here’s a crazy idea, though I’m sure many of us AI enthusiasts have thought about it at some point: what if you could carry a small device in your pocket, plug it into any computer, and have an AI take over — typing, clicking, navigating — all without installing anything on that machine?

That’s exactly what I’m going to try to build. And I’m going to document the whole journey here.

So What’s the Idea?

Picture this. You have two machines. The first is the controller — I’m calling it MK (AKA Mind Control) . The second is the target — SUB (duh!). MK connects to SUB the same way a keyboard and mouse would: through USB. As far as SUB is concerned, it just has normal peripherals plugged in. It has absolutely no idea that the keystrokes and mouse clicks are coming from an AI.

But here’s the cool part. MK also captures SUB’s display output via HDMI. So it can actually see what’s on screen, send that to a vision-capable LLM, figure out what to do next, and send the right keyboard or mouse commands back. See the screen, think, act, repeat. A fully autonomous loop.

Why go through all this trouble instead of just using remote desktop software or something like that? A few reasons:

It works on literally anything. Windows, macOS, Linux, even at the BIOS level before any OS loads. It doesn’t care.
Nothing gets installed on the target. SUB just sees a keyboard and mouse. No software footprint, no agent running, nothing suspicious.
SUB doesn’t even need internet. Only MK needs a connection to hit the LLM API.

The dream endgame is a portable, self-contained device running a local LLM that you can just plug into any computer and let it do its thing. But let’s not get ahead of ourselves — we’re starting from scratch.

Talking It Through with AI

Before buying anything or writing any code, I spent some time chatting with Claude and ChatGPT about whether this is even feasible and what the best approach would be. Turns out, it’s very much doable. The setup breaks down into three pieces:

The input side — a microcontroller that plugs into SUB’s USB port and pretends to be a keyboard and mouse. MK tells it what to type or where to click, and it sends those signals to SUB as normal HID input.

The display side — an HDMI capture card that grabs SUB’s screen. SUB thinks it’s outputting to a monitor, but really MK is capturing the frames and feeding them to the LLM.

The brain — software running on MK that captures screenshots, sends them to a vision LLM, parses the response, and tells the microcontroller what to do next.

Both Claude and ChatGPT pointed me toward the same components, so I felt pretty good about the choices.

The Shopping List

Here’s what I ordered based on AI’s recommendation:

Raspberry Pi Pico 2 W — $16.31

This little guy plugs into SUB and acts as a keyboard and mouse. It runs the newer RP2350 chip and has built-in WiFi and Bluetooth, which means MK can send it commands wirelessly. It supports CircuitPython for easy HID programming, including absolute mouse positioning — which is important because the LLM will be saying things like “click at pixel 540, 320” and we need to hit that exact spot.

Amazon link

UGREEN Full HD 1080P HDMI Capture Card — $18.99

This is how MK sees SUB’s screen. SUB’s HDMI plugs into this, and it connects to MK over USB. It shows up as a regular webcam, so no special drivers needed — just grab frames with OpenCV or FFmpeg. Supports 4K input, outputs at 1080p, which is plenty for what we need.

Amazon link

HiLetgo CP2102 USB to UART Serial Adapter — $7.39

This gives us a wired serial connection between MK and the Pico. Yeah, the Pico has WiFi, but having a UART link is great for debugging and as a reliable fallback. Comes with jumper wires, which is nice.

Amazon link

Total: ~$43

Not bad at all. Under fifty bucks to get this experiment off the ground. I confirmed that I am ordering the right products by uploading a screenshot of my shopping card to Claude.

Now We Wait

Orders are placed and I’m waiting for everything to show up. In Part 2, I’ll walk through the physical setup — how to wire everything together so SUB sees a keyboard and mouse while MK captures the screen and pulls the strings.

Stay tuned.

Building an AI Hardware Agent: 1. The Idea and the Shopping List

So What’s the Idea?

Talking It Through with AI

The Shopping List

Raspberry Pi Pico 2 W — $16.31

UGREEN Full HD 1080P HDMI Capture Card — $18.99

HiLetgo CP2102 USB to UART Serial Adapter — $7.39

Total: ~$43

Now We Wait

Recent Posts

Comments

Subscribe Form