Building a robotics research setup that lives next to my desk

D dfdxlabs.com ↗

▲ 177 points • 60 comments • by mplappert • 6d ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with a small amount of AI content detected

10 %

AI likelihood · overall

Human

93% human-written 7% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,792

PEAK AI % 0% · §2

Analyzed

Jun 19

backend: pangram/v3.3

Segments scanned

5 windows

avg 358 words each

Distribution

93 / 7%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,792 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

The finished setup in action: setting up a chessboard via teleoperation. Visible are the different camera feeds, the human operator, and the sensed robot state; the operator can switch between cameras. Robotics research has become cheap and accessible enough that small teams, and even individuals, can now do meaningful research on real hardware. There are two reasons for this. First, capable robot hardware has become dramatically more affordable: the physical setup below uses an industrial-grade arm, two cameras, and a full teleoperation setup while staying below €5,000.11This figure excludes VAT and the cost of compute. Second, there is now a steady supply of publicly available foundation models that are suitable for robotics. Hugging Face’s LeRobot, for example, is built around the same idea of democratizing state-of-the-art robotics research. I have some history with this. Between 2017 and 2020, I did robotic manipulation research at OpenAI, first on a humanoid hand and then on a tabletop. The tabletop setup I worked with around 2019/2020 was roughly an order of magnitude more expensive than the one described here. The comparison is not perfect, but the fact that this version is even in the same category of usefulness at this price point is the important change. Back then, this kind of work required a team of around 20 people. If my thesis is right, a single person at a desk should be able to get surprisingly far today. So, to test this thesis, I’ve decided to just do it: I will spend the next several months doing independent research on robotic manipulation, and I will do it in the open. I don’t expect the main output to be papers or an open-source codebase.22I currently don’t plan to open-source the code described here. Maintaining an open-source project is real work, and I’d rather spend that time on research. This might change. What I care about here is the research log itself: what works, what fails, and what I learn from running the system. This note covers step one: building the full foundation for doing research. The first half is about the physical setup: an industrial-grade robot arm, two cameras, and teleoperation in a package small enough to live next to my desk. The second half is about the software stack I wrote from scratch to operate it.

§2 Human · 0%

The video above shows the result in action. This is an experiment and the plan might change. But I’m excited. Requirements From past experience, I know that robotics research should be done on actual hardware, so step one is building a setup that I can experiment on. Before buying anything, I wrote down a few requirements. They apply to the system as a whole—the physical setup and the software that operates it: Less than €10,000 Small enough that I can put it on or next to my desk Parts readily available (no enterprise sales) Easy to use via Python Unopinionated about the software stack (since I want to build my own) The €10,000 limit was not derived from a detailed estimate. At the time, I mostly did not know what the final system would cost. The number was useful as a ceiling: high enough that I would not have to optimize every component for price, but low enough that the setup would still be affordable enough for my scale. These five constraints explain most of the decisions in the rest of this post. Physical setup I decided to build a setup for tabletop manipulation with a single arm. Tabletop manipulation is nice because it offers endless tasks of varying difficulty: for example, you can start with a basic single-object pick-and-place task but gradually move towards setting up a chessboard or assembling Lego, all within the same physical setup.33We had the same reasoning 6 years ago on the OpenAI Robotics team. After solving the Rubik’s cube, we moved towards a tabletop setup because it can support so many different tasks, and we were interested in general-purpose robotics. I opted for a single robot arm instead of a bimanual setup for simplicity, space, and cost reasons. This choice, however, imposes some real limitations on what types of tasks I can do: for example, folding a shirt with a single arm is probably impossible. But a single arm still leaves plenty of interesting tabletop tasks, and it forces a useful kind of constraint: the policy has to compensate for missing hardware with behavior. It can push an object against another object or the table edge to hold it in place, reposition something before grasping it, or use the environment as part of the manipulation strategy. For now, that is exactly the regime I want to study. For vision, I use a wrist-mounted camera and a stationary camera.

§3 Human · 0%

A constraint I have here is space: I cannot build a fully integrated “robot cage” lab setup, which means that the positions of the cameras, the lighting conditions, and the background within the field of view will change over time. The trade-off is that the data will be messier than in a fixed lab setup. However, I think of this as a feature and not a bug: for robots to become truly useful, they must work under exactly these circumstances. To test the setup and to record data, I use a 6-DoF space mouse to teleoperate it. I use a simple, foldable IKEA table next to the robot to separate my own workspace (which tends to be cluttered with various objects) from the robot’s. It’s also safer. Because the robot “sits” directly next to me, using it is very low friction. While originally motivated by space constraints, I really enjoy this setup for quick iteration and development work throughout the day. Depicted below is the full physical setup that I’ve ended up with. I’ll describe the individual components in greater detail below. The robotics setup (front and top view). Visible are (1) the UFACTORY xArm Lite 6, (2) the Intel RealSense D405 wrist camera, (3) the Logitech C920 table camera, (4) the 3Dconnexion SpaceMouse Wireless for teleoperation, and (5) a foldable table. Bill of materials The full bill of materials is below. All prices are what I paid at the time, excluding VAT. I also include links to the places where I bought them for convenience, but all parts should be readily available from various other resellers. Product Price Purchase Link UFACTORY xArm Lite 6 €3,403.32 Reichelt UFACTORY xArm Lite 6 gripper €444.50 Reichelt UFACTORY xArm camera mount €89.03 Reichelt Intel RealSense D405 camera €302.51 MyBotShop Logitech C920 camera €47.86 Reichelt USB cable for Intel

§4 Human · 0%

RealSense (3m) €20.25 Reichelt SmallRig Desktop Magic Arm €28.90 Foto Koch 3Dconnexion SpaceMouse Wireless €174.70 Amazon AGPTEK cable clips €8.32 Amazon IKEA SUNDSÖ folding table €50.41 IKEA Total €4,569.80 The total comes to €4,569.80, excluding VAT and compute. That is less than half of the €10,000 budget I set in my requirements. The important part is not that this is cheap in absolute terms, but that it is cheap enough for an individual or small team to iterate on real hardware. Compute is the one caveat. You obviously need GPU compute to train policies and, eventually, serve them. I left it out because I already had compute available,44The compute I already had is a NVIDIA DGX Spark box. and I suspect the same is true for many researchers. Robot arm I picked the UFACTORY xArm Lite 6 because I wanted to have a reliable, industrial-grade robot arm. I think cheaper arms like LeRobot SO-101, OpenArm, and the Robot Learning Company are interesting and I’m glad they exist.55In fact, I also bought a LeRobot SO-101 kit. It’s much more affordable (around €450) but also obviously much more toy-ish. However, my past research experience has taught me that buying a precise, mature, and durable robot arm makes everything so much easier: they just work and they rarely break. The UFACTORY arm in particular is very appealing because it is surprisingly affordable and comes with a pragmatic Python SDK. The UFACTORY xArm Lite 6 with the first-party wrist camera mount and the xArm Lite 6 gripper. I use cheap cable clips from Amazon to route the wrist camera's USB cable. So far, I’m extremely happy with my choice. The robot arm comes in a nice case (easy to transport if I ever have to) and seems very well-built. It also comes with a base and two clamps to fix it to a table, an emergency stop button, and an external power supply (supports both 110V and 220V).

§5 Human · 0%

The setup was extremely simple. I estimate that it took me roughly 30 minutes to go from unboxing to operating it for the first time. The arm connects via Ethernet and offers a convenient web interface to operate it. The UFACTORY web interface, which runs directly on the robot. It's very convenient for initial testing and setup. Beyond the web interface, the first-party Python SDK makes operations very simple. The arm can be actuated via joint positions or velocities, but it also supports actuation in TCP space.66Tool center point (TCP) is the reference point or frame attached to the end effector whose Cartesian pose you care about. For a gripper, this is often somewhere between the fingers rather than the physical mounting point. Actuation in TCP space means that you command the desired pose or velocity in Cartesian space and the robot controller computes the corresponding joint motion. The latter works great and it’s what I use in practice. The robot also already comes with several safety features: it has self-collision avoidance, configurable global speed and acceleration limits, detects and avoids joint limit violations, and senses and aborts if too much force is detected (with the sensitivity being configurable as well). It further supports a “teach mode” where a human operator can freely move the arm around. For the gripper, I decided to use UFACTORY’s xArm Lite 6 parallel gripper. The gripper works but it’s the weakest part of the setup. It’s pneumatically actuated, so it’s quite noisy when turned on, and rather weak. There are no sensors in the gripper itself, so determining the open/closed state can only be done by reading the control signal. The parallel gripper width is very narrow, so you can only pick up small objects. It can be reconfigured into a "wide" configuration by flipping the fingers 180 degrees, but then it does not fully close. It’s also quite inflexible: the gripper is able to fully close, but then the maximum width when opened is very narrow (depicted above). It’s possible to reconfigure the gripper into a “wide” configuration. This is done by unscrewing the two fingers, swapping them, and screwing them back in. In this configuration, the gripper is much wider when opened, but it cannot fully close anymore.