A few years ago I started working on turning a radio-controlled toy robotic car and into an autonomous vehicle. Thanks to great resources like Berkeley’s CS 188 - Introduction to Artificial Intelligence, and Sebastian Thrun’s Artificial Intelligence for Robotics Udemy course, I learned about localization and search algorithms and PID controllers. My idea was to strap a Raspberry Pi on the back, use its camera to run an ML model to do some distance/object detection and then take those outputs and give them to a traditional localization algorithm. Then we’d go from there. If I could just get it to figure out where it was or how much it moved from the “origin”, I’d be happy.
Keep in mind this is on an ARM CPU, with about 1GB of RAM and a crappy Qualcomm embedded GPU.
At the time, TensorFlow did not distribute any binaries for architectures other than x86_64. So I tried to compile it on the RPi. Oh what a mistake that was! Burning poor CPU for 20 mins and probably not a few percent done. Then I tried to cross-compile and didn’t get that working.
Anyways, while mired in these practical problems, I had a vision for how I wanted it to work.
I’d write a little JSON or YAML file like a package.json that’d look something like this:
models:
- name: obj-distance-detetction
pretrained: yolo
datasets:
- s3://some-open-distance-dataset
hooks:
- name: localization
input: boundingbox, distance
file: localize.py
A kind of “package manager for ML.” Here I’ve listed the features that I wanted and the current state of the ecosystem tools that provide them:
- Pretrained weights - Hugging Face
- Model code -
transformers
library, random research repos on github- A better package manager than the mess of python environments
- Now mamba is super fast and does this pretty well, PDM is interesting too
- Model data - many competitors for a “git for data” type solution e.g. pachyderm, dolthub, lakefs
- Converting model weights from one format to another: ONNX
- Describe/automatically do autoML or architecture search: MLJar is very easy to use, and focuses on tabular supervised learning, AutoGluon is fast
- And finally, wrap this all up in a declarative format like the above. I recently found Ludwig which does this pretty brilliantly https://github.com/ludwig-ai/ludwig
The end goal is not just to embed a single ML system in a non-ML system. It is to compose multiple ML systems. But composing ML stuff is hard, because little problems in one part propagate to the whole system and decrease overall accuracy. For example, Dropbox composed classic computer vision and deep learning methods and described their challenges getting the end to end system to work.
Recently, researchers have used LLMs as the entrypoint for many other models. One paper used ChatGPT to select and run models automatically from HuggingFace to deliver specialized results. These model combinations are especially common for image-languge problems.
Could we train sub-model components in parallel and compose them properly? Could we use an ensemble of models to produce training data that could be cleaned or annotated while doing online training?
Efficient fine-tuning approaches like LoRA can help us build models like open source software, forking, modifying, and even maybe pushing upstream improvements to models. Let’s hope open-source can be a true competitor to centralized ML.