Recursively self-improving AI is here!
And it's provably not getting any more misaligned than it would be if it wasn't recursively self-improving!
If you want to see some absolutely brilliant research by some absolutely brilliant people doing the best work that anyone’s doing on AI these days, the new paper from the DeepMind team in Nature is a must read article: https://www.nature.com/articles/s41586-022-05172-4
There’s a little bit of snake oil mixed into the introduction, which it really doesn’t need. So I’ll provide background that I think is more relevant.
More efficient matrix multiplication is a problem with a well-known veritable vineyard of low-hanging fruit that a few people occasionally reach out and pick, but for the most part, it’s so well-solved for all intents and purposes that there is even lower-hanging fruit closer at hand for making any real-world problem you’re dealing with more efficient than improving how you multiply matrices. For example, I used to work in HFT. Some of our algorithms used a decent amount of matrix multiplication, and our trades were extremely latency sensitive. We used FPGAs for reading and writing network packets. We had the fastest microwave tower network of anyone in the world for sending data from one particular colocation to a different particular colocation. And our strategies than did matrix multiplication did it on the CPU. (To be fair, we did it out of band so the matrix multiplication itself didn’t actually add latency; but we were resource constrained on that problem just like we were resource constrained on all our other problems, and we knew of “better” trading strategies that we would have preferred to be running than the ones we were running that we weren’t running because the matrix multiplication we were doing was too slow.) There were so many obvious other places where there were easy gains to be had that we couldn’t be bothered to move our matrix multiplication to GPUs even though that was an unambiguous win. Needless to say, we also weren’t trying to find faster ways to multiply matrices on the CPU.
Relatedly, when I was in college, I remember a professor who was trying to get people to pay more attention to real world computational costs relative to how much they cared about doing what’s theoretically asymptotically optimal. I remember him handing around an article about someone winning a matrix multiplication competition using a naive N**3 algorithm, and having nothing better about his algorithm in any way other than just memory management, with some code he optimized on his train commute to and from work outperforming anything anyone else had written for matrix multiplication of a given size with lots of real world relevance. (I’m saying “I remember” because I’m not sure I got the details of this story completely correct, and don’t think it’s worth the effort to track down the actual story. But I usually don’t misremember punchlines, and the punchline here was that the N for the cutover at which naive matrix multiplication began to underperform Strassen was temporarily embarrassingly substantially raised by a hobbyist tinkering around with memory management, because at the time, all of the “optimized” algorithms pretty much disregarded real world constraints.)
In summary, the matrix multiplication space is a problem space where everyone knows there’s tons of work to be done. And everyone knows that a lot of that work is mostly just trial and error, and building up better intuitions over the course of that trial and error, and that it would improve basically every problem that anyone works on if someone spent that time doing any of that trial and error, but there aren’t really any individual problems that are so constrained by matrix multiplication speed that it’s worth anybody allocating a bunch of human resources to making matrix multiplication faster, especially since a lot of that work will be made obsolete by the next generation of hardware.
The problem that’s the closest to having matrix multiplication be so important that it’s almost worth hiring people to work full time on optimizing it for each generation of hardware is GPU libraries/compilers/APIs, and next after that is artificial intelligence, but honestly it’s probably not even worth it for companies like Nvidia, AMD, and Intel to put any good people in their GPU driver or compiler team on it. Why spend years working on a 10% improvement when you get a 2x speedup every 18 months from other problems? (Especially since any actual algorithm gains can probably be just as easily applied to your competitor’s chips as they can be to yours, so even if it’s a lasting improvement to GPU performance it’s not a lasting competitive advantage.)
So anyways, this is an absolutely perfect problem to train an AI to do, and like the authors of the paper discuss, it’s a harder problem than the problems AI could conceivably tackle a decade ago; because the solution space is quite a bit bigger than the solution space for other problems that AI couldn’t outperform humans on a decade ago. The contours of the problem are also also really well-defined, and it’s easy to prove correctness of solutions. As an added bonus, this is one of the fundamental problems underpinning everything AIs do, so any improvements made to this problem will make AI faster.
Of all the work anybody’s done in AI, this almost certainly has the most real world application. It’s valuable work that it would be great to constantly throw some intelligence at, but for various economic reasons and other reasons related to the pace of change and availability of other problems, throwing more human intelligence at it doesn’t make much sense. AIs improving these solutions is just great!
(There’s also some claims about improvements to combinatorial search, and this being relevant to improving real math, yada, yada, yada in the paper. I don’t foresee that scaling enough to matter with conventional computing. Fifteen years from now the stuff people are doing in QM will blow everything anybody’s doing in conventional so far out of the water, that all of the work anyone’s doing in AI research for combinatorial search today will be completely irrelevant.)
Anyways, I continue to be blown away by the human intelligence on the DeepMind team. And with their most recent paper in Nature, they are unambiguously describing a process to build recursively self-improving AI.
I predict that this team will produce dozens of more breakthrough papers in the next couple decades full of brilliant discoveries. And I predict that the intelligence driving those discoveries is going to remain unambiguously human despite their AI now being self-improving until they start using QM for their combinatorial search.