Notes on NUX RSS Feed

NUX and GGML: Bringing AI to Kernel Space.

In the past few months, mostly pushed by friends more knowledgeable than me in this field, I started – something not exactly original – to divert my attention to the recent improvements in machine learning and AI.

I had an alternating fascination with the field over the years. The first time I had real interest as an adult engineer was in 2010, after I watched the Jeff Hawkins' 2002 TED talk. If you haven't watched it, watch it now; it's a brilliant talk!

I was living in Amsterdam at the time, and I remember spending every possible hour outside work tinkering with the idea of prediction. I remember downloading the first Numenta's whitepaper about their Cortical Learning Algorithm and I did what I usually do when I want to understand something: I reimplemented it. Twice.

Speaking of Numenta, they're definitely up to something. Their recent papers, although I have read them only lightly, look extremely promising and super-interesting. If you haven't already, check their Thousand Brains Project. Seems like a place to spend a lifetime of fun.

But of course, today all the discourse is about everything that happened since this paper. And I couldn't ignore it.

GGML to the rescue

Personal taste here, but in order for me to experiment with things, I need to find a way to experiment without resorting to Python.

I have been briefly exposed to PyTorch at work, and that was enough experience for me.

I thought for some time that this meant that the whole AI thing would be out of touch for me, but then a friend pointed me to GGML.

GGML is a tensor library used by projects such as Llama.cpp. In the repository examples, you can find even some simple but effective GPTs.

It was originally meant to support CPUs only (and aarch64 Macs in particular), but now has backends for BLAS, OpenMP and hardware platforms.

The code – that has all the obvious signs of a fast growing project – is a mix of C and minimal C++. On a cursory glance, it seems to be architected in this way:

  • A set of tools to open, save and load models.
  • Functions that create a computational graph from the models.
  • A VM that executes the computational graphs in a thread pool.

What I liked about GGML is that the architecture makes sense and it's easily hackable, if you can stomach cmake.

GGML in kernel space.

The goal of my NUX prototyping kernel framework is to be able to quickly create custom kernels. It has its own libc – libec, based on the NetBSD libc – and a powerful memory management and this means that as long as file I/O is not required, you should be able to port any C program to run in kernel mode.

Another thing that NUX offers is the ability to completely control what the hardware is doing. If I run some code in kernel space in a CPU, I can make sure that nothing will ever interrupt it.

NUX supports IPIs, so we can use that (or simple SMP barriers) to syncronize among them.

I realised quickly that this works really well with GGML architecture. You could for example, boot a machine and assigns all their secondary CPUs to the GGML threadpool, while using the bootstrap CPU for system control and drivers.

Of course, I decided to implement that. And make a FOSDEM talk about the effort!

An early prototype

Today, I published a github something that has been living dangerously unbacked up in my machine for the past few months: blasbare.

It has been my workspace for running various experiment of porting various computing architectures to NUX.

As it stands, there's a simple kernel that runs the simple-ctx GGML example.

Despite the simplicity of it, it compiles the full GGML library with the CPU backend.

Works still needs to be done, and the documentation is lacking, but it's the early days.

This project will be discussed more in detail at FOSDEM 2025 in Brussels later this month. Hope to see you there!

Full Article and Comments

Why would one rewrite Mach?

Over the past few months, I have been writing a microkernel aiming at being effectively a Mach rewrite.

The response to this, when I shared my plans with friends who share my interests, was a simple, almost desperate question: Why, Gianluca, why would you do that?.

Definitely an interesting question, so I thought this would be a good time to write about it.

The many reasons behind a personal project.

In the spring of 2024, I picked up an old project of mine, NUX, and brought it to a state where I could easily build portable kernels that would run on real, modern hardware.

The first test I made was to port Murgia Hack to NUX. The port, which allowed MH to run on two new architectures, RISCV64 and AMD64, was straightforward. But the architecture of NUX is heavily influenced by the design choices of MH, so it was not meant to be difficult.

What I needed was a challenging kernel. I briefly considered a classical UNIX system, but then it occurred to me that there's always been a microkernel-shaped hole in my life: Mach.

Mach and me, a personal story.

I remember the first time I downloaded the GNU Hurd sources. It was the mid-to-late 1990s. The first thing I've heard about the Hurd was this famous Linus Torvalds quote:

In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd people.

Truth is, the promise of GNU Hurd was exciting. It was considered the next big thing. It was an architecture that allowed to play with the fundamentals of a UNIX system by providing translators.

It took me days to finally build the toolchain and the sources, and being able to boot GNU Mach on my 486.

At that time – I was a high school student – I was able to understand some parts of the Linux Kernel code, and so I decided to study Mach. And so, with the usual youth optimism and fearlessness, I printed the "Mach 3 Kernel Principles" on my super-slow early ink jet printer, and spent nights reading it. Then I started looking at the code.

Me and the Mach source code

If you have ever looked at Mach's source code, you know where this story is going. I was unprepared for that code. I finally understood something my mediocre Italian Literature teacher was failing to explain at the time: Dante's trip to the Inferno.

There was something off about the purity, clarity of the Mach principles and architecture, as explained in the documentation, and its code.

I was always interested in memory management, and the VM code seemed to me, at the time, designed to bring the worst doubts about one's own capacity to understand things.

As a reaction, I went on and decided to start writing my own kernel. But this is another story.

I stayed around the Hurd community for quite some time. Despite what Linus said about them – us? –, there were some really nice people in the project. Many of them I still consider my friends.

Sto-Mach

With the years, I started being a bit more at ease with the Mach source code, and in 2005, as a late computer engineering student at university, I presented Sto-Mach at the Hurd Meeting in Madrid. Sto-Mach was my reaction to a project called oskit-mach. Based on oskit, oskit-mach removed much of the Mach core and substituted it with oskit components. Sto-Mach did the opposite. Kept the core of Mach intact, removed the Linux 2.0 driver glue code, and used oskit components as drivers.

The result was slightly better. Now we had Linux 2.2 drivers, but more importantly a COM-based interface – yes, oskit packaged other operating system source code in this Microsoft standard – to write drivers.

I have recently found the source code of Sto-Mach in my archives. I will be writing a blog post about it. For now, only slides of that presentation are available on my talks page.

Shortly after that, I left university, and started my career as a hypervisor engineer. And I forgot many details about Mach.

MACHINA: a NUX-based Mach clone.

Now, should be clearer why, when faced with the choice of a challenging kernel architecture to implement in NUX, I chose Mach.

By coincidence, right when I was thinking about doing it, I found, while unpacking the boxes after the move to a new flat, my old printed version of Mach 3 Kernel Principles I spent so many nights reading, decades ago.

On re-reading it, I re-discovered my fascination for that architecture, and decided that I wanted to rewrite Mach, to answer questions that now, as a seasoned system engineer, I can finally face:

  1. Does Mach have to be this complex to achieve this functionality?
  2. What would a modern Mach implementation look like?
  3. What architectural decisions of Mach would not be made today?

Is Mach easy to rewrite? No, it is not. Is this going to be a replacement for Mach? No, it will not. But sometimes, the best way to understand a system is reimplementing it.

I will introduce MACHINA at FOSDEM 2025. See you there!

Full Article and Comments

Introducing NUX, a kernel framework

History and motivation

Circa 2018, I decided that the Murgia Hack System needed a fresh start to support newer architectures.

MH's kernel is quite clean and simple, but suffers from an aging low level support. Incredibly, some of that i386 code can be traced back to my early experiments (in 1999!) and code that I wrote for my first SMP machine – a dual Pentium III bought in Akihabara in the early 2000s!

Unfortunately, emotional attachment to code doesn't create great engineering, and I had to start from scratch.

The driving principle behind this effort – that later became NUX – was to rationalise my kernel development.

At its core, a kernel is an executable, running in privileged mode. It's special because it handles exceptions, IRQs and syscalls, essentially events, so it can be seen as an event-based program. And it runs on multiple CPUs concurrently, we can even draw similarities with multi-threading.

The very annoying and often project specific part of a kernel is the bootstrap. A kernel usually starts in a mode that it's either very limited (think x86 legacy boot) or very different in terms of runtime (think EFI).

A kernel is thus required to set up its own data structures (and virtual memory), and then jump in it (through magic pieces of assembler called trampolines).

In a nutshell, NUX can be seen as an attempt to solve all the abovementioned problems that differentiate a kernel from a normal executable.

Solving the bootstrapping problem.

To solve the setup of the kernel executable data structures, NUX introduces APXH, an ELF bootloader. APXH – (upper case of αρχη), greek for beginning – is a portable bootloader whose goal is to load an ELF executable, create the page tables based on the ELF's Program Header, and jump to the entry point. It attempts to be the closest thing to an exec() you can possibly have at boot.

APXH also supports special program header entries, – such as Frame Buffer, 1:1 Physical Map, Boot Information page – that allows the kernel to immediately use system features discoverable at boot, further reducing low level initialisation.

APXH is extremely portable, and currently works on i386, AMD64 and RISCV64, and also supports booting from multiple environemnts, currently EFI, GRUB's multiboot and OpenSBI.

Creating an embedded executable: the need for a small libc.

In order to create an executable in C, you'll have to create against a C Runtime (crt) and a C Library.

This is why NUX introduces libec, an embedded quasi-standard libc.

libec is based on the NetBSD libc, guaranteeing extreme portability and simplicity. It is meant to be used as a small, embedded libc.

Every binary built by NUX – whether APXH, a NUX kernel, or the example kernel's userspace program – are all compiled against libec.

A kernel as a C executable.

As for any C-program, the kernel will have to define a main function, that is called after the C-runtime has initialised. The libec is complex enough to support constructors, so that you can, define initialisation functions that run before main.

A special function of NUX, that diverts from normal C-programs, is main_ap. This is a main funciton, that is called on secondary processors, that is other processors that are not the bootstrapping CPU.

Kernel entries as events.

As mentioned above, a kernel has to deal with requests from userspace and hardware events. In NUX, this is done by defining entry functions for these events.

The whole state of the running kernel can be defined by the actions of these entry functions.

A kernel entry has a uctxt passed as a parameters and returns a uctxt. uctxt is a User Context, the state of the userspace program. The kernel can modify the User Context passed as an argument and return the same one, or can return a completely new one.

The former is how system calls return a value, the latter is how you implement threads and process switches.

The NUX library interface

Finally, NUX provides three libraries:

  1. libnux: a machine-independent library that provides the higher level funcitonalities you need to develop a fully functional OS kernel. The 'libnux' interface is here.
  1. libhal: This is a machine-dependent layer. Exports a common interface to handle low level CPU functionalities. The HAL interface is here.
  1. libplt: This is a machine-dependent layer. Exports a common interface to handle low level Platform functionalities, such as device discovery, interrupt controller configuration and timer handling. The Platform Driver interface is here.

The separation between hal and plt is possibly a unique choice of NUX, and allows, as many other design choices of NUX, for a gradual and quick porting to new architectures.

For example, when the AMD64 support was added, the ACPI platform library needed no changes, as the CPU mode was different but the platform was exactly the same.

Similarly, an upcoming support for ACPI support for Risc-V consists mostly on expanding the ACPI libplt to support Risc-V specific tables and the different interrupt controllers.

A useful tool for kernel prototyping.

NUX goal is to remove the burden of bootstrapping a kernel. And be portable.

The hope is that NUX will be useful to others the same way it has been useful to me: experimenting with kernel and OS architectures, while skipping the hard part of low level initialisation and handling.

Full Article and Comments