/Xv6 Rust 0x01/ - Starting is the hardest part
Xv6 is one of the best operating systems for teaching. It’s a great way to learn about how an OS works with basic functions and few line of code.
Originally, xv6 was written in C, which is awesome for students to get hands-on experience with such a classic programming language. But now that Rust is gaining traction—especially since rust-for-linux is becoming a part of the main line Linux—wouldn’t it be fun to run xv6 using Rust?
As a perfect way to kill time, I have migrated most of xv6 from C to Rust. You can check it out here. During this migration process, I encountered many sorts of issues and tricky stuff, nothing brings me more satisfaction than successfully resolving a problem!
Therefore, I believe it would be cool for me to share my experiences through a series of articles detailing how I did this, complete with a more structured approach and clear procedures.
All right, let's get started...
1. Rust on risc-v
In some previous versions of the xv6, it running on x86 arch, however, currently the xv6 has fully migrated to the risc-v arch.
As we are going to port the xv6 to rust, at the very first, we better take a look about how to run rust on risc-v.
Please note that in these series of articles, I will assume the reader has basic knowledge about rust, and knows how to setup the local environment such as rustup, cargo, and IDE.
In the following articles, I will use my local machine as the demo env, and that includes:
- MBP 2019, Intel
- cargo 1.75.0-nightly (b4d18d4bd 2023-10-31), some features rely on nightly build
- ustup 1.26.0 (5af9b9484 2023-04-05)
- CLion 2024.1, (I didn't choose RustRover, because after giving it a try I found it still not stable)
Here we go! Let's create a new rust project, and name it "xv6-rust-sampe".
Now, there is one and only one main.rs
file lies in the
src/
directory, leave it for a sec, we don't need that file
right now.
Remember, we are going to run rust code on risc-v arch, before any coding, we should deal with the toolchain in the first place, after all I bet our code cannot run correctly when it has been compiled as x86 right?
To choose the correct toolchain, let's create a
.cargo/config.toml
in the project root, which is the cargo
configuration file of our project, we could set our building target
here.
Add just two lines in the toml file, like this:
1 | [build] |
This will tell our rust toolchain that in this project, we would like to have a risc-v program as the output.
So far so good.
In the next step, we go back to the main.rs
, and there
is only one very simple function:
1 | fn main() { |
However, if you type and execute cargo run
with
confidence, then you would probably get this:
1 | error[E0463]: can't find crate for `std` |
Surprise! You have met the first issue in our journey, risc-v
toolchain doesn't support the std
lib!
Follow the error hints, we better add the #![no_std]
to
our code. And our second challenge is just right behind: no
std
no println!()
, WTF?
Unfortunately, yes. Actually we will take one whole chapter to
implement printf!()
macro in the next article, which also
means, we are not gonna have it today.
It's a bit awkward for us to not be able to print a simple "hello world", then we can only take one step back, change to this:
1 |
|
Hopefully, at least we can check the value of i
in
debugger, and if the value is equal to 1000, then it can prove we
successfully run rust code on risc-v as well.
The third monster shows up:
1 | error: `#[panic_handler]` function required, but not found |
In the toolchain of risc-v, it doesn't even have its builtin panic handler!
Fine, let's build one to it:
1 |
|
In the above code, some weird stuff shows up. What is "wfi"? (at least I can assure you it's not wifi)
According to the risc-v ISA section 3.2.3: "The Wait for Interrupt instruction (WFI) provides a hint to the implementation that the current hart can be stalled until an interrupt might need servicing', FYI, the word "hart" means hardware thread.
Essentially, we put a "wfi" into a loop, which means if any panic
happens, instead of reporting some error, we just let the cpu stall.
Besides, the macro "core::arch::asm!()" is a wrapper that will let us
easily run assembly in rust code, since there is no std
lib
here, we replace it as core
(not surprisingly, it doesn't
contain a println!()
), for more details about
core
, check here.
All right, after adding the panic handler, and re-run
cargo run
, we will get our final issue:
1 | error: requires `start` lang_item |
lang_item
is a set of items, defined by compiler, to implement special features
for the language, for example memory management, exception management,
etc., the above error is like a "side effect" of
no_std
.
Generally the std
lib by default takes care of all of
the special cases related to lang_item
, once we set
no_std
, many language items need to be provided by
ourselves.
The start
language item is to define the entry point of
the program. Since std
did a great job to link the program
entry to main()
, so just like the above case, no std, no
main.
To solve the issue, we should add the #![no_main]
to our
code, that will let the compiler realize we will define our own program
entry, hence the compiler will no longer report above error then.
Up to now, maybe we could try running the code to see if everything goes well? After all, in many cases of rust, pass compile means pass everything.
Let's recap the current code:
1 |
|
We run it, then we'll get:
1 | target/riscv64gc-unknown-none-elf/debug/xv6-rust-sample: target/riscv64gc-unknown-none-elf/debug/xv6-rust-sample: cannot execute binary file |
Basically, that means we run the binary in a wrong arch. As we are aware, the target binary we would like to have is a program that can be run on risc-v platform, not our x86 platform.
In simple terms, we need a risc-v env to run the binary. And in such circumstances, virtual machine is a great choice for us.
2. Setup risc-v platform based on QEMU
We have successfully compiled the example rust code with risc-v target. Now we need to have a virtual machine to simulate the risc-v environment, of course you can do it on real hardware like Raspberry PI, but virtual machine can help us setup the target platform in a second, that would incredibly save time in the initial stages of development.
Here, we choose QEMU because it's very easy to use, open soured and could integrate to rust seamlessly.
It's quite simple to setup the QEMU with rust integration, we only
need to add two lines in the previous
.cargo/config.toml
:
1 | [build] |
I'm not gonna describe much detail of QEMU in this article, please check here to see the usage of QEMU if you needed.
In short, in the above lines, we set the "runner" of target "riscv64gc-unknown-none-elf" as a command line that can bring up QEMU. You may already noticed, the QEMU binary we execute is "qemu-system-riscv64", which means there are many different binaries that are for other platforms.
Once we set the runner
for some target, then every time we execute cargo
cargo run
, the target file of our program will be passed as
an argument to the command we put into "runner" field. That also why we
put the -kernel
param of QEMU in the end.
All set, let's give it a try!
1 | Finished dev [unoptimized + debuginfo] target(s) in 0.99s |
It went very well, no "cannot execute binary file" error ever again, and the QEMU seems running.
Because we can't output anything, the only way to verify the correctness of our program is run as debug mode, and check the value of "i" in memory.
3. Debugger is our closest friend
No matter now or later, the debugger is always a super important helper to us. Without a debugger, we cannot learn the current program status easily, and can only use the logger to print context with many restrictions.
I'm using the GDB as the debugger in the next series of articles, but you can also choose other debuggers like lldb or rust-gdb, they are quite the same.
So, how to introduce gdb into our project?
Step 1, we need to let QEMU be able to accept a GDB connection,
additionally, pause QEMU to wait for a gdb connection. That requires us
to add two params: -s -S
in to the runner command:
1 | [target.riscv64gc-unknown-none-elf] |
The -s
is a shorthand for -gdb tcp::1234
,
which means to listen the GDB connection on tcp port 1234.
The -S
ask QEMU not start to run until a GDB connection
comes in.
Step 2, run GDB in remote debug mode. I use Clion as my local IDE, so
I can simply create a remote debug in the "Run/Debug Configuration",
with the remote args as localhost:1234
, and choose the
symbol file to
target/riscv64gc-unknown-none-elf/debug/xv6-rust-sample
.
After completing the above two steps, when we run
cargo run
again, set a breakpoint in the first line of
main()
, and click debug on Clion, we should see the
Debugger connected to localhost:1234
in the debug tab. And
if we stop the debugger, QEMU will stop too, and shows:
qemu-system-riscv64: QEMU: Terminated via GDBstub
.
But nothing happens except the debugger connected. Why?
Actually, we haven't completed our program when we added
#![no_main]
in the previous content.
#![no_main]
only tells rust compiler "you don't need to
worry about the program entry anymore, we the developer will take care
of that". But in fact we didn't do anything related to the program entry
at all!
Hence, right now we need to let QEMU understand where to start
running our code. And that requires a linker
script .ld
, just like this:
1 | /* entry.ld */ |
So linker script is basically define the memory layout of the output binary, and since the rust risc-v cross toolchain will generate the target file as ELF format, we defined the layout as ELF style.
The above ld
file is quite simple, the four sections are
basic ELF sections and nothing special here. The only fields we need to
put our eyes on are the fields on top.
OUTPUT_ARCH( "riscv" )
indicates the target file is for
the risc-v platform. And ENTRY( main )
points out our
program entry is a symbol called main
, which is our main
function indeed.
The . = 0x80000000
stands for putting the entry onto the
address 0x80000000, so that our binary will start from that. QEMU
supports many different hardware architectures, particularly in risc-v,
the RAM address starts from 0x80000000. We can execute a very simple
command to prove that:
1 | qemu-system-riscv64 -monitor stdio |
We start a risc-v virtual machine with -monitor stdio
,
that would not just run a VM instance, but also bring us into an
interactive interface, we can check the current memory regions by
info mtree
, apparently the RAM begins at
0000000080000000
.
Now we have our ld
file, but we still need to activate
the script in our program, which need to modify the
.cargo/config.toml
:
1 | [build] |
And one another step is, let the entry symbol be recognized.
Generally, rust would mangle most of the symbols, to make sure all
symbols have their own unique name. But as we have set the
ENTRY( main )
, we need to let the main
stays
"main", not other mangled names. To achieve that, we have to change the
function signature of main()
like this:
1 | ... ... |
#[no_mangle]
force the compiler not mangle this
function, and extern "C"
is a declaration of FFI, to export
the function with C ABI.
After all of the above modifications, I'm sure the program can stop at the first line of main if there is a breakpoint.
4. Things are getting complicated
In the previous chapter, I ensured the program could be stopped at the first line, but I bet you have tried, the debugger can no longer step over to the second line. And if you pause the program through gdb, the current memory address turns out to be 0x0. Something's wrong here.
There is a CSR called mstatus
in risc-v to indicate any
event that caused the trap, we could check the value of
mcause
to investigate why our program is in a failure.
Execut info all-registers
in gdb, will show value of all
registers:
1 | (gdb) info all-registers |
The mcause
shows value of 0x1
, refer to the
risc-v document(Table
14. Machine cause register (mcause) values after trap.),
0x1
means "Instruction access fault".
But how can it be? It won't be insufficient access permission, after all we haven't set any privileged level, so our program is running on the machine mode, which is the highest privileged mode, we can literally do everything.
If we move one step forward, decompile the program with obj-dump, and see the assembly code here:
1 | 0000000080000000 <main>: |
Yes, the stack pointer! sp
is initially zero, so that
after line 80000000, the sp
will be set to
0x0 - 0xf =
(we are on the 64-bit platform).
Unfortunately, at line 80000006, the value of a0 will be saved to
sp + 12
, which is 0xfffffffffffffffc
, but
obviously this address is illegal. If you remember, we only create a VM
with 128MiB memory, which means the available physical address range is
0x80000000 ~ 0x88000000
.
To make it correct, let's set the sp
in the first
place:
1 |
|
We add one line of asm to set the sp
equals to
0x80001000
, since our program is quite simple and will not
grow to even 0x800000ff
, so our code section is safe and
has no chance to be overridden.
Finally, the program can be run correctly, and if you like, add a
panic!()
at the end of the program, otherwise when
main()
is return, the program will fail again because we
didn't tell it what to do next after main()
returned.
5. What the xv6 is all about?
After all the above sections, now we can get back to talking more about xv6.
Quote from the name of the xv6 book, xv6: a simple, Unix-like teaching operating system. Yes, xv6 was inspired by Unix v6, and since the Unix needs to run on specific hardware like PDP-11, and with many low-level details, in 2006, MIT decided to modeled on Unix v6, rewrite it by ANSI C, with multiprocessor support been added, at last created xv6.
As we mentioned at the beginning of this article, the xv6 was running on x86 at first, but then they ported it to risc-v. That's why we provide this entire article to discuss how to run rust on risc-v platform, with the knowledge in this article, I believe we could get our local environment ready to go, and get to know some basic low-level information about risc-v instructions, linker script and ASM in rust.
Basically, although it does not contain many lines of code, xv6 is still a full functional operating system, it has virtualized CPU and memory as process and virtual memory, it supports concurrency, and contains an Unix-like file system to implement persistent. It has user space and kernel space, with a group of system calls (but not compliant with POSIX for clarity and simplicity). Like Unix, xv6 remains macro kernel concept, so it has only one kernel binary.
In the next articles, we will take a close look at the detailed components design of xv6, and then try to port each one of the components to rust...