Saturday, April 30, 2022

Virgule/emulsiV: Learning RISC-V Assembly Language

 I finally took a long vacation with my psychiatrist girlfriend. We flew to Portugal--from the USA--10 1/2 hour flight. Super Fun! 

Of course we ignored the COVID-19 warnings like this one

Bad news--when we were leaving, she tested negative for COVID-19, but I didn't, so she left, and I was stuck.  

I self-isolated in a hotel--six days, although the Portuguese health authority said it would be more like 11 days--to do what? 

Ignore the shelter in place mandate and go to museums?  No. Getting the locals sick--I found them calm, polite, and often downright zen--not my thing.

Ignore the mandate and go to the beach?  Nope. See above.

Ignore the mandate and ride the ridiculously crowded tram 28?  I wouldn't do that if I was healthy, so, no.

Learn some new programming skills, of course. RISC-V assembly!! 

RISC-V based dev board from Sparkfun
 
WELL ARMED?

I've already written posts about assembly for the 6502 processor (here) and ARM (here). 

With 6502 it was pretty easy to learn assembly language basics; ARM, not so much.

Another instruction set architecture ("ISA") is RISC-V; it's open source and was designed to help teach newbies like me how processors work. 

I expect RISC-V to become increasing popular as chip makers become weary of giving ARM and Intel bags of money since RISC-V has no license fee.  

So, here I was, stuck in Lisbon, Portugal, alone in a hotel room, only me and my ancient MacBook Pro. Can I learn a new ISA without going crazy?  

Yes. I found an amazing online simulator for learning RISC-V--"Virgule" (a simulated RISC-V processor) with "emulsiV" (an emulsifier for Virgule? No idea what the hell emulsiV means)--find them here.


Documentation for Virgule/emulsiV is a bit sparse, but I could follow it--which means you can too--it's here

A good RISC-V/Virgule introduction video--short and informative--is here. As the video correctly notes: Someone put a lot of work into this simulator. This must have been a scripting project of passion, and it's one of the best, and at the same time simplest, online simulator I've seen to date. Viva la France! 

It has an ASCII display, behind glass GPIO, about 2-3k of simulated RAM and even a memory location where you can load and view bitmap images. 

It gets better: You can animate what happens inside Virgule/emulsiV as you step through your code, and control the speed of the animation. Not only can you see if your code works, you can see how it works at a snail's pace.. This greatly helped me understand RISC-V at a deep level while writing, debugging, and running my code.

There are different ways to get code into the simulator; for me, I hand-coded instructions into the memory column on the left side of the simulator, then stepped through my code to watch it work. 

If I made a really dumb data entry mistake the simulator tried to correct it and did a remarkably good job. Amazing! As a learning tool, this couldn't be more straightforward.

Enter assembly code right into memory slots....


REGISTERS: 

The simulator has 32 general purpose registers.  RISC-V's spec indicates that hardware does not have a central accumulator and that register x0 is always 0.  

Programmatically we call these registers x0 x1 etc....this is not hex, which is denoted 0x000 in the simulator as you'd expect; the registers are called out via decimal with a preceeding "x".  

The documentation talks about RS1, RS2 etc.--these are considered SOURCE registers; also rd1, rd2 etc, these are the DESTINATION registers. 

Other than x0, any register can be used as a source or destination.  



IMMEDIATE VALUES

In the RISC-V ISA, immediate values ("IMM's") don't seem to have the odd rules, complexities, and head scratching limitations that I found in the ARM ISA. 

 An easy way, I think, to get immediate values into registers is to use addi instruction—"add immediate":

 Addi dest, source, value

Source can be an existing register with a value stored in it….this next instruction puts 32 in x1, and register x0 can only have 0 as its value: 

addi x1, x0, 32

However, For some instructions, like lui, Immediate values are entered like this

0xyyyyy000

where  yyyyy is a 32 bit hex value. So the LUI instruction is a good way to put a large immediate value into a register:

 lui x2, 0xc0000000  ;put c0000000 hex in reg 2

But what about the 3 LSBs, they never get can a value?

Ha!—use addi after a lui—a trick!

addi x1, x0, 32

In this instruction: x1 is destination register;  x0 is source register and has value of 0, and 32 is the value to put into the LSB's

STORING VALUES

Virgule emulates a 32 bit RISC-V processor, so  you can store bytes ("SB"),  half words/16 bits ("SH"), or words--32 bits ("SW"). 

The general instruction for storing a byte (bits 0:7) is this:

SB (what to store) (where to store) 

so---

SB x1, 0{x2)

SB means take 0:7 bits of what to store--in this case the value in register x1, and copy it to register x2, offset by IMM 0; in other words, copy the value in register x1 to register x2.

And--you can store to memory (as opposed to a register) using an offset value greater than 0—see the "Extra Ram" section below.

LOADING VALUES

Works the same way as storing but going the other direction.  

LB x3, 0(x2). 

Loads a byte to register x3--the value found in register x2. 

If the offset is 1 or greater, you are loading bits 0:7 from a memory cell, not a register.

EXTRA RAM

Not  well documented, but you can store and load values to and from virtual RAM, not just in the 32 registers provided.

For instance:

SB x5 1(x1) 

stores bits 0:7 found in source register x5 into a virtual a memory slot--in this case 1 added to the memory location previously stored into register x1. 

If x1 contains zero, and the offset is a 1, you are storing a byte into memory location 1.

Remember that x1 already contains a value, to which an offsite is applied.  So if x1 already contains the value 3,  SB x5 1(x1) will copy bits 0:7 of  register x5 into memory cell 3 + 1, or virtual memory cell 4. 

As far as how much "extra RAM" we can use in Virgule: As far as I could tell, I had 1-2047 memory locations to offset from (x1), but when I went outside that range things didn't work the way I wanted.  

However the memory map indicates 0-3071, more like 3K locations rather than 2K, including x0-x32 used as registers, but I couldn’t make things work with offset values  >  2047.

It would be great if someone could show me exactly how all the memory 000 to CFF is mapped--because I am missing something here and didn't have time to fully figure it out. 

Comments?  

Unfortunately the sim does not have a way to display what is in this extra virtual RAM. Only register values are shown.

WRITING ASCII TO THE (fake) ASCII DISPLAY

Output from Virgule's virtual ASCII display.  Get the hex code, ready to load into Virgule, here.

Here's one way to do this...

lui x2, 0xc0000000  ;put c0000000 hex in reg 2

addi x1, x0, 75. ;put 75 in reg 1

sb x1, 0(x2) ;store value at x1 (75 is ascii letter “K”) to the simulated text display

IMPORTANT! always make your code use offset 0 above.....I assumed to make the cursor appear at the next spot to the right, use offset 1, and for the next, use offset 2, etc. but that crashed the simulator.

The simulator will automatically increment the cursor to the next available slot the next time you do an ascii write. Just keep writing to the ascii display over and over--this is not well documented, but you can figure this out from the examples.

WRITING TO THE FAKE LEDs

GPIO's! Of course this simulator has them. Virgule has an impressive array of virtual LEDs, switches, and push buttons. They can be configured by right clicking on one of the GPIO points and choosing what you want (impressive by itself!). You need to set the direction (read vs. write) using an SB instruction, then use subsequent instructions to do other things.

I wrote some simple code to light the first row of LEDS:

we put the memory location for our GPIO into reg x1

lui x1, 0xd0000000 ; memory location for virtual GPIO

sb x0, 0(x1) ; this makes the LEDs outputs  

addi x3, x3, 126. ; put 126 into x3 (it can be anything from 1 to 255)

sb x3, 16(x1)

Final instruction writes what is in x3 to x1 offset by 16.....16 is the memory location for VAL, the values you want to write to the GPIO "pin".

JUMPS

These instructions sounded intimidating after watching videos and online reading, but after experimenting, they are not too difficult:

JAL: jump the program counter to immediate value; then put the count of the next program counter instruction to run, after the jump has occurred, into a register. 

let's look at the instruction

jal x1, +24   

So! If Jal x1, 24 is put into instruction "memory" position 00, the program counter jumps to PC 18 (24/4 is 6, you are jumping the program counter forward by 6 instruction memory “slots”); then, Virgule puts 1c into register X1 since 1c is next PC slot after 18.

JALR; register and offset contain where to jump to.

Jalr: x3, x2, 0

x3 stores the next program counter “slot” that would have occurred if the jump didn’t happen.  X2 is the value of where to jump to, offset by value 0.

ENCODING

If you want to understand RISC-V at its deepest level, check out how assembly is turned into the 32 bit values hex values  sent to the processor. 

Understanding encoding would be critical if you are writing a compiler, pulling apart RISC-V hex instructions using C, and so on.  A detailed video (long, deep, and quite informative) covering RISC-V encoding is here.

THE WHOLE J, S, R, U thing: RISC-V in its 32 bit form expects all its opcodes, functions, values, etc, to fit into 32 bit words. The opcode is always in bits 6:0.  But from there it varies how the opcode, values, register/memory locations etc, are encoded. In the RISC-V spec, there are a few different “formats” used, with letters like J, S, R, U to designate how this it's done; in other words, each letter designates how a given 32 bit word is encoded

Encoding instructions: I assumed for everything RISC-V the opcode contained the instruction itself. "When you assume" right? 

Um, no. In some cases the opcode alone isn’t enough. Instead, an instruction's opcode (bits 0:6) can be a general marching order, while one or two “functions” further define what needs to be done during decode step. They are called “function 3” and “function 7”. You knew that right?

For instance, for ADDI, opcode is OP-IMM. And function is “ADDI”.  

However, for the LUI instruction, the opcode itself tells the CPU's decoder everything it needs. You knew this as well?

Encoding Register values: to try to better understand this, I dug into how a 32 bit RISC-V instruction knows what registers we want to use for an instruction. 

Turns out RISC-V, in its 32 bit version, uses 5 bits for register values. Hello?  That should allow for 2 raised to the 5 registers or 32.   

So, how does the system encode the value 0xC000000 to register x30?  and not send it to x7 by mistake? 

Here is how: mask everything else in the 32 bit word, that, by the aforementioned letter designation, does not represent a register, then turn the 5 bits left into a decimal number. For instance you have an RD of 0 0 1 0 1 that means we want to use register x5.

The entire encoding process, turning commands into bytes, opcodes + functions into instructions, numbers into signed or unsigned numbers, etc., is a bit complex, but like all things RISC-V--it's manageable if you put a bit of time into digging deeper. 

Overall I figure RISC-V is what happens when friendly, helpful, well meaning college professors and grad students--as opposed to apathetic/screw-you-I'm-tenured/screw-you-again-I'm-a-certifiably-bonkers-T.A./I-give-not-a-poop-about-you-wanting-to-learn-things-I'm-too-busy-bedding-coeds/I'm-busy-boozing-and-making-my-research-paper-deadline-so-don't-bug-me basket case profs I had back in my college days--um, when they create a cool ISA for pimply faced freshmen.  

COVID CODA  

I will spare you the puns of RISCing things by traveling abroad...too late.

Experimenting with RISC-V using this simulator helped me keep my limited sanity during the days I was locked in a hotel room, waiting for the Portuguese government to let me back on the plane. I have to say that for a vacation with a big bummer ending, this RISC part of the trip was fun! 

RISC-V is a blast to code as I see it--it's fun, interesting, and for those wanting to try something new that's even legal: highly recommended.

In the future I might create some RISC-V based projects and get my sponsors PCBWAY (had to get a plug in right?) to help with the fab. Stay tuned.

In the meantime: there are code examples included in Virgule but some of them seemed a bit complex to me, so I wrote some of my own code, attempting to make things as simple as possible. 

My trail of breadcrumbs? Sure. You can get my code examples, in hex, ready to load into Virgule/emulsiV, with a brief readme about what does what.  

Go to my github--here.

As far as COVID-19: turns out I was lucky and my symptoms were mild. In six days I got a doctors note and was back on the plane to the good old USA. Now I am happily blogging, and the weather here is fantastic. 

COVID-19! It's what you get when you breathe the fumes.  I would not feel so all alone...Keep coding, jester. 


No comments:

Post a Comment

A guy OK with C tries to learn C++. Bjane me Up, Stroustruppy!

Why no posts so far for 3-2024?  I have been woodshedding, brushing up on C++ skills.  What got me to finally start digging into C++? I was ...