Saturday, January 18, 2014

I'm back - let's write some assembly!

Hey! It's been a while. I'll go ahead and explain where I've been, and then talk about some stuff I've been working on recently. I stopped posting before the start of last summer, where I got to intern at a really cool company called Freefly, who primarily make giant camera-hauling multirotors for professional cinema, as well as more recently a sweet 3-axis handheld camera stabilizer. They're based in my hometown, Seattle, and staying there was pretty awesome because I got to rock climb every weekday evening, and then head to the dropzone and skydive every weekend! Unfortunately this left me without a lot of time for personal projects, but I did manage to pick up my skydiving license towards the start of the summer, and I'm up to 54 jumps now. Here's a pretty cool picture my friend Andrew from the MIT Skydiving Club took of me this fall:


I also have a video from early in the summer, just after I got my license. I'm wearing all black with a black helmet and a navy blue parachute with red highlights:


Anyway, enough about that. Let's move on to the really interesting stuff - x86 assembly! Last semester I took a pretty cool class called Intro to Robotics, in which we were provided with 3D printed, jointed legs, and had to build a driven exoskeleton to allow them to walk on a treadmill. I was primarily in charge of my team's programming and electronics (it was a MechE class so I had the most experience in these), and I was able to write some functional code and get it to walk:


The robot uses a gyro inside the hip to determine the hip angle, and has a control loop which speeds up the walk cycle as the hip tilts forwards, allowing it to match the speed of the treadmill. I also got to witness a bit of control system magic when the robot walked up on the front piece of the treadmill (stepping off of the belt entirely), paused for a second, and then started walking backwards back down onto the treadmill. Since walking too far forward had caused the hip to tilt backward, it actually slowed down the walking cycle until it reversed it entirely. Unfortunately this wasn't captured on video, but it was a pretty magical moment!

Despite this, I wasn't too proud of my code because to me it felt pretty hacked together. I decided I really wanted to make an effort to get better at programming - of course step one was to immediately add a linux (Ubuntu) install to my laptop, and after working on a few Python projects I decided I wanted to try and learn some assembly language. Despite having spent a fair amount of time with various microcontrollers, everything I've written for them has been in C, so I didn't have any experience writing assembly until now. I thought that it would be fun to write something that runs on my laptop, so I installed nasm, a common assembler, and started reading online tutorials. I'll try to keep this post limited to explaining what I've done, rather than making it an assembly tutorial on its own, but I'll link you a few of the tutorials/pages I found useful if you're interested in really learning this stuff on your own:

http://docs.cs.up.ac.za/programming/asm/derick_tut/ - very useful tutorial, explains procedures, the stack, jumping, and just generally how to get started.
http://leto.net/writing/nasm.php - a very similar tutorial, slightly less in-depth
https://www.cs.uaf.edu/2005/fall/cs301/support/x86/nasm.html - a cheat sheet with some common commands
http://home.myfairpoint.net/fbkotler/nasmdocc.html - the entire nasm instruction set reference
http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64 - a table of system calls (more on what that is later) for 64-bit linux

Unfortunately all of those tutorials are for 32-bit linux, and since I wanted to write for my 64-bit machine I had to use different registers than the ones in their examples, as well as having entirely different system calls.

First let's go over what I wrote, and then I'll explain it a bit. I started out writing a program that takes in a command line argument (when you run a program from the command line you can give it one or more 'arguments', which are just strings that get pushed to the stack) and then prints it back out. This really isn't useful to me (more useful versions of this already exist, like echo), but it's a good place to start learning. Here's the behavior:

The battery monitor at the bottom [charge state, charge%, remaining time] is one of the Python projects I mentioned earlier
You'll see me calling the program to run it: ./args_old, and then giving it a single string: "this is a sample string!" as a command line argument. The backslashes are just escapes for the spaces and exclamation mark, so they become part of the string instead of splitting the text into separate arguments. As you can see, running it will just print the string I gave back through standard output.

Now, let's take a look under the hood:

section .text
global _start

_start:
pop rax ; get number of args
pop rbx ; get program name
cmp rax,2
jne _bad; if there isn't just 1 arg, go to _bad
pop rcx ; first argument

call _length_entry

_output:
mov rax,1
mov rdi,1
mov rsi,rcx
mov rdx,[strLen] ; length of str into rdx

syscall ; kernel halp

_newline:
mov rax,1
mov rdi,1
mov rsi,ten
mov rdx,1
syscall

_exit:
mov rax,60 ; exit
mov rdi,0 ; don't error
syscall ; laterz

_bad:
mov rax,1
mov rdi,1
mov rsi,feelbad
mov rdx,feelLen
syscall
jmp _exit

_length_entry:
mov qword [strLen],0
jmp _length_test

_length:
add qword [strLen],1
add rcx,1

_length_test:
cmp byte [rcx],0
jne _length
sub rcx,[strLen]
ret ; if null byte, ret

section .data
strLen dq 0 ; reserve some space for length
ten db 10 ; -_-
feelbad db 'Your args are bad and you should feel bad.',10
feelLen equ $-feelbad

Woo, assembly! 60+ lines just to print something back at you. In order to make a system call to write, you have to give it both a string and the length of the string, so it knows where to stop. Because of this, I had to write a routine that counts the length of the argument, conveniently named _length. Basically it uses the strLen as a counter, counting characters in the string until it hits a null byte (0), meaning there are no more characters and it's hit the end of the string. The register rcx holds a pointer to the argument string from the start. It gets shifted forward in each loop of _length to always point to the next character until finding the null byte, then it gets subtracted by strLen so that it points back to the beginning of the string. Also, if you don't have exactly one argument, it'll jump to the label _bad which prints an error message. Finally, the _output routine sets registers to make a system write call, puts the string pointer into the rsi register which is where the system looks for the string to write, and similarly puts strLen into rdx. Then syscall actually tells the kernel to run the instruction.

Once I got that working, I decided to make it a bit more interesting. How about making it take in two arguments, both a number %n and a string, and then printing the string %n times?



Because arguments are always pushed to your program as strings, this means that something like "3523" is still a string, so I had to write a routine to convert a string into an integer. Here's the full code for the new version (also a pastebin):

section .text
global _start

_start:
pop rax ; get number of args
pop rbx ; get program name
cmp rax,3 ; make sure there are 2 arguments (+1 for program name)
jne _bad ; if not, jump to error message
pop rdx ; first argument
pop rcx ; second argument

call _length_entry
mov [stra],rcx
call _atoi_entry
call _text_entry
call _exit

_output:
mov rax,1
mov rdi,1
mov rsi,[stra] ; move pointer of the string start into rsi
mov rdx,[strLen] ; length of str into rdx
syscall ; kernel halp

_newline:
mov rax,1
mov rdi,1
mov rsi,ten
mov rdx,1
syscall
ret

_exit:
mov rax,60 ; exit
xor rdi,rdi ; don't error
syscall ; peace out

_bad:
mov qword [stra],feelbad ; write feelbad to output string
mov qword [strLen],feelLen ; write feelLen to output string length
call _output
jmp _exit

_length_entry:
mov qword [strLen],0 ; zero string length
jmp _length_test

_length:
inc qword [strLen] ; increment string length
inc rcx ; string pointer to next character

_length_test:
cmp byte [rcx],0 ; check if null character
jne _length
sub rcx,[strLen] ; set rcx back to beginning of string
ret ; if null byte, ret

_atoi_entry:
xor r9,r9 ; zero accumulator
mov [asciinum],rdx ; write string pointer to [asciinum]

_atoi:
xor rax,rax ; set rax to 0
mov r10,[asciinum] ; pointer to asciinum in r10
mov al,[r10] ; first byte of string into first byte of al
cmp al,0 ; make sure we haven't hit end of string
je _atoi_exit

lea r11,[r9*8]
lea r9,[r11+r9*2] ; multiply r9 by 10

sub al,48 ; ascii -> digits
cmp al,9 ; make sure they're numbers 0-9
ja _bad

add r9,rax ; incriment accumulator
inc qword [asciinum] ; shift pointer to string up by 1 (next character)

jmp _atoi

_atoi_exit:
mov [asciiInt],r9 ; write accumulator to asciiInt
ret

_text_entry:
xor r12,r12 ; zero a counter
jmp _text_mult_cmp

_text_mult: ; text_mult is effectively a while loop that prints our
; string r9 times, since r9 is our accumulator from before
call _output
inc r12

_text_mult_cmp:
cmp r12,r9
je _exit
jmp _text_mult


section .data
stra dq 0 ; reserve space for string pointer
strLen dq 0 ; reserve some space for length
asciinum dq 0
asciiInt dq 0
ten db 10 ; sigh. set the value of ten to be 10
feelbad db 'Your args are bad and you should feel bad. Proper syntax: args <int> <str>'
feelLen equ $-feelbad ; set length of feelbad to feelLen


You can see the label _atoi which converts a string to an integer. It's similar to _length, except each time it moves to a new character, it multiplies the whole thing by 10. This makes sense if you think about how a base-10 system works, if you type in a single digit - say, 4 - then it reads the first character, adds it to the accumulator register r9, finds that the next character is a null byte, and writes 4 to asciiInt. If you give it two digits - how about 24 - then it reads the first character, adds it to the accumulator (now the accumulator = 2), finds the next character isn't a null byte, multiplies the accumulator by 10 so the accumulator = 20, reads in the next character (4), and adds it to the accumulator to get a value of 24. The algorithm works the same way as you continue increasing the length of the number you give it. My program also checks each character with cmp al,9 then ja _bad which makes sure it's 0-9 instead of some other ascii character, and if not then it goes to the _bad routine to print an error message.

Here's a quick demo of the error message:


It also still prints the error message if the number of arguments given isn't exactly 2.

That's pretty much it!


Once you write assembly code like my example above, you need to assemble it into machine code, which I do with nasm -f elf64 args.asm. This produces an object file (args.o) which is just raw machine language. There's also a program called objdump which can print out the object file and the associated instructions if you do something like objdump -d args.o, the output of which can be found here on pastebin (check it out, it's cool!). After making the object file, I then use the linker ld to convert it into an executable, for example ld -s -o args args.o which produces an executable file called args. To debug, there's a great program called gdb which allows you to step through your code one instruction at a time, and check all the register values at each step.

Unfortunately the speed of my program is pretty much limited by the overhead of the system write calls, but I still got it to write a 10 character line 10,000,000 times (into /dev/null which is a place to dump unwanted text) in about 6 seconds.

If you're running 64-bit Linux and want to try this out for yourself, you can download the executable from https://www.dropbox.com/s/7t6shbarnr1zvcw/args.

I may not have made any explicitly useful programs, but I definitely learned a lot about how my computer works by learning some assembly. In the future if I need to write a function to do some really fast math, I could definitely see myself implementing it in assembly then calling it every time I need it. I definitely recommend doing this kind of thing on your own, it's fun!

In addition to programming, I've also been doing a fair amount of electronics work recently. Next post will be a write-up of a new board I've put together!