Thinking Algorithmic in Assembly

Thinking Algorithmic in Assembly

Note: This article was first published in Hackernoon by myself. This is simply a copy of the article.

Data structures and algorithms are essential to computer science. Yes, it is true that most software engineers don’t implement them on an every day basis. However, I do strongly believe that a good foundation of data structures and algorithms can help make you a better, overall engineer.

Today I’m going to be demoing an implementation of bubble sort in assembly. To most people, the thought of this probably sounds insane. You’re probably thinking, “Why on earth would you implement an algorithm in assembly?” Well to start off, I would probably never write a production application in assembly. Compilers do a WAY better job of writing assembly than I do. However, I do think it is important to understand assembly. Most of us run commands to compile our code, and I think it’s really easy to take for granted the work the compiler is doing for us. We all use loops, conditionals, and functions on a daily basis. Have you ever taken the time to understand how these are implemented at the assembly level? Chances are, if you majored in computer science, you might have explored assembly. However, if not, keep reading!

The second question you might be asking is “Why bubble sort?” For those not familiar with algorithms, bubble sort is generally considered one of the slowest sorting algorithms out there. The only main reason I chose bubble sort is because it’s fairly easy to implement and writing these algorithms out in assembly can be pretty tricky.

The implementation below isn’t perfect. Like I said, compilers do a way better job of writing assembly than I do. The goal here was not to write a perfectly optimized algorithm in assembly. It’s simply just a demonstration and a fun weekend project if you’re a computer science geek like myself. I chose MIPS as I have the most experience writing assembly for that architecture. MIPS isn’t widely used versus things like x86 and ARM, however it’s still widely taught today.

Okay, enough talking for now. Here’s the code:

# an interative approach to bubble sort
# c/c++ implementation: https://codescracker.com/cpp/program/cpp-program-bubble-sort.htm
.data
	arr: .word 7,4,6,9,1,3
	N: .word 6
.text
	la $s0, arr # get address of array
	la $s1, N # get address of N
	lw $s1, 0($s1) # get value of N
	.globl main
	main:
		add $a0, $s0, $0 # get address of arr in param1
		add $a1, $s1, $0 # put value of N in param2
		jal bubbleSort # sort arr
		jal printValues # print the values
		j exit # exit
	bubbleSort:
		add $sp, $sp, -8 # make room for 3 ints on stack (i, j, tmp)
		sw $s0, 0($sp) # i
		sw $s1, 4($sp) # j
		sw $s2, 8($sp) # tmp
		add $s0, $0, $0 # use i in for loop and initiate to 0
	outerLoop:
		addi $t0, $a1, -1 # n -1
		slt $t0, $s0, $t0 # i < (n - 1)
		beq $t0, $0, return # if false exit
		add $s1, $0, $0 # set j = 0
	innerLoop:
		sub $t0, $a1, $s0 # n - i
		addi $t0, $t0, -1 # (n - i - 1)
		slt $t0, $s1, $t0 # j < (n - i - 1)
		beq $t0, $0, exitInner # if false go to next iteration of outer loop
		sll $t0, $s1, 2 # get ready to do arr[j]. Shift 4 bits cause we have ints
		addi $t1, $t0, 4 # get ready to do arr[j + 1]. Shift 4 bits cause we have ints
		add $t0, $t0, $a0 # add address of j and arr
		add $t1, $t1, $a0 # add address of j + 1 and arr
		lw $t2, 0($t0) # *arr[j]
		lw $t3, 0($t1) # *arr[j + 1]
		slt $t4, $t3, $t2 # *arr[j + 1] < *arr[j]
		bne $t4, $0, swap
		addi $s1, $s1, 1 # j++
		j innerLoop
	exitInner:
		addi $s0, $s0, 1 # i++ in outerLoop
		j outerLoop
	swap:
		add $s2, $t2 $0
		sw $t3, 0($t0) # arr[j] = arr[j+1]
		sw $s2, 0($t1) # arr[j + 1] = tmp
		addi $s1, $s1, 1 # j++
		j innerLoop
	return:
		lw $s0, 0($sp)
		lw $s1, 4($sp)
		lw $s2, 8($sp)
		addi $sp, $sp, 8
		jr $ra
	printValues:
		addi $sp, $sp, -8 # make room for i on stack
		sw $s0, 0($sp) # store i on stack
		sw $a0, 4($sp) # store base addr of arr
		add $s0, $0, $0 # initialize i to 0
	printLoop:
		lw $t1, 4($sp) # get base addr of arr off stac
		slt $t0, $s0, $a1 # i < N
		beq $t0, $0, returnPrintValues
		sll $t0, $s0, 2 # shift 4 bytes to get next elem
		add $t0, $t0, $t1 # get addr of arr[i]
		lw $t0, 0($t0)
		li  $v0, 1           # service 1 is print integer
		add $a0, $t0, $zero  # get ready to print val
		syscall
		addi $s0, $s0, 1 # i++
		j printLoop
	returnPrintValues:
		lw $s0, 0($sp)
		lw $a0, 4($sp)
		addi $sp, $sp, 8 # restore stack pointer
		jr $ra
	exit:
		li $v0, 10 # exit
		syscall