Data Structures I: Stack and Heap
When learning Rust, the recommended resource is the Rust book which gets you up to speed on the language. However, depending on your prior programming experience (coming from garbage collected languages) or your CS background (you know your data structures), there’s a lot you’ll be hearing for the first time when reading the book.
In this series, I’ll go over various data structures and not just regurgitate implementations of each one but take a deeper-than-theory dive into their capabilities and relevance in the programming industry. This article hopes to better your understanding of 2 data structures: The Stack, and The Heap and their final full-fledged implementations can be found here .
Each section will follow a similar format with 3 subsections:
- Theory: An introduction to the data structure (for readers with no data structures background).
- Implementation: Building the stuff.
- Deep Dive: A deep dive into an important use case.
I also won’t be discussing time and space complexities so as to keep things straightforward.
Although I intend to make this post as accessible as I can, experience with Rust is assumed. I recommend you go through the Rust book first and coming back when comfortable if you haven’t already.
The Stack
This section goes through the implementation of a Stack and is more beginner-focused, and work with concepts from Chapters 1 - 10 of the book.
Theory
Stacks are simply collections of elements, in which the last element in is the first element out — Last In First Out (LIFO). The stack is primarily defined by two operations and both of which interact with the top of the stack:
- Push: Adds an element to the top of the stack.
- Pop: Removes an element from the top of the stack.
Keep in mind that I did not say “contiguous collections” as this depends on the backing store of the Stack.
Implementation
To begin, define the Stack struct:
const SIZE: usize = 256;
struct Stack<T> {
top: usize,
data: [T; SIZE],
}The Stack contains two fields: a usize used to identify the top of the stack — for push and pop operations — and an array able to hold 256 elements of type T. As the backing store of the stack is an array, the underlying data would be stored contiguously in memory making it more cache friendly thanks to locality. There’s also the guarantee of zero memory allocations during the lifetime of the stack as sizes of arrays are fixed and known at compile times. Since the stack will own all its data, there’s also no need to worry about the borrowing.
This is a simplified illustration of how the Stack would look like in memory:
Time to implement the essential — push and pop — and non-essential methods for the Stack struct:
#[derive(Debug, PartialEq)]
enum StackError {
StackOverflow,
StackUnderflow,
}
impl<T: Default + Copy> Stack<T> {
fn new() -> Self {
Self {
top: 0,
data: [T::default(); SIZE],
}
}
fn push(&mut self, value: T) -> Result<(), StackError> {
if self.top == SIZE {
return Err(StackError::StackOverflow);
}
self.data[self.top] = value;
self.top += 1;
Ok(())
}
fn pop(&mut self) -> Result<T, StackError> {
if self.top == 0 {
return Err(StackError::StackUnderflow);
}
self.top -= 1;
Ok(self.data[self.top])
}
fn peek(&mut self) -> Option<T> {
if self.top == 0 {
return None;
}
Some(self.data[self.top - 1])
}
fn clear(&mut self) {
self.top = 0;
}
fn length(&self) -> usize {
self.top
}
fn capacity(&self) -> usize {
SIZE
}
}There you have it! A stack in all its glory.
Deep Dive (Call Stack)
If this is your first time implementing a stack, you’re probably thinking to yourself: “Is that it?”… yeah, that’s it. Despite how simple stacks are, they’re incredibly powerful and have a lot of applications like backtracking and expression evaluation but I’d argue that the most popular application of the stack is the Call Stack which is what the Rust book and other programming languages keep referring to.
The Call Stack simply keeps track of the execution of various procedures and subroutines. When you make a call to a function, the call stacks helps the program know what point to hand control back to after the called function returns. This is the main purpose of the stack. There are other usecases and you can check them out here . To see this in action, consider this C program:
#include <stdio.h>
int add(int a, int b) { return a + b; }
int square(int x) { return x * x; }
int main() {
int n = add(1, 2);
int k = square(n);
printf("Final Result: %d\n", k);
return 0;
}Compile the program using gcc and load the binary into gdb so you can debug and inspect the execution:
gcc -g main.c -o main
gdb mainRunning these commands will compile and load up the interactive debugger. The -g flag tells the compiler to include a symbol table so that the debugger can map memory addresses back to the actual source code. The important stuff happen at lines 7 and 8 so set breakpoints for those lines:
(gdb) break 7
Breakpoint 1 at 0x1164: file main.c, line 7.
(gdb) break 8
Breakpoint 2 at 0x1176: file main.c, line 8.If you disassemble the program, you can inspect the set breakpoints in the machine instructions by running: disassemble main:
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000000115c <+0>: push %rbp
0x000000000000115d <+1>: mov %rsp,%rbp
0x0000000000001160 <+4>: sub $0x10,%rsp
0x0000000000001164 <+8>: mov $0x2,%esi
0x0000000000001169 <+13>: mov $0x1,%edi
0x000000000000116e <+18>: call 0x1139 <add>
0x0000000000001173 <+23>: mov %eax,-0x4(%rbp)
0x0000000000001176 <+26>: mov -0x4(%rbp),%eax
0x0000000000001179 <+29>: mov %eax,%edi
0x000000000000117b <+31>: call 0x114d <square>
0x0000000000001180 <+36>: mov %eax,-0x8(%rbp)
0x0000000000001183 <+39>: mov -0x8(%rbp),%eax
0x0000000000001186 <+42>: mov %eax,%esi
0x0000000000001188 <+44>: lea 0xe75(%rip),%rax # 0x2004
0x000000000000118f <+51>: mov %rax,%rdi
0x0000000000001192 <+54>: mov $0x0,%eax
0x0000000000001197 <+59>: call 0x1030 <printf@plt>
0x000000000000119c <+64>: mov $0x0,%eax
0x00000000000011a1 <+69>: leave
0x00000000000011a2 <+70>: ret
End of assembler dump.Depending on your system architecture and optimization levels, you might get different instructions. I’m building on an x86_64 system architecture.
GDB places the breakpoints at the first instruction of each source line, effectively pausing execution before the arguments are loaded into registers for the function call. I don’t want to get too deep into the nitty gritty but if you want to learn more about Assembly, The diagram below provides a very simplified depiction of what’s happening in memory when we run the loaded program:
(gdb) run
Starting program: /home/simon/Projects/Dummy/main
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at main.c:7
7 int n = add(1, 2);