I was looking for an interesting hobby project to hone my AVR assembly skills when I found the Arduboy, a sturdy “Arduino” I could keep in my pocket.
There were a lot of fun games already and even some in 3D but no filled-polygon games? This had to be rectified.
First, what do we have to work with?
.arduboy Distribution. We have also put together a.arduboy distribution for use with game loader tools such as Arduboy Manager (Windows only). These allow you to load games onto the Arduboy without having to compile them yourself. The.arduboy archive for Evade is.
The Arduboy is a portable game console compatible with the Arduino IDE and uses the same microcontroller as the Arduino Leonardo, an ATmega32u4 but unlike the Leonardo board it comes with hardware pre-attached all in a nice sturdy casing.
The ATmega32u4 contains an 8 bits CPU, a tiny bit of RAM, some flash ROM for the program and data, a tiny EEPROM for saving data when the power goes out, and features a bunch of GPIO pins to connect to external hardware.
CPU | 8bits ATmega32u4 |
Speed | 16Mhz |
SRAM | 2.5KB 1KB taken by the display buffer 1.5KB left to work with |
Flash ROM | 28KB For the application 4KB For the boot loader 32KB Total |
EEPROM | 1KB |
Screen | 128×64 1bpp monochrome OLED |
Buttons | D-pad and 2 buttons |
Sound | 1 bit x 2 channels (or 3 levels.) |
Speaker | Piezoelectric speaker |
Software Controllable LEDs | 1 RGB, 1 Green, 1 Yellow |
External IO Ports | 1 micro USB port. Device mode only. |
The display is 128×64 1bpp which means that a full frame takes away 1KB of our memory. Unless we’re generating the display on the fly (which isn’t practical for a polygon-based 3D game, I tried. More on that later) we’re left with 1.5KB of RAM for all our working data including the CPU stack.
It is connect to the MCU via one-way SPI and we don’t get the v-sync signal from the OLED controller as that particular display manufacturer did not expose the signal on the ribbon cable. <jackie_chan_wtf.jpg>
There is no audio generation circuitry. The sound has to be generated in software through an IRQ for each PWM pulse. Eating away at our precious CPU time.
The early tests
In late May 2018 when I got my unit I started with a filled-polygon rasterizer to see if it’d be possible to get anything at a playable speed.
Then I started experimenting with different game concepts. To see which styles would fit the OLED display better, the engine’s performance, ROM space, limited controls, etc.
I settled for a concept based on Nintendo’s Star Fox on the SNES.
SNES | Arduboy |
16 bits | 8 bits |
3 CPUs (65816, SuperFX, SPC700) | 1 CPU |
ROM: 1024KB | ROM: 28KB |
Main RAM: 128KB Video RAM: 64KB Audio RAM: 64KB SuperFX RAM: 256KB Total RAM: 512KB | 2.5KB RAM |
10Mhz (SuperFX) | 16Mhz (Ha! Take that!) |
Saving time and preparing for the inevitable debugging ahead
Debugging on MCU hardware is never as convenient as on your favorite workstation in your favorite IDE with all the nice tools like boundary checkers and log files.
So I moved the project out of the Arduino IDE and into my own Makefile-based build system with a native PC port so I could do as much testing on PC without having to program the flash every time.
It also lets me do some convenient time-saving things like showing the bounding boxes and bezier paths in a level preview window and live-editing the levels and sounds.
If you’re planning on spending months of your spare time working on a game invest time in making your life easier first. Doesn’t take long to pay off.
Throw out what you don’t need
Moving out of the Arduino IDE let me throw out the USB library and a few others pieces. But that meant I didn’t have access to the USB serial port for debugging…
Remember that part about having the engine & game running on PC too?
I’ll live.
The OLED Display
By default the display controller is in “horizontal mode” which is fine for 2D blitting but isn’t convenient for rasterizing polygon.
Luckily the controller has a “Vertical Addressing Mode” which turns the memory mapping into a familiar and rasterizing-friendly 64×128 screen (sideways) of 8×128 bytes starting at the top left, going right, with the lsb on the leftmost pixel (0x01 = leftmost, 0x80 = rightmost).
How to play n64 games on mac for free. Wait a few seconds until the download process is finished.Install the new “setup Project64.exe” file onto your computer by double-clicking on it. Since the given article is devoted to installing the emulator on the Windows computer, we say that you’ll need to select the “Windows” version of the emulator, you’ll need to click the conspicuous yellow “Get Project64 for Windows” button and downloading will start. When you’re on the site’s homepage, you’ll be asked to choose between the Project64 Windows version and its Android version. Follow next simple steps to run N64 games using Project64 on your PC.
The initialisation commands used:
The great thing about making a 3D engine is you don’t really care which way the screen is. You can just fix this in the projection matrix.
Rendering
I’m not going to get into the details of rasterizing polygons as there’s already plenty of books and tutorials about this both online and in dead tree format. Pick your favorite.
As long as texturing isn’t involved the CPU can handle it just fine. With 32 registers it’s very apt at this job.
However I used a little trick for the sky box. It’s actually a dynamically generated 2×96 texture drawn over the screen in a skewed fashion according to the camera’s angle by projecting the horizon onto the edges of the screen to figure out the height and slope.
The lines representing the ground are drawn onto the background pattern before clearing the frame buffer with it. Their position is calculated according to the projection matrix.
Let’s butcher some 3D models
ROM space is at a premium here so obviously data needs to be compressed somehow. We also don’t have much RAM to work with so everything will have to be decompressed (or generated!) on the fly.
And a balance has to be struck between the saved data and the decompressor code. Each instruction on the CPU is 2 bytes and this can add up quickly.
2019 apple mac pro. I didn’t want to model 3D objects by entering vertex values by hand so I made an OBJ to C++ converter-compressor tool and modelled the objects in Modo.
The first obvious place to save bytes is to mirror your models whenever possible so the compressor looks for matching mirrored polygons on all 3 dimensions (X,Y,Z) and merge them into mirrored batches.
This means all I need is 3 bits to tell the decompressor if it has to repeat the poly batch mirrored on X, Y, and/or Z.
This all fitted nicely into a 16bit batch header VVZYXIIIIICCCCCC where
- V is the number of vertices per poly-1 (dot, line, tri, quad),
- ZYX are the mirror bits,
- I is the index offset of vertex 0 in the batch,
- C is the number of primitives to render in the batch.
The engine uses 32bits signed fixed point math in a 16.16 format (16 bits of fraction). This meant 12 bytes per vertex but this is much too wasteful.
That’s when floating points become your friend.
When using GOG galaxy right-click Transport Fever 2 in the library or select “More” next to the “Play” button, then select “Manage Installation → Show folder”. Transport fever save game location gog mac. Creating a GOG system report If you have installed the game using our.sh installer, open your default terminal emulator, and run the following commands (don't forget to adapt the first command depending on the game and its location.Use TAB to auto-complete the location and prevent typos and mistakes, and RETURN key - 'Enter' - to confirm): cd /GOG Games/.your game./start.sh -sysrep. C:UsersYOURnameAppDataRoamingTransport Fever 2 Should be the same as GOG version. It will automatically detect an existing installation and reinstall all game files, while keeping your save games and settings intact. When using GOG galaxy right-click Transport Fever in the library or select “More” next to the “Play” button. The Mac version will be available on the Mac App Store, Steam and GOG.com. A beta test will be conducted prior to the release. More information will be announced in October. We are very happy to finally welcome the Mac community and send best regards from the whole Urban Games team.
But wait! Floats are 32bits! That’s still 12 bytes!
Yes, standard floats are but I’m not using that.
The trick I used was to encode the vertices as 4bits of significand and a shared 4 bits of exponent, for a total of 2 bytes per vertex.
Arduboy Manager
Modern GPUs use a similar trick to encode HDR colors such as RGB9_E5_EXT in OpenGL in only 32bits.
This is where the 3D models get butchered by the compressor tool but with careful adjustment of the vertices I could get the models to look fine even after rounding. Beggars can’t be choosers.
Each poly has a 4bit “color” index into a 16 entries palette that maps to 9 different dithering patterns. The indirection allows me to create the pulsing brightness animation and blinking.
Animations
Usually animations in 3D engines are done using bone matrix transforms with complex bezier curves but this was going to be much too large for this project and the poor little CPU could not have dealt with multiple bone matrices in any reasonable time frame.
So animations were done using simple vertex interpolation instead. When vertices are only 2 bytes (and mirrored!) it becomes more reasonable for the few meshes that would get animated.
Bounding boxes and collisions
Ugh! more bytes, right?
Nope. Not even going to store any bounding boxes.
Arduboy Game Loader Machine
All the objects have a scaling value to reuse the meshes and the bounding box was implicitly 1x1x1 centered around the object’s origin. (-0.5 to 0.5) then scaled by the instance’s scaling.
Complex collisions were created using invisible objects.
You’d think the extra invisible objects would add more bytes than storing bounding boxes, however the extra code complexity necessary to handle multiple bounding boxes of varying sizes and positions per objects out-weights the savings.
The level
The level was encoded using spawning sections that are indexed to be reused.
The game contains a single intro level where elements are gradually introduced to let new players learn the controls.
After which a random level generator picks from 3 pools of level segments:
- Dangers
- Breaks
- Energy Refills
When the game starts only the first few segments of each category can be selected by the RNG.
As the score goes up the game:
- Accelerates from 1x at 0 points to 2x at 128K points.
- The more dangerous segments from each pool get enabled.
- More dangers get selected before breaks are picked.
- More breaks get selected before an energy refill gets picked.
The random number generator is seeded with a constant so the same level gets generated every time to keep the high scores fair.
Text rendering
The font is encoded as 5×3 pixels per glyph into a 16bit value.
The glyph is first unpacked into an 8×5 buffer (remember the screen is on its side).
A style buffer tells the text rendering routine how to render the multiple text passes:
- Expand Border
- Fill 8×5 Cell
- NAND Mask Draw (black)
- XOR Draw (invert)
Drawing white text is a mask to set to black followed by an invert.
It seems wasteful but the black outline step takes care of clearing to black so the 2nd pass only needs to invert.
This allows various styles using the same text routine and saves on code space.
Reducing the lag
To help controlling the game when the frame rate dips the inputs are read 60 times per second (same rate as the game play logic) and stored into a circular buffer so inputs are processed at their input time.
Internally the game simulation runs at 60 ticks per second regardless of frame rate.
The killer math
Doing 32bit fixed point 3D math really kills that poor 8bit CPU.
Around 50% of the time is spent doing the geometry transforms, another 30% for the clipping.
I had to write the 16.16 x 16.16 -> 16.16 fixed point multiplication code in assembly to get a decent frame rate.
The main advantage other than not doing the ridiculously overkill 32 x 32 -> 64 fixed point math in C++ was that I was able to skip some of the long-multiplication steps as 3D vertex*matrix transform has a lot of zero bytes the way the models are rounded off so the assembly code skips an entire (byte * 4 bytes) multiply-accumulate chunk of code for each zero bytes.
Step 1 and Step 4 (not pictured) can be entirely skipped in this example.
The fastest routine is the one that does nothing. This was the biggest gain in speed.
The tally
Data | 3220 Bytes |
Code | 25232 Bytes |
Frame buffer | 1024 Bytes |
Work memory | 1314 Bytes |
Stack left | 222 Bytes |
Arduboy Programming
Download the .arduboy file (98KB Turbo version) (Classic version: .arduboy file (100KB) ) to flash the game on your Arduboy using the Arduboy Game Loader from Team A.R.G. (external link) (Disclaimer: I am not associated with Team A.R.G.)
Reset Arduboy
Or the full .zip file with the printable PDF instruction booklet (600KB) and contains the .hex file to upload using avrdude or your method of choice. Macro recorder for games speed up mac.
Best Arduboy Games
Or if you need only the instruction booklet (516KB).