Minecraft is a procedurally-generated world, i.e. the the world generation is determined by some algorithm. Procedurally-generated worlds sometimes use tools such as Perlin noise to make the landscape smooth and consistent, but random. In Minecraft, this might translate to biome generation being reasonable so that a jungle biome will not be generated next to a tundra biome. Since Minecraft is a voxel world, its regions are generated as \( 16 \times 16 \times 256 \) chunks. These form the building blocks of Minecraft's generation. We try to mimic this generation using tools machine learning and computational topology.

world by height
A region of a world (colored by height)
world by block
A region of a world (colored by block ID)
A village (colored by block ID)
mountain overlook
View from a mountain (colored by height)
Underground cave systems (colored by height)
cave closeup
Closeup of a cave (colored by height)
cave ravine
Closeup of ravine (colored by height)

From machine learning, we will use a generative adversarial network (GAN) to generate the chunks. Essentially, a generator is trained in an adversarial manner against a competing discriminator, commonly as classifier. The goal of the generator is to mimic the sample distribution of the data. It does this by trying to minimize its own objective function while also maximizing the loss of the discriminator. This translates to the generator producing more representative samples that appear to be from the underlying data distribution. On the other hand, the discriminator operates on a similar model (with roles switched) and has the sole purpose of determining whether a given sample is actual or fake. This effectively allows the discriminator to gauge how well the generator is performing. This framework will give us the ability to try to model the underlying data distribution of Minecraft chunks to the point that our generation is indistinguishable from Minecraft's.

One of the defining features of Minecraft worlds is the the vast caves systems that generate underground. To aid the GAN in generating representative chunks, we will use persistent homology. This tool counts the persistence of \( k \)-dimensional holes that might be present in the data. The \( 0 \)-dimensional holes are the number of connected components in the data, the \( 1 \)-dimensional holes are structures like caves, and so on. Using persistent homology should allow the generator to train more representative landscapes and structures.

Although in theory this seems entirely possible, right now the main obstacle holding this project back is memory limits due to the density of Minecraft chunks. I have been trying to think of ways around this.

Code can be found here.