Can Computers Make Art?
Of course not, you might say. They are machines! Creativity is what makes us human. Art is the epitomy of creativity, and thus the epitomy of humanity. If machines can make art, then what are we?
Let me introduce you to the Portrait of Edmond Belamy sold for $432,500 at Christie’s art auction in October 2018. On the surface, this portrait looks like just another painting. It vaguely resembles a portrait of a man but has some quirky characteristics that one might attribute to the author’s peculiar vision - the unfinished edges, the blurry face, the studded texture. Is it worth $432,500? I’ll leave that up to you to decide.
Alone in the corner is a series of characters where you might normally expect an artist’s signature. To the uninitiated, these characters resemble something that looks like a mathematical function. What a strange thing to put on a piece of art, you might say! To those familiar with deep learning neural networks, you have already figured it out. These characters spell out a network’s loss function meaning that this work of art was produced by a machine.
I invite you to read the article on the Christie’s website for the full story behind this painting. To summarize, the Portrait of Edmond Belamy was produced by a French art collective known as Obvious that focuses on the boundaries between art and artificial intelligence. For this portrait, they used a type of network known as a Generative Adversarial Network (or GAN; more on these below) trained on 15,000 portraits painted between the 14th and 20th centuries.
With this knowledge in hand, I ask you again - can computers make art? Obvious and Christie’s seem to think so. Or at least, they believe that algorithms can aid in the creation of art. After all, did the art come in the form of the painting or of the process the artists used to train and produce the painting? How about a musician using a backing track produced by AI? Is that laziness on the behalf of the musician or digital, new-age creativity?
These are the questions that inspired me to pursue the topic of Novel 3D Object Generation for the capstone project of my Masters of Information and Data Science (MIDS) program at UC Berkeley. Obvious proved the viability of AI-assisted art in the 2D space with the Portrait of Edward Belamy, but what about 3D art? Can data science and machine learning be utilized to produce novel 3D shapes and designs worthy of being called “art”?
In this post, I will show you, the reader, how my team and I tackled this question. We will start by first refining the question and assessing other business applications of such a system, then I’ll introduce you to the related literature that served as inspiration for our work, and finally we’ll walk through our approach including our successes, failures, and our final results.
Before I start talking about the project, I need to introduce and give credit to my teammates, Dave and Cynthia. Dave Owen provided our team with much needed business acumen and made sure to keep us on track as the project progressed. He also deserves credit for coming up with and formulating the idea behind the project. Cynthia Hu and I represented the technical side of the team. Her real-world data science experience and willingness to really dig into the details of our research paper references proved invaluable throughout the project. My primary contributions were in data pipeline development, model development, and model training.
Dave and Cynthia were a pleasure to work with. Even in the moments of doubt where we questioned our chances of success, they hung around and listened to my crazy ideas for what we could try next. Thank you David and Cynthia!
What is a Capstone Project?
The MIDS Capstone project is part of the course W210 Synthetic Capstone (I’m not sure why it is called “synthetic” - it felt very real to me) and is taken at the end of a student’s MIDS experience. The course’s and thus project’s goal is to give students the chance to utilize the skills they’ve learned throughout the program to produce a real-world application or solution.
Refining The Question
All good ideas start with a question. You’ve just been introduced to ours: Can neural networks produce 3D objects worthy of being called art? But be warned, not all questions are good questions.
In our case, our question was good enough to get us started but not to drive forward progress and give us real, actionable direction. We know
The first project pitfall identified by both us and our instructors was the vagueness and ambiguity of art. How do you measure it? How do you determine bad from good?
Call me crazy, but I enjoyed reading research papers for this project. The folks in this field are both insanely creative and incrediby clever. My impression of the papers I read were that this field is still wide open - no one’s truly figured out how to communicate the complexities of 3D shapes to a neural network. Most papers dealt only with voxelized objects (and so did we). A few were brave enough to venture into the triangular mesh category. In this section, I cover the most influential to our project as well as one extra paper that I think is worth reading.
Working with 3D Data
I’d like to start this section with a question for the reader: how would you represent data in the 3D space? If you already know the answer, then skip ahead. If you don’t, then start by thinking about how data is represented in the 2D space: an NxM grid of pixels. These pixels can be, for example in a black and white photo, a single number representing their grayscale value (the shade between black and white). Each pixel location could also contain a vector containg, for example, that pixel’s four RGBA values and so on and so forth.
If you expand the concept of the NxM grid from 2D to 3D, then you arrive at the logical data representation of the voxel grid. A voxel is a 3D pixel and it looks exactly as you would expect (think Minecraft).
These voxel grids, like 2D grids, can contain a variety of data in the third dimension. For the purposes of this project and for much of the existing research in this area, that third dimension is generally populated by either a 0 or a 1 representing if a voxel exists in that space or not. This data structure is conveniently easy to use to feed to and train a neural network as many of the existing 2D algorithms can be extended and applied to the third dimension.
The drawback of voxel grids is the storage and thus memory space required. If a 28x28 image of an MNIST digit (from the MNIST dataset) contains 28x28 ints (and we assume in this case that an int requires 4 bytes), then that image requires 28 * 28 * 4 = 3,136 bytes. With the third dimension, that single image becomes 28 * 28 * 28 * 4 = 87,808 bytes. With 70,000 digits, that increase can be pretty dramatic as illustrated in the table below.
|Single digit||3,136 B||87,808 B|
|Full dataset||219.52 MB||6146.56 MB|
That’s a storage/memory footprint increase of 28x (logically)! But how many pictures in the real world do you encounter that are only 28x28? Probably zero unless you do machine learning and play with the MNIST dataset. As an example, my Google Pixel 2 smartphone takes pictures that come in at a resolution of 1920x1080. Imagine adding a third dimension to that!
Lastly, I would be amiss if I did not mention that this project and many others rely heavily on the amazing work put into the free binvox application. Binvox comes as an executable packed with an astonishing number of features to support converting and manipulating voxel representations.
If you’ve ever watched a Pixar film or played a video game that isn’t Minecraft, then you’ve seen triangular meshes at work. A triangular mesh is simply a series of triangles that link together to form some type of object or plane.
With a high enough “poly count” (poly == triangle), graphics engines can produce realistic looking images by using triangular meshes to smooth over edges and create round surfaces (as opposed to the cubic nature of voxels).
Meshes also improve upon the voxel’s storage limitation. Meshes can be represented as a simple array of coordinates where (ideally) each triangular shares at least two vertices (or one edge) with another triangle. Here’s a demonstration with a fabulous rendering via Microsoft Paint:
The exact storage requirement of a given triangle mesh depends on (1) the number of triangles and (2) the file format. There are many file formats available - too many to list here - that offer various features like color or arcs. We focused on the formats used commonly in 3D printing and chose to start with the simplest available: the STL file.
Model 1: STL VAE
Model 2: STL GAN
Model 3: STL VAE-GAN
Model 4: STL RNN
Model 5: Voxel VAE
Model 6: Voxel VAE-GAN
Model 7: Voxel VAE 2.0
Bump to 64x64x64 Machine Upgrade