Week 8: Generating Images with DCGAN-Tensorflow

We made it to generative models! I’ve been learning how to set up and train models on images to generate new pics. Specifically, I’ve been using a Tensorflow implementation of Deep Convolutional Generative Adversarial Network. I’m still understanding how it works, but for now it’s like playing with a new camera that is learning how to make new images based off of the ones I feed it. The process of “developing” photos takes a long time—many hours or days even, and it reminds me of timed waiting in a dark room, not mentioned recording the results along with my input parameters.

This machine learning model needs to train on a lot of images. But where to find thousands upon thousands of pics? Movies! As collections of fast-moving images, they are an excellent source. And since machine learning is a branch of artificial intelligence, I choose Bladerunner 1982.

Part I - Virtual Machine Setup
I returned to Paperspace and setup a GPU, a Quadro M4000 (0.51/hr, $3/mo for public IP, and $5/mo for 50 GB) with Ubunto 16.04 and their ML-in-a-Box template, which has CUDA and the TensorFlow deep learning libraries already set.

With the public IP, I can log into the machine via Terminal, set up my project file structure, git clone repos, train models, and generate new images there, too. It helps to have some practice with the command line and Python beforehand.

Part II - Data Collection
The next step was to acquire a digital version of the movie and extract frames using ffmpeg.

To extract one every second I used:
$ ffmpeg -i BladeRunner.mp4 -vf fps=1 %04d.png -hide_banner

I choose PNG format over JPG because it supports lossless compression, and I wanted to preserve as much detail as possible.

This gave me 7057 images from which I removed out opening studio logos and credits at the end (btw, you can also set the initial extraction range with ffmpeg), which gave me a total of 6,555 images (8.63 GB).

Best to complete this step on the VM as it takes way too long to upload images using Cyberduck or via Jupyter Lab notebook or another way into your machine.

Part III - Data Preparation
This DCGAN-tensorflow (I used Gene Kogan’s fork for a bug fix) expects square images at 128 x 128 pixels*, but my movie frames were 1920 x 800.

First, I wrote a python script using Pillow to copy and resize images to 307 x 128. If I recall correctly, this resize.py file was in the same directory as the original images.

from PIL import Image
import glob, os

size = 307, 128

def resize():
    for infile in glob.glob("*.png"):
        file, ext = os.path.splitext(infile)
        im = Image.open(infile)
        im.thumbnail(size, Image.ANTIALIAS)
        im.save("/dir/dir/dir/dir/images_resized/"+file + "_resized.png", "PNG")

resize()

Then, I made center crops using the ml4a guides/utils/dataset_utils.py by cd-ing into the utils directory and running this (don’t forget to install the requirements! see requirements.txt):

$ python3 dataset_utils.py --input_src /home/paperspace/Documents/data/brStills/ --output_dir /home/paperspace/Documents/projects/BladeRunner/stillsCropped --w 128 --h 128 --centered --action none --save_ext png --save_mode output_only

Along the way, I learned this handy line to count the number of files in a folder to make sure all 6,555 images were there:
$ ls -F |grep -v / | wc -l

*128x128 is so small! But Gene thought 256x256 might be too large. I learned that they are not, but it takes a much longer time to train. I did not pursue this after a 5-epoch run @ 1 hour.

Part III - Train the Model
Before I began training, I created directory with the following folders: checkpoints, samples, and my images.

Next, moving into the DCGAN folder, I ran the following line. Opening the main.py file beforehand, I learned which flag to set to indicate the PNG file format of my images (the default is JPG).

$ python main.py --dataset=images --data_dir /home/paperspace/Documents/projects/BladeRunner128 --input_height 128 --output_height 128 --checkpoint_dir /home/paperspace/Documents/projects/BladeRunner128/checkpoints --sample_dir /home/paperspace/Documents/projects/BladeRunner128/samples --epoch 5 --input_fname_pattern '*.png' --train

I started by running 5 epochs. Each epoch represents one pass through the entire dataset, and to do that, the dataset is broken up into batches. The default batch size for this DCGAN is 64.
My understanding from class was that other than the number of epochs, the hyperparameters for this particular DCGAN do not need tweaking, although I might revisit batch size later. How many epochs, you ask? That’s a good question, and one for which I haven’t found a definitive answer. It’s at this point that folks say it’s more an art than a science because it also depends on the diversity of your data. More epochs means more learning, but you don’t want to overfit…so to be continued… Reference

(Also worth mentioning that I got some initial errors because my single quotes around png were not formatted in an acceptable way. Better to type them in than copy and paste from a text editor.)

Part IV - Generate Images • Train the Model More • Repeat
Generating images is as easy as running the above line in the same place but without the --train, and the results are saved in the samples folder.

I learned that I can continue training from where I left off as long as I keep all the directories intact. The model will look for the checkpoint folder and load the most recent one. Similarly, if I train too much, I can remove the later checkpoints and generate from the most recent one remaining (or so I’m told…need to double-check for this DCGAN).

Here are some results:

After 5 epochs

 After 25 epochs (~1 hr 15 min)

After 25 epochs (~1 hr 15 min)

 After 125 epochs (~6hr)

After 125 epochs (~6hr)

 After 250 epochs (~12 hours)

After 250 epochs (~12 hours)

And here is a random sampling of movie stills from the original dataset:

Screen Shot 2018-11-02 at 1.58.22 PM.png

Many of the DCGAN examples I’ve seen use a dataset that is much more homogenous than mine. The output variation here doesn’t surprise me, and in fact, I was curious to see what would happen if I used a mix of images. That being said, the colors remind me of the film’s palette, and especially at the 250-epoch mark, I see a few faces emerging.

Part V - Train the Model on Linear Image Dataset
During my image preparation process, the pic ordered was shuffled when I cropped the center squares. A general question I have is does the order of the training data matter, especially for a dataset with a linear progression like this one? Along my machine learning meanderings online, I’ve seen nods to shuffling data to improve a model’s learning outcomes. A good next step would be for me to train on an ordered set to see if there are any differences…so let’s do that!

Here’s my own python script for taking the center crop of images:

from PIL import Image
import glob, os

def crop():
    for infile in glob.glob("*.png"):
        file, ext = os.path.splitext(infile)
        im = Image.open(infile)
        width, height = im.size;

        new_width = 128;
        new_height = 128;

        left = (width - new_width)/2
        top = (height - new_height)/2
        right = (width + new_width)/2
        bottom = (height + new_height)/2

        centerCrop = im.crop((left, top, right, bottom))

        centerCrop.save("/dir/dir/dir/dir/"+file + "_crop.png", "PNG")

crop()

This is the semester of never-ending blog posts, but I just keep learning new things to improve my workflow and understanding! For example, since this was going to be another long job, I learned how to use nohup mycommand > myLog.txt & disown to put this process into the background, send the output to a file, and break my Terminal’s connection with it so I could close my Terminal or my computer without interrupting the job. At any point, I can log back into the VM and cat myLog.txt to see the current output of the program. Reference.

$ nohup python main.py --dataset=stillsCropped --data_dir /home/paperspace/Documents/projects/BRlineartrain --input_height 128 --output_height 128 --checkpoint_dir /home/paperspace/Documents/projects/BRlineartrain/checkpoints --sample_dir /home/paperspace/Documents/projects/BRlineartrain/samples --epoch 40 --input_fname_pattern '*.png' --train > myLog.txt & disown

 After 210 epochs (~11 hours)

After 210 epochs (~11 hours)

So there’s a tendency towards more completed faces in the samples generated from the model trained on linear dataset. Is that because the neural model likely trained on a succession of frames with an actor in the center of the frame? Is the takeaway to stick with more homogenous datasets, then?

Overall, this experiment was an exercise in getting my bearings with setup and workflow: collecting and prepping my data, setting up a remote computational space (because my own computer is not powerful enough), and learning the mechanics for this particular model. The more I play and try, the more I’ll know which questions to ask and how to use this for a particular application.