Neural Aesthetic

Week 8: Generating Images with Neural Style Transfer

A popular machine learning method for generation images is style transfer: appropriating the style of one image onto the content of another. Here is jcjohnson’ neural-style model, for which there is excellent documentation and examples.

I cloned the model into my virtual GPU machine in Paperspace (described in this post), and experimented with it in two ways: traditional style transfer and texture synthesis.

PART 1 - STYLE TRANSFER
This technique reminds of me of layering images with masks in Photoshop. To run the model you select and train on a “style” image to apply to a “content” image. There are many adjustable parameters to optimize the process and that impact the look of the resulting image, all documented in the repo.

Here’s an example (command run inside of model’s directory):
$ th neural_style.lua -style_image /home/paperspace/Documents/projects/_pattern.jpg -content_image /home/paperspace/Documents/projects/_portrait.jpg -output_image /home/paperspace/Documents/projects/result/_pattern_portrait.png -style_weight 2e0 -content_weight 3e0 -backend cudnn

My style image is of a pattern, my content image (to receive the style) is a portrait, I’ve defined the location and name of the resulting image, I’ve adjusted how much to weigh the style and content inputs, and finally, made more efficient use of my GPU’s memory by using the -backend cudnn flag. (Channel Zero, anyone?)

Let’s try another example, this time with the default style and content weight settings of 5e0 and 1e2 respectively. It really is like being in an analog dark room. No two prints are alike. Even if you run the model again with the same images and parameters, you get slightly different results.

PART 2 - TEXTURE SYNTHESIS
When you run the model with the content weight equal to 0, it still picks up and applies the learned style to an empty canvas. Again the result changes with every run even if the parameters do not change.

$ th neural_style.lua -style_image /home/paperspace/Documents/projects/clouds.jpg -content_image /home/paperspace/Documents/projects/content.jpg -output_image /home/paperspace/Documents/projects/result/_clouds_texture.png -content_weight 0 -backend cudnn

You can also combine multiple style input images for a collaged-texture synthesis:

$ th neural_style.lua -style_image /home/paperspace/Documents/projects/one.jpg,/home/paperspace/Documents/projects/two.jpg -content_image /home/paperspace/Documents/projects/content.jpg -output_image /home/paperspace/Documents/projects/result/_trees_texture.png -content_weight 0 -backend cudnn

Image Credits: Clouds & Red, Blue, and White Abstract

I guess this okay; I’m still looking for a useful application of this (because Photoshop). But I’m glad I did it. And really I want to know how it handles many many many input style images—like all 6,555 images from my Blade Runner experiment earlier. Unfortunately, Terminal said my argument list was too long.

I then tried half that amount (having extracted 30 frames from each minute of the film) but again got the same response.

It does work with 26 pics, though! Which is what I used when I tested the process (though I already trashed that output image). My next step is to figure out how to get around this…

In the mean time, here are the steps I used to prepare to train the model on multiple style files; I concatenated all the filenames with a comma, stored that into giant string which I wrote to a text file, then stored the contents of that file into a variable, which I then included in the command to start the training process:

  1. Create an empty text file: $ touch text.txt

  2. Create a directory with your image files

  3. Start a python shell: $ python

  4. >>> import os

  5. >>> mydir = ‘/images’

  6. >>> style_string = ','.join([os.path.join(mydir, f) for f in os.listdir(mydir)])

  7. >>> f = open( 'file.txt’, 'w' )

  8. >>> f.write(style_string)

  9. >>> f.close()

  10. Exit python shell (Control + d)

  11. Navigate into neural style directory

  12. $ value=$(</dir/file.txt)

  13. $ echo “$value” (to check the contents of the variable)

Which then leads to this:
$ th neural_style.lua -style_image "$value" -content_image /home/paperspace/Documents/projects/content.jpg -output_image /home/paperspace/Documents/projects/result/_result.png -content_weight 0 -backend cudnn

Week 8: Generating Images with DCGAN-Tensorflow

We made it to generative models! I’ve been learning how to set up and train models on images to generate new pics. Specifically, I’ve been using a Tensorflow implementation of Deep Convolutional Generative Adversarial Network. I’m still understanding how it works, but for now it’s like playing with a new camera that is learning how to make new images based off of the ones I feed it. The process of “developing” photos takes a long time—many hours or days even, and it reminds me of timed waiting in a dark room, not mentioned recording the results along with my input parameters.

This machine learning model needs to train on a lot of images. But where to find thousands upon thousands of pics? Movies! As collections of fast-moving images, they are an excellent source. And since machine learning is a branch of artificial intelligence, I choose Bladerunner 1982.

Part I - Virtual Machine Setup
I returned to Paperspace and setup a GPU, a Quadro M4000 (0.51/hr, $3/mo for public IP, and $5/mo for 50 GB) with Ubunto 16.04 and their ML-in-a-Box template, which has CUDA and the TensorFlow deep learning libraries already set.

With the public IP, I can log into the machine via Terminal, set up my project file structure, git clone repos, train models, and generate new images there, too. It helps to have some practice with the command line and Python beforehand.

Part II - Data Collection
The next step was to acquire a digital version of the movie and extract frames using ffmpeg.

To extract one every second I used:
$ ffmpeg -i BladeRunner.mp4 -vf fps=1 %04d.png -hide_banner

I choose PNG format over JPG because it supports lossless compression, and I wanted to preserve as much detail as possible.

This gave me 7057 images from which I removed out opening studio logos and credits at the end (btw, you can also set the initial extraction range with ffmpeg), which gave me a total of 6,555 images (8.63 GB).

Best to complete this step on the VM as it takes way too long to upload images using Cyberduck or via Jupyter Lab notebook or another way into your machine.

Part III - Data Preparation
This DCGAN-tensorflow (I used Gene Kogan’s fork for a bug fix) expects square images at 128 x 128 pixels*, but my movie frames were 1920 x 800.

First, I wrote a python script using Pillow to copy and resize images to 307 x 128. If I recall correctly, this resize.py file was in the same directory as the original images.

from PIL import Image
import glob, os

size = 307, 128

def resize():
    for infile in glob.glob("*.png"):
        file, ext = os.path.splitext(infile)
        im = Image.open(infile)
        im.thumbnail(size, Image.ANTIALIAS)
        im.save("/dir/dir/dir/dir/images_resized/"+file + "_resized.png", "PNG")

resize()

Then, I made center crops using the ml4a guides/utils/dataset_utils.py by cd-ing into the utils directory and running this (don’t forget to install the requirements! see requirements.txt):

$ python3 dataset_utils.py --input_src /home/paperspace/Documents/data/brStills/ --output_dir /home/paperspace/Documents/projects/BladeRunner/stillsCropped --w 128 --h 128 --centered --action none --save_ext png --save_mode output_only

Along the way, I learned this handy line to count the number of files in a folder to make sure all 6,555 images were there:
$ ls -F |grep -v / | wc -l

*128x128 is so small! But Gene thought 256x256 might be too large. I learned that they are not, but it takes a much longer time to train. I did not pursue this after a 5-epoch run @ 1 hour.

Part III - Train the Model
Before I began training, I created directory with the following folders: checkpoints, samples, and my images.

Next, moving into the DCGAN folder, I ran the following line. Opening the main.py file beforehand, I learned which flag to set to indicate the PNG file format of my images (the default is JPG).

$ python main.py --dataset=images --data_dir /home/paperspace/Documents/projects/BladeRunner128 --input_height 128 --output_height 128 --checkpoint_dir /home/paperspace/Documents/projects/BladeRunner128/checkpoints --sample_dir /home/paperspace/Documents/projects/BladeRunner128/samples --epoch 5 --input_fname_pattern '*.png' --train

I started by running 5 epochs. Each epoch represents one pass through the entire dataset, and to do that, the dataset is broken up into batches. The default batch size for this DCGAN is 64.
My understanding from class was that other than the number of epochs, the hyperparameters for this particular DCGAN do not need tweaking, although I might revisit batch size later. How many epochs, you ask? That’s a good question, and one for which I haven’t found a definitive answer. It’s at this point that folks say it’s more an art than a science because it also depends on the diversity of your data. More epochs means more learning, but you don’t want to overfit…so to be continued… Reference

(Also worth mentioning that I got some initial errors because my single quotes around png were not formatted in an acceptable way. Better to type them in than copy and paste from a text editor.)

Part IV - Generate Images • Train the Model More • Repeat
Generating images is as easy as running the above line in the same place but without the --train, and the results are saved in the samples folder.

I learned that I can continue training from where I left off as long as I keep all the directories intact. The model will look for the checkpoint folder and load the most recent one. Similarly, if I train too much, I can remove the later checkpoints and generate from the most recent one remaining (or so I’m told…need to double-check for this DCGAN).

Here are some results:

After 5 epochs

 After 25 epochs (~1 hr 15 min)

After 25 epochs (~1 hr 15 min)

 After 125 epochs (~6hr)

After 125 epochs (~6hr)

 After 250 epochs (~12 hours)

After 250 epochs (~12 hours)

And here is a random sampling of movie stills from the original dataset:

Screen Shot 2018-11-02 at 1.58.22 PM.png

Many of the DCGAN examples I’ve seen use a dataset that is much more homogenous than mine. The output variation here doesn’t surprise me, and in fact, I was curious to see what would happen if I used a mix of images. That being said, the colors remind me of the film’s palette, and especially at the 250-epoch mark, I see a few faces emerging.

Part V - Train the Model on Linear Image Dataset
During my image preparation process, the pic ordered was shuffled when I cropped the center squares. A general question I have is does the order of the training data matter, especially for a dataset with a linear progression like this one? Along my machine learning meanderings online, I’ve seen nods to shuffling data to improve a model’s learning outcomes. A good next step would be for me to train on an ordered set to see if there are any differences…so let’s do that!

Here’s my own python script for taking the center crop of images:

from PIL import Image
import glob, os

def crop():
    for infile in glob.glob("*.png"):
        file, ext = os.path.splitext(infile)
        im = Image.open(infile)
        width, height = im.size;

        new_width = 128;
        new_height = 128;

        left = (width - new_width)/2
        top = (height - new_height)/2
        right = (width + new_width)/2
        bottom = (height + new_height)/2

        centerCrop = im.crop((left, top, right, bottom))

        centerCrop.save("/dir/dir/dir/dir/"+file + "_crop.png", "PNG")

crop()

This is the semester of never-ending blog posts, but I just keep learning new things to improve my workflow and understanding! For example, since this was going to be another long job, I learned how to use nohup mycommand > myLog.txt & disown to put this process into the background, send the output to a file, and break my Terminal’s connection with it so I could close my Terminal or my computer without interrupting the job. At any point, I can log back into the VM and cat myLog.txt to see the current output of the program. Reference.

$ nohup python main.py --dataset=stillsCropped --data_dir /home/paperspace/Documents/projects/BRlineartrain --input_height 128 --output_height 128 --checkpoint_dir /home/paperspace/Documents/projects/BRlineartrain/checkpoints --sample_dir /home/paperspace/Documents/projects/BRlineartrain/samples --epoch 40 --input_fname_pattern '*.png' --train > myLog.txt & disown

 After 210 epochs (~11 hours)

After 210 epochs (~11 hours)

So there’s a tendency towards more completed faces in the samples generated from the model trained on linear dataset. Is that because the neural model likely trained on a succession of frames with an actor in the center of the frame? Is the takeaway to stick with more homogenous datasets, then?

Overall, this experiment was an exercise in getting my bearings with setup and workflow: collecting and prepping my data, setting up a remote computational space (because my own computer is not powerful enough), and learning the mechanics for this particular model. The more I play and try, the more I’ll know which questions to ask and how to use this for a particular application.

Week 6: Generating Text with a LSTM Neural Network

Since my last post for Neural Aesthetic, I’ve gained a broader view of accessible tools to develop machine learning projects, such as ml4a, ML5, and Wekinator. This survey helped me understand that I’m most interested in developing projects with the potential to generate potentially distinct content (expressive output) with potentially distinct and expressive input.

GUIDING QUESTIONS
I chose to work with ML5’s LSTMGenerator() which is a type of Recurrent Neural Network “useful for working with sequential data (like characters in text or the musical notes of a song) where the order of the that sequence matters.” I applied the tool to two bodies of text with the intended purpose of generating new text one character at a time.

My goals for this project included:

  1. Could I identify and find text that is perhaps often unread or overlooked?

  2. If so, what voice might emerge if I trained a model based on the data I collected?

  3. What will I learn from the process of collecting, cleaning, and training a model?

TOPIC 1: PRIVACY POLICIES
I created a dataset of privacy policies from all the websites where I maintain an account and for every app installed on my connected devices (laptop, mobile phone, and tablet).

Data Collection & Cleaning
To my knowledge, a clearinghouse of privacy policies does not exist so I tracked down the policies for all of my accounts and apps, 215 in total, and copied and pasted the text into separate RTF files. Of note, I did not collect privacy policies of every website I visited during the data collection process nor the privacy policies of third-party domains that supply content for or track my information (device information, browsing history/habits, etc) on websites that I visited. This monotonous process took me about a week, mostly during study breaks.

In order to train a machine learning model on this data, it needs to be in one file and “cleaned” as much as possible. I spent some time investigating different approaches and finally found one that worked well (and did not crash my apps in the process):

  1. Using the command line tool, textutil, I concatenated all the RTF files into one giant HTML file

  2. I copied the text from HTML file into a RTF file in TextEdit on my Mac

  3. With all text selected, I removed all the bullet formatting

  4. Next, I converted the text to plain text (Format > Make Plain Text)

  5. Finally, and through several rounds, I removed most of the empty lines of space throughout the document

The final file totaled 4.9MB at about 1,600 pages if printed. This was good news as both Gene and Cris encouraged me to collect as much text as possible, at least 1MB.

Training the Model Cris tipped me off to a service from Paperspace, Gradient, which offers a command line interface for training machine learning models and this very useful tutorial that he authored. Thanks, Cris! This process trains a model using TensorFlow on a GPU hosted by Paperspace. (If I initiated on the training on my 2015 MacBook Pro it would likely takes weeks.)

Of note, the ML5 tutorial on training a LSTM suggests different hyperparameters to use for varying sizes of datasets. For my 4.9MB file, I went with these in my run.sh file:

--rnn_size 512 \
--num_layers 2 \
--seq_length 128 \
--batch_size 64 \
--num_epochs 50 \

My first training on the NVIDIA Quadro P5000, took around one hour to complete (and about $0.78/hr). A week later, having added several more privacy policies from recently created accounts and finally honed my data-cleaning process (mentioned above), I used the NVIDIA Quadro M4000 (the P5000 was full) and the process took about two hours (at $0.51/hr). (I’m super curious to come back to this post in several years and compare computing power and rates.) When the training finished, it was quick to download the model from Paperspace into my local computer’s project directory.

Using the Model with ML5
ROUND 1 - ML5’s Interactive Text Generation LSTM example using P5.js was included when I cloned the repo for training a LSTM with Paperspace. With a local server running and with the model loaded, initiated text prediction by seeding the program with a few words. I chose how many characters it would predict and the “temperature” or randomness of the prediction on a scale of 0 to 1 (1 being closer to the original text and 0 offering more deviations). 

This was fun to play with but extremely slooow to respond when entering seed text…painfully so…freezing my browser slow, especially as I increased the length of the prediction. Speaking with Cris in office hours, I learned that this is example is stateless, in other words every time a new seed character is entered, it and all of the previous characters are used to recalculate an entirely new prediction. Screen grab below, prediction in blue.

ROUND 2 - Just my luck, about an hour before talking to Cris, machine learning artist, Memo Akten, made a ML5 pull request with a stateful LSTM example: “Instead of feeding every single character every frame to predict the next character, we feed only the last character, and instruct the LSTM to remember its internal state.” Excited to see how it would work, we downloaded the entire ML5 library, integrated Memo’s example, and ran it on my computer. Much faster and even more fun!

TOPIC 2: THE CONGRESSIONAL RECORD OF THE UNITED STATES CONGRESS
For a variety of reasons, I’ve been thinking a lot about trust these days (ahem, see above). Perhaps that’s how I eventually found myself at the Library of Congress website reading the daily report of events and floor debates from the 115th Congress.

Data Collection & Cleaning I found all of the documents here in digital form dating back to 1989. My initial impulse was to collect all records from January 20, 2017, to the current day, and it was quick to download all 389 PDF documents with a browser extension.

It took a couple of different approaches, but in the end I used a built-in automation function in Adobe Acrobat Pro to convert all the docs into plain text files. Turns out that different tools convert files differently, and Adobe gave me the cleanest results and files of the smallest size. When I combined all docs into one, it totaled over 16,000 pages at nearly 400MB. To my surprise, TextEdit handled my removal of empty spaces and characters converted from graphics very well. (Next time, do this in Python.)

Training the Model This time I used a different service, Spell.run, to train my model in no small part because you get $100 when you sign up right now. Not only did I find an intro video on The Coding Train (the best!) but just a week earlier, Nabil Hassein recorded a tutorial on training a LSTM using Spell. Following along I created a new project folder, activated a virtualenv Python environment and downloaded the ml5 repo for LSTM training.

Again it took a couple of rounds, but in the end I trained only one month of data. Even using one of the lower mid-range machine options, K80x8, training nearly 400MB of data would have taken around four days*. Training three months-worth (46MB) about 12 hours. One month of data (16MB) finished in just under four hours (at $7/hr) with my hyperparameters set to:

--rnn_size 1024 \
--num_layers 2 \
--seq_length 128 \
--batch_size 128 \
--num_epochs 50 \

A new problem that I encountered was: “UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1332: invalid continuation byte.”

By running this command, file --mime input.txt, I learned that in fact in my input file was encoded as ISO-8859-1 and not UTF-8.

I performed a conversion with this, iconv -f ISO-8859-1 -t UTF-8 input.txt > input2.txt, but it did weird stuff to the apostrophes and dashes. Once again, not all TXT conversions are the same.

This worked better and ensured the UTF-8 encoding that I needed:

  1. I made a copy of the original file

  2. Opened it in TextEdit

  3. Converted it to RTF, saved, and closed

  4. Made a copy of that file (just in case)

  5. Opened that on in Text Edit

  6. Converted it to Plain Text, saved, and closed

*In class today, Gene suggested that I lower the number of epochs from 50 to 5, set the rnn_size to 2048, layers to 3, consider pushing seq_length to 512 at the risk of memory issues, and possibly increasing the batch_size. He also mentioned that adjusting hyperparameters is more of an art than a science.

Using the Model with ML5
I again used the newly suggested Stateful LSTM ML5 example:

Congress.png

TAKEAWAYS
I gained a lot of practice from this process. Most of my time was spent thinking about, finding, gathering, and cleaning the data (something that Jabril mentioned last spring during his visit to ITP). After that, it’s kinda like baking, you pop it into the “oven” to train for a coupe of hours (or longer) and hope it comes out fitted juuust right. These particular projects are a bit silly but nevertheless fun to play. I wonder about distilling any amount of information in this way—is it irresponsible? Finally, these particular network generated sequences of individual characters, and for the most part, the resulting text makes some sense—as much sense as reading the legal-speak of the original documents. However, I’m excited to learn about opportunities for whole-word generation later this semester.

TRY IT! Personalized Privacy Policy

ADDITIONAL RESOURCES
The Unreasonable Effectiveness of Recurrent Neural Networks
Understanding LSTM Networks
ml4a guide: Recurrent Neural Networks: Character RNNs with Keras 
Training a char-rnn to Talk Like Me