Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694-711). Springer, Cham.

vgg features

The data

path = untar_data(URLs.IMAGENETTE)

db = DataBlock(blocks=(ImageBlock, ImageBlock),
               get_items=get_image_files,
               splitter=RandomSplitter(valid_pct=0.01),
               get_x=noop, get_y=noop,
               item_tfms=Resize(256),
               batch_tfms=Normalize.from_stats(0.5*torch.ones(3), 0.5*torch.ones(3)))
dls = db.dataloaders(path, bs=4, num_workers=4)
dls.show_batch()

For style transfer we have to choose any image as a style target and normlalize it with the imagenet_stats.

def get_style_target(artist, size=256, **kwargs):
    r = requests.get(artists_sources[artist], stream=True)
    style_target_img = PILImage.create(r.content)
    p = Pipeline([ToTensor,
                  Resize(size, **kwargs),
                  IntToFloatTensor,
                  Normalize.from_stats(*imagenet_stats, cuda=False)])
    return p(style_target_img), p

style_target, p = get_style_target('picasso')
p.decode(style_target)[0].show(figsize=(10,10));

The Loss

As the autors extracted features from the VGG16 model.

These are the original weights used in the paper.

!wget http://cs.stanford.edu/people/jcjohns/fast-neural-style/models/vgg16.t7 -O vgg16.t7

URL transformed to HTTPS due to an HSTS policy
--2020-11-11 17:36:44--  https://cs.stanford.edu/people/jcjohns/fast-neural-style/models/vgg16.t7
Resolving cs.stanford.edu (cs.stanford.edu)... 171.64.64.64
Connecting to cs.stanford.edu (cs.stanford.edu)|171.64.64.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 553452665 (528M)
Saving to: ‘vgg16.t7’

vgg16.t7            100%[===================>] 527.81M  14.7MB/s    in 38s     

2020-11-11 17:37:23 (13.8 MB/s) - ‘vgg16.t7’ saved [553452665/553452665]

The PerceptualLoss module computes the feature loss based on feture_layer and the style loss on the style_layers_names.

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

style_target_test = TensorImage(torch.rand(1, 3, 256, 256)*2-1)
feature_loss = PerceptualLoss(style_target_test, renormalize=True, style_weight=1, bs=4)
input = TensorImage(torch.rand(2, 3, 256, 256)*2-1).cuda()
target = TensorImage(torch.rand(2, 3, 256, 256)*2-1).cuda()
loss = feature_loss(input, target)
loss

TensorImage(16743.9863, device='cuda:0', grad_fn=<AliasBackward>)

Test that the style image is properly normalized

style_unnorm = TensorImage(torch.rand(1, 3, 256, 256))
style_imagenet = Normalize.from_stats(*imagenet_stats, cuda=False)(style_unnorm)
style_norm = style_unnorm*2-1 
feature_loss = PerceptualLoss(style_imagenet, renormalize=True, feature_weight=0, cuda=False)
target = TensorImage(torch.rand(1, 3, 256, 256)*2-1)
loss = feature_loss(style_norm, target)
test_eq(loss, 0)

Test that cuad=True works

style_target_test = TensorImage(torch.rand(1, 3, 256, 256)*2-1)
feature_loss = PerceptualLoss(style_target_test, renormalize=True)
input = TensorImage(torch.rand(1, 3, 256, 256)*2-1).cuda()
target = TensorImage(torch.rand(1, 3, 256, 256)*2-1).cuda()
loss = feature_loss(input, target)
loss

TensorImage(83739.4297, device='cuda:0', grad_fn=<AliasBackward>)

style_target_test = TensorImage(torch.rand(1, 3, 256, 256)*2-1)
feature_loss = PerceptualLoss(style_target_test, renormalize=True, bs=4)
input = TensorImage(torch.rand(4, 3, 256, 256)*2-1).cuda()
target = TensorImage(torch.rand(4, 3, 256, 256)*2-1).cuda()
loss = feature_loss(input, target)
loss

TensorImage(85110.8438, device='cuda:0', grad_fn=<AliasBackward>)

style_target_test = TensorImage(torch.rand(1, 3, 256, 256)*2-1)
feature_loss = PerceptualLoss(style_target_test, renormalize=True, style_weight=1e5, feature_weight=1)
target = TensorImage(torch.rand(1, 3, 256, 256)*2-1).to('cuda')
loss = feature_loss(target, target)
loss

TensorImage(1.6949e+09, device='cuda:0', grad_fn=<AliasBackward>)

style_target_test = TensorImage(torch.rand(1, 3, 256, 256)*2-1)
feature_loss = PerceptualLoss(style_target_test, renormalize=True, style_weight=1, bs=4)
input = TensorImage(torch.rand(4, 3, 256, 256)*2-1).cuda()
target = TensorImage(torch.rand(4, 3, 256, 256)*2-1).cuda()
loss = feature_loss(input, target)
loss

TensorImage(17311.0059, device='cuda:0', grad_fn=<AliasBackward>)

We use LBFGS optimization to find the images that mimimize the style loss.

We can also visualize images that minimize the feature reconstruction loss at different layers

Resnet Generator

The authors use a generator with residual connexions. They used a residual block without the last activation.

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

jrb = JohnsonResBlock(32)
x = torch.randn(4, 32, 16, 16)
y = jrb(x)
test_eq(y.shape, x.shape)

style_transfer_generator = ResnetGenerator()
x = torch.randn(1, 3, 256, 256)
y = style_transfer_generator(x)
y.shape, y.max(), y.min()

(torch.Size([1, 3, 256, 256]),
 tensor(0.9999, grad_fn=<MaxBackward1>),
 tensor(-0.9999, grad_fn=<MinBackward1>))

Learning

style_target, _ = get_style_target('picasso')
sgc = ShowGraphCallback()
picasso_learn = style_learner(dls, style_target=style_target, cbs=sgc, plkwargs={'style_weight': 0.5, 'feature_weight':5})
with picasso_learn.removed_cbs(sgc):
    picasso_learn.fit(1, lr=1.e-3)                    
picasso_learn.fit(7, lr=1.e-3)

picasso_learn.show_results()

Super-Resolution 4x

db = DataBlock(blocks=(ResImageBlock(72), ResImageBlock(288)),
               get_items=get_image_files,
               get_x=noop, get_y=noop,
               batch_tfms=Normalize.from_stats(0.5*torch.ones(3), 0.5*torch.ones(3)))

dls = db.dataloaders(path, bs=4, num_workers=4)
dls.show_batch()
b = dls.one_batch()

learn = superres_learner(dls)
learn.fit(16, lr=1e-3, wd=0)

learn.show_results()

Superres 8x

db = DataBlock(blocks=(ResImageBlock(36), ResImageBlock(288)),
               get_items=get_image_files,
               get_x=noop, get_y=noop,
               batch_tfms=Normalize.from_stats(0.5*torch.ones(3), 0.5*torch.ones(3)))

dls = db.dataloaders(path, bs=4, num_workers=4)
dls.show_batch()
b = dls.one_batch()

learn = superres_learner(dls, superres_factor=8)
learn.fit(16, lr=1e-3, wd=0)

learn.show_results()

Perceptual losses for real-time style transfer and super-resolution

The data

The Loss

`gramm_matrix`[source]

`anisotropic_total_variation`[source]

`class` `PerceptualLoss`[source]

Resnet Generator

`class` `JohnsonResBlock`[source]

`ResnetGenerator`[source]

Learning

`class` `LossToDevice`[source]

`style_learner`[source]

Super-Resolution 4x

`ResImageBlock`[source]

`superres_learner`[source]

Superres 8x

Perceptual losses for real-time style transfer and super-resolution

The data

The Loss

gramm_matrix[source]

anisotropic_total_variation[source]

class PerceptualLoss[source]

Resnet Generator

class JohnsonResBlock[source]

ResnetGenerator[source]

Learning

class LossToDevice[source]

style_learner[source]

Super-Resolution 4x

ResImageBlock[source]

superres_learner[source]

Superres 8x

`gramm_matrix`[source]

`anisotropic_total_variation`[source]

`class` `PerceptualLoss`[source]

`class` `JohnsonResBlock`[source]

`ResnetGenerator`[source]

`class` `LossToDevice`[source]

`style_learner`[source]

`ResImageBlock`[source]

`superres_learner`[source]