image

Some time ago, the industry’s most famous anime style transformation filter library AnimeGAN released its latest v2 version, and it has been the talk of the town for a while. When it comes to secondary yuan, the largest domestic user base is undoubtedly the Jitterbug client, which has a built-in animation conversion filter “Transformation Comic” that allows users to convert their actual appearance to secondary yuan “style” during live broadcasts. For secondary fans, the self-indulgent way of “breaking the next-dimensional wall and transforming into a paper man” is a tried and tested way:

image

But it is inevitable to see more aesthetic fatigue, a thousand people a “cone face”, the same “Kazran” type big eyes, so people are more or less the same feeling, not too much, lost reality.

The CartoonGan-based AnimeGAN anime style filter is able to retain the characteristics of the original image while retaining both the coolness of the secondary yuan and the realism of the tertiary yuan, which is quite a bit of a combination of soft and rigid, light feeling.

image

And the AnimeGAN project team has released the demo interface online, so you can run the model directly: https://huggingface.co/spaces/akhaliq/AnimeGANv2 However, due to bandwidth and online resource bottlenecks, the online migration queue is often in a queue state, and the upload of some original images may also The uploading of some original images may also cause leakage of personal privacy.

So this time we build AnimeGANV2 based on Pytorch deep learning framework in Mac os Monterey with M1 chip to convert still pictures and dynamic videos.

As we know, the current cpu version of Pytorch on the M1 chip mac is Python 3.8. This time we use the native installation package to install it, first go to the Python website and download Python 3.8.10 universal2 stable version: https://www.python.org/downloads/release/python-3810/

image

Just double-click to install, then go to the terminal and type the command to install Pytorch:

1
pip3.8 install torch torchvision torchaudio

Here we install the latest stable version 1.10 by default, then go to the Python 3.8 command line and import the torch library at

1
2
3
4
5
6
(base) ➜  video git:(main) ✗ python3.8
Python 3.8.10 (v3.8.10:3d8993a744, May  3 2021, 09:09:08) 
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>

After making sure that Pytorch is available, clone the official project

1
git clone https://github.com/bryandlee/animegan2-pytorch.git

AnimeGAN is also based on Generative adversarial network, the principle is that we have a certain amount of original pictures on hand, we can call them cubic pictures, the real picture features will exist a distribution, such as: normal distribution, uniform distribution, or more complex forms of distribution, then the purpose of GAN is to generate a batch of data close to the real distribution by generator to generate a batch of data close to the real distribution. These data can be understood as quadratic optimization, but will retain some features of the third dimension, such as larger eyes, face shape closer to the drawing style of the filter model, etc. In our processing, this generator tends to use a neural network, because it can represent more complex data distribution situations.

After successful download, you can see four different weighting models in the weights folder, where celeba_distill.pt and paprika.pt are used to transform landscape images, while face_paint_512_v1.pt and face_paint_512_v2.pt are more focused on portrait transformation.

First install the image processing library Pillow:

1
pip3.8 install Pillow

This is followed by a new test_img.py file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from PIL import Image
import torch
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="celeba_distill")
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v1")
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v2")
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="paprika")


face2paint = torch.hub.load("bryandlee/animegan2-pytorch:main", "face2paint", size=512)

img = Image.open("Arc.jpg").convert("RGB")
out = face2paint(model, img)

out.show()

Here the Arc de Triomphe photo as an example, respectively use celeba_distill and paprika filter to view the effect, note that local requests need to turn off ssl certificate detection, while the first run need to download the online model parameters

image

Here the image size parameter refers to the total number of width and height channels, the next is the character portrait animation style conversion, adjust the imported model generator type, the input image to change into a character portrait:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from PIL import Image
import torch
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

import numpy as np

#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="celeba_distill")
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v1")
model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v2")
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="paprika")


face2paint = torch.hub.load("bryandlee/animegan2-pytorch:main", "face2paint", size=512)

img = Image.open("11.png").convert("RGB")

out = face2paint(model, img)


out.show()

image

As you can see, the v1 filter is relatively more strongly stylized, while v2 relatively retains the characteristics of the original picture on the basis of stylization, which originates from the third dimension without being confined to the experience, overhead but not flowing in vain, and is higher than Jitterbug’s cartoon filter.

Let’s take a look at the anime filter conversion of dynamic video, video in a broad sense, is a multi-picture burst of playback, but depends on the video frame rate problem, frame rate is also known as FPS (Frames PerSecond) abbreviation - frame / second, refers to the number of frames per second to refresh the picture, but also It can be interpreted as the number of times per second the graphics processor can refresh. The higher the frame rate, the smoother and more realistic the animation will be, and the more frames per second (FPS), the smoother the action displayed will be.

Here you can convert coherent videos to pictures in FPS with third party software, in m1 mac os system, the famous video processing software:Ffmpeg is recommended

Homebrew for installation using the arm architecture.

1
brew install ffmpeg

After successful installation, type the ffmpeg command in the terminal to view the version:

1
2
3
4
(base) ➜  animegan2-pytorch git:(main) ✗ ffmpeg   
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with Apple clang version 13.0.0 (clang-1300.0.29.3)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/4.4.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox

There is no problem with the installation, then prepare a video file and create a new video_img.py:

1
2
3
4
import os

# 视频转图片
os.system("ffmpeg -i ./视频.mp4 -r 15 -s 1280,720 -ss 00:00:20 -to 00:00:22 ./myvideo/%03d.png")

Here we use Python3’s built-in os module to run the ffmpeg command directly, for the video in the current directory, converting it at 15 frames per second, with the -s parameter representing the video resolution, the -ss parameter controlling the start position and end position of the video, and finally the directory to export the image to.

After running the script, enter the myvideo directory.

1
2
3
4
5
6
(base) ➜  animegan2-pytorch git:(main)cd myvideo 
(base) ➜  myvideo git:(main) ✗ ls
001.png    004.png    007.png    010.png    013.png    016.png    019.png    022.png    025.png    028.png   
002.png    005.png    008.png    011.png    014.png    017.png    020.png    023.png    026.png    029.png   
003.png    006.png    009.png    012.png    015.png    018.png    021.png    024.png    027.png    030.png   
(base) ➜  myvideo git:(main)

As you can see, the images have been converted according to the number of frames as subscript file names.

Next, the images need to be batch converted using the AnimeGAN filter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from PIL import Image
import torch
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

import numpy as np

import os

img_list = os.listdir("./myvideo/")


# model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="celeba_distill")
# model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v1")
model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v2")
# #model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="paprika")

face2paint = torch.hub.load("bryandlee/animegan2-pytorch:main", "face2paint", size=512)

for x in img_list:

    if os.path.splitext(x)[-1] == ".png":

        print(x)

        img = Image.open("./myvideo/"+x).convert("RGB")

        out = face2paint(model, img)

        out.show()
        out.save("./myimg/"+x)

        # exit(-1)

For each conversion, the original image is kept and the filtered image is stored in the relative directory myimg, then a new img_video.py is created to reconvert it to a video.

1
2
3
4
import os

# 图片转视频
os.system("ffmpeg -y -r 15 -i  ./myimg/%03d.png -vcodec libx264 ./myvideo/test.mp4")

The rate is still 15 frames per second, the same as the original video.

If the original video has an audio track, you can first split the audio track: img_video.py to reconvert it to video:

1
2
3
# 抽离音频
import os
os.system("ffmpeg -y -i ./lisa.mp4 -ss 00:00:20 -to 00:00:22 -vn -y -acodec copy ./myvideo/3.aac")

After converting the anime filter, merge the converted video with the audio track of the original video:

1
2
3
# 合并音视频

os.system("ffmpeg -y -i ./myvideo/test.mp4 -i ./myvideo/3.aac -vcodec copy -acodec copy ./myvideo/output.mp4")

Test cases for the original video.

image

Post-conversion effect.

image

With the support of the m1 chip, the efficiency of the cpu-based version of Pytorch is still good, but unfortunately the gpu version of Pytorch adapted to the m1 chip we still need to wait a while, in the last month, Pytorch project team member soumith gave this response.

So, here’s an update.

We plan to get the M1 GPU supported. @albanD, @ezyang and a few core-devs have been looking into it. I can’t confirm/deny the involvement of any other folks right now.

So, what we have so far is that we had a prototype that was just about okay. We took the wrong approach (more graph-matching-ish), and the user-experience wasn’t great – some operations were really fast, some were really slow, there wasn’t a smooth experience overall. One had to guess-work which of their workflows would be fast.

So, we’re completely re-writing it using a new approach, which I think is a lot closer to your good ole PyTorch, but it is going to take some time. I don’t think we’re going to hit a public alpha in the next ~4 months.

We will open up development of this backend as soon as we can.

It can be seen that the project team should be completely reconstructed for the m1 chip Pytorch underlying, the public beta version will not be launched in the near future, perhaps the second half of next year will be released, or very much worth looking forward to.