最近开始确认自己想要在Python和深度学习学习的一个方向,就是图像处理,自己对这部分还是很有兴趣的,所以最近看视频,然后根据代码做了一个图像缺失弥补的程序。这个课程我2年前是看过的,但是因为那时候的笔记本没办法跑这种吃资源的项目,所以工作后自己凑了一台3060的笔记本和2060的台式,专门用来跑程序。以下是对程序的理解。
一、模型解读
这个项目来源于一篇论文Globally and Locally Consistent Image Completion,如果想要理解这个模型,需要先大致了解一下这个论文。论文的中心思想是:先给图片挖掉一部分区域——用
这个图片去跑global completion网络,并且把网络参数保存——然后在completion基础上,用global completion得到的全局图片和生成的local图片分别跑Global Discriminator和local Discriminator,项目模型可以看下图:注意这里的图片输入,一个是完整未动过的图片(completion生成的),一个是从网络自己生成的图片中截取的local图片。我们本文的模型是跑一个completion和一个completion+discriminator,然后结果可以比较。

二、网络解读
通过模型可以看到这里面有两个大网络:completion和discriminator,而discriminator又分为global和local两部分,论文中对网络组成进行了详细描述,如下图:

其中的dilated conv是指空洞卷积,其目的是为了增加感受野,而deconv conv是反卷积,目的是把图片进行还原。在tensorflow中用空洞卷积使用tf.nn.atrous_conv2d(x, filters, dilation, padding='SAME'),而反卷积使用tf.nn.conv2d_transpose(x, filters, output_shape, [1, stride, stride, 1]),这个是比较方便的。网络的定义如下代码所示:
from layer import *
import tensorflow as tf
class Network:
def __init__(self, x, mask, local_x, global_completion, local_completion, is_training, batch_size):
self.batch_size = batch_size
self.imitation = self.generator(x * (1 - mask), is_training)
self.completion = self.imitation * mask + x * (1 - mask)
# 由真的图片上截取下来的local_x跟原图x,输出结果就是True
self.real = self.discriminator(x, local_x, reuse=False)
# 由completion自己补的图片跟local discriminator补出来的图片,输出结果就是Fake
self.fake = self.discriminator(global_completion, local_completion, reuse=True)
self.g_loss = self.calc_g_loss(x, self.completion)
self.d_loss = self.calc_d_loss(self.real, self.fake)
self.g_variables = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope='generator')
self.d_variables = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope='discriminator')
def generator(self, x, is_training):
with tf.compat.v1.variable_scope('generator'):
with tf.compat.v1.variable_scope('conv1'):
x = conv_layer(x, [5, 5, 3, 64], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv2'):
x = conv_layer(x, [3, 3, 64, 128], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv3'):
x = conv_layer(x, [3, 3, 128, 128], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv4'):
x = conv_layer(x, [3, 3, 128, 256], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv5'):
x = conv_layer(x, [3, 3, 256, 256], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv6'):
x = conv_layer(x, [3, 3, 256, 256], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('dilated1'):
x = dilated_conv_layer(x, [3, 3, 256, 256], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('dilated2'):
x = dilated_conv_layer(x, [3, 3, 256, 256], 4)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('dilated3'):
x = dilated_conv_layer(x, [3, 3, 256, 256], 8)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('dilated4'):
x = dilated_conv_layer(x, [3, 3, 256, 256], 16)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv7'):
x = conv_layer(x, [3, 3, 256, 256], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv8'):
x = conv_layer(x, [3, 3, 256, 256], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('deconv1'):
x = deconv_layer(x, [4, 4, 128, 256], [self.batch_size, 64, 64, 128], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv9'):
x = conv_layer(x, [3, 3, 128, 128], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('deconv2'):
x = deconv_layer(x, [4, 4, 64, 128], [self.batch_size, 128, 128, 64], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv10'):
x = conv_layer(x, [3, 3, 64, 32], 1)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv11'):
x = conv_layer(x, [3, 3, 32, 3], 1)
x = tf.nn.tanh(x)
return x
def discriminator(self, global_x, local_x, reuse):
def global_discriminator(x):
is_training = tf.constant(True)
with tf.compat.v1.variable_scope('global'):
with tf.compat.v1.variable_scope('conv1'):
x = conv_layer(x, [5, 5, 3, 64], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv2'):
x = conv_layer(x, [5, 5, 64, 128], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv3'):
x = conv_layer(x, [5, 5, 128, 256], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv4'):
x = conv_layer(x, [5, 5, 256, 512], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv5'):
x = conv_layer(x, [5, 5, 512, 512], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('fc'):
x = flatten_layer(x)
x = full_connection_layer(x, 1024)
return x
def local_discriminator(x):
is_training = tf.constant(True)
with tf.compat.v1.variable_scope('local'):
with tf.compat.v1.variable_scope('conv1'):
x = conv_layer(x, [5, 5, 3, 64], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv2'):
x = conv_layer(x, [5, 5, 64, 128], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv3'):
x = conv_layer(x, [5, 5, 128, 256], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('conv4'):
x = conv_layer(x, [5, 5, 256, 512], 2)
x = batch_normalize(x, is_training)
x = tf.nn.relu(x)
with tf.compat.v1.variable_scope('fc'):
x = flatten_layer(x)
x = full_connection_layer(x, 1024)
return x
with tf.compat.v1.variable_scope('discriminator', reuse=reuse):
global_output = global_discriminator(global_x)
local_output = local_discriminator(local_x)
with tf.compat.v1.variable_scope('concatenation'):
output = tf.compat.v1.concat((global_output, local_output), 1)
output = full_connection_layer(output, 1)
return output
def calc_g_loss(self, x, completion):
loss = tf.compat.v1.nn.l2_loss(x - completion)
return tf.reduce_mean(loss)
def calc_d_loss(self, real, fake):
alpha = 4e-4
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=real, labels=tf.ones_like(real)))
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=fake, labels=tf.zeros_like(fake)))
return tf.add(d_loss_real, d_loss_fake) * alpha
关于loss值的选取:对于completion比较简单,采用MSE值来计算,就是简单地用生成的图片和真实图片做一个减法,就可以得出loss值;而discriminator则比较复杂一点,我理解了很久,因为论文提及用fake和real来进行判别,使用tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=fake, labels=tf.zeros_like(fake)))来进行计算,但是对于谁是fake谁是real并不是很清晰,我分析代码后得出:
由真的图片上截取下来的local_x跟原图x,输出结果就是True
由completion自己补的图片跟local discriminator补出来的图片,输出结果就是Fake
在程序中,它们的定义为
self.real = self.discriminator(x, local_x, reuse=False)
self.fake = self.discriminator(global_completion, local_completion, reuse=True)
三、程序分析
程序的架构是:数据处理——网络定义——建立模型——模型计算——结果展示
数据处理:这次补全的图片采用人像,整个数据有20万+,如果每次都导入这么多数据,将会非常浪费时间跟资源,因此,程序先将这些图片进行压缩,并且转为npy格式,同时,为了节省资源,只选取其中5000张图片,x_train为95%。
网络定义:已经在上面讲过,这里不再赘述。
建立模型:模型建立就是把一些过程量定义出来,这里需要解释一下mask,它本身是一个黑白图,把要被填充的地方,置为1,其它地方置为0,而这部分是通过get_points函数来实现,这个函数计算出local的大小和坐标,然后对该部分进行填充,代码如下所示。
x = tf.compat.v1.placeholder(tf.float32,[BATCH_SIZE,IMAGE_SIZE,IMAGE_SIZE,3])
mask = tf.compat.v1.placeholder(tf.float32,[BATCH_SIZE,IMAGE_SIZE,IMAGE_SIZE,1])
local_x = tf.compat.v1.placeholder(tf.float32,[BATCH_SIZE,LOCAL_SIZE,LOCAL_SIZE,3])
global_completion = tf.compat.v1.placeholder(tf.float32,[BATCH_SIZE,IMAGE_SIZE,IMAGE_SIZE,3])
local_completion = tf.compat.v1.placeholder(tf.float32,[BATCH_SIZE,LOCAL_SIZE,LOCAL_SIZE,3])
is_training = tf.compat.v1.placeholder(tf.bool,[])
model = Network(x, mask, local_x, global_completion, local_completion, is_training, batch_size=BATCH_SIZE)
sess = tf.compat.v1.Session()
global_step = tf.compat.v1.Variable(0,name='global_step',trainable=False)
epoch = tf.compat.v1.Variable(0,name='epoch',trainable=False)
opt = tf.compat.v1.train.AdamOptimizer(learning_rate=LEARNING_RATE)
# var_list:默认是GraphKeys.TRAINABLE_VARIABLES
g_train_op = opt.minimize(model.g_loss, global_step=global_step, var_list=model.g_variables)
d_train_op = opt.minimize(model.d_loss, global_step=global_step, var_list=model.d_variables)
init_opt = tf.compat.v1.global_variables_initializer()
sess.run(init_opt)
def get_points():
points = []
mask = []
for i in range(BATCH_SIZE):
# starting coordinate of the hole
x1, y1 = np.random.randint(0, IMAGE_SIZE - LOCAL_SIZE + 1, 2)
x2, y2 = np.array([x1, y1]) + LOCAL_SIZE
points.append([x1, y1, x2, y2])
# weight,height
w, h = np.random.randint(HOLE_MIN, HOLE_MAX + 1, 2)
p1 = x1 + np.random.randint(0, LOCAL_SIZE - w)
q1 = y1 + np.random.randint(0, LOCAL_SIZE - h)
p2 = p1 + w
q2 = q1 + h
m = np.zeros((IMAGE_SIZE, IMAGE_SIZE, 1), dtype=np.uint8)
m[q1:q2 + 1, p1:p2 + 1] = 1
mask.append(m)
return np.array(points), np.array(mask)
模型计算:通过模型定义,我们给定一个PRETRAIN_EPOCH值,如果epoch超过这个值,就停止completion的计算,保存模型参数,然后开始计算discriminator,而这部分源程序中没有给出停止的条件,所以我给定一个stop_loss:1e-4,当loss值低于这个数就保存模型跳出。这里就是跑的最久的地方,如果batch_size给的太大,电脑资源容易不够,我用台式电脑:6g 2060显卡跑这个模型,只能用batch_size=16,不然就会算不下去。
模型展示:最后,我们通过x_test来看一下计算结果,结果分为两个,一个是completion完成的,一个是completion+discriminator完成的。
下图是completion最后出来的图,效果还可以,有点像打了马赛克;

下图是原始图和模型图的对照,结果也还不错,如果模型继续训练可以得到更好的结果,论文中的图是跑了好几天的:

下图是一个是completion(上)完成的,一个是completion+discriminator(下)完成的,下面那张的肤色比上面的偏白。


代码下载链接python图像缺失弥补源码资源-CSDN文库
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题?更新问题,以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点?我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等),但我确实想创建一个输出文件。
我有带有Logo图像的公司模型has_attached_file:logo我用他们的Logo创建了许多公司。现在,我需要添加新样式has_attached_file:logo,:styles=>{:small=>"30x15>",:medium=>"155x85>"}我是否应该重新上传所有旧数据以重新生成新样式?我不这么认为……或者有什么rake任务可以重新生成样式吗? 最佳答案 参见Thumbnail-Generation.如果rake任务不适合你,你应该能够在控制台中使用一个片段来调用重新处理!关于相关公司
这个问题在这里已经有了答案:关闭10年前。PossibleDuplicate:Pythonconditionalassignmentoperator对于这样一个简单的问题表示歉意,但是谷歌搜索||=并不是很有帮助;)Python中是否有与Ruby和Perl中的||=语句等效的语句?例如:foo="hey"foo||="what"#assignfooifit'sundefined#fooisstill"hey"bar||="yeah"#baris"yeah"另外,类似这样的东西的通用术语是什么?条件分配是我的第一个猜测,但Wikipediapage跟我想的不太一样。
什么是ruby的rack或python的Java的wsgi?还有一个路由库。 最佳答案 来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht
华为OD机试题本篇题目:明明的随机数题目输入描述输出描述:示例1输入输出说明代码编写思路最近更新的博客华为od2023|什么是华为od,od薪资待遇,od机试题清单华为OD机试真题大全,用Python解华为机试题|机试宝典【华为OD机试】全流程解析+经验分享,题型分享,防作弊指南华为o
我想解析一个已经存在的.mid文件,改变它的乐器,例如从“acousticgrandpiano”到“violin”,然后将它保存回去或作为另一个.mid文件。根据我在文档中看到的内容,该乐器通过program_change或patch_change指令进行了更改,但我找不到任何在已经存在的MIDI文件中执行此操作的库.他们似乎都只支持从头开始创建的MIDI文件。 最佳答案 MIDIpackage会为您完成此操作,但具体方法取决于midi文件的原始内容。一个MIDI文件由一个或多个音轨组成,每个音轨是十六个channel中任何一个上的
本文主要介绍在使用Selenium进行自动化测试或者任务时,对于使用了iframe的页面,如何定位iframe中的元素文章目录场景描述解决方案具体代码场景描述当我们在使用Selenium进行自动化测试的时候,可能会遇到一些界面或者窗体是使用HTML的iframe标签进行承载的。对于iframe中的标签,如果直接查找是无法找到的,会抛出没有找到元素的异常。比如近在咫尺的例子就是,CSDN的登录窗体就是使用的iframe,大家可以尝试通过F12开发者模式查看到的tag_name,class_name,id或者xpath来定位中的页面元素,会抛出NoSuchElementException异常。解决
我正在尝试使用Ruby2.0.0和Rails4.0.0提供的API从imgur中提取图像。我已尝试按照Ruby2.0.0文档中列出的各种方式构建http请求,但均无济于事。代码如下:require'net/http'require'net/https'defimgurheaders={"Authorization"=>"Client-ID"+my_client_id}path="/3/gallery/image/#{img_id}.json"uri=URI("https://api.imgur.com"+path)request,data=Net::HTTP::Get.new(path
2022/8/4更新支持加入水印水印必须包含透明图像,并且水印图像大小要等于原图像的大小pythonconvert_image_to_video.py-f30-mwatermark.pngim_dirout.mkv2022/6/21更新让命令行参数更加易用新的命令行使用方法pythonconvert_image_to_video.py-f30im_dirout.mkvFFMPEG命令行转换一组JPG图像到视频时,是将这组图像视为MJPG流。我需要转换一组PNG图像到视频,FFMPEG就不认了。pyav内置了ffmpeg库,不需要系统带有ffmpeg工具因此我使用ffmpeg的python包装p
ValidPalindromeGivenastring,determineifitisapalindrome,consideringonlyalphanumericcharactersandignoringcases. [#125]Example:"Aman,aplan,acanal:Panama"isapalindrome."raceacar"isnotapalindrome.Haveyouconsiderthatthestringmightbeempty?Thisisagoodquestiontoaskduringaninterview.Forthepurposeofthisproblem