classification

在上篇博客提到,该任务就是将原始数据的每张图片(256x256)进行grid级别的label预测,思路很简单,就是最后卷出的feature map是4x4的,不要过average global pooling layer,直接拉成1x16的向量过sigmoid激活函数即可(label也要变成16个字符,有点类似OCR)。

dataset

原始数据给的标注是json格式的框标注,但是框不是杆菌的具体位置,而是代表这个grid里面存在杆菌:

"frames":{"0_grid.png":[{"x1":162.42542787286064,"y1":25.34963325183374,"x2":170.24938875305622,"y2":34.11246943765281,"width":256,"height":256,"box":{"x1":162.42542787286064,"y1":25.34963325183374,"x2":170.24938875305622,"y2":34.11246943765281},"UID":"fd548124","id":0,"type":"Rectangle","tags":["TB01"],"name":1},{"x1":214.3765281173594,"y1":24.410757946210268,"x2":224.07823960880197,"y2":34.11246943765281,"width":256,"height":256,"box":{"x1":214.3765281173594,"y1":24.410757946210268,"x2":224.07823960880197,"y2":34.11246943765281},"UID":"5318dd81","id":1,"type":"Rectangle","tags":["TB01"],"name":2},...}

部分标注内容如上,主要包含了对应的文件夹下有哪些图片,图片上有无杆菌,杆菌的位置在哪个格子(要自己判断),以及一张图片有杆菌的话共有几个(“name”)。

首先找出哪些是positive的图片,并且根据坐标位置写出标签:

def find_write_positive_imgs(src_json_path, src_imgs_path, dst_csv_path, dst_imgs_path):

data_csv = open(dst_csv_path, 'a+', newline='')
csv_writer = csv.writer(data_csv)
csv_writer.writerow(["ImageName", "LabelId"])

with open(src_json_path,'r') as load_json:
load_dict = json.load(load_json)

img_names = load_dict['visitedFrames']

for img_name in img_names:

#n_name represents the boxes quantities of the img <"name" attribute in .json file>
n_name=len(load_dict['frames'][img_name])

if n_name > 0:
src_img_path = os.path.join(src_imgs_path, img_name)
img = cv2.imread(src_img_path)
H = img.shape[0]
W = img.shape[1]
dst_img_path = os.path.join(dst_imgs_path, img_name.replace('.png', '_22.png'))
cv2.imwrite(dst_img_path, img)

labelid = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
for i in range(0,n_name):
x1 = load_dict['frames'][img_name][i]['x1']
y1 = load_dict['frames'][img_name][i]['y1']
area_h0 = 0
area_w0 = 0
for area_h1 in range(H//4, H+1, H//4):
if y1 > area_h0 and y1 < area_h1:
row_id = (area_h1 * 4 / H) - 1
for area_w1 in range(W//4, W+1, W//4):
if x1 > area_w0 and x1 < area_w1:
col_id = (area_w1 * 4 / W) - 1
id = int(col_id + 4 * row_id)
labelid[id] = 1
break
else:
area_w0 = area_w1
break
else:
area_h0 = area_h1

csv_writer.writerow([img_name.replace('.png', '_22.png'),
''.join(str(k) for k in labelid)])

此外,由于最后找出的positive图片很少(好像只有320张),我又对其进行了数据扩增,先是原始旋转一圈,然后right-left翻转后又旋转了一圈,因此总共扩增到了8倍大小。之后进行一下train-val-test set的划分,一般生成随机数就可以按自己的意愿划分,也有专门的库,具体划分代码就不上了。

另外数据增强方面也考虑过rgb转hsv或者ycrcb的,但是我试了一个样例之后效果不是很好,毕竟这样做的目的就是为了将主要的前景和特征显示出来,奈何我的数据太差了些,不好操作,于是作罢。

准备好数据之后,要对数据进行抽取,我用的是pytorch,直接继承Dataset类就好:

class SSDataset(Dataset):

def __init__(self, imgs_path, csv_path,
img_transform=None, loader=default_loader):
with open(csv_path, 'r') as f:
#这里一定要按字符串读取,否则前面的0会丢掉
#类似于OCR的labe读取
data_info = pd.read_csv(f, dtype=str)
#第一列是image name
self.img_list = list(data_info.iloc[:,0])
#第二类是labelid
self.label_list = list(data_info.iloc[:,1])
self.img_transform = img_transform
#loader用PIL.Image.open()
#不要用cv2.imread()
#pytorch默认PIL格式
self.loader = loader
self.imgs_path = imgs_path

def __getitem__(self, index):
img_path = os.path.join(self.imgs_path, self.img_list[index])
img = self.loader(img_path)
label = self.label_list[index]
if self.img_transform is not None:
img = self.img_transform(img)
return img, label

def __len__(self):
return len(self.label_list)

但是定义的labelid是str,还需要转成tensor去计算loss:

def labelid_switch(labels_str):
b_s = len(labels_str)
pad_label = []
for i in range(0, b_s):
temp_label = [0]* 16
temp_label[:16] = labels_str[i]
temp_label = list(map(int, temp_label))
pad_label.append(temp_label)
pad_label = torch.Tensor(pad_label)
labels_float = pad_label.view(b_s, 16)
return labels_float

train

训练模型是主要用的是resnet和vgg,这部分代码可以直接参考torchvision,然后改改后面的layer就好了。

loss function上我试了binary cross entropy和focal loss(毕竟整体上positive grids还是少于negative grids的),此外我也试了下mixup,就是随机把batch里面的图片两两混合,计算loss的时候按照混合的比例分别计算相加,这也是一种应对过拟合,降低模型复杂度的办法(还有一种类似的方法叫sample pairing,只混合图片,不管label,我也试了,不过实际好像没mixup顶用):

class BFocalLoss(nn.Module):

def __init__(self, gamma=1,alpha=0.8):
super(BFocalLoss, self).__init__()
self.gamma = gamma
self.alpha = alpha
def forward(self, inputs, targets):
p = inputs
loss = -self.alpha*(1-p)**self.gamma*(targets*torch.log(p+1e-12))-\
(1-self.alpha)*p**self.gamma*((1-targets)*torch.log(1-p+1e-12))
loss = torch.sum(loss)
return loss
def mixup_data(in_img, in_label, alpha=1.0):
#alpha in [0.1,0.4] in paper has better gain(for imagenet)
#for cifar-10 is 1.
if alpha > 0:
lam = np.random.beta(alpha, alpha)
else:
lam = 1

Batch_Size = in_img.size()[0]
Index = torch.randperm(Batch_Size)
mixed_x = lam * in_img + (1 - lam) * in_img[Index, :]
y_a, y_b = in_label, in_label[Index]
return mixed_x, y_a, y_b, lam

#计算loss
loss_mixup = lam * criterion(pred, labels_a) + \
(1 - lam) * criterion(pred, labels_b)

接下来的事就是调参,对比实验,开tensorboard看loss的趋势了。(这里有一个现象,前期有一部分时间loss难以下降,总是在一个范围内波动,我猜想可能是因为数据扩增的原因。)

test

这部分就是加载模型,一张张图片测试,然后写出预测的csv即可,然后给出grid acc

#部分代码如下:
file_pre = open(PRE_TEST_CSV, 'w', newline='')
pre_writer = csv.writer(file_pre)
pre_writer.writerow(["ImageName", "LabelId"])

with open(SRC_TEST_CSV, 'r') as f_test:
test_data = pd.read_csv(f_test, dtype=str)
img_name = list(test_data.iloc[:,0])
labelid = list(test_data.iloc[:,1])
test_data_len = len(test_data.index)

num_right = 0
positive_num = 0
positive_num_right = 0
for i in range(0,test_data_len):
img_path = os.path.join(TEST_DATA_PATH, img_name[i])
img = Image.open(img_path)
img_tensor = transformations(img).float()
img_tensor = img_tensor.unsqueeze_(0)

temp_label = [0]*16
temp_label[:16] = labelid[i]
temp_label = list(map(int, temp_label))
for temp in temp_label:
if temp > 0:
positive_num += 1
label = torch.FloatTensor(temp_label)
label = label.view(1, 16)

input = Variable(img_tensor)
input = input.to(device)
pred = net(input).data.cpu() #在CPU中比较
output = pred
pred_len = pred.size()[1]
out = []
for j in range(0, pred_len):
if pred[0][j] < 0.5:
output[0][j] = 0
out.append(0)
if output[0][j] == label[0][j]:
num_right += 1
else:
output[0][j] = 1
out.append(1)
if output[0][j] == label[0][j]:
num_right += 1
positive_num_right += 1
pre_writer.writerow([img_name[i],''.join(str(k) for k in out)])

print('test acc is: ', num_right/(test_data_len*16))
print('postivite acc is: ', positive_num_right, '/', positive_num)

summary

实际上这个代码下来,调参还是挺费劲的,尤其是对我这种刚开始搞深度学习,经验还不够的新手来说,着实走了不少弯路。可是数据集实在太差,实在是想不出什么招。。所以硬撑了快两个月(实际上前大半个月我是直接分割grid成单独的图片,然后全部丢进去训练的。。这样搞不仅正负样本差距极大,而且切断了图片的连续性,效果奇差也在意料之中了,基本训不动,即使加了focal loss也没什么卵用)最后最高也才得到90%的acc。

weakly semantic segmentation

好歹8月下旬那会找到了一个公开的sputum smear的数据集,还带着框的标注:

跟CTO交流后,他觉得这数据集质量不错,干脆就提议做弱监督分割,毕竟object detection现在都做烂了,而且开源这数据集的小哥自己也把object detection的acc刷的不错了,所以没必要再调包重复同样的事情了。我当时其实没啥思路,但是觉得应该挺有意思的,于是就接了下来。

后来通过调研发现,原来在自然图像上早就有人做了weakly segmentation(又是我恺明哥那些人…),而且效果还不错,唯一可惜的就是完整的代码基本没人开源,不过后来参考GitHub上的一些相关代码也慢慢搭建出了整个框架。

整个项目思路主要参考的是这两篇论文:戴季峰的BoxSup和Max Planck Institute的Simple Does It,主要的思路就是先设定几个从bounding box annotations生成segment proposals的方法(主要是opencv中GrabCut),然后利用此label去进行supervised training,最后过一下denseCRF优化一下,让boundary更加丝滑。当然也可以试试递归训练,让performance不错的model去预测生成新的training set中的label,然后进行下一轮的训练。

因为代码比较庞杂,分块不好展示,完整的代码就直接放在我的github上。

pre-processing

原始的数据集中有1217张阳性图片,此外这些图片的标注还有47张莫名奇妙多了些20x20的框(可能是标的时候手抖了),因此要先一个个去掉。

看到了吗,左下角和右上角都多了个小框

之后,对这些图片进行大致masks的生成,我这里给了三种方法:

  • Box_segments: 把整个box里面的像素都认为是杆菌(要把box的坐标都转成int,得对上像素)
  • Sbox_segments:取box里面的80%的矩形框,认为该框里面的像素都是杆菌(同样,坐标都是int类型)
  • GrabCut_segments: 利用经典的计算机视觉方法GrabCut来得到杆菌的分割区域,但是该方法一般对图片的里面的单个的大物体比较友好,而杆菌又细又长,同时又包含着染色质,所以利用颜色分布的GrabCut分割出的杆菌要么会大点,要么就没有。大点的我不管,没有的我在这里就直接用Box_segments代替了。

GrabCut部分代码如下:

def grabcut(img_name):
masks = []
# one image has many object that need to grabcut
for i, ann_info in enumerate(ANNS[img_name], start=1):
img = cv.imread((img_dir +img_name).rstrip()+'.jpg')
grab_name = ann_info[1]
xmin = ann_info[3]
ymin = ann_info[2]
xmax = ann_info[5]
ymax = ann_info[4]
"""get int box coor"""
img_w = img.shape[1]
img_h = img.shape[0]
xmin, ymin, xmax, ymax = get_int_coor(xmin, ymin, xmax, ymax, img_w, img_h)
box_w = xmax - xmin
box_h = ymax - ymin
# cv.grabcut's para
mask = np.zeros(img.shape[:2], np.uint8)
# rect is the tuple
rect = (xmin, ymin, box_w, box_h)
bgdModel = np.zeros((1, 65), np.float64)
fgdModel = np.zeros((1, 65), np.float64)
#for small bbox:
if box_w * box_h < MINI_AREA:
img_mask = mask[ymin:ymax, xmin:xmax] = 1
# for big box that area == img.area(one object bbox is just the whole image)
elif box_w * box_h == img.shape[1] * img.shape[0]:
rect = [RECT_SHRINK, RECT_SHRINK, box_w - RECT_SHRINK * 2, box_h - RECT_SHRINK * 2]
cv.grabCut(img, mask, rect, bgdModel,fgdModel, ITER_NUM, cv.GC_INIT_WITH_RECT)
# astype('uint8') keep the image pixel in range[0,255]
img_mask = np.where((mask == 0) | (mask == 2), 0, 1).astype('uint8')
# for normal bbox:
else:
cv.grabCut(img, mask, rect, bgdModel,fgdModel, ITER_NUM, cv.GC_INIT_WITH_RECT)
img_mask = np.where((mask == 0) | (mask == 2), 0, 1).astype('uint8')
# if the grabcut output is just background(it happens in my dataset)
if np.sum(img_mask) == 0:
img_mask = np.where((mask == 0), 0, 1).astype('uint8')
# couting IOU
# if the grabcut output too small region, it need reset to bbox mask
box_mask = np.zeros((img.shape[0], img.shape[1]))
box_mask[ymin:ymax, xmin:xmax] = 1
sum_area = box_mask + img_mask
intersection = np.where((sum_area==2), 1, 0).astype('uint8')
union = np.where((sum_area==0), 0, 1).astype('uint8')
IOU = np.sum(intersection) / np.sum(union)
if IOU <= IOU_THRESHOLD:
img_mask = box_mask
# for draw mask on the image later
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
masks.append([img_mask, grab_name, rect])

num_object = i
"""for multi-objects intersection and fix the label """
masks.sort(key=lambda mask: np.sum(mask[0]), reverse=True)
for j in range(num_object):
for k in range(j+1, num_object):
masks[j][0] = masks[j][0] - masks[k][0]
masks[j][0] = np.where((masks[j][0]==1), 1, 0).astype('uint8')
"""get class name id"""
grab_name = masks[j][1]
class_id = grab_name.split('_')[-1]
class_id = int(class_id.split('.')[0])

#set the numpy value to class_id
masks[j][0] = np.where((masks[j][0]==1), class_id, 0).astype('uint8')
# save grabcut_inst(one object in a image)
scipy.misc.toimage(masks[j][0], cmin=0, cmax=255, pal=tbvoc_info.colors_map,
mode='P' ).save((grabcut_dir).rstrip()+masks[j][1])

"""merge masks"""
# built array(img.shape size)
mask_ = np.zeros(img.shape[:2])
for mask in masks:
mask_ = mask_ + mask[0]
# save segmetation_label(every object in a image)
scipy.misc.toimage(mask_, cmin=0, cmax=255, pal=tbvoc_info.colors_map,
mode='P').save((segmentation_label_dir+img_name).rstrip()+'.png')

这里面我是用scipy来保存masks的,我用的版本是0.19.0,超过这个版本的scipy就没有toimage()这个函数了,据说PIL有可以替代的函数,但是我看两个的功效好像不一样,就没去折腾了。

原图,带框标注

GrabCut生成的segmentation label

读取数据部分进行了resize处理,原图尺寸是1632x1224,1224不能被32整除,五次下采样和上采样的时候会出现feature map维度不匹配的错误,因此resize成了1632x1216。这里要注意,原图是利用双线性插值进行resize的,masks图是利用最近邻进行resize的(实际上我是生成好masks后训练时才意识到这个问题,实际上可以在最开始就把dataset的数据resize好,这样masks的误差可能就小点),PIL和cv2里面都有类似的函数。

数据读取部分代码:

class TBDataset(Dataset):
def __init__(self, txt_dir, width, height, transform=None):
self.img_names = []
with open(txt_dir, 'r') as f_txt:
for img_name in f_txt:
self.img_names.append(img_name)

self.transform = transform
self.txt_dir = txt_dir
self.width = width
self.height = height

def __getitem__(self, index):
img_name = self.img_names[index]
img = Image.open(os.path.join(img_dir, img_name).rstrip()+'.jpg')
# the resize function like bilinear
img = img.resize((self.width, self.height), Image.LANCZOS)
img = np.array(img)
label = Image.open(os.path.join(label_dir, img_name).rstrip()+'.png')
# for consider class_id is not consecutive and just fixed by user
label = label.resize((self.width, self.height), Image.NEAREST)
label = np.array(label)
if self.transform is not None:
img = self.transform(img)
#img = torch.FloatTensor(img)
label = torch.FloatTensor(label)
return img, label

def __len__(self):
return len(self.img_names)

train

训练部分模型用的是FCN和UNet,因为考虑到只有二分类,后面也可以考虑deeplab,UNet++等等。FCN用的是VGG-16 backbone,下采样5次,UNet下采样4次,都是按照论文来的,没做什么改动。模型最后输出的是一个1632x1216的feature map,然后直接过sigmoid激活函数,再和1632x1216的mask图片(读进来的是一个二维0-1矩阵,代表每个像素点的label)进行loss计算,然后BP,更新参数学习。loss也用了交叉熵和focal loss.

post-processing

对模型预测出的结果再过一遍denseCRF,优化分割的同时也会去掉一些false-positive

部分代码如下:

def run_densecrf(img_dir, img_name, masks_pro):
height = masks_pro.shape[0]
width = masks_pro.shape[1]

# must use cv2.imread()
# if use PIL.Image.open(), the algorithm will break
#TODO --need to fix the image problem
img = cv.imread(os.path.join(img_dir, img_name).rstrip()+'.jpg')
img = cv.resize(img, (1632,1216), interpolation = cv.INTER_LINEAR)

# expand to [1,H,W]
masks_pro = np.expand_dims(masks_pro, 0)
# masks_pro = masks_pro[:, :, np.newaxis]
# append to array---shape(2,H,W)
# one depth represents the class 0, the other represents the class 1
masks_pro = np.append(1-masks_pro, masks_pro, axis=0)
#[Classes, H, W]
# U needs to be flat
U = masks_pro.reshape(2, -1)
# deepcopy and the order is C-order(from rows to colums)
U = U.copy(order='C')
# for binary classification, the value after sigmoid may be very small
U = np.where((U < 1e-12), 1e-12, U)
d = dcrf.DenseCRF2D(width, height, 2)

# make sure the array be c-order which will faster the processing speed
# reference: https://zhuanlan.zhihu.com/p/59767914
U = np.ascontiguousarray(U)
img = np.ascontiguousarray(img)

d.setUnaryEnergy(-np.log(U))
d.addPairwiseGaussian(sxy=3, compat=3)
d.addPairwiseBilateral(sxy=80, srgb=13, rgbim=img, compat=10)
Q = d.inference(5)
# compare each value between two rows by colum
# and inference each pixel belongs to which class(0 or 1)
map = np.argmax(Q, axis=0).reshape((height, width))
proba = np.array(map)

return proba

这里主要用到了二元势pairwise potential,比较每个像素和其他像素的关系,具体原理可以去看看原代码和论文。

此外,我还顺手进行了下迭代训练。实际上,对于我这个数据集,基本上用GrabCut生成label训练一遍效果就不错了,不过为了看下更新label再训练一轮会不会得到更好的结果,在固定的epoch结束后将训练好得模型设为eval模式,然后预测train set的数据,然后再返回train模式继续训练。需要注意的是,更新label的时候,可能会有漏诊和误诊,我就直接将预测的mask和Box_segments得到的mask相加,只取为2的部分,这样就去掉了假阳性,然后漏诊的部分再用box补回来。

从实验结果来看,一般我这个是更新3次label(每10个epoch更新一次)就差不多了,再多也没什么提升。总体上来说,这个操作可以提高单张图片同时存在多个杆菌的分割效果,但是提升力度也没什么太令人满意的地方。可能是我的更新姿势不对?

def update_label(predict_model, device):

"""load train_pairs.txt info for check the missed diagnosis objects"""
#ann_info:[image name, image name_num_ class_id.png, bbox_ymin,
# bbox_xmin,bbox_ymax, bbox_xmax, class_name]
print('start to update...')
ANNS = {}
with open(dataset_pairs_dir, 'r') as da_p_txt:
for ann_info in da_p_txt:
# split the string line, get the list
ann_info = ann_info.rstrip().split('###')
if ann_info[0].rstrip() not in ANNS:
ANNS[ann_info[0].rstrip()] = []
ANNS[ann_info[0].rstrip()].append(ann_info)


predict_model.eval()



# define the same image transformations
transformations = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])

update_num = 0
print('updating progress:')
with open(dataset_txt_dir, 'r') as da_txt:
# don't use the code line below
# or it will close the file and the whole programm end here (I guess)
# I debug here for two hours......
#lines = len(da_txt.readlines())
for update_name in da_txt:
update_num += 1
# in RGB [W, H, depth]
img = Image.open(os.path.join(img_dir, update_name).rstrip()+'.jpg')
img_w = img.size[0]
img_h = img.size[1]
img = img.resize((1632, 1216), Image.LANCZOS)
input_ = transformations(img).float()
# add batch_size dimension
#[3, H, W]-->[1, 3, H, W]
input_ = input_.unsqueeze_(0)
input_ = input_.to(device)
pred = predict_model(input_).view([1216, 1632]).data.cpu()
#pred.shape[H,W]
pred = np.array(pred)
"""crf smooth prediction"""
crf_pred = run_densecrf(img_dir, update_name, pred)

"""start to update"""
last_label = Image.open(os.path.join(label_dir, update_name).rstrip()+'.png')
last_label = last_label.resize((1632, 1216), Image.NEAREST)
last_label = np.array(last_label)

# predicted label without false-positive segments
updated_label = crf_pred + last_label
updated_label = np.where((updated_label==2), 1, 0).astype('uint8')
# predicted label with missed diagnosis
# we just use the box segments as missed diagnosis for now
info4check = ANNS[update_name.rstrip()]
masks_missed = np.zeros((1216, 1632), np.uint8)
for box4check in info4check:
xmin = box4check[3]
ymin = box4check[2]
xmax = box4check[5]
ymax = box4check[4]
xmin, ymin, xmax, ymax = get_int_coor(xmin, ymin,
xmax, ymax, img_w, img_h)
xmin = int(xmin * 1632 / img_w)
xmax = int(xmax * 1632 / img_w)
ymin = int(ymin * 1216 / img_h)
ymax = int(ymax * 1216 / img_h)
if np.sum(updated_label[ymin:ymax, xmin:xmax]) == 0:
masks_missed[ymin:ymax, xmin:xmax] = 1

updated_label = updated_label + masks_missed
scipy.misc.toimage(updated_label, cmin=0, cmax=255, pal=colors_map,
mode='P').save(os.path.join(label_dir,
update_name).rstrip()+ '.png')
print('{} / {}'.format(update_num, len(ANNS)), end='\r')

metric

一般的segmentation论文都是用IoU来进行比较的,但是这个数据集没有segmentation groundtruth,所以我就自己定义了个检测的acc:预测的mask和框有交叉(np.sum(region of box)!=0),就认为检测出了一个,然后算average acc,通过这个指标和test set上的预测结果来大致衡量哪些方法组合在一起不错。最后总结下来,还是GrabCut+FCN+FL(α=0.75,γ=1\alpha=0.75,\gamma=1)更好些,不过我没加大UNet的深度和通道数,否则的话我猜想可能UNet会占上风。

篇幅有限,放几个还不错的预测结果:

GrabCut+FCN+FL

GrabCut+FCN+FL

GrabCut+UNet+FL,UNet的结果似乎要圆润一些

GrabCut+FCN+FL更新3次label,效果。。也就马马虎虎吧

summary

总的来说,最后的弱监督分割还是收获挺多的,尤其是自己的工程能力得到了锻炼,代码组织和书写也得到了一定地提升,最后相关成果也写成论文投了ISBI会议,如果能中的话,还是很舒服的^-^