Spaces:

jwlarocque
/

DIS-SAM

Running

App Files Files Community

jwlarocque commited on 3 days ago

Commit

ab7d699

1 Parent(s): fdaae10

Create DIS-SAM space

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
IS_Net/DIS5K/DIS5K-test/enhance_gt/1#Accessories#1#Bag#2339506821_83cf9f1d22_o_comp_1.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_gt/1#Accessories#1#Bag#3292738108_c51336a8be_o_comp_1.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_gt/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_gt/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_1.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_gt/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_2.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_im/1#Accessories#1#Bag#2339506821_83cf9f1d22_o_comp_1.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/enhance_im/1#Accessories#1#Bag#3292738108_c51336a8be_o_comp_1.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/enhance_im/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/enhance_im/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_1.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/enhance_im/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_2.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/enhance_sam/1#Accessories#1#Bag#2339506821_83cf9f1d22_o_comp_1.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_sam/1#Accessories#1#Bag#3292738108_c51336a8be_o_comp_1.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_sam/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_sam/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_1.png +0 -0
IS_Net/DIS5K/DIS5K-test/enhance_sam/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_2.png +0 -0
IS_Net/DIS5K/DIS5K-test/gt/1#Accessories#1#Bag#2339506821_83cf9f1d22_o.png +0 -0
IS_Net/DIS5K/DIS5K-test/gt/1#Accessories#1#Bag#3292738108_c51336a8be_o.png +0 -0
IS_Net/DIS5K/DIS5K-test/gt/4#Architecture#10#Pavilion#5795028920_08884db993_o.png +0 -0
IS_Net/DIS5K/DIS5K-test/im/1#Accessories#1#Bag#2339506821_83cf9f1d22_o.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/im/1#Accessories#1#Bag#3292738108_c51336a8be_o.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/im/4#Architecture#10#Pavilion#5795028920_08884db993_o.jpg +3 -0
IS_Net/DIS5K/DIS5K-test/mask/1#Accessories#1#Bag#2339506821_83cf9f1d22_o.png +0 -0
IS_Net/DIS5K/DIS5K-test/mask/1#Accessories#1#Bag#3292738108_c51336a8be_o.png +0 -0
IS_Net/DIS5K/DIS5K-test/mask/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.png +0 -0
IS_Net/__pycache__/data_loader.cpython-311.pyc +0 -0
IS_Net/basics.py +125 -0
IS_Net/data_loader.py +542 -0
IS_Net/datalist.py +62 -0
IS_Net/models/__pycache__/isnet.cpython-311.pyc +0 -0
IS_Net/models/isnet.py +640 -0
IS_Net/saliency_toolbox.py +552 -0
IS_Net/swd_optim/__init__.py +10 -0
IS_Net/swd_optim/adai.py +116 -0
IS_Net/swd_optim/adais.py +120 -0
IS_Net/swd_optim/adams.py +137 -0
IS_Net/swd_optim/sgds.py +82 -0
IS_Net/train_valid_inference_main.py +729 -0
MultiScaleDeformableAttention-1.0-py3-none-any.whl +3 -0
README.md +4 -2
SAM/segment_anything/__init__.py +15 -0
SAM/segment_anything/__pycache__/__init__.cpython-311.pyc +0 -0
SAM/segment_anything/__pycache__/automatic_mask_generator.cpython-311.pyc +0 -0
SAM/segment_anything/__pycache__/build_sam.cpython-311.pyc +0 -0
SAM/segment_anything/__pycache__/predictor.cpython-311.pyc +0 -0
SAM/segment_anything/automatic_mask_generator.py +372 -0
SAM/segment_anything/build_sam.py +111 -0
SAM/segment_anything/modeling/__init__.py +11 -0
SAM/segment_anything/modeling/__pycache__/__init__.cpython-311.pyc +0 -0
SAM/segment_anything/modeling/__pycache__/common.cpython-311.pyc +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+MultiScaleDeformableAttention-1.0-py3-none-any.whl filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text

IS_Net/DIS5K/DIS5K-test/enhance_gt/1#Accessories#1#Bag#2339506821_83cf9f1d22_o_comp_1.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_gt/1#Accessories#1#Bag#3292738108_c51336a8be_o_comp_1.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_gt/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_gt/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_1.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_gt/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_2.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_im/1#Accessories#1#Bag#2339506821_83cf9f1d22_o_comp_1.jpg ADDED Viewed

Git LFS Details

SHA256: 4cb7d3c28db6f3bc4d2227d7551b1ba85abc9d335a1c6a625777de351bb9d469
Pointer size: 131 Bytes
Size of remote file: 779 kB

IS_Net/DIS5K/DIS5K-test/enhance_im/1#Accessories#1#Bag#3292738108_c51336a8be_o_comp_1.jpg ADDED Viewed

Git LFS Details

SHA256: 2c4229b3b7978308ba3f28903e5a65e2bce7bfa7bd684f53c5bb23f3067dd6c4
Pointer size: 131 Bytes
Size of remote file: 146 kB

IS_Net/DIS5K/DIS5K-test/enhance_im/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.jpg ADDED Viewed

Git LFS Details

SHA256: 71fd4c8bd0e10b57142b9781bc4654f368cd91ceb8e0b4e22ff9e54ce0b2fe06
Pointer size: 132 Bytes
Size of remote file: 1.19 MB

IS_Net/DIS5K/DIS5K-test/enhance_im/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_1.jpg ADDED Viewed

Git LFS Details

SHA256: 71fd4c8bd0e10b57142b9781bc4654f368cd91ceb8e0b4e22ff9e54ce0b2fe06
Pointer size: 132 Bytes
Size of remote file: 1.19 MB

IS_Net/DIS5K/DIS5K-test/enhance_im/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_2.jpg ADDED Viewed

Git LFS Details

SHA256: 71fd4c8bd0e10b57142b9781bc4654f368cd91ceb8e0b4e22ff9e54ce0b2fe06
Pointer size: 132 Bytes
Size of remote file: 1.19 MB

IS_Net/DIS5K/DIS5K-test/enhance_sam/1#Accessories#1#Bag#2339506821_83cf9f1d22_o_comp_1.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_sam/1#Accessories#1#Bag#3292738108_c51336a8be_o_comp_1.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_sam/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_sam/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_1.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/enhance_sam/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_2.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/gt/1#Accessories#1#Bag#2339506821_83cf9f1d22_o.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/gt/1#Accessories#1#Bag#3292738108_c51336a8be_o.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/gt/4#Architecture#10#Pavilion#5795028920_08884db993_o.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/im/1#Accessories#1#Bag#2339506821_83cf9f1d22_o.jpg ADDED Viewed

Git LFS Details

SHA256: 4cb7d3c28db6f3bc4d2227d7551b1ba85abc9d335a1c6a625777de351bb9d469
Pointer size: 131 Bytes
Size of remote file: 779 kB

IS_Net/DIS5K/DIS5K-test/im/1#Accessories#1#Bag#3292738108_c51336a8be_o.jpg ADDED Viewed

Git LFS Details

SHA256: 2c4229b3b7978308ba3f28903e5a65e2bce7bfa7bd684f53c5bb23f3067dd6c4
Pointer size: 131 Bytes
Size of remote file: 146 kB

IS_Net/DIS5K/DIS5K-test/im/4#Architecture#10#Pavilion#5795028920_08884db993_o.jpg ADDED Viewed

Git LFS Details

SHA256: 71fd4c8bd0e10b57142b9781bc4654f368cd91ceb8e0b4e22ff9e54ce0b2fe06
Pointer size: 132 Bytes
Size of remote file: 1.19 MB

IS_Net/DIS5K/DIS5K-test/mask/1#Accessories#1#Bag#2339506821_83cf9f1d22_o.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/mask/1#Accessories#1#Bag#3292738108_c51336a8be_o.png ADDED Viewed

IS_Net/DIS5K/DIS5K-test/mask/4#Architecture#10#Pavilion#5795028920_08884db993_o_comp_0.png ADDED Viewed

IS_Net/__pycache__/data_loader.cpython-311.pyc ADDED Viewed

Binary file (34.3 kB). View file

IS_Net/basics.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import os
+# os.environ['CUDA_VISIBLE_DEVICES'] = '2'
+from skimage import io, transform
+import torch
+import torchvision
+from torch.autograd import Variable
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.data import Dataset, DataLoader
+from torchvision import transforms, utils
+import torch.optim as optim
+from skimage.metrics import structural_similarity as ssim
+import matplotlib.pyplot as plt
+import numpy as np
+from PIL import Image
+import glob
+import cv2
+from scipy.stats import pearsonr
+def mae_torch(pred,gt):
+	h,w = gt.shape[0:2]
+	sumError = torch.sum(torch.absolute(torch.sub(pred.float(), gt.float())))
+	maeError = torch.divide(sumError,float(h)*float(w)*255.0+1e-4)
+	return maeError
+import torch
+def maximal_f_measure_torch(pd, gt):
+    gtNum = torch.sum((gt > 128).float() * 1)  # 计算真实标签中像素值大于128的数量
+    # 从预测张量中提取正例和负例
+    pp = pd[gt > 128]
+    nn = pd[gt <= 128]
+    # 计算正例和负例的直方图
+    pp_hist = torch.histc(pp, bins=255, min=0, max=255)
+    nn_hist = torch.histc(nn, bins=255, min=0, max=255)
+    # 反转直方图并计算累积和
+    pp_hist_flip = torch.flipud(pp_hist)
+    nn_hist_flip = torch.flipud(nn_hist)
+    pp_hist_flip_cum = torch.cumsum(pp_hist_flip, dim=0)
+    nn_hist_flip_cum = torch.cumsum(nn_hist_flip, dim=0)
+    # 计算Precision、Recall 和 F-measure
+    precision = (pp_hist_flip_cum) / (pp_hist_flip_cum + nn_hist_flip_cum + 1e-4)
+    recall = (pp_hist_flip_cum) / (gtNum + 1e-4)
+    f_measure = (2 * precision * recall) / (precision + recall + 1e-4)
+    # 找到最大F-measure及其对应的阈值
+    max_f_measure, threshold = torch.max(f_measure, dim=0)
+    return max_f_measure.item(), threshold.item()
+def calculate_meam(image1, image2):
+    # 直方图均衡化
+    image1_equalized = cv2.equalizeHist(image1)
+    image2_equalized = cv2.equalizeHist(image2)
+    # 计算Pearson相关系数
+    correlation_coefficient, _ = pearsonr(image1_equalized.flatten(), image2_equalized.flatten())
+    # 计算MEAM值
+    meam_value = correlation_coefficient * np.mean(np.minimum(image1_equalized, image2_equalized))
+    return meam_value
+def f1score_torch(pd,gt):
+	# print(gt.shape)
+	gtNum = torch.sum((gt>128).float()*1) ## number of ground truth pixels
+	pp = pd[gt>128]
+	nn = pd[gt<=128]
+	pp_hist =torch.histc(pp,bins=255,min=0,max=255)
+	nn_hist = torch.histc(nn,bins=255,min=0,max=255)
+	pp_hist_flip = torch.flipud(pp_hist)
+	nn_hist_flip = torch.flipud(nn_hist)
+	pp_hist_flip_cum = torch.cumsum(pp_hist_flip, dim=0)
+	nn_hist_flip_cum = torch.cumsum(nn_hist_flip, dim=0)
+	precision = (pp_hist_flip_cum)/(pp_hist_flip_cum + nn_hist_flip_cum + 1e-4)#torch.divide(pp_hist_flip_cum,torch.sum(torch.sum(pp_hist_flip_cum, nn_hist_flip_cum), 1e-4))
+	recall = (pp_hist_flip_cum)/(gtNum + 1e-4)
+	f1 = (1+0.3)*precision*recall/(0.3*precision+recall + 1e-4)
+	return torch.reshape(precision,(1,precision.shape[0])),torch.reshape(recall,(1,recall.shape[0])),torch.reshape(f1,(1,f1.shape[0]))
+def f1_mae_torch(pred, gt, valid_dataset, idx, mybins, hypar):
+	import time
+	tic = time.time()
+	if(len(gt.shape)>2):
+		gt = gt[:,:,0]
+	# if pred.shape != gt.shape:
+	# 	plt.imshow(pred.cpu().detach().numpy())
+	# 	plt.show()
+	# 	plt.imshow(gt.cpu().detach().numpy())
+	# 	plt.show()
+		# pred = pred.transpose(1,0)
+	# print(pred.shape,gt.shape)
+	# print(valid_dataset.dataset["im_name"][idx]+".png")
+	pre, rec, f1 = f1score_torch(pred,gt)
+	mae = mae_torch(pred,gt)
+	# hypar["valid_out_dir"] = hypar["valid_out_dir"]+"-eval" ###
+	if(hypar["valid_out_dir"]!=""):
+		if(not os.path.exists(hypar["valid_out_dir"])):
+			os.mkdir(hypar["valid_out_dir"])
+		dataset_folder = os.path.join(hypar["valid_out_dir"],valid_dataset.dataset["data_name"][idx])
+		if(not os.path.exists(dataset_folder)):
+			os.mkdir(dataset_folder)
+		io.imsave(os.path.join(dataset_folder,valid_dataset.dataset["im_name"][idx]+".png"),pred.cpu().data.numpy().astype(np.uint8))
+	# print(valid_dataset.dataset["im_name"][idx]+".png")
+	# print("time for evaluation : ", time.time()-tic)
+	return pre.cpu().data.numpy(), rec.cpu().data.numpy(), f1.cpu().data.numpy(), mae.cpu().data.numpy()

IS_Net/data_loader.py ADDED Viewed

	@@ -0,0 +1,542 @@

+## data loader
+## Ackownledgement:
+## We would like to thank Dr. Ibrahim Almakky (https://scholar.google.co.uk/citations?user=T9MTcK0AAAAJ&hl=en)
+## for his helps in implementing cache machanism of our DIS dataloader.
+from __future__ import print_function, division
+import numpy as np
+import random
+from copy import deepcopy
+import json
+from tqdm import tqdm
+from skimage import io
+import os
+from glob import glob
+import matplotlib.pyplot as plt
+from PIL import Image, ImageOps
+import torch
+from torch.utils.data import Dataset, DataLoader
+from torchvision import transforms, utils
+from torchvision.transforms.functional import normalize
+import torch.nn.functional as F
+import cv2
+from scipy.ndimage import label
+def show_gray_images(images, m=4):
+    """
+    展示一组灰度图像
+    参数:
+    images: 一个形状为(n, h, w)的数组，其中n是图像的数量，h和w分别是图像的高度和宽度。
+    m: 每行展示的图像数量，默认为4。
+    返回值:
+    无
+    """
+    n, h, w = images.shape  # 获取输入图像的数量、高度和宽度
+    num_rows = (n + m - 1) // m  # 计算需要的行数
+    fig, axes = plt.subplots(num_rows, m, figsize=(m*2, num_rows*2))  # 创建画布和子图
+    plt.subplots_adjust(wspace=0.05, hspace=0.05)  # 调整子图间的间距
+    for i in range(num_rows):
+        for j in range(m):
+            idx = i*m + j  # 计算当前图像的索引
+            if idx < n:
+                axes[i, j].imshow(images[idx], cmap='gray')  # 展示图像
+                axes[i, j].axis('off')  # 关闭坐标轴显示
+    plt.show()  # 显示图像
+#### --------------------- DIS dataloader cache ---------------------####
+def segment_connected_components(mask):
+    # 将mask转换为PyTorch张量
+    mask_tensor = torch.tensor(mask)
+    # 使用Scipy的label函数找到连通组件
+    labeled_array, num_features = label(mask_tensor.numpy())
+    # 创建一个字典来存储每个连通组件的像素值
+    components = {}
+    for label_idx in range(1, num_features + 1):
+        component_mask = (labeled_array == label_idx)
+        components[label_idx] = component_mask.astype(int)
+    return components
+def FillHole(im_in):
+    img = np.array(im_in,dtype=np.uint8)[0]
+    mask = np.zeros_like(img)
+    contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+    for contour in contours:
+        cv2.drawContours(mask, [contour], -1, 255, thickness=cv2.FILLED)
+    im_out = torch.from_numpy(mask)[None,...].float()
+    return im_out
+def get_im_gt_name_dict(datasets, flag='valid'):
+    print("------------------------------", flag, "--------------------------------")
+    name_im_gt_mid_list = []
+    for i in range(len(datasets)):
+        print("--->>>", flag, " dataset ",i,"/",len(datasets)," ",datasets[i]["name"],"<<<---")
+        tmp_im_list, tmp_gt_list, tmp_mid_list = [], [], []
+        tmp_im_list = glob(datasets[i]["im_dir"]+os.sep+'*'+datasets[i]["im_ext"])
+        # img_name_dict[im_dirs[i][0]] = tmp_im_list
+        # print('-im-',datasets[i]["name"],datasets[i]["im_dir"], ': ',len(tmp_im_list))
+        if(datasets[i]["gt_dir"]==""):
+            print('-gt-', datasets[i]["name"], datasets[i]["gt_dir"], ': ', 'No Ground Truth Found')
+            tmp_gt_list = []
+        else:
+            tmp_gt_list = [datasets[i]["gt_dir"]+os.sep+x.split(os.sep)[-1].split(datasets[i]["im_ext"])[0]+datasets[i]["gt_ext"] for x in tmp_im_list]
+            # lbl_name_dict[im_dirs[i][0]] = tmp_gt_list
+            # print('-gt-', datasets[i]["name"],datasets[i]["gt_dir"], ': ',len(tmp_gt_list))
+        if(datasets[i]["mid_dir"]==""):
+            print('-mid-', datasets[i]["name"], datasets[i]["mid_dir"], ': ', 'No mid Found')
+            tmp_mid_list = []
+        else:
+            tmp_mid_list = [datasets[i]["mid_dir"]+os.sep+x.split(os.sep)[-1].split(datasets[i]["im_ext"])[0]+datasets[i]["mid_ext"] for x in tmp_im_list]
+            # lbl_name_dict[im_dirs[i][0]] = tmp_gt_list
+            # print('-mid-', datasets[i]["name"],datasets[i]["mid_dir"], ': ',len(tmp_gt_list))
+        if flag=="train": ## combine multiple training sets into one dataset
+            if len(name_im_gt_mid_list)==0:
+                name_im_gt_mid_list.append({"dataset_name":datasets[i]["name"],
+                                        "im_path":tmp_im_list,
+                                        "gt_path":tmp_gt_list,
+                                        "mid_path":tmp_mid_list,
+                                        "im_ext":datasets[i]["im_ext"],
+                                        "gt_ext":datasets[i]["gt_ext"],
+                                        "mid_ext":datasets[i]["mid_ext"],
+                                        "cache_dir":datasets[i]["cache_dir"]})
+            else:
+                name_im_gt_mid_list[0]["dataset_name"] = name_im_gt_mid_list[0]["dataset_name"] + "_" + datasets[i]["name"]
+                name_im_gt_mid_list[0]["im_path"] = name_im_gt_mid_list[0]["im_path"] + tmp_im_list
+                name_im_gt_mid_list[0]["gt_path"] = name_im_gt_mid_list[0]["gt_path"] + tmp_gt_list
+                name_im_gt_mid_list[0]["mid_path"] = name_im_gt_mid_list[0]["mid_path"] + tmp_mid_list
+                if datasets[i]["im_ext"]!=".jpg" or datasets[i]["gt_ext"]!=".png":
+                    print("Error: Please make sure all you images and ground truth masks are in jpg and png format respectively !!!")
+                    exit()
+                name_im_gt_mid_list[0]["im_ext"] = ".jpg"
+                name_im_gt_mid_list[0]["gt_ext"] = ".png"
+                name_im_gt_mid_list[0]["mid_ext"] = ".png"
+                name_im_gt_mid_list[0]["cache_dir"] = os.sep.join(datasets[i]["cache_dir"].split(os.sep)[0:-1])+os.sep+name_im_gt_mid_list[0]["dataset_name"]
+        else: ## keep different validation or inference datasets as separate ones
+            name_im_gt_mid_list.append({"dataset_name":datasets[i]["name"],
+                                    "im_path":tmp_im_list,
+                                    "gt_path":tmp_gt_list,
+                                    "mid_path":tmp_mid_list,
+                                    "im_ext":datasets[i]["im_ext"],
+                                    "gt_ext":datasets[i]["gt_ext"],
+                                    "mid_ext":datasets[i]["mid_ext"],
+                                    "cache_dir":datasets[i]["cache_dir"]})
+    return name_im_gt_mid_list
+def create_dataloaders(name_im_gt_mid_list, cache_size=[], cache_boost=True, my_transforms=[], batch_size=1, shuffle=False,is_train=True):
+    ## model="train": return one dataloader for training
+    ## model="valid": return a list of dataloaders for validation or testing
+    gos_dataloaders = []
+    gos_datasets = []
+    if(len(name_im_gt_mid_list)==0):
+        return gos_dataloaders, gos_datasets
+    num_workers_ = 0
+    # if(batch_size>1):
+    #     num_workers_ = 2
+    # if(batch_size>4):
+    #     num_workers_ = 4
+    # if(batch_size>8):
+    #     num_workers_ = 8
+    for i in range(0,len(name_im_gt_mid_list)):
+        gos_dataset = GOSDatasetCache([name_im_gt_mid_list[i]],
+                                      cache_size = cache_size,
+                                      cache_path = name_im_gt_mid_list[i]["cache_dir"],
+                                      cache_boost = cache_boost,
+                                      transform = transforms.Compose(my_transforms),
+                                      is_train=is_train)
+        gos_dataloaders.append(DataLoader(gos_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers_))
+        gos_datasets.append(gos_dataset)
+    return gos_dataloaders, gos_datasets
+def im_reader(im_path):
+    image = Image.open(im_path).convert('RGB')
+    corrected_image = ImageOps.exif_transpose(image)
+    # return plt.imread(im_path)
+    return np.array(corrected_image)
+def im_preprocess(im,size):
+    if len(im.shape) > 3:
+        im = im[:,:,:3]
+    if len(im.shape) < 3:
+        im = im[:, :, np.newaxis]
+    if im.shape[2] == 1:
+        im = np.repeat(im, 3, axis=2)
+    im_tensor = torch.tensor(im.copy(), dtype=torch.float32)
+    im_tensor = torch.transpose(torch.transpose(im_tensor,1,2),0,1)
+    if(len(size)<2):
+        return im_tensor, im.shape[0:2]
+    else:
+        im_tensor = torch.unsqueeze(im_tensor,0)
+        im_tensor = F.upsample(im_tensor, size, mode="bilinear")
+        im_tensor = torch.squeeze(im_tensor,0)
+    return im_tensor.type(torch.uint8), im.shape[0:2]
+def gt_preprocess(gt,size):
+    if len(gt.shape) > 2:
+        gt = gt[:, :, 0]
+    gt_tensor = torch.unsqueeze(torch.tensor(gt, dtype=torch.uint8),0)
+    if(len(size)<2):
+        return gt_tensor.type(torch.uint8), gt.shape[0:2]
+    else:
+        gt_tensor = torch.unsqueeze(torch.tensor(gt_tensor, dtype=torch.float32),0)
+        gt_tensor = F.upsample(gt_tensor, size, mode="bilinear")
+        gt_tensor = torch.squeeze(gt_tensor,0)
+    return gt_tensor.type(torch.uint8), gt.shape[0:2]
+    # return gt_tensor, gt.shape[0:2]
+class GOSRandomHFlip(object):
+    def __init__(self,prob=0.25):
+        self.prob = prob
+    def __call__(self,sample):
+        imidx, image, label, shape, box, mask =  sample['imidx'], sample['image'], sample['label'], sample['shape'], sample['box'], sample['mask']
+        # random horizontal flip
+        randomnum = random.random()
+        if randomnum <= self.prob:
+            image = torch.flip(image,dims=[2])
+            label = torch.flip(label,dims=[2])
+            box = torch.flip(box,dims=[2])
+            mask = torch.flip(mask,dims=[2])
+        elif randomnum <= self.prob*2:
+            image = torch.flip(image,dims=[1])
+            label = torch.flip(label,dims=[1])
+            box = torch.flip(box,dims=[1])
+            mask = torch.flip(mask,dims=[1])
+        elif randomnum <= self.prob*3:
+            image = torch.flip(image,dims=[2])
+            label = torch.flip(label,dims=[2])
+            box = torch.flip(box,dims=[2])
+            mask = torch.flip(mask,dims=[2])
+            image = torch.flip(image,dims=[1])
+            label = torch.flip(label,dims=[1])
+            box = torch.flip(box,dims=[1])
+            mask = torch.flip(mask,dims=[1])
+        return {'imidx':imidx,'image':image, 'label':label, 'shape':shape, 'mask':mask, 'box':box}
+class GOSResize(object):
+    def __init__(self,size=[320,320]):
+        self.size = size
+    def __call__(self,sample):
+        imidx, image, label, shape, box, mask =  sample['imidx'], sample['image'], sample['label'], sample['shape'], sample['box'], sample['mask']
+        # import time
+        # start = time.time()
+        image = torch.squeeze(F.upsample(torch.unsqueeze(image,0),self.size,mode='bilinear'),dim=0)
+        label = torch.squeeze(F.upsample(torch.unsqueeze(label,0),self.size,mode='bilinear'),dim=0)
+        # print("time for resize: ", time.time()-start)
+        return {'imidx':imidx,'image':image, 'label':label, 'shape':shape, 'mask':mask, 'box':box}
+class GOSRandomCrop(object):
+    def __init__(self,size=[288,288]):
+        self.size = size
+    def __call__(self,sample):
+        imidx, image, label, shape, box, mask =  sample['imidx'], sample['image'], sample['label'], sample['shape'], sample['box'], sample['mask']
+        h, w = image.shape[1:]
+        new_h, new_w = self.size
+        top = np.random.randint(0, h - new_h)
+        left = np.random.randint(0, w - new_w)
+        image = image[:,top:top+new_h,left:left+new_w]
+        label = label[:,top:top+new_h,left:left+new_w]
+        return {'imidx':imidx,'image':image, 'label':label, 'shape':shape, 'mask':mask, 'box':box}
+class GOSNormalize(object):
+    def __init__(self, mean=[0.485,0.456,0.406,0], std=[0.229,0.224,0.225,1.0]):
+        self.mean = mean
+        self.std = std
+    def __call__(self,sample):
+        imidx, image, label, shape, box, mask =  sample['imidx'], sample['image'], sample['label'], sample['shape'], sample['box'], sample['mask']
+        # print(image.shape)
+        image = normalize(image,self.mean,self.std)
+        mask = normalize(mask,0,1)
+        box = normalize(box,0,1)
+        return {'imidx':imidx,'image':image, 'label':label, 'shape':shape, 'mask':mask, 'box':box}
+class GOSRandomthorw(object):
+    def __init__(self,ratio=0.25):
+        self.ratio = ratio
+    def __call__(self,sample):
+        imidx, image, label, shape, box, mask =  sample['imidx'], sample['image'], sample['label'], sample['shape'], sample['box'], sample['mask']
+        randomnum = random.random()
+        if randomnum < self.ratio:
+            mask = torch.zeros_like(mask)
+        elif randomnum < self.ratio*2:
+            box = torch.zeros_like(box)
+        elif randomnum < self.ratio*3:
+            mask = torch.zeros_like(mask)
+            box = torch.zeros_like(box)
+        return {'imidx':imidx,'image':image, 'label':label, 'shape':shape, 'mask':mask, 'box':box}
+class GOSDatasetCache(Dataset):
+    def __init__(self, name_im_gt_mid_list, cache_size=[], cache_path='./cache', cache_file_name='dataset.json', cache_boost=False, transform=None, is_train=True):
+        self.is_train = is_train
+        self.cache_size = cache_size
+        self.cache_path = cache_path
+        self.cache_file_name = cache_file_name
+        self.cache_boost_name = ""
+        self.cache_boost = cache_boost
+        # self.ims_npy = None
+        # self.gts_npy = None
+        ## cache all the images and ground truth into a single pytorch tensor
+        self.ims_pt = None
+        self.gts_pt = None
+        self.mid_pt = None
+        ## we will cache the npy as well regardless of the cache_boost
+        # if(self.cache_boost):
+        self.cache_boost_name = cache_file_name.split('.json')[0]
+        self.transform = transform
+        self.dataset = {}
+        ## combine different datasets into one
+        dataset_names = []
+        dt_name_list = [] # dataset name per image
+        im_name_list = [] # image name
+        im_path_list = [] # im path
+        gt_path_list = [] # gt path
+        mid_path_list = []
+        im_ext_list = [] # im ext
+        gt_ext_list = [] # gt ext
+        mid_ext_list = []
+        for i in range(0,len(name_im_gt_mid_list)):
+            dataset_names.append(name_im_gt_mid_list[i]["dataset_name"])
+            # dataset name repeated based on the number of images in this dataset
+            dt_name_list.extend([name_im_gt_mid_list[i]["dataset_name"] for x in name_im_gt_mid_list[i]["im_path"]])
+            im_name_list.extend([x.split(os.sep)[-1].split(name_im_gt_mid_list[i]["im_ext"])[0] for x in name_im_gt_mid_list[i]["im_path"]])
+            im_path_list.extend(name_im_gt_mid_list[i]["im_path"])
+            gt_path_list.extend(name_im_gt_mid_list[i]["gt_path"])
+            mid_path_list.extend(name_im_gt_mid_list[i]["mid_path"])
+            im_ext_list.extend([name_im_gt_mid_list[i]["im_ext"] for x in name_im_gt_mid_list[i]["im_path"]])
+            gt_ext_list.extend([name_im_gt_mid_list[i]["gt_ext"] for x in name_im_gt_mid_list[i]["gt_path"]])
+            mid_ext_list.extend([name_im_gt_mid_list[i]["mid_ext"] for x in name_im_gt_mid_list[i]["mid_path"]])
+        self.dataset["data_name"] = dt_name_list
+        self.dataset["im_name"] = im_name_list
+        self.dataset["im_path"] = im_path_list
+        self.dataset["ori_im_path"] = deepcopy(im_path_list)
+        self.dataset["gt_path"] = gt_path_list
+        self.dataset["ori_gt_path"] = deepcopy(gt_path_list)
+        self.dataset["mid_path"] = mid_path_list
+        self.dataset["ori_mid_path"] = deepcopy(mid_path_list)
+        self.dataset["im_shp"] = []
+        self.dataset["gt_shp"] = []
+        self.dataset["mid_shp"] = []
+        self.dataset["im_ext"] = im_ext_list
+        self.dataset["gt_ext"] = gt_ext_list
+        self.dataset["mid_ext"] = mid_ext_list
+        self.dataset["ims_pt_dir"] = ""
+        self.dataset["gts_pt_dir"] = ""
+        self.dataset["mid_pt_dir"] = ""
+        self.dataset = self.manage_cache(dataset_names)
+    def manage_cache(self,dataset_names):
+        if not os.path.exists(self.cache_path): # create the folder for cache
+            os.makedirs(self.cache_path)
+        cache_folder = os.path.join(self.cache_path, "_".join(dataset_names)+"_"+"x".join([str(x) for x in self.cache_size]))
+        # if cache_folder.__len__() > 100: cache_folder = cache_folder[:100]
+        if not os.path.exists(cache_folder): # check if the cache files are there, if not then cache
+            return self.cache(cache_folder)
+        return self.load_cache(cache_folder)
+    def cache(self,cache_folder):
+        os.mkdir(cache_folder)
+        cached_dataset = deepcopy(self.dataset)
+        # ims_list = []
+        # gts_list = []
+        ims_pt_list = []
+        gts_pt_list = []
+        mid_pt_list = []
+        for i, im_path in tqdm(enumerate(self.dataset["im_path"]), total=len(self.dataset["im_path"])):
+            im_id = cached_dataset["im_name"][i]
+            # print("im_path: ", im_path)
+            im = im_reader(im_path)
+            im, im_shp = im_preprocess(im,self.cache_size)
+            im_cache_file = os.path.join(cache_folder,self.dataset["data_name"][i]+"_"+im_id + "_im.pt")
+            torch.save(im,im_cache_file)
+            cached_dataset["im_path"][i] = im_cache_file
+            if(self.cache_boost):
+                ims_pt_list.append(torch.unsqueeze(im,0))
+            # ims_list.append(im.cpu().data.numpy().astype(np.uint8))
+            gt = np.zeros(im.shape[0:2])
+            if len(self.dataset["gt_path"])!=0:
+                gt = im_reader(self.dataset["gt_path"][i])
+            gt, gt_shp = gt_preprocess(gt,self.cache_size)
+            gt_cache_file = os.path.join(cache_folder,self.dataset["data_name"][i]+"_"+im_id + "_gt.pt")
+            torch.save(gt,gt_cache_file)
+            if len(self.dataset["gt_path"])>0:
+                cached_dataset["gt_path"][i] = gt_cache_file
+            else:
+                cached_dataset["gt_path"].append(gt_cache_file)
+            if(self.cache_boost):
+                gts_pt_list.append(torch.unsqueeze(gt,0))
+            mid = np.zeros(im.shape[0:2])
+            if len(self.dataset["mid_path"])!=0:
+                mid = im_reader(self.dataset["mid_path"][i])
+            mid, mid_shp = gt_preprocess(mid,self.cache_size)
+            mid_cache_file = os.path.join(cache_folder,self.dataset["data_name"][i]+"_"+im_id + "_mid.pt")
+            torch.save(mid,mid_cache_file)
+            if len(self.dataset["mid_path"])>0:
+                cached_dataset["mid_path"][i] = mid_cache_file
+            else:
+                cached_dataset["mid_path"].append(mid_cache_file)
+            if(self.cache_boost):
+                mid_pt_list.append(torch.unsqueeze(mid,0))
+            # gts_list.append(gt.cpu().data.numpy().astype(np.uint8))
+            # im_shp_cache_file = os.path.join(cache_folder,im_id + "_im_shp.pt")
+            # torch.save(gt_shp, shp_cache_file)
+            cached_dataset["im_shp"].append(im_shp)
+            # self.dataset["im_shp"].append(im_shp)
+            # shp_cache_file = os.path.join(cache_folder,im_id + "_gt_shp.pt")
+            # torch.save(gt_shp, shp_cache_file)
+            cached_dataset["gt_shp"].append(gt_shp)
+            # self.dataset["gt_shp"].append(gt_shp)
+            cached_dataset["mid_shp"].append(mid_shp)
+        if(self.cache_boost):
+            cached_dataset["ims_pt_dir"] = os.path.join(cache_folder, self.cache_boost_name+'_ims.pt')
+            cached_dataset["gts_pt_dir"] = os.path.join(cache_folder, self.cache_boost_name+'_gts.pt')
+            cached_dataset["mid_pt_dir"] = os.path.join(cache_folder, self.cache_boost_name+'_mids.pt')
+            self.ims_pt = torch.cat(ims_pt_list,dim=0)
+            self.gts_pt = torch.cat(gts_pt_list,dim=0)
+            self.mid_pt = torch.cat(mid_pt_list,dim=0)
+            torch.save(torch.cat(ims_pt_list,dim=0),cached_dataset["ims_pt_dir"])
+            torch.save(torch.cat(gts_pt_list,dim=0),cached_dataset["gts_pt_dir"])
+            torch.save(torch.cat(mid_pt_list,dim=0),cached_dataset["mid_pt_dir"])
+        try:
+            json_file = open(os.path.join(cache_folder, self.cache_file_name),"w")
+            json.dump(cached_dataset, json_file)
+            json_file.close()
+        except Exception:
+            raise FileNotFoundError("Cannot create JSON")
+        return cached_dataset
+    def load_cache(self, cache_folder):
+        print(os.path.join(cache_folder,self.cache_file_name))
+        json_file = open(os.path.join(cache_folder,self.cache_file_name),"r")
+        dataset = json.load(json_file)
+        json_file.close()
+        ## if cache_boost is true, we will load the image npy and ground truth npy into the RAM
+        ## otherwise the pytorch tensor will be loaded
+        if(self.cache_boost):
+            # self.ims_npy = np.load(dataset["ims_npy_dir"])
+            # self.gts_npy = np.load(dataset["gts_npy_dir"])
+            self.ims_pt = torch.load(dataset["ims_pt_dir"], map_location='cpu')
+            self.gts_pt = torch.load(dataset["gts_pt_dir"], map_location='cpu')
+            self.mid_pt = torch.load(dataset["mid_pt_dir"], map_location='cpu')
+        return dataset
+    def __len__(self):
+        return len(self.dataset["im_path"])
+    def __getitem__(self, idx):
+        im = None
+        gt = None
+        mid = None
+        if(self.cache_boost and self.ims_pt is not None):
+            # start = time.time()
+            im = self.ims_pt[idx]#.type(torch.float32)
+            gt = self.gts_pt[idx]#.type(torch.float32)
+            mid = self.mid_pt[idx]#.type(torch.float32)
+            # print(idx, 'time for pt loading: ', time.time()-start)
+        else:
+            # import time
+            # start = time.time()
+            # print("tensor***")
+            im_pt_path = os.path.join(self.cache_path,os.sep.join(self.dataset["im_path"][idx].split(os.sep)[-2:]))
+            im = torch.load(im_pt_path)#(self.dataset["im_path"][idx])
+            gt_pt_path = os.path.join(self.cache_path,os.sep.join(self.dataset["gt_path"][idx].split(os.sep)[-2:]))
+            gt = torch.load(gt_pt_path)#(self.dataset["gt_path"][idx])
+            mid_pt_path = os.path.join(self.cache_path,os.sep.join(self.dataset["mid_path"][idx].split(os.sep)[-2:]))
+            mid = torch.load(mid_pt_path)#(self.dataset["gt_path"][idx])
+            # print(idx,'time for tensor loading: ', time.time()-start)
+        im_shp = self.dataset["im_shp"][idx]
+        # print("time for loading im and gt: ", time.time()-start)
+        box = torch.zeros_like(gt[0])+gt[0]
+        rows, cols = torch.where(box>0)
+        left = torch.min(cols)
+        top = torch.min(rows)
+        right = torch.max(cols)
+        bottom = torch.max(rows)
+        box[top:bottom,left:right] = 255
+        box[box!=255] = 0
+        box = box[None,...]
+        gim = torch.cat([im,mid,box],dim=0)
+        # start_time = time.time()
+        im = torch.divide(gim,255.0)
+        gt = torch.divide(gt,255.0)
+        mask = torch.divide(mid,255.0)
+        box = torch.divide(box,255.0)
+        sample = {
+        "imidx": torch.from_numpy(np.array(idx)),
+        "image": im,
+        "label": gt,
+        "mask": mask,
+        'box': box,
+        "shape": torch.from_numpy(np.array(im_shp)),
+        }
+        if self.transform:
+            sample = self.transform(sample)
+        return sample

IS_Net/datalist.py ADDED Viewed

	@@ -0,0 +1,62 @@

+dataset_test = {"name": "DIS5K-test",
+                "im_dir": r"DIS5K/DIS5K-test/im",
+                "gt_dir": r"DIS5K/DIS5K-test/gt",
+                "mid_dir":r"DIS5K/DIS5K-test/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-test"}
+dataset_tr = {"name": "DIS5K-TR-m",
+                "im_dir": r"DIS5K/DIS-TR/im",
+                "gt_dir": r"DIS5K/DIS-TR/gt",
+                "mid_dir":r"DIS5K-TR/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-TR-m"}
+dataset_vd = {"name": "DIS5K-VD-m",
+                "im_dir": r"DIS5K/DIS-VD/im",
+                "gt_dir": r"DIS5K/DIS-VD/gt",
+                "mid_dir":r"DIS5K/DIS5K-VD/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-VD-m"}
+dataset_te1 = {"name": "DIS5K-TE1-m",
+                "im_dir": r"DIS5K/DIS-TE1/im",
+                "gt_dir": r"DIS5K/DIS-TE1/gt",
+                "mid_dir":r"DIS5K/DIS5K-TE1/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-TE1-m"}
+dataset_te2 = {"name": "DIS5K-TE2-m",
+                "im_dir": r"DIS5K/DIS-TE2/im",
+                "gt_dir": r"DIS5K/DIS-TE2/gt",
+                "mid_dir":r"DIS5K/DIS5K-TE2/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-TE2-m"}
+dataset_te3 = {"name": "DIS5K-TE3-m",
+                "im_dir": r"DIS5K/DIS-TE3/im",
+                "gt_dir": r"DIS5K/DIS-TE3/gt",
+                "mid_dir":r"DIS5K/DIS5K-TE3/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-TE3-m"}
+dataset_te4 = {"name": "DIS5K-TE4-m",
+                "im_dir": r"DIS5K/DIS-TE4/im",
+                "gt_dir": r"DIS5K/DIS-TE4/gt",
+                "mid_dir":r"DIS5K/DIS5K-TE4/mask",
+                "im_ext": ".jpg",
+                "gt_ext": ".png",
+                "mid_ext": ".png",
+                "cache_dir":r"DIS5K-Cache/DIS-TE4-m"}

IS_Net/models/__pycache__/isnet.cpython-311.pyc ADDED Viewed

Binary file (33.1 kB). View file

IS_Net/models/isnet.py ADDED Viewed

	@@ -0,0 +1,640 @@

+import torch
+import torch.nn as nn
+from torchvision import models
+import torch.nn.functional as F
+from timm.models.layers import trunc_normal_, DropPath
+import matplotlib.pyplot as plt
+import monai
+def iou_loss(pred, mask):
+    inter = (pred * mask).sum(dim=(2, 3)) #交集
+    union = (pred + mask).sum(dim=(2, 3)) - inter #并集-交集
+    iou = 1 - (inter + 1) / (union + 1)
+    return iou.mean()
+bce_loss = nn.BCELoss(reduction='mean')
+def muti_loss_fusion(preds, target):
+    loss0 = 0.0
+    loss = 0.0
+    for i in range(0,len(preds)):
+        # print("i: ", i, preds[i].shape)
+        if(preds[i].shape[2]!=target.shape[2] or preds[i].shape[3]!=target.shape[3]):
+            # tmp_target = _upsample_like(target,preds[i])
+            tmp_target = F.interpolate(target, size=preds[i].size()[2:], mode='bilinear', align_corners=True)
+            loss = loss + 20*bce_loss(preds[i],tmp_target) + 0.5*iou_loss(preds[i],tmp_target)
+            # loss = loss + bce_loss(preds[i],tmp_target)+ iou_loss(preds[i],tmp_target)
+            # loss = loss + bce_loss(preds[i],tmp_target)
+        else:
+            loss = loss + 20*bce_loss(preds[i],target) + 0.5*iou_loss(preds[i],target)
+            # loss = loss + bce_loss(preds[i],target) + iou_loss(preds[i],target)
+            # loss = loss + bce_loss(preds[i],target)
+        if(i==0):
+            loss0 = loss
+    return loss0, loss
+MSE_loss = nn.MSELoss(reduction='mean')
+kl_loss = nn.KLDivLoss(reduction='mean')
+l1_loss = nn.L1Loss(reduction='mean')
+smooth_l1_loss = nn.SmoothL1Loss(reduction='mean')
+def muti_loss_fusion_kl(preds, target, dfs, fs, mode='MSE'):
+    loss0 = 0.0
+    loss = 0.0
+    for i in range(0,len(preds)):
+        # print("i: ", i, preds[i].shape)
+        if(preds[i].shape[2]!=target.shape[2] or preds[i].shape[3]!=target.shape[3]):
+            # tmp_target = _upsample_like(target,preds[i])
+            tmp_target = F.interpolate(target, size=preds[i].size()[2:], mode='bilinear', align_corners=True)
+            loss = loss + 20*bce_loss(preds[i],tmp_target) + 0.5*iou_loss(preds[i],tmp_target)
+            # loss = loss + bce_loss(preds[i],tmp_target) + iou_loss(preds[i],tmp_target)
+            # loss = loss + bce_loss(preds[i],tmp_target)
+        else:
+            loss = loss + 20*bce_loss(preds[i],target) + 0.5*iou_loss(preds[i],target)
+            # loss = loss + bce_loss(preds[i],target) + iou_loss(preds[i],target)
+            # loss = loss + bce_loss(preds[i],target)
+        if(i==0):
+            loss0 = loss
+    for i in range(0,len(dfs)):
+        if(mode=='MSE'):
+            loss = loss + MSE_loss(dfs[i],fs[i]) ### add the mse loss of features as additional constraints
+            # print("fea_loss: ", fea_loss(dfs[i],fs[i]).item())
+        elif(mode=='KL'):
+            loss = loss + kl_loss(F.log_softmax(dfs[i],dim=1),F.softmax(fs[i],dim=1))
+            # print("kl_loss: ", kl_loss(F.log_softmax(dfs[i],dim=1),F.softmax(fs[i],dim=1)).item())
+        elif(mode=='MAE'):
+            loss = loss + l1_loss(dfs[i],fs[i])
+            # print("ls_loss: ", l1_loss(dfs[i],fs[i]))
+        elif(mode=='SmoothL1'):
+            loss = loss + smooth_l1_loss(dfs[i],fs[i])
+            # print("SmoothL1: ", smooth_l1_loss(dfs[i],fs[i]).item())
+    return loss0, loss
+class REBNCONV(nn.Module):
+    def __init__(self,in_ch=3,out_ch=3,dirate=1,stride=1):
+        super(REBNCONV,self).__init__()
+        self.conv_s1 = nn.Conv2d(in_ch,out_ch,3,padding=1*dirate,dilation=1*dirate,stride=stride)
+        self.bn_s1 = nn.BatchNorm2d(out_ch)
+        self.relu_s1 = nn.ReLU(inplace=True)
+    def forward(self,x):
+        hx = x
+        xout = self.relu_s1(self.bn_s1(self.conv_s1(hx)))
+        return xout
+## upsample tensor 'src' to have the same spatial size with tensor 'tar'
+def _upsample_like(src,tar):
+    src = F.upsample(src,size=tar.shape[2:],mode='bilinear')
+    return src
+### RSU-7 ###
+class RSU7(nn.Module):
+    def __init__(self, in_ch=3, mid_ch=12, out_ch=3, img_size=512):
+        super(RSU7,self).__init__()
+        self.in_ch = in_ch
+        self.mid_ch = mid_ch
+        self.out_ch = out_ch
+        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1) ## 1 -> 1/2
+        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
+        self.pool1 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool2 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool3 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool4 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv5 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool5 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv6 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.rebnconv7 = REBNCONV(mid_ch,mid_ch,dirate=2)
+        self.rebnconv6d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv5d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv4d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)
+    def forward(self,x):
+        b, c, h, w = x.shape
+        hx = x
+        hxin = self.rebnconvin(hx)
+        hx1 = self.rebnconv1(hxin)
+        hx = self.pool1(hx1)
+        hx2 = self.rebnconv2(hx)
+        hx = self.pool2(hx2)
+        hx3 = self.rebnconv3(hx)
+        hx = self.pool3(hx3)
+        hx4 = self.rebnconv4(hx)
+        hx = self.pool4(hx4)
+        hx5 = self.rebnconv5(hx)
+        hx = self.pool5(hx5)
+        hx6 = self.rebnconv6(hx)
+        hx7 = self.rebnconv7(hx6)
+        hx6d =  self.rebnconv6d(torch.cat((hx7,hx6),1))
+        hx6dup = _upsample_like(hx6d,hx5)
+        hx5d =  self.rebnconv5d(torch.cat((hx6dup,hx5),1))
+        hx5dup = _upsample_like(hx5d,hx4)
+        hx4d = self.rebnconv4d(torch.cat((hx5dup,hx4),1))
+        hx4dup = _upsample_like(hx4d,hx3)
+        hx3d = self.rebnconv3d(torch.cat((hx4dup,hx3),1))
+        hx3dup = _upsample_like(hx3d,hx2)
+        hx2d = self.rebnconv2d(torch.cat((hx3dup,hx2),1))
+        hx2dup = _upsample_like(hx2d,hx1)
+        hx1d = self.rebnconv1d(torch.cat((hx2dup,hx1),1))
+        return hx1d + hxin
+### RSU-6 ###
+class RSU6(nn.Module):
+    def __init__(self, in_ch=3, mid_ch=12, out_ch=3):
+        super(RSU6,self).__init__()
+        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1)
+        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
+        self.pool1 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool2 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool3 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool4 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv5 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.rebnconv6 = REBNCONV(mid_ch,mid_ch,dirate=2)
+        self.rebnconv5d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv4d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)
+    def forward(self,x):
+        hx = x
+        hxin = self.rebnconvin(hx)
+        hx1 = self.rebnconv1(hxin)
+        hx = self.pool1(hx1)
+        hx2 = self.rebnconv2(hx)
+        hx = self.pool2(hx2)
+        hx3 = self.rebnconv3(hx)
+        hx = self.pool3(hx3)
+        hx4 = self.rebnconv4(hx)
+        hx = self.pool4(hx4)
+        hx5 = self.rebnconv5(hx)
+        hx6 = self.rebnconv6(hx5)
+        hx5d =  self.rebnconv5d(torch.cat((hx6,hx5),1))
+        hx5dup = _upsample_like(hx5d,hx4)
+        hx4d = self.rebnconv4d(torch.cat((hx5dup,hx4),1))
+        hx4dup = _upsample_like(hx4d,hx3)
+        hx3d = self.rebnconv3d(torch.cat((hx4dup,hx3),1))
+        hx3dup = _upsample_like(hx3d,hx2)
+        hx2d = self.rebnconv2d(torch.cat((hx3dup,hx2),1))
+        hx2dup = _upsample_like(hx2d,hx1)
+        hx1d = self.rebnconv1d(torch.cat((hx2dup,hx1),1))
+        return hx1d + hxin
+### RSU-5 ###
+class RSU5(nn.Module):
+    def __init__(self, in_ch=3, mid_ch=12, out_ch=3):
+        super(RSU5,self).__init__()
+        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1)
+        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
+        self.pool1 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool2 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool3 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.rebnconv5 = REBNCONV(mid_ch,mid_ch,dirate=2)
+        self.rebnconv4d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)
+    def forward(self,x):
+        hx = x
+        hxin = self.rebnconvin(hx)
+        hx1 = self.rebnconv1(hxin)
+        hx = self.pool1(hx1)
+        hx2 = self.rebnconv2(hx)
+        hx = self.pool2(hx2)
+        hx3 = self.rebnconv3(hx)
+        hx = self.pool3(hx3)
+        hx4 = self.rebnconv4(hx)
+        hx5 = self.rebnconv5(hx4)
+        hx4d = self.rebnconv4d(torch.cat((hx5,hx4),1))
+        hx4dup = _upsample_like(hx4d,hx3)
+        hx3d = self.rebnconv3d(torch.cat((hx4dup,hx3),1))
+        hx3dup = _upsample_like(hx3d,hx2)
+        hx2d = self.rebnconv2d(torch.cat((hx3dup,hx2),1))
+        hx2dup = _upsample_like(hx2d,hx1)
+        hx1d = self.rebnconv1d(torch.cat((hx2dup,hx1),1))
+        return hx1d + hxin
+### RSU-4 ###
+class RSU4(nn.Module):
+    def __init__(self, in_ch=3, mid_ch=12, out_ch=3):
+        super(RSU4,self).__init__()
+        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1)
+        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
+        self.pool1 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.pool2 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=1)
+        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=2)
+        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
+        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)
+    def forward(self,x):
+        hx = x
+        hxin = self.rebnconvin(hx)
+        hx1 = self.rebnconv1(hxin)
+        hx = self.pool1(hx1)
+        hx2 = self.rebnconv2(hx)
+        hx = self.pool2(hx2)
+        hx3 = self.rebnconv3(hx)
+        hx4 = self.rebnconv4(hx3)
+        hx3d = self.rebnconv3d(torch.cat((hx4,hx3),1))
+        hx3dup = _upsample_like(hx3d,hx2)
+        hx2d = self.rebnconv2d(torch.cat((hx3dup,hx2),1))
+        hx2dup = _upsample_like(hx2d,hx1)
+        hx1d = self.rebnconv1d(torch.cat((hx2dup,hx1),1))
+        return hx1d + hxin
+### RSU-4F ###
+class RSU4F(nn.Module):
+    def __init__(self, in_ch=3, mid_ch=12, out_ch=3):
+        super(RSU4F,self).__init__()
+        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1)
+        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
+        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=2)
+        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=4)
+        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=8)
+        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=4)
+        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=2)
+        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)
+    def forward(self,x):
+        hx = x
+        hxin = self.rebnconvin(hx)
+        hx1 = self.rebnconv1(hxin)
+        hx2 = self.rebnconv2(hx1)
+        hx3 = self.rebnconv3(hx2)
+        hx4 = self.rebnconv4(hx3)
+        hx3d = self.rebnconv3d(torch.cat((hx4,hx3),1))
+        hx2d = self.rebnconv2d(torch.cat((hx3d,hx2),1))
+        hx1d = self.rebnconv1d(torch.cat((hx2d,hx1),1))
+        return hx1d + hxin
+class myrebnconv(nn.Module):
+    def __init__(self, in_ch=3,
+                       out_ch=1,
+                       kernel_size=3,
+                       stride=1,
+                       padding=1,
+                       dilation=1,
+                       groups=1):
+        super(myrebnconv,self).__init__()
+        self.conv = nn.Conv2d(in_ch,
+                              out_ch,
+                              kernel_size=kernel_size,
+                              stride=stride,
+                              padding=padding,
+                              dilation=dilation,
+                              groups=groups)
+        self.bn = nn.BatchNorm2d(out_ch)
+        self.rl = nn.ReLU(inplace=True)
+    def forward(self,x):
+        return self.rl(self.bn(self.conv(x)))
+class ISNetGTEncoder(nn.Module):
+    def __init__(self,in_ch=1,out_ch=1):
+        super(ISNetGTEncoder,self).__init__()
+        self.conv_in = myrebnconv(in_ch,16,3,stride=2,padding=1) # nn.Conv2d(in_ch,64,3,stride=2,padding=1)
+        self.stage1 = RSU7(16,16,64)
+        self.pool12 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage2 = RSU6(64,16,64)
+        self.pool23 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage3 = RSU5(64,32,128)
+        self.pool34 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage4 = RSU4(128,32,256)
+        self.pool45 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage5 = RSU4F(256,64,512)
+        self.pool56 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage6 = RSU4F(512,64,512)
+        self.side1 = nn.Conv2d(64,out_ch,3,padding=1)
+        self.side2 = nn.Conv2d(64,out_ch,3,padding=1)
+        self.side3 = nn.Conv2d(128,out_ch,3,padding=1)
+        self.side4 = nn.Conv2d(256,out_ch,3,padding=1)
+        self.side5 = nn.Conv2d(512,out_ch,3,padding=1)
+        self.side6 = nn.Conv2d(512,out_ch,3,padding=1)
+    def compute_loss(self, preds, targets):
+        return muti_loss_fusion(preds,targets)
+    def forward(self,x):
+        hx = x
+        hxin = self.conv_in(hx)
+        # hx = self.pool_in(hxin)
+        #stage 1
+        hx1 = self.stage1(hxin)
+        hx = self.pool12(hx1)
+        #stage 2
+        hx2 = self.stage2(hx)
+        hx = self.pool23(hx2)
+        #stage 3
+        hx3 = self.stage3(hx)
+        hx = self.pool34(hx3)
+        #stage 4
+        hx4 = self.stage4(hx)
+        hx = self.pool45(hx4)
+        #stage 5
+        hx5 = self.stage5(hx)
+        hx = self.pool56(hx5)
+        #stage 6
+        hx6 = self.stage6(hx)
+        #side output
+        d1 = self.side1(hx1)
+        d1 = _upsample_like(d1,x)
+        d2 = self.side2(hx2)
+        d2 = _upsample_like(d2,x)
+        d3 = self.side3(hx3)
+        d3 = _upsample_like(d3,x)
+        d4 = self.side4(hx4)
+        d4 = _upsample_like(d4,x)
+        d5 = self.side5(hx5)
+        d5 = _upsample_like(d5,x)
+        d6 = self.side6(hx6)
+        d6 = _upsample_like(d6,x)
+        # d0 = self.outconv(torch.cat((d1,d2,d3,d4,d5,d6),1))
+        return [F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5), F.sigmoid(d6)], [hx1,hx2,hx3,hx4,hx5,hx6]
+class ISNetDIS(nn.Module):
+    def __init__(self,in_ch=3,out_ch=1):
+        super(ISNetDIS,self).__init__()
+        self.conv_in = nn.Conv2d(in_ch,64,3,stride=2,padding=1)
+        self.pool_in = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage1 = RSU7(64,32,64)
+        self.pool12 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage2 = RSU6(64,32,128)
+        self.pool23 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage3 = RSU5(128,64,256)
+        self.pool34 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage4 = RSU4(256,128,512)
+        self.pool45 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage5 = RSU4F(512,256,512)
+        self.pool56 = nn.MaxPool2d(2,stride=2,ceil_mode=True)
+        self.stage6 = RSU4F(512,256,512)
+        # decoder
+        self.stage5d = RSU4F(1024,256,512)
+        self.stage4d = RSU4(1024,128,256)
+        self.stage3d = RSU5(512,64,128)
+        self.stage2d = RSU6(256,32,64)
+        self.stage1d = RSU7(128,16,64)
+        self.side1 = nn.Conv2d(64,out_ch,3,padding=1)
+        self.side2 = nn.Conv2d(64,out_ch,3,padding=1)
+        self.side3 = nn.Conv2d(128,out_ch,3,padding=1)
+        self.side4 = nn.Conv2d(256,out_ch,3,padding=1)
+        self.side5 = nn.Conv2d(512,out_ch,3,padding=1)
+        self.side6 = nn.Conv2d(512,out_ch,3,padding=1)
+        # self.outconv = nn.Conv2d(6*out_ch,out_ch,1)
+    def compute_loss_kl(self, preds, targets, dfs, fs, mode='MSE'):
+        # return muti_loss_fusion(preds,targets)
+        return muti_loss_fusion_kl(preds, targets, dfs, fs, mode=mode)
+    def compute_loss(self, preds, targets):
+        # return muti_loss_fusion(preds,targets)
+        return muti_loss_fusion(preds, targets)
+    def forward(self,x):
+        hx = x
+        hxin = self.conv_in(hx)
+        #stage 1
+        hx1 = self.stage1(hxin)
+        hx = self.pool12(hx1)
+        #stage 2
+        hx2 = self.stage2(hx)
+        hx = self.pool23(hx2)
+        #stage 3
+        hx3 = self.stage3(hx)
+        hx = self.pool34(hx3)
+        #stage 4
+        hx4 = self.stage4(hx)
+        hx = self.pool45(hx4)
+        #stage 5
+        hx5 = self.stage5(hx)
+        hx = self.pool56(hx5)
+        #stage 6
+        hx6 = self.stage6(hx)
+        hx6up = _upsample_like(hx6,hx5)
+        #-------------------- decoder --------------------
+        hx5d = self.stage5d(torch.cat([hx6up,hx5],1))
+        hx5dup = _upsample_like(hx5d,hx4)
+        hx4d = self.stage4d(torch.cat([hx5dup,hx4],1))
+        hx4dup = _upsample_like(hx4d,hx3)
+        hx3d = self.stage3d(torch.cat([hx4dup,hx3],1))
+        hx3dup = _upsample_like(hx3d,hx2)
+        hx2d = self.stage2d(torch.cat([hx3dup,hx2],1))
+        hx2dup = _upsample_like(hx2d,hx1)
+        hx1d = self.stage1d(torch.cat([hx2dup,hx1],1))
+        #side output
+        d1 = self.side1(hx1d)
+        d1 = _upsample_like(d1,x)
+        d2 = self.side2(hx2d)
+        d2 = _upsample_like(d2,x)
+        d3 = self.side3(hx3d)
+        d3 = _upsample_like(d3,x)
+        d4 = self.side4(hx4d)
+        d4 = _upsample_like(d4,x)
+        d5 = self.side5(hx5d)
+        d5 = _upsample_like(d5,x)
+        d6 = self.side6(hx6)
+        d6 = _upsample_like(d6,x)
+        # d0 = self.outconv(torch.cat((d1,d2,d3,d4,d5,d6),1))
+        # plt.imshow(hx1d[0][0].cpu().detach().numpy(),cmap='gray')
+        # plt.show()
+        # plt.imshow(hx2d[0][0].cpu().detach().numpy(),cmap='gray')
+        # plt.show()
+        # plt.imshow(hx3d[0][0].cpu().detach().numpy(),cmap='gray')
+        # plt.show()
+        # plt.imshow(hx4d[0][0].cpu().detach().numpy(),cmap='gray')
+        # plt.show()
+        # plt.imshow(hx5d[0][0].cpu().detach().numpy(),cmap='gray')
+        # plt.show()
+        # plt.imshow(hx6[0][0].cpu().detach().numpy(),cmap='gray')
+        # plt.show()
+        return [F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5), F.sigmoid(d6)],[hx1d,hx2d,hx3d,hx4d,hx5d,hx6]

IS_Net/saliency_toolbox.py ADDED Viewed

	@@ -0,0 +1,552 @@

+import os
+import cv2
+import sys
+import numpy as np
+from glob import glob
+from tqdm import tqdm
+from scipy.ndimage import correlate
+from scipy.ndimage.morphology import distance_transform_edt
+from joblib import Parallel, delayed
+eps = sys.float_info.epsilon
+def calcualte_once(gt_name,sm_dir,gt_threshold,beta,measures):
+    values = dict()
+    for idx in measures:
+        values[idx] = list()
+        if idx == 'Max-F':
+            values['Precision'] = list()
+            values['Recall']    = list()
+    _, name = os.path.split(gt_name)
+    sm_name = os.path.join(sm_dir, name)
+    if os.path.exists(sm_name):
+        gt, sm = read_and_normalize(gt_name, sm_name, gt_threshold)
+        if 'MAE' in measures:
+            values['MAE'].append(mean_square_error(gt, sm))
+        if 'E-measure' in measures:
+            values['E-measure'].append(e_measure(gt, sm))
+        if 'S-measure' in measures:
+            values['S-measure'].append(s_measure(gt, sm))
+        if 'Adp-F' in measures:
+            values['Adp-F'].append(adaptive_fmeasure(gt, sm, beta))
+        if 'Wgt-F' in measures:
+            values['Wgt-F'].append(weighted_fmeasure(gt, sm))
+        if 'Max-F' in measures:
+            prec, recall = prec_recall(gt, sm, 256)  # 256 thresholds between 0 and 1
+            values['Precision'].append(prec)
+            values['Recall'].append(recall)
+    else:
+        print("\n{} not found!".format(os.path.basename(sm_name)))
+        print('---' * 10)
+    return values
+def calculate_measures(gt_dir, sm_dir, measures, save=False, beta=np.sqrt(0.3), gt_threshold=0.5, n_thread=1):
+    """
+    function that calculates Saliency measures for given directories
+    arameters
+    ----------
+    gt_dir : str
+        The path to the ground truth directory
+    sm_dir : str
+        The path to the predicted saliency map directory
+    measures : list
+        list of measure names which need to be calculated
+        supported measures: 'MAE'       => Mean Squared Error
+                            'E-measure' =>  Enhanced-alignment measure
+                            'S-measure' =>  Structure-measure
+                            'Max-F'     =>  Maximum F-measure
+                            'Adp-F'     =>  Adaptive F-measure
+                            'Wgt-F'     =>  Weighted F-measure
+    save : str
+        If spesified, the results will be saved in 'save' directory
+    beta : float
+        beta parameter that is used in F-measure formula. default is sqrt(0.3)
+    gt_threshold : float
+        The threshold that is used to binrize ground truth maps.
+    Returns
+    -------
+    values : dictionary
+        a dict containing the results
+    """
+    values = dict()
+    for idx in measures:
+        values[idx] = list()
+        if idx == 'Max-F':
+            values['Precision'] = list()
+            values['Recall']    = list()
+    results = Parallel(n_jobs=n_thread)(delayed(calcualte_once)(gt_name,sm_dir,gt_threshold,beta,measures) for gt_name in tqdm(glob(os.path.join(gt_dir, '*')), total=len(glob(os.path.join(gt_dir, '*')))))
+    for i in results:
+        if 'MAE' in measures:
+            values['MAE'].append(i["MAE"])
+        if 'E-measure' in measures:
+            values['E-measure'].append(i["E-measure"])
+        if 'S-measure' in measures:
+            values['S-measure'].append(i["S-measure"])
+        if 'Adp-F' in measures:
+            values['Adp-F'].append(i["Adp-F"])
+        if 'Wgt-F' in measures:
+            values['Wgt-F'].append(i["Wgt-F"])
+        if 'Max-F' in measures:  # 256 thresholds between 0 and 1
+            values['Precision'].append(i["Precision"])
+            values['Recall'].append(i["Recall"])
+    if 'MAE' in measures:
+        values['MAE'] = np.mean(values['MAE'])
+    if 'E-measure' in measures:
+        values['E-measure'] = np.mean(values['E-measure'])
+    if 'S-measure' in measures:
+        values['S-measure'] = np.mean(values['S-measure'])
+    if 'Adp-F' in measures:
+        values['Adp-F'] = np.mean(values['Adp-F'])
+    if 'Wgt-F' in measures:
+        values['Wgt-F'] = np.mean(values['Wgt-F'])
+    if 'Max-F' in measures:
+        values['Precision'] = np.mean(np.hstack(values['Precision'][:]), 1)
+        values['Recall'] = np.mean(np.hstack(values['Recall'][:]), 1)
+        f_measures = (1 + beta ** 2) * values['Precision'] * values['Recall'] / (
+                beta ** 2 * values['Precision'] + values['Recall'])
+        values['Fmeasure_all_thresholds'] = f_measures
+        values['Max-F'] = np.max(f_measures)
+    if save:
+        if not os.path.isdir(save):
+            os.mkdir(save)
+        for key in values.keys():
+            np.save(os.path.join(save, key + ".npy"), values[key])
+    return values
+def read_and_normalize(gt_path, sm_path, gt_threshold=0.5):
+    """
+    function that reads, normalizes and crops a ground truth and a saliency map
+    parameters
+    ----------
+    gt_path : str
+        The path to a ground truth map
+    sm_path : str
+        The path to a predicted saliency map
+    gt_threshold : float
+        The threshold that is used to binrize ground truth maps.
+    Returns
+    -------
+    gt_img, sm_img : numpy.ndarray
+        The prepared arrays
+    """
+    gt_img = norm_img(cv2.imread(gt_path, cv2.IMREAD_GRAYSCALE))
+    gt_img = (gt_img >= gt_threshold).astype(np.float32)
+    sm_img = norm_img(cv2.imread(sm_path, cv2.IMREAD_GRAYSCALE))
+    if sm_img.shape[0] != gt_img.shape[0] or sm_img.shape[1] != gt_img.shape[1]:
+        sm_img = cv2.resize(sm_img, (gt_img.shape[1], gt_img.shape[0]))
+    return gt_img, sm_img
+def norm_img(im):
+    return cv2.normalize(im.astype('float'),
+                         None,
+                         0.0, 1.0,
+                         cv2.NORM_MINMAX)
+# MAE
+def mean_square_error(gt, sm):
+    return np.mean(np.abs(sm - gt))
+# E-measure
+# article: https://arxiv.org/abs/1805.10421
+# original code [Matlab]: https://github.com/DengPingFan/E-measure
+def e_measure(gt, sm):
+    """
+    This fucntion computes the Enhanced-alignment Measure (E-Measure) between the saliency map and the ground truth
+    article: https://arxiv.org/abs/1805.10421
+    original code [Matlab]: https://github.com/DengPingFan/E-measure
+    parameters
+    ----------
+    gt : numpy.ndarray
+        The path to the ground truth directory
+    sm : numpy.ndarray
+        The path to the predicted saliency map directory
+    Returns
+    -------
+    value : float
+        The calculated E-masure
+    """
+    sm = adptive_binary(sm)
+    gt = gt.astype(np.bool_)
+    sm = sm.astype(np.bool_)
+    dgt = gt.astype(np.float32)
+    dsm = sm.astype(np.float32)
+    if np.sum(dgt) == 0:  # if the gt is completely black
+        enhanced_matrix = 1.0 - dsm  # only calculate the black area of intersection
+    elif np.mean(dgt) == 1:  # if the gt is completely white
+        enhanced_matrix = dsm  # only calcualte the white area of intersection
+    else:
+        # Normal case:
+        # 1.compute alignment matrix
+        align_matrix = alignment_term(dsm, dgt)
+        # 2.compute enhanced alignment matrix
+        enhanced_matrix = enhanced_alignment_term(align_matrix)
+    height, width = gt.shape
+    value = np.sum(enhanced_matrix) / (height * width - 1 + eps)
+    return value
+def alignment_term(dgt, dsm):
+    # compute global mean
+    mu_fm = np.mean(dsm)
+    mu_gt = np.mean(dgt)
+    # compute the bias matrix
+    align_fm = dsm - mu_fm
+    align_gt = dgt - mu_gt
+    # compute alignment matrix
+    align_Matrix = 2 * (align_gt * align_fm) / (align_gt * align_gt + align_fm * align_fm + eps)
+    return align_Matrix
+def enhanced_alignment_term(align_matrix):
+    enhanced = ((align_matrix + 1) ** 2) / 4
+    return enhanced
+def adptive_binary(sm):
+    adaptive_threshold = 2 * np.mean(sm)
+    if adaptive_threshold > 1:
+        adaptive_threshold = 1
+    binary_sm = (sm >= adaptive_threshold).astype(np.float32)
+    return binary_sm
+# S-Measure
+# article: https://www.crcv.ucf.edu/papers/iccv17/1164.pdf
+# Matlab code: https://github.com/DengPingFan/S-measure
+def s_measure(gt, sm):
+    """
+    This fucntion computes the structural similarity (S-Measure) between the saliency map and the ground truth
+    article: https://www.crcv.ucf.edu/papers/iccv17/1164.pdf
+    original code [Matlab]: https://github.com/DengPingFan/S-measure
+    parameters
+    ----------
+    gt : numpy.ndarray
+        The path to the ground truth directory
+    sm : numpy.ndarray
+        The path to the predicted saliency map directory
+    Returns
+    -------
+    value : float
+        The calculated S-masure
+    """
+    gt_mean = np.mean(gt)
+    if gt_mean == 0:  # if the GT is completely black
+        sm_mean = np.mean(sm)
+        measure = 1.0 - sm_mean  # only calculate the area of intersection
+    elif gt_mean == 1:  # if the GT is completely white
+        sm_mean = np.mean(sm)
+        measure = sm_mean.copy()  # only calcualte the area of intersection
+    else:
+        alpha = 0.5
+        measure = alpha * s_object(sm, gt) + (1 - alpha) * s_region(sm, gt)
+        if measure < 0:
+            measure = 0
+    return measure
+def ssim(gt, sm):
+    gt = gt.astype(np.float32)
+    height, width = sm.shape
+    num_pixels = width * height
+    # Compute the mean of SM,GT
+    sm_mean = np.mean(sm)
+    gt_mean = np.mean(gt)
+    # Compute the variance of SM,GT
+    sigma_x2 = np.sum(np.sum((sm - sm_mean) ** 2)) / (num_pixels - 1 + eps)
+    sigma_y2 = np.sum(np.sum((gt - gt_mean) ** 2)) / (num_pixels - 1 + eps)
+    # Compute the covariance
+    sigma_xy = np.sum(np.sum((sm - sm_mean) * (gt - gt_mean))) / (num_pixels - 1 + eps)
+    alpha = 4 * sm_mean * gt_mean * sigma_xy
+    beta = (sm_mean ** 2 + gt_mean ** 2) * (sigma_x2 + sigma_y2)
+    if alpha != 0:
+        ssim_value = alpha / (beta + eps)
+    elif alpha == 0 and beta == 0:
+        ssim_value = 1.0
+    else:
+        ssim_value = 0
+    return ssim_value
+def divide_sm(sm, x, y):
+    # copy the 4 regions
+    lt = sm[:y, :x]
+    rt = sm[:y, x:]
+    lb = sm[y:, :x]
+    rb = sm[y:, x:]
+    return lt, rt, lb, rb
+def divide_gt(gt, x, y):
+    height, width = gt.shape
+    area = width * height
+    # copy the 4 regions
+    lt = gt[:y, :x]
+    rt = gt[:y, x:]
+    lb = gt[y:, :x]
+    rb = gt[y:, x:]
+    # The different weight (each block proportional to the GT foreground region).
+    w1 = (x * y) / area
+    w2 = ((width - x) * y) / area
+    w3 = (x * (height - y)) / area
+    w4 = 1.0 - w1 - w2 - w3
+    return lt, rt, lb, rb, w1, w2, w3, w4
+def centroid(gt):
+    # col
+    rows, cols = gt.shape
+    if np.sum(gt) == 0:
+        x = np.round(cols / 2)
+        y = np.round(rows / 2)
+    else:
+        total = np.sum(gt)
+        i = np.arange(cols).reshape(1, cols) + 1
+        j = np.arange(rows).reshape(rows, 1) + 1
+        x = int(np.round(np.sum(np.sum(gt, 0, keepdims=True) * i) / total))
+        y = int(np.round(np.sum(np.sum(gt, 1, keepdims=True) * j) / total))
+    return x, y
+def s_region(gt, sm):
+    x, y = centroid(gt)
+    gt_1, gt_2, gt_3, gt_4, w1, w2, w3, w4 = divide_gt(gt, x, y)
+    sm_1, sm_2, sm_3, sm_4 = divide_sm(sm, x, y)
+    q1 = ssim(sm_1, gt_1)
+    q2 = ssim(sm_2, gt_2)
+    q3 = ssim(sm_3, gt_3)
+    q4 = ssim(sm_4, gt_4)
+    region_value = w1 * q1 + w2 * q2 + w3 * q3 + w4 * q4
+    return region_value
+def object(gt, sm):
+    x = np.mean(sm[gt == 1])
+    # compute the standard deviations of the foreground or background in sm
+    sigma_x = np.std(sm[gt == 1])
+    score = 2.0 * x / (x ** 2 + 1.0 + sigma_x + eps)
+    return score
+def s_object(gt, sm):
+    # compute the similarity of the foreground in the object level
+    sm_fg = sm.copy()
+    sm_fg[gt == 0] = 0
+    o_fg = object(sm_fg, gt)
+    # compute the similarity of the background
+    sm_bg = 1.0 - sm.copy()
+    sm_bg[gt == 1] = 0
+    o_bg = object(sm_bg, gt == 0)
+    u = np.mean(gt)
+    object_value = u * o_fg + (1 - u) * o_bg
+    return object_value
+# Weighted F-Measure
+# article: https://ieeexplore.ieee.org/document/6909433
+# Matlab code: https://cgm.technion.ac.il/Computer-Graphics-Multimedia/Software/FGEval/
+def weighted_fmeasure(gt, sm, beta2=1):
+    """
+    This fucntion computes Weighted F-Measure between the saliency map and the ground truth
+    article: https://ieeexplore.ieee.org/document/6909433
+    original code [Matlab]: https://cgm.technion.ac.il/Computer-Graphics-Multimedia/Software/FGEval/
+    parameters
+    ----------
+    gt : numpy.ndarray
+        The path to the ground truth directory
+    sm : numpy.ndarray
+        The path to the predicted saliency map directory
+    Returns
+    -------
+    value : float
+        The calculated Weighted F-Measure
+    """
+    dst, idx = distance_transform_edt(1 - gt, return_indices=True)
+    raw_idx = idx[0][gt == 0]
+    col_idx = idx[1][gt == 0]
+    e = np.abs(sm - gt).astype(np.float32)
+    et = np.abs(sm - gt).astype(np.float32)
+    et[gt == 0] = et[raw_idx, col_idx]
+    k = matlab_style_gauss2d(shape=(7, 7), sigma=5)
+    ea = correlate(et.astype(np.float32), k, mode='constant')
+    min_e_ea = np.abs(sm - gt).astype(np.float32)
+    min_e_ea[gt * (ea < e) == 1] = ea[gt * (ea < e) == 1]
+    b = np.ones_like(gt).astype(np.float32)
+    b[gt == 0] = 2 - 1 * np.exp(np.log(1 - 0.5) / 5. * dst[gt == 0])
+    ew = min_e_ea * b
+    tpw = np.sum(gt) - np.sum(ew[gt == 1])
+    fpw = np.sum(ew[gt == 0])
+    rec = 1 - np.mean(ew[gt == 1])  # Weighed Recall
+    prec = tpw / (eps + tpw + fpw)  # Weighted Precision
+    value = (1 + beta2) * (rec * prec) / (eps + (beta2 * rec) + prec)
+    return value
+def matlab_style_gauss2d(shape=(3, 3), sigma=0.5):
+    """
+    2D gaussian mask - should give the same result as MATLAB's
+    fspecial('gaussian',[shape],[sigma])
+    """
+    m, n = [(ss - 1.) / 2. for ss in shape]
+    y, x = np.ogrid[-m:m + 1, -n:n + 1]
+    h = np.exp(-(x * x + y * y) / (2. * sigma * sigma))
+    h[h < np.finfo(h.dtype).eps * h.max()] = 0
+    sumh = h.sum()
+    if sumh != 0:
+        h /= sumh
+    return h
+# Adaptive F-measure
+def adaptive_fmeasure(gt, sm, beta):
+    """
+    This fucntion computes Adaptive F-measure between the saliency map and the ground truth using
+    the binary method proposed in:
+    https://ieeexplore.ieee.org/document/5206596
+    parameters
+    ----------
+    gt : numpy.ndarray
+        The path to the ground truth directory
+    sm : numpy.ndarray
+        The path to the predicted saliency map directory
+    Returns
+    -------
+    value : float
+        The calculated Adaptive F-measure
+    """
+    gt_idx = np.where(gt > 0)
+    gt_cnt = np.sum(gt)
+    if gt_cnt == 0:
+        prec = []
+        recall = []
+    else:
+        adaptive_threshold = 2 * np.mean(sm)
+        if adaptive_threshold > 1:
+            adaptive_threshold = 1
+        sm_binary = (sm >= adaptive_threshold).astype(np.float32)
+        hit_cnt = np.sum(sm_binary[gt_idx])
+        alg_cnt = np.sum(sm_binary)
+        if hit_cnt == 0:
+            prec = 0
+            recall = 0
+        else:
+            prec = hit_cnt / (alg_cnt + eps)
+            recall = hit_cnt / gt_cnt
+    value = (1 + beta ** 2) * prec * recall / ((beta ** 2 * prec + recall) + eps)
+    return value
+def prec_recall(gt, sm, num_th):
+    """
+    This fucntion computes Adaptive F-measure between the saliency map and the ground truth using
+    the binary method proposed in:
+    https://ieeexplore.ieee.org/document/5206596
+    The results of this dunction will be used to calculate Max-F measure and plot PR and F-Threshold Curves
+    parameters
+    ----------
+    gt : numpy.ndarray
+        The path to the ground truth directory
+    sm : numpy.ndarray
+        The path to the predicted saliency map directory
+    num_th : interger
+        The total number of thresholds between 0 and 1
+    Returns
+    -------
+    prec, recall:  numpy.ndarray
+        The calculated Precision and Recall (shape: (num_th,1))
+    """
+    gt_idx = np.where(gt > 0)
+    gt_cnt = np.sum(gt)
+    if gt_cnt == 0:
+        prec = []
+        recall = []
+    else:
+        hit_cnt = np.zeros((num_th, 1), np.float32)
+        alg_cnt = np.zeros((num_th, 1), np.float32)
+        thresholds = np.linspace(0, 1, num_th)
+        for k, curTh in enumerate(thresholds):
+            sm_binary = (sm >= curTh).astype(np.float32)
+            hit_cnt[k] = np.sum(sm_binary[gt_idx])
+            alg_cnt[k] = np.sum(sm_binary)
+        prec = hit_cnt / (alg_cnt + eps)
+        recall = hit_cnt / gt_cnt
+    return prec, recall

IS_Net/swd_optim/__init__.py ADDED Viewed

	@@ -0,0 +1,10 @@

+from .adai import Adai
+from .adais import AdaiS
+from .adams import AdamS
+from .sgds import SGDS
+del adai
+del adais
+del adams
+del sgds

IS_Net/swd_optim/adai.py ADDED Viewed

	@@ -0,0 +1,116 @@

+import torch
+from torch.optim.optimizer import Optimizer, required
+class Adai(Optimizer):
+    r"""Implements Adaptive Inertia Estimation (Adai) algorithm.
+    It has be proposed in
+    `Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia`__.
+    Arguments:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float): learning rate
+        betas (Tuple[float, float], optional): beta0 and beta2 (default: (0.1, 0.99))
+        eps (float, optional): the inertia bound (default: 1e-03)
+        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
+    """
+    def __init__(self, params, lr=required, betas=(0.1, 0.99), eps=1e-03,
+                 weight_decay=0):
+        if lr is not required and lr < 0.0:
+            raise ValueError("Invalid learning rate: {}".format(lr))
+        if not 0.0 <= eps:
+            raise ValueError("Invalid epsilon value: {}".format(eps))
+        if not 0.0 <= betas[0]:
+            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
+        if not 0.0 <= betas[1] < 1.0:
+            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
+        if not 0.0 <= weight_decay:
+            raise ValueError("Invalid weight_decay value: {}".format(weight_decay))
+        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
+        super(Adai, self).__init__(params, defaults)
+    def __setstate__(self, state):
+        super(Adai, self).__setstate__(state)
+    @torch.no_grad()
+    def step(self, closure=None):
+        """Performs a single optimization step.
+        Arguments:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+        param_size = 0
+        exp_avg_sq_hat_sum = 0.
+        for group in self.param_groups:
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                param_size += p.numel()
+                grad = p.grad.data
+                state = self.state[p]
+                # State initialization
+                if len(state) == 0:
+                    state['step'] = 0
+                    # Exponential moving average of gradient values
+                    state['exp_avg'] = torch.zeros_like(p.data, memory_format=torch.preserve_format)
+                    # Exponential moving average of squared gradient values
+                    state['exp_avg_sq'] = torch.zeros_like(p.data, memory_format=torch.preserve_format)
+                    # Cumulative products of beta1
+                    state['beta1_prod'] = torch.ones_like(p.data, memory_format=torch.preserve_format)
+                state['step'] += 1
+                exp_avg_sq = state['exp_avg_sq']
+                beta0, beta2 = group['betas']
+                bias_correction2 = 1 - beta2 ** state['step']
+                if group['weight_decay'] != 0:
+                    grad.add_(group['weight_decay'], p.data)
+                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
+                exp_avg_sq_hat_sum += exp_avg_sq.sum() / bias_correction2
+        # Calculate the mean of all elements in exp_avg_sq_hat
+        exp_avg_sq_hat_mean = exp_avg_sq_hat_sum / param_size
+        for group in self.param_groups:
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                grad = p.grad.data
+                state = self.state[p]
+                exp_avg = state['exp_avg']
+                exp_avg_sq = state['exp_avg_sq']
+                beta1_prod = state['beta1_prod']
+                beta0, beta2 = group['betas']
+                bias_correction2 = 1 - beta2 ** state['step']
+                exp_avg_sq_hat = exp_avg_sq / bias_correction2
+                beta1 = (1. - (exp_avg_sq_hat / exp_avg_sq_hat_mean).mul(beta0)).clamp(0., 1 - group['eps'])
+                beta1_prod.mul_(beta1)
+                bias_correction1 = 1 - beta1_prod
+                exp_avg.mul_(beta1).addcmul_(1 - beta1, grad)
+                exp_avg_hat = exp_avg / bias_correction1
+                step_size = group['lr']
+                p.data.add_(-step_size, exp_avg_hat)
+        return loss

IS_Net/swd_optim/adais.py ADDED Viewed

	@@ -0,0 +1,120 @@

+import torch
+from torch.optim.optimizer import Optimizer, required
+class AdaiS(Optimizer):
+    r"""Implements Adai with stable/decoupled weight decay (AdaiS/AdaiW).
+    It is based on
+    `Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia`
+    and
+    `Stable Weight Decay Regularization`__.
+    Arguments:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float, optional): learning rate
+        betas (Tuple[float, float], optional): beta0 and beta2 (default: (0.1, 0.99))
+        eps (float, optional): the inertia bound (default: 1e-03)
+        weight_decay (float, optional): weight decay (default: 0)
+    """
+    def __init__(self, params, lr=required, betas=(0.1, 0.99), eps=1e-03,
+                 weight_decay=0):
+        if lr is not required and lr < 0.0:
+            raise ValueError("Invalid learning rate: {}".format(lr))
+        if not 0.0 <= eps:
+            raise ValueError("Invalid epsilon value: {}".format(eps))
+        if not 0.0 <= betas[0]:
+            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
+        if not 0.0 <= betas[1] < 1.0:
+            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
+        if not 0.0 <= weight_decay:
+            raise ValueError("Invalid weight_decay value: {}".format(weight_decay))
+        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
+        super(AdaiS, self).__init__(params, defaults)
+    def __setstate__(self, state):
+        super(AdaiS, self).__setstate__(state)
+    @torch.no_grad()
+    def step(self, closure=None):
+        """Performs a single optimization step.
+        Arguments:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+        param_size = 0
+        exp_avg_sq_hat_sum = 0.
+        for group in self.param_groups:
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                param_size += p.numel()
+                grad = p.grad.data
+                state = self.state[p]
+                # State initialization
+                if len(state) == 0:
+                    state['step'] = 0
+                    # Exponential moving average of gradient values
+                    state['exp_avg'] = torch.zeros_like(p.data, memory_format=torch.preserve_format)
+                    # Exponential moving average of squared gradient values
+                    state['exp_avg_sq'] = torch.zeros_like(p.data, memory_format=torch.preserve_format)
+                    # Cumulative products of beta1
+                    state['beta1_prod'] = torch.ones_like(p.data, memory_format=torch.preserve_format)
+                exp_avg_sq = state['exp_avg_sq']
+                beta0, beta2 = group['betas']
+                state['step'] += 1
+                bias_correction2 = 1 - beta2 ** state['step']
+                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
+                exp_avg_sq_hat = exp_avg_sq / bias_correction2
+                exp_avg_sq_hat_sum += exp_avg_sq_hat.sum()
+        # Calculate the mean of all elements in exp_avg_sq_hat
+        exp_avg_sq_hat_mean = exp_avg_sq_hat_sum / param_size
+        for group in self.param_groups:
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                grad = p.grad.data
+                # Perform stable/decoupled weight decay
+                if group['weight_decay'] !=0:
+                    p.data.mul_(1 - group['lr'] * group['weight_decay'])
+                state = self.state[p]
+                exp_avg = state['exp_avg']
+                exp_avg_sq = state['exp_avg_sq']
+                beta0, beta2 = group['betas']
+                beta1_prod = state['beta1_prod']
+                bias_correction2 = 1 - beta2 ** state['step']
+                exp_avg_sq_hat = exp_avg_sq / bias_correction2
+                beta1 = (1. - (exp_avg_sq_hat / exp_avg_sq_hat_mean).mul(beta0)).clamp(0., 1 - group['eps'])
+                beta1_prod.mul_(beta1)
+                bias_correction1 = 1 - beta1_prod
+                exp_avg.mul_(beta1).addcmul_(1 - beta1, grad)
+                exp_avg_hat = exp_avg.div(bias_correction1)
+                step_size = group['lr']
+                p.data.add_(-step_size, exp_avg_hat)
+        return loss

IS_Net/swd_optim/adams.py ADDED Viewed

	@@ -0,0 +1,137 @@

+import math
+import torch
+from torch.optim.optimizer import Optimizer
+class AdamS(Optimizer):
+    r"""Implements Adam with stable weight decay (AdamS) algorithm.
+    It has be proposed in
+    `Stable Weight Decay Regularization`__.
+    Arguments:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float, optional): learning rate (default: 1e-3)
+        betas (Tuple[float, float], optional): coefficients used for computing
+            running averages of gradient and its square (default: (0.9, 0.999))
+        eps (float, optional): term added to the denominator to improve
+            numerical stability (default: 1e-8)
+        weight_decay (float, optional): weight decay coefficient (default: 1e-4)
+        amsgrad (boolean, optional): whether to use the AMSGrad variant of this
+            algorithm from the paper `On the Convergence of Adam and Beyond`_
+            (default: False)
+    """
+    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,
+                 weight_decay=1e-4, amsgrad=False):
+        if not 0.0 <= lr:
+            raise ValueError("Invalid learning rate: {}".format(lr))
+        if not 0.0 <= eps:
+            raise ValueError("Invalid epsilon value: {}".format(eps))
+        if not 0.0 <= betas[0] < 1.0:
+            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
+        if not 0.0 <= betas[1] < 1.0:
+            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
+        if not 0.0 <= weight_decay:
+            raise ValueError("Invalid weight_decay value: {}".format(weight_decay))
+        defaults = dict(lr=lr, betas=betas, eps=eps,
+                        weight_decay=weight_decay, amsgrad=amsgrad)
+        super(AdamS, self).__init__(params, defaults)
+    def __setstate__(self, state):
+        super(AdamS, self).__setstate__(state)
+        for group in self.param_groups:
+            group.setdefault('amsgrad', False)
+    @torch.no_grad()
+    def step(self, closure=None):
+        """Performs a single optimization step.
+        Arguments:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            with torch.enable_grad():
+                loss = closure()
+        param_size = 0
+        exp_avg_sq_hat_sum = 0.
+        for group in self.param_groups:
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                param_size += p.numel()
+                # Perform optimization step
+                grad = p.grad
+                if grad.is_sparse:
+                    raise RuntimeError('AdamS does not support sparse gradients')
+                amsgrad = group['amsgrad']
+                state = self.state[p]
+                # State initialization
+                if len(state) == 0:
+                    state['step'] = 0
+                    # Exponential moving average of gradient values
+                    state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
+                    # Exponential moving average of squared gradient values
+                    state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
+                    if amsgrad:
+                        # Maintains max of all exp. moving avg. of sq. grad. values
+                        state['max_exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
+                beta1, beta2 = group['betas']
+                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
+                state['step'] += 1
+                bias_correction2 = 1 - beta2 ** state['step']
+                # Decay the first and second moment running average coefficient
+                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
+                exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
+                if amsgrad:
+                    max_exp_avg_sq = state['max_exp_avg_sq']
+                    # Maintains the maximum of all 2nd moment running avg. till now
+                    torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
+                    # Use the max. for normalizing running avg. of gradient
+                    exp_avg_sq_hat = max_exp_avg_sq / bias_correction2
+                else:
+                    exp_avg_sq_hat = exp_avg_sq / bias_correction2
+                exp_avg_sq_hat_sum += exp_avg_sq_hat.sum()
+        # Calculate the sqrt of the mean of all elements in exp_avg_sq_hat
+        exp_avg_mean_sqrt = math.sqrt(exp_avg_sq_hat_sum / param_size)
+        for group in self.param_groups:
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                state = self.state[p]
+                #Perform stable weight decay
+                if group['weight_decay'] !=0:
+                    p.data.mul_(1 - group['weight_decay'] * group['lr'] / exp_avg_mean_sqrt)
+                beta1, beta2 = group['betas']
+                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
+                bias_correction1 = 1 - beta1 ** state['step']
+                bias_correction2 = 1 - beta2 ** state['step']
+                if amsgrad:
+                    max_exp_avg_sq = state['max_exp_avg_sq']
+                    exp_avg_sq_hat = max_exp_avg_sq / bias_correction2
+                else:
+                    exp_avg_sq_hat = exp_avg_sq / bias_correction2
+                denom = exp_avg_sq_hat.sqrt().add(group['eps'])
+                step_size = group['lr'] / bias_correction1
+                p.addcdiv_(exp_avg, denom, value= - step_size)
+        return loss

IS_Net/swd_optim/sgds.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import torch
+from torch.optim.optimizer import Optimizer, required
+class SGDS(Optimizer):
+    r"""Implements stochastic gradient descent with stable weight decay (SGDS).
+    It has be proposed in
+    `Stable Weight Decay Regularization`__.
+    Args:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float): learning rate
+        momentum (float, optional): momentum factor (default: 0)
+        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
+        dampening (float, optional): dampening for momentum (default: 0)
+        nesterov (bool, optional): enables Nesterov momentum (default: False)
+    """
+    def __init__(self, params, lr=required, momentum=0, dampening=0,
+                 weight_decay=0, nesterov=False):
+        if lr is not required and lr < 0.0:
+            raise ValueError("Invalid learning rate: {}".format(lr))
+        if momentum < 0.0:
+            raise ValueError("Invalid momentum value: {}".format(momentum))
+        if weight_decay < 0.0:
+            raise ValueError("Invalid weight_decay value: {}".format(weight_decay))
+        defaults = dict(lr=lr, momentum=momentum, dampening=dampening,
+                        weight_decay=weight_decay, nesterov=nesterov)
+        if nesterov and (momentum <= 0 or dampening != 0):
+            raise ValueError("Nesterov momentum requires a momentum and zero dampening")
+        super(SGDS, self).__init__(params, defaults)
+    def __setstate__(self, state):
+        super(SGDS, self).__setstate__(state)
+        for group in self.param_groups:
+            group.setdefault('nesterov', False)
+    @torch.no_grad()
+    def step(self, closure=None):
+        """Performs a single optimization step.
+        Arguments:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            with torch.enable_grad():
+                loss = closure()
+        for group in self.param_groups:
+            momentum = group['momentum']
+            dampening = group['dampening']
+            nesterov = group['nesterov']
+            for p in group['params']:
+                if p.grad is None:
+                    continue
+                d_p = p.grad
+                # Perform stable weight decay
+                if group['weight_decay'] !=0:
+                    bias_correction = (1 - dampening) / (1 - momentum)
+                    p.data.mul_(1 - bias_correction * group['lr'] * group['weight_decay'])
+                if momentum != 0:
+                    param_state = self.state[p]
+                    if 'momentum_buffer' not in param_state:
+                        buf = param_state['momentum_buffer'] = torch.clone(d_p).detach()
+                    else:
+                        buf = param_state['momentum_buffer']
+                        buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
+                    if nesterov:
+                        d_p = d_p.add(buf, alpha=momentum)
+                    else:
+                        d_p = buf
+                p.add_(d_p, alpha=-group['lr'])
+        return loss

IS_Net/train_valid_inference_main.py ADDED Viewed

	@@ -0,0 +1,729 @@

+import os
+import time
+import numpy as np
+from skimage import io
+import time
+import matplotlib.pyplot as plt
+import torch, gc
+import torch.nn as nn
+from torch.autograd import Variable
+import torch.optim as optim
+import torch.nn.functional as F
+from data_loader import get_im_gt_name_dict, create_dataloaders, GOSRandomHFlip, GOSResize, GOSRandomCrop, GOSNormalize #GOSDatasetCache,
+# from data_loader_cache import get_im_gt_name_dict, create_dataloaders, GOSRandomHFlip, GOSResize, GOSRandomCrop, GOSNormalize #GOSDatasetCache,
+from basics import  f1_mae_torch #normPRED, GOSPRF1ScoresCache,f1score_torch,
+from models.isnet import ISNetGTEncoder, ISNetDIS
+from torch.cuda.amp import autocast, GradScaler
+from datalist import *
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+def get_gt_encoder(train_dataloaders, train_datasets, valid_dataloaders, valid_datasets, hypar, train_dataloaders_val, train_datasets_val): #model_path, model_save_fre, max_ite=1000000):
+    torch.manual_seed(hypar["seed"])
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed(hypar["seed"])
+    print("define gt encoder ...")
+    net = ISNetGTEncoder() #UNETGTENCODERCombine()
+    # if(hypar["model_digit"]=="half"):
+    #     net.half()
+    ## load the existing model gt encoder
+    if(hypar["gt_encoder_model"]!=""):
+        model_path = hypar["model_path"]+"/"+hypar["gt_encoder_model"]
+        if torch.cuda.is_available():
+            net.load_state_dict(torch.load(model_path))
+            net.cuda()
+        else:
+            net.load_state_dict(torch.load(model_path,map_location="cpu"))
+        print("gt encoder restored from the saved weights ...")
+        return net ############
+    if torch.cuda.is_available():
+        net.cuda()
+    print("--- define optimizer for GT Encoder---")
+    # optimizer = lion.Lion(net.parameters(), lr=1e-4, betas=(0.9, 0.99))
+    optimizer = optim.AdamW(net.parameters(), lr=1e-4, betas=(0.9, 0.999), eps=1e-8, weight_decay=0)
+    # optimizer = optim.SGD(net.parameters(), lr=1e-4)
+    model_path = hypar["model_path"]
+    model_save_fre = hypar["model_save_fre"]
+    max_ite = hypar["max_ite"]
+    batch_size_train = hypar["batch_size_train"]
+    batch_size_valid = hypar["batch_size_valid"]
+    if(not os.path.exists(model_path)):
+        os.mkdir(model_path)
+    ite_num = hypar["start_ite"] # count the total iteration number
+    ite_num4val = 0 #
+    running_loss = 0.0 # count the toal loss
+    running_tar_loss = 0.0 # count the target output loss
+    last_f1 = [0 for x in range(len(valid_dataloaders))]
+    train_num = train_datasets[0].__len__()
+    net.train()
+    start_last = time.time()
+    gos_dataloader = train_dataloaders[0]
+    epoch_num = hypar["max_epoch_num"]
+    notgood_cnt = 0
+    for epoch in range(epoch_num): ## set the epoch num as 100000
+        for i, data in enumerate(gos_dataloader):
+            if(ite_num >= max_ite):
+                print("Training Reached the Maximal Iteration Number ", max_ite)
+                exit()
+            # start_read = time.time()
+            ite_num = ite_num + 1
+            ite_num4val = ite_num4val + 1
+            # get the inputs
+            labels = data['label']
+            if(hypar["model_digit"]=="full"):
+                labels = labels.type(torch.FloatTensor)
+            else:
+                labels = labels.type(torch.HalfTensor)
+            # wrap them in Variable
+            if torch.cuda.is_available():
+                labels_v = Variable(labels.cuda(), requires_grad=False)
+            else:
+                labels_v = Variable(labels, requires_grad=False)
+            # print("time lapse for data preparation: ", time.time()-start_read, ' s')
+            # y zero the parameter gradients
+            start_inf_loss_back = time.time()
+            optimizer.zero_grad()
+            # plt.imshow(labels_v[0][0].cpu(),cmap='gray')
+            # plt.show()
+            # with autocast():
+            ds, fs = net(labels_v)#net(inputs_v)
+            loss2, loss = net.compute_loss(ds, labels_v)
+            # scaler.scale(loss).backward()
+            # loss.backward()
+            # scaler.step(optimizer)
+            # scaler.update()
+            #ORTHO Loss
+            reg = 1e-8
+            orth_loss = torch.zeros(1).to(device)
+            for name, param in net.named_parameters():
+                if 'bias' not in name:
+                    param_flat = param.view(param.shape[0], -1)
+                    sym = torch.mm(param_flat, torch.t(param_flat))
+                    sym -= torch.eye(param_flat.shape[0]).to(param.device)
+                    orth_loss = orth_loss + (reg * sym.abs().sum())
+            loss = loss + orth_loss
+            loss.backward()
+            optimizer.step()
+            running_loss += loss.item()
+            running_tar_loss += loss2.item()
+            # del outputs, loss
+            del ds, loss2, loss
+            end_inf_loss_back = time.time()-start_inf_loss_back
+            print("GT Encoder Training>>>"+model_path.split('/')[-1]+" - [epoch: %3d/%3d, batch: %5d/%5d, ite: %d] train loss: %3f, tar: %3f, time-per-iter: %3f s, time_read: %3f" % (
+            epoch + 1, epoch_num, (i + 1) * batch_size_train, train_num, ite_num, running_loss / ite_num4val, running_tar_loss / ite_num4val, time.time()-start_last, time.time()-start_last-end_inf_loss_back))
+            start_last = time.time()
+            if ite_num % model_save_fre == 0:  # validate every 2000 iterations
+                notgood_cnt += 1
+                net.eval()
+                tmp_f1, tmp_mae, val_loss, tar_loss, i_val, tmp_time = valid_gt_encoder(net, valid_dataloaders, valid_datasets, hypar, epoch)
+                # tmp_f1, tmp_mae, val_loss, tar_loss, i_val, tmp_time = valid_gt_encoder(net, train_dataloaders_val, train_datasets_val, hypar, epoch)
+                net.train()  # resume train
+                tmp_out = 0
+                print("last_f1:",last_f1,np.mean(last_f1))
+                print("tmp_f1:",tmp_f1,np.mean(tmp_f1))
+                # for fi in range(len(last_f1)):
+                if(np.mean(tmp_f1)>np.mean(last_f1)):
+                    tmp_out = 1
+                print("tmp_out:",tmp_out)
+                if(tmp_out):
+                    notgood_cnt = 0
+                    last_f1 = tmp_f1
+                    tmp_f1_str = [str(round(f1x,4)) for f1x in tmp_f1]
+                    tmp_mae_str = [str(round(mx,4)) for mx in tmp_mae]
+                    maxf1 = '_'.join(tmp_f1_str)
+                    meanM = '_'.join(tmp_mae_str)
+                    # .cpu().detach().numpy()
+                    model_name = "/GTENCODER-gpu_itr_"+str(ite_num)+\
+                                "_traLoss_"+str(np.round(running_loss / ite_num4val,4))+\
+                                "_traTarLoss_"+str(np.round(running_tar_loss / ite_num4val,4))+\
+                                "_valLoss_"+str(np.round(val_loss /(i_val+1),4))+\
+                                "_valTarLoss_"+str(np.round(tar_loss /(i_val+1),4)) + \
+                                "_maxF1_" + maxf1 + \
+                                "_mae_" + meanM + \
+                                "_time_" + str(np.round(np.mean(np.array(tmp_time))/batch_size_valid,6))+".pth"
+                    torch.save(net.state_dict(), model_path + model_name)
+                running_loss = 0.0
+                running_tar_loss = 0.0
+                ite_num4val = 0
+                if(np.mean(tmp_f1)>0.99):
+                    print("GT encoder is well-trained and obtained...")
+                    return net
+                if(notgood_cnt >= hypar["early_stop"]):
+                    print("No improvements in the last "+str(notgood_cnt)+" validation periods, so training stopped !")
+                    exit()
+    print("Training Reaches The Maximum Epoch Number")
+    return net
+def valid_gt_encoder(net, valid_dataloaders, valid_datasets, hypar, epoch=0):
+    net.eval()
+    print("Validating...")
+    epoch_num = hypar["max_epoch_num"]
+    val_loss = 0.0
+    tar_loss = 0.0
+    tmp_f1 = []
+    tmp_mae = []
+    tmp_time = []
+    start_valid = time.time()
+    for k in range(len(valid_dataloaders)):
+        valid_dataloader = valid_dataloaders[k]
+        valid_dataset = valid_datasets[k]
+        val_num = valid_dataset.__len__()
+        mybins = np.arange(0,256)
+        PRE = np.zeros((val_num,len(mybins)-1))
+        REC = np.zeros((val_num,len(mybins)-1))
+        F1 = np.zeros((val_num,len(mybins)-1))
+        MAE = np.zeros((val_num))
+        val_cnt = 0.0
+        i_val = None
+        for i_val, data_val in enumerate(valid_dataloader):
+            # imidx_val, inputs_val, labels_val, shapes_val = data_val['imidx'], data_val['image'], data_val['label'], data_val['shape']
+            imidx_val, labels_val, shapes_val = data_val['imidx'], data_val['label'], data_val['shape']
+            if(hypar["model_digit"]=="full"):
+                labels_val = labels_val.type(torch.FloatTensor)
+            else:
+                labels_val = labels_val.type(torch.HalfTensor)
+            # wrap them in Variable
+            if torch.cuda.is_available():
+                labels_val_v = Variable(labels_val.cuda(), requires_grad=False)
+            else:
+                labels_val_v = Variable(labels_val,requires_grad=False)
+            # with autocast():
+            t_start = time.time()
+            ds_val = net(labels_val_v)[0]
+            t_end = time.time()-t_start
+            tmp_time.append(t_end)
+            # loss2_val, loss_val = muti_loss_fusion(ds_val, labels_val_v)
+            loss2_val, loss_val = net.compute_loss(ds_val, labels_val_v)
+            # compute F measure
+            for t in range(hypar["batch_size_valid"]):
+                val_cnt = val_cnt + 1.0
+                print("num of val: ", val_cnt)
+                i_test = imidx_val[t].data.numpy()
+                pred_val = ds_val[0][t,:,:,:].float() # B x 1 x H x W
+                ## recover the prediction spatial size to the orignal image size
+                pred_val = torch.squeeze(F.upsample(torch.unsqueeze(pred_val,0),(shapes_val[t][0],shapes_val[t][1]),mode='bilinear'))
+                ma = torch.max(pred_val)
+                mi = torch.min(pred_val)
+                pred_val = (pred_val-mi)/(ma-mi) # max = 1
+                # pred_val = normPRED(pred_val)
+                gt = np.squeeze(io.imread(valid_dataset.dataset["ori_gt_path"][i_test])) # max = 255
+                if gt.max()==1:
+                    gt=gt*255
+                with torch.no_grad():
+                    gt = torch.tensor(gt).to(device)
+                pre,rec,f1,mae = f1_mae_torch(pred_val*255, gt, valid_dataset, i_test, mybins, hypar)
+                PRE[i_test,:]=pre
+                REC[i_test,:] = rec
+                F1[i_test,:] = f1
+                MAE[i_test] = mae
+            del ds_val, gt
+            gc.collect()
+            torch.cuda.empty_cache()
+            # if(loss_val.data[0]>1):
+            val_loss += loss_val.item()#data[0]
+            tar_loss += loss2_val.item()#data[0]
+            print("[validating: %5d/%5d] val_ls:%f, tar_ls: %f, f1: %f, mae: %f, time: %f"% (i_val, val_num, val_loss / (i_val + 1), tar_loss / (i_val + 1), np.amax(F1[i_test,:]), MAE[i_test],t_end))
+            del loss2_val, loss_val
+        print('============================')
+        PRE_m = np.mean(PRE,0)
+        REC_m = np.mean(REC,0)
+        f1_m = (1+0.3)*PRE_m*REC_m/(0.3*PRE_m+REC_m+1e-8)
+        # print('--------------:', np.mean(f1_m))
+        tmp_f1.append(np.amax(f1_m))
+        tmp_mae.append(np.mean(MAE))
+        print("The max F1 Score: %f"%(np.max(f1_m)))
+        print("MAE: ", np.mean(MAE))
+    # print('[epoch: %3d/%3d, ite: %5d] tra_ls: %3f, val_ls: %3f, tar_ls: %3f, maxf1: %3f, val_time: %6f'% (epoch + 1, epoch_num, ite_num, running_loss / ite_num4val, val_loss/val_cnt, tar_loss/val_cnt, tmp_f1[-1], time.time()-start_valid))
+    return tmp_f1, tmp_mae, val_loss, tar_loss, i_val, tmp_time
+def train(net, optimizer, train_dataloaders, train_datasets, valid_dataloaders, valid_datasets, hypar,train_dataloaders_val, train_datasets_val): #model_path, model_save_fre, max_ite=1000000):
+    if hypar["interm_sup"]:
+        print("Get the gt encoder ...")
+        featurenet = get_gt_encoder(train_dataloaders, train_datasets, valid_dataloaders, valid_datasets, hypar,train_dataloaders_val, train_datasets_val)
+        ## freeze the weights of gt encoder
+        for param in featurenet.parameters():
+            param.requires_grad=False
+    # scaler = GradScaler()
+    model_path = hypar["model_path"]
+    model_save_fre = hypar["model_save_fre"]
+    max_ite = hypar["max_ite"]
+    batch_size_train = hypar["batch_size_train"]
+    batch_size_valid = hypar["batch_size_valid"]
+    if(not os.path.exists(model_path)):
+        os.mkdir(model_path)
+    ite_num = hypar["start_ite"] # count the toal iteration number
+    ite_num4val = 0 #
+    running_loss = 0.0 # count the toal loss
+    running_tar_loss = 0.0 # count the target output loss
+    last_mae = [1 for x in range(len(valid_dataloaders))]
+    last_f1 = [0 for x in range(len(valid_dataloaders))]
+    train_num = train_datasets[0].__len__()
+    net.train()
+    start_last = time.time()
+    gos_dataloader = train_dataloaders[0]
+    epoch_num = hypar["max_epoch_num"]
+    notgood_cnt = 0
+    for epoch in range(epoch_num): ## set the epoch num as 100000
+        for i, data in enumerate(gos_dataloader):
+            if(ite_num >= max_ite):
+                print("Training Reached the Maximal Iteration Number ", max_ite)
+                exit()
+            # start_read = time.time()
+            ite_num = ite_num + 1
+            ite_num4val = ite_num4val + 1
+            # get the inputs
+            inputs, labels = data['image'], data['label']
+            locations = data['location_blocks']
+            if(hypar["model_digit"]=="full"):
+                inputs = inputs.type(torch.FloatTensor)
+                labels = labels.type(torch.FloatTensor)
+                locations = locations.type(torch.FloatTensor)
+            else:
+                inputs = inputs.type(torch.HalfTensor)
+                labels = labels.type(torch.HalfTensor)
+                locations = locations.type(torch.HalfTensor)
+            # wrap them in Variable
+            if torch.cuda.is_available():
+                inputs_v, labels_v = Variable(inputs.cuda(), requires_grad=False), Variable(labels.cuda(), requires_grad=False)
+                locations_v = Variable(locations.cuda(), requires_grad=False)
+            else:
+                inputs_v, labels_v = Variable(inputs, requires_grad=False), Variable(labels, requires_grad=False)
+                locations_v = Variable(locations, requires_grad=False)
+            # print("time lapse for data preparation: ", time.time()-start_read, ' s')
+            # y zero the parameter gradients
+            start_inf_loss_back = time.time()
+            optimizer.zero_grad()
+            if hypar["interm_sup"]:
+                # with autocast():
+                # forward + backward + optimize
+                    _,fs = featurenet(labels_v)
+                    ds,dfs = net(inputs_v)
+                     ## extract the gt encodings
+                    loss2, loss = net.compute_loss_kl(ds, labels_v, dfs, fs, mode='MSE')
+                    # loss2, loss = net.compute_loss_kl(ds, labels_v, dfs, fs, mode='cosin')
+                    # print(next(featurenet.parameters()).dtype,next(net.parameters()).dtype,labels_v.dtype,fs[0][0].dtype)
+                    # print(ds[0][0].dtype,dfs[0][0].dtype)
+                    # print(loss2.dtype,loss.dtype)
+            else:
+                # with autocast():
+                # forward + backward + optimize
+                    ds,_ = net(inputs_v)
+                    loss2, loss = net.compute_loss(ds, labels_v)
+            # loss.backward()
+            # with torch.autograd.detect_anomaly():
+                # scaler.scale(loss).backward()
+            #ORTHO Loss
+            reg = 1e-8
+            orth_loss = torch.zeros(1).to(device)
+            for name, param in net.named_parameters():
+                if 'bias' not in name:
+                    param_flat = param.view(param.shape[0], -1)
+                    sym = torch.mm(param_flat, torch.t(param_flat))
+                    sym -= torch.eye(param_flat.shape[0]).to(device)
+                    orth_loss = orth_loss + (reg * sym.abs().sum())
+            loss = loss + orth_loss
+            loss.backward()
+            # scaler.step(optimizer)
+            # scaler.update()
+            optimizer.step()
+            # torch.cuda.empty_cache()
+            # # print statistics
+            running_loss += loss.item()
+            running_tar_loss += loss2.item()
+            # del outputs, loss
+            del ds, loss2, loss
+            end_inf_loss_back = time.time()-start_inf_loss_back
+            print(">>>"+model_path.split('/')[-1]+" - [epoch: %3d/%3d, batch: %5d/%5d, ite: %d] train loss: %3f, tar: %3f, time-per-iter: %3f s, time_read: %3f" % (
+            epoch + 1, epoch_num, (i + 1) * batch_size_train, train_num, ite_num, running_loss / ite_num4val, running_tar_loss / ite_num4val, time.time()-start_last, time.time()-start_last-end_inf_loss_back))
+            start_last = time.time()
+            if ite_num % model_save_fre == 0:  # validate every 2000 iterations
+                notgood_cnt += 1
+                net.eval()
+                tmp_f1, tmp_mae, val_loss, tar_loss, i_val, tmp_time = valid(net, valid_dataloaders, valid_datasets, hypar, epoch)
+                torch.cuda.empty_cache()
+                net.train()  # resume train
+                tmp_out = 0
+                print("last_f1:",last_f1,np.mean(last_f1))
+                print("tmp_f1:",tmp_f1,np.mean(tmp_f1))
+                if np.mean(tmp_mae)<np.mean(last_mae):
+                    last_mae = tmp_mae
+                    tmp_out = 1
+                if np.mean(tmp_f1)>np.mean(last_f1):
+                    last_f1 = tmp_f1
+                    tmp_out = 1
+                print("tmp_out:",tmp_out)
+                if(tmp_out):
+                    notgood_cnt = 0
+                    # last_f1 = tmp_f1
+                    tmp_f1_str = [str(round(f1x,4)) for f1x in tmp_f1]
+                    tmp_mae_str = [str(round(mx,4)) for mx in tmp_mae]
+                    maxf1 = '_'.join(tmp_f1_str)
+                    meanM = '_'.join(tmp_mae_str)
+                    # .cpu().detach().numpy()
+                    model_name = "/gpu_itr_"+str(ite_num)+\
+                                "_traLoss_"+str(np.round(running_loss / ite_num4val,4))+\
+                                "_traTarLoss_"+str(np.round(running_tar_loss / ite_num4val,4))+\
+                                "_valLoss_"+str(np.round(val_loss /(i_val+1),4))+\
+                                "_valTarLoss_"+str(np.round(tar_loss /(i_val+1),4)) + \
+                                "_maxF1_" + maxf1 + \
+                                "_mae_" + meanM + \
+                                "_time_" + str(np.round(np.mean(np.array(tmp_time))/batch_size_valid,6))+".pth"
+                    torch.save(net.state_dict(), model_path + model_name)
+                running_loss = 0.0
+                running_tar_loss = 0.0
+                ite_num4val = 0
+                if(notgood_cnt >= hypar["early_stop"]):
+                    print("No improvements in the last "+str(notgood_cnt)+" validation periods, so training stopped !")
+                    exit()
+    print("Training Reaches The Maximum Epoch Number")
+def valid(net, valid_dataloaders, valid_datasets, hypar, epoch=0):
+    net.eval()
+    print("Validating...")
+    epoch_num = hypar["max_epoch_num"]
+    val_loss = 0.0
+    tar_loss = 0.0
+    val_cnt = 0.0
+    tmp_f1 = []
+    tmp_mae = []
+    tmp_time = []
+    start_valid = time.time()
+    for k in range(len(valid_dataloaders)):
+        valid_dataloader = valid_dataloaders[k]
+        valid_dataset = valid_datasets[k]
+        val_num = valid_dataset.__len__()
+        mybins = np.arange(0,256)
+        PRE = np.zeros((val_num,len(mybins)-1))
+        REC = np.zeros((val_num,len(mybins)-1))
+        F1 = np.zeros((val_num,len(mybins)-1))
+        MAE = np.zeros((val_num))
+        for i_val, data_val in enumerate(valid_dataloader):
+            val_cnt = val_cnt + 1.0
+            imidx_val, inputs_val, labels_val, shapes_val = data_val['imidx'], data_val['image'], data_val['label'], data_val['shape']
+            if(hypar["model_digit"]=="full"):
+                inputs_val = inputs_val.type(torch.FloatTensor)
+                labels_val = labels_val.type(torch.FloatTensor)
+            else:
+                inputs_val = inputs_val.type(torch.HalfTensor)
+                labels_val = labels_val.type(torch.HalfTensor)
+            # wrap them in Variable
+            if torch.cuda.is_available():
+                inputs_val_v, labels_val_v = Variable(inputs_val.cuda(), requires_grad=False), Variable(labels_val.cuda(), requires_grad=False)
+            else:
+                inputs_val_v, labels_val_v = Variable(inputs_val, requires_grad=False), Variable(labels_val,requires_grad=False)
+            # with autocast():
+            t_start = time.time()
+            ds_val = net(inputs_val_v)[0]
+            # plt.imshow(inputs_val_v[0][0].cpu().detach())
+            # plt.show()
+            # print(inputs_val_v.cpu().detach().shape)
+            t_end = time.time()-t_start
+            tmp_time.append(t_end)
+            # loss2_val, loss_val = muti_loss_fusion(ds_val, labels_val_v)
+            loss2_val, loss_val = net.compute_loss(ds_val, labels_val_v)
+            # compute F measure
+            for t in range(hypar["batch_size_valid"]):
+                i_test = imidx_val[t].data.numpy()
+                pred_val = ds_val[0][t,:,:,:].float() # B x 1 x H x W
+                ## recover the prediction spatial size to the orignal image size
+                pred_val = torch.squeeze(F.upsample(torch.unsqueeze(pred_val,0),(shapes_val[t][0],shapes_val[t][1]),mode='bilinear'))
+                # pred_val = normPRED(pred_val)
+                ma = torch.max(pred_val)
+                mi = torch.min(pred_val)
+                pred_val = (pred_val-mi)/(ma-mi) # max = 1
+                gt = np.squeeze(io.imread(valid_dataset.dataset["ori_gt_path"][i_test])) # max = 255
+                if gt.max()==1:
+                    gt=gt*255
+                with torch.no_grad():
+                    gt = torch.tensor(gt).to(device)
+                pre,rec,f1,mae = f1_mae_torch(pred_val*255, gt, valid_dataset, i_test, mybins, hypar)
+                PRE[i_test,:]=pre
+                REC[i_test,:] = rec
+                F1[i_test,:] = f1
+                MAE[i_test] = mae
+                del ds_val, gt
+                gc.collect()
+                torch.cuda.empty_cache()
+            # if(loss_val.data[0]>1):
+            val_loss += loss_val.item()#data[0]
+            tar_loss += loss2_val.item()#data[0]
+            print("[validating: %5d/%5d] val_ls:%f, tar_ls: %f, f1: %f, mae: %f, time: %f"% (i_val, val_num, val_loss / (i_val + 1), tar_loss / (i_val + 1), np.amax(F1[i_test,:]), MAE[i_test],t_end))
+            del loss2_val, loss_val
+        print('============================')
+        PRE_m = np.mean(PRE,0)
+        REC_m = np.mean(REC,0)
+        f1_m = (1+0.3)*PRE_m*REC_m/(0.3*PRE_m+REC_m+1e-8)
+        tmp_f1.append(np.amax(f1_m))
+        tmp_mae.append(np.mean(MAE))
+    return tmp_f1, tmp_mae, val_loss, tar_loss, i_val, tmp_time
+def main(train_datasets,
+         valid_datasets,
+         hypar): # model: "train", "test"
+    ### --- Step 1: Build datasets and dataloaders ---
+    dataloaders_train = []
+    dataloaders_valid = []
+    if(hypar["mode"]=="train"):
+        print("--- create training dataloader ---")
+        ## collect training dataset
+        train_nm_im_gt_list = get_im_gt_name_dict(train_datasets, flag="train")
+        ## build dataloader for training datasets
+        train_dataloaders, train_datasets = create_dataloaders(train_nm_im_gt_list,
+                                                             cache_size = hypar["cache_size"],
+                                                             cache_boost = hypar["cache_boost_train"],
+                                                             my_transforms = [
+                                                                             GOSRandomHFlip(), ## this line can be uncommented for horizontal flip augmetation
+                                                                             # GOSResize(hypar["input_size"]),
+                                                                            #  GOSRandomCrop(hypar["crop_size"]), ## this line can be uncommented for randomcrop augmentation
+                                                                            #   GOSNormalize([0.5,0.5,0.5,0,0,0,0,0],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]),
+                                                                            GOSNormalize([0.5,0.5,0.5,0,0],[1.0,1.0,1.0,1.0,1.0]),
+                                                                            # GOSNormalize([0.5,0.5,0.5],[1.0,1.0,1.0]),
+                                                                            # GOSNormalize([123.675, 116.28, 103.53],[58.395, 57.12, 57.375])
+                                                                              ],
+                                                             batch_size = hypar["batch_size_train"],
+                                                             shuffle = True,
+                                                             is_train=True)
+        train_dataloaders_val, train_datasets_val = create_dataloaders(train_nm_im_gt_list,
+                                                             cache_size = hypar["cache_size"],
+                                                             cache_boost = hypar["cache_boost_train"],
+                                                             my_transforms = [
+                                                                            #   GOSNormalize([0.5,0.5,0.5,0,0,0,0,0],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]),
+                                                                            GOSNormalize([0.5,0.5,0.5,0,0],[1.0,1.0,1.0,1.0,1.0]),
+                                                                            # GOSNormalize([0.5,0.5,0.5],[1.0,1.0,1.0]),
+                                                                            # GOSNormalize([123.675, 116.28, 103.53],[58.395, 57.12, 57.375])
+                                                                              ],
+                                                             batch_size = hypar["batch_size_valid"],
+                                                             shuffle = False,
+                                                             is_train=False)
+        print(len(train_dataloaders), " train dataloaders created")
+    print("--- create valid dataloader ---")
+    ## build dataloader for validation or testing
+    valid_nm_im_gt_list = get_im_gt_name_dict(valid_datasets, flag="valid")
+    ## build dataloader for training datasets
+    valid_dataloaders, valid_datasets = create_dataloaders(valid_nm_im_gt_list,
+                                                          cache_size = hypar["cache_size"],
+                                                          cache_boost = hypar["cache_boost_valid"],
+                                                          my_transforms = [
+                                                                        #    GOSNormalize([0.5,0.5,0.5,0,0,0,0,0],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]),
+                                                                        GOSNormalize([0.5,0.5,0.5,0,0],[1.0,1.0,1.0,1.0,1.0]),
+                                                                        # GOSNormalize([0.5,0.5,0.5],[1.0,1.0,1.0]),
+                                                                        # GOSNormalize([123.675, 116.28, 103.53],[58.395, 57.12, 57.375])
+                                                                        #    GOSResize(hypar["input_size"])
+                                                                           ],
+                                                          batch_size=hypar["batch_size_valid"],
+                                                          shuffle=False,
+                                                          is_train=False)
+    print(len(valid_dataloaders), " valid dataloaders created")
+    # print(valid_datasets[0]["data_name"])
+    ### --- Step 2: Build Model and Optimizer ---
+    print("--- build model ---")
+    net = hypar["model"]#GOSNETINC(3,1)
+    # convert to half precision
+    # if(hypar["model_digit"]=="half"):
+    #     net.half()
+    if torch.cuda.is_available():
+        net.cuda()
+    if(hypar["restore_model"]!=""):
+        print("restore model from:")
+        print(hypar["model_path"]+"/"+hypar["restore_model"])
+        if torch.cuda.is_available():
+            net.load_state_dict(torch.load(hypar["model_path"]+"/"+hypar["restore_model"]),strict=False)
+        else:
+            net.load_state_dict(torch.load(hypar["model_path"]+"/"+hypar["restore_model"],map_location="cpu"),strict=False)
+    print("--- define optimizer ---")
+    # optimizer = optim.AdamW(net.parameters(), lr=1e-3, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
+    optimizer = optim.AdamW(net.parameters(), lr=4e-5, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
+    ### --- Step 3: Train or Valid Model ---
+    if(hypar["mode"]=="train"):
+        train(net,
+              optimizer,
+              train_dataloaders,
+              train_datasets,
+              valid_dataloaders,
+              valid_datasets,
+              hypar,
+              train_dataloaders_val, train_datasets_val)
+    else:
+        valid(net,
+              valid_dataloaders,
+              valid_datasets,
+              hypar)
+if __name__ == "__main__":
+    ### --------------- STEP 1: Configuring the Train, Valid and Test datasets ---------------
+    ## configure the train, valid and inference datasets
+    train_datasets, valid_datasets = [], []
+    valid_datasets  = [dataset_test] ## users can create mutiple dictionary for setting a list of datasets as vaidation sets or inference sets
+    train_datasets = [dataset_test] ## users can create mutiple dictionary for setting a list of datasets as training set
+    ### --------------- STEP 2: Configuring the hyperparamters for Training, validation and inferencing ---------------
+    hypar = {}
+    ## -- 2.1. configure the model saving or restoring path --
+    hypar["mode"] = "train"
+    ## "train": for training,
+    ## "valid": for validation and inferening,
+    ## in "valid" mode, it will calculate the accuracy as well as save the prediciton results into the "hypar["valid_out_dir"]", which shouldn't be ""
+    ## otherwise only accuracy will bee calculated and no predictions will be saved
+    hypar["interm_sup"] = True ## in-dicate if activate intermediate feature supervision
+    if hypar["mode"] == "train":
+        hypar["valid_out_dir"] = "" ## for "train" model leave it as "", for "valid"("inference") mode: set it according to your local directory
+        hypar["model_path"] ="./saved_models" ## model weights saving (or restoring) path
+        hypar["restore_model"] = "" ## name of the segmentation model weights .pth for resume training process from last stop or for the inferencing
+        hypar["start_ite"] = 0  ## start iteration for the training, can be changed to match the restored training process
+        hypar["gt_encoder_model"] = ""
+    else: ## configure the segmentation output path and the to-be-used model weights path
+        hypar["valid_out_dir"] = "./your-results/"##".D:/Code/Design_for_graduation/DIS-main/IS-Net/DIS5K-Results-test" ## output inferenced segmentation maps into this fold
+        hypar["model_path"] = "./saved_models" ## load trained weights from this path
+        hypar["restore_model"] = "gpu_itr_102000_traLoss_2.5701_traTarLoss_0.0248_valLoss_2.3643_valTarLoss_0.3743_maxF1_0.8063_mae_0.0825_time_0.015695.pth"##"isnet.pth" ## name of the to-be-loaded weights
+    # if hypar["restore_model"]!="":
+    #     hypar["start_ite"] = int(hypar["restore_model"].split("_")[2])
+    ## -- 2.2. choose floating point accuracy --
+    hypar["model_digit"] = "full" ## indicates "half" or "full" accuracy of float number
+    hypar["seed"] = 0
+    ## -- 2.3. cache data spatial size --
+    ## To handle large size input images, which take a lot of time for loading in training,
+    #  we introduce the cache mechanism for pre-convering and resizing the jpg and png images into .pt file
+    hypar["cache_size"] = [1024, 1024] ## cached input spatial resolution, can be configured into different size
+    hypar["cache_boost_train"] = False ## "True" or "False", indicates wheather to load all the training datasets into RAM, True will greatly speed the training process while requires more RAM
+    hypar["cache_boost_valid"] = False ## "True" or "False", indicates wheather to load all the validation datasets into RAM, True will greatly speed the training process while requires more RAM
+    ## --- 2.4. data augmentation parameters ---
+    hypar["input_size"] = [1024, 1024] ## mdoel input spatial size, usually use the same value hypar["cache_size"], which means we don't further resize the images
+    hypar["crop_size"] = [1024, 1024] ## random crop size from the input, it is usually set as smaller than hypar["cache_size"], e.g., [920,920] for data augmentation
+    hypar["random_flip_h"] = 1 ## horizontal flip, currently hard coded in the datader and it is not in use
+    hypar["random_flip_v"] = 1 ## vertical flip , currently not in use
+    ## --- 2.5. define model  ---
+    print("building model...")
+    hypar["model"] = ISNetDIS(in_ch=5) #U2NETFASTFEATURESUP()
+    hypar["early_stop"] = 20 ## stop the training when no improvement in the past 20 validation periods, smaller numbers can be used here e.g., 5 or 10.
+    hypar["model_save_fre"] = 3000 ## valid and save model weights every 2000 iterations
+    hypar["batch_size_train"] = 6 ## batch size for training
+    hypar["batch_size_valid"] = 1 ## batch size for validation and inferencing
+    print("batch size: ", hypar["batch_size_train"])
+    hypar["max_ite"] = 50000000 ## if early stop couldn't stop the training process, stop it by the max_ite_num
+    hypar["max_epoch_num"] = 500000 ## if early stop and max_ite couldn't stop the training process, stop it by the max_epoch_num
+    main(train_datasets,
+         valid_datasets,
+         hypar=hypar)

MultiScaleDeformableAttention-1.0-py3-none-any.whl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:152caec7860d1f39f644ac5eed946b5a4eecfad40764396345b3d0e516921b17
+size 2048806

README.md CHANGED Viewed

@@ -9,6 +9,8 @@ app_file: app.py
 pinned: false
 license: mit
 short_description: SAM-prompted dichotomous segmentation. No affiliation.
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 license: mit
 short_description: SAM-prompted dichotomous segmentation. No affiliation.
+python_version: 3.11
+preload_from_hub:
+  - jwlarocque/DIS-SAM DIS-SAM-checkpoint.pth
+  - andzhang01/segment_anything sam_vit_l_0b3195.pth
 ---

SAM/segment_anything/__init__.py ADDED Viewed

	@@ -0,0 +1,15 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from .build_sam import (
+    build_sam,
+    build_sam_vit_h,
+    build_sam_vit_l,
+    build_sam_vit_b,
+    sam_model_registry,
+)
+from .predictor import SamPredictor
+from .automatic_mask_generator import SamAutomaticMaskGenerator

SAM/segment_anything/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (549 Bytes). View file

SAM/segment_anything/__pycache__/automatic_mask_generator.cpython-311.pyc ADDED Viewed

Binary file (18.3 kB). View file

SAM/segment_anything/__pycache__/build_sam.cpython-311.pyc ADDED Viewed

Binary file (3.22 kB). View file

SAM/segment_anything/__pycache__/predictor.cpython-311.pyc ADDED Viewed

Binary file (14.1 kB). View file

SAM/segment_anything/automatic_mask_generator.py ADDED Viewed

	@@ -0,0 +1,372 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import numpy as np
+import torch
+from torchvision.ops.boxes import batched_nms, box_area  # type: ignore
+from typing import Any, Dict, List, Optional, Tuple
+from .modeling import Sam
+from .predictor import SamPredictor
+from .utils.amg import (
+    MaskData,
+    area_from_rle,
+    batch_iterator,
+    batched_mask_to_box,
+    box_xyxy_to_xywh,
+    build_all_layer_point_grids,
+    calculate_stability_score,
+    coco_encode_rle,
+    generate_crop_boxes,
+    is_box_near_crop_edge,
+    mask_to_rle_pytorch,
+    remove_small_regions,
+    rle_to_mask,
+    uncrop_boxes_xyxy,
+    uncrop_masks,
+    uncrop_points,
+)
+class SamAutomaticMaskGenerator:
+    def __init__(
+        self,
+        model: Sam,
+        points_per_side: Optional[int] = 32,
+        points_per_batch: int = 64,
+        pred_iou_thresh: float = 0.88,
+        stability_score_thresh: float = 0.95,
+        stability_score_offset: float = 1.0,
+        box_nms_thresh: float = 0.7,
+        crop_n_layers: int = 0,
+        crop_nms_thresh: float = 0.7,
+        crop_overlap_ratio: float = 512 / 1500,
+        crop_n_points_downscale_factor: int = 1,
+        point_grids: Optional[List[np.ndarray]] = None,
+        min_mask_region_area: int = 0,
+        output_mode: str = "binary_mask",
+    ) -> None:
+        """
+        Using a SAM model, generates masks for the entire image.
+        Generates a grid of point prompts over the image, then filters
+        low quality and duplicate masks. The default settings are chosen
+        for SAM with a ViT-H backbone.
+        Arguments:
+          model (Sam): The SAM model to use for mask prediction.
+          points_per_side (int or None): The number of points to be sampled
+            along one side of the image. The total number of points is
+            points_per_side**2. If None, 'point_grids' must provide explicit
+            point sampling.
+          points_per_batch (int): Sets the number of points run simultaneously
+            by the model. Higher numbers may be faster but use more GPU memory.
+          pred_iou_thresh (float): A filtering threshold in [0,1], using the
+            model's predicted mask quality.
+          stability_score_thresh (float): A filtering threshold in [0,1], using
+            the stability of the mask under changes to the cutoff used to binarize
+            the model's mask predictions.
+          stability_score_offset (float): The amount to shift the cutoff when
+            calculated the stability score.
+          box_nms_thresh (float): The box IoU cutoff used by non-maximal
+            suppression to filter duplicate masks.
+          crop_n_layers (int): If >0, mask prediction will be run again on
+            crops of the image. Sets the number of layers to run, where each
+            layer has 2**i_layer number of image crops.
+          crop_nms_thresh (float): The box IoU cutoff used by non-maximal
+            suppression to filter duplicate masks between different crops.
+          crop_overlap_ratio (float): Sets the degree to which crops overlap.
+            In the first crop layer, crops will overlap by this fraction of
+            the image length. Later layers with more crops scale down this overlap.
+          crop_n_points_downscale_factor (int): The number of points-per-side
+            sampled in layer n is scaled down by crop_n_points_downscale_factor**n.
+          point_grids (list(np.ndarray) or None): A list over explicit grids
+            of points used for sampling, normalized to [0,1]. The nth grid in the
+            list is used in the nth crop layer. Exclusive with points_per_side.
+          min_mask_region_area (int): If >0, postprocessing will be applied
+            to remove disconnected regions and holes in masks with area smaller
+            than min_mask_region_area. Requires opencv.
+          output_mode (str): The form masks are returned in. Can be 'binary_mask',
+            'uncompressed_rle', or 'coco_rle'. 'coco_rle' requires pycocotools.
+            For large resolutions, 'binary_mask' may consume large amounts of
+            memory.
+        """
+        assert (points_per_side is None) != (
+            point_grids is None
+        ), "Exactly one of points_per_side or point_grid must be provided."
+        if points_per_side is not None:
+            self.point_grids = build_all_layer_point_grids(
+                points_per_side,
+                crop_n_layers,
+                crop_n_points_downscale_factor,
+            )
+        elif point_grids is not None:
+            self.point_grids = point_grids
+        else:
+            raise ValueError("Can't have both points_per_side and point_grid be None.")
+        assert output_mode in [
+            "binary_mask",
+            "uncompressed_rle",
+            "coco_rle",
+        ], f"Unknown output_mode {output_mode}."
+        if output_mode == "coco_rle":
+            from pycocotools import mask as mask_utils  # type: ignore # noqa: F401
+        if min_mask_region_area > 0:
+            import cv2  # type: ignore # noqa: F401
+        self.predictor = SamPredictor(model)
+        self.points_per_batch = points_per_batch
+        self.pred_iou_thresh = pred_iou_thresh
+        self.stability_score_thresh = stability_score_thresh
+        self.stability_score_offset = stability_score_offset
+        self.box_nms_thresh = box_nms_thresh
+        self.crop_n_layers = crop_n_layers
+        self.crop_nms_thresh = crop_nms_thresh
+        self.crop_overlap_ratio = crop_overlap_ratio
+        self.crop_n_points_downscale_factor = crop_n_points_downscale_factor
+        self.min_mask_region_area = min_mask_region_area
+        self.output_mode = output_mode
+    @torch.no_grad()
+    def generate(self, image: np.ndarray) -> List[Dict[str, Any]]:
+        """
+        Generates masks for the given image.
+        Arguments:
+          image (np.ndarray): The image to generate masks for, in HWC uint8 format.
+        Returns:
+           list(dict(str, any)): A list over records for masks. Each record is
+             a dict containing the following keys:
+               segmentation (dict(str, any) or np.ndarray): The mask. If
+                 output_mode='binary_mask', is an array of shape HW. Otherwise,
+                 is a dictionary containing the RLE.
+               bbox (list(float)): The box around the mask, in XYWH format.
+               area (int): The area in pixels of the mask.
+               predicted_iou (float): The model's own prediction of the mask's
+                 quality. This is filtered by the pred_iou_thresh parameter.
+               point_coords (list(list(float))): The point coordinates input
+                 to the model to generate this mask.
+               stability_score (float): A measure of the mask's quality. This
+                 is filtered on using the stability_score_thresh parameter.
+               crop_box (list(float)): The crop of the image used to generate
+                 the mask, given in XYWH format.
+        """
+        # Generate masks
+        mask_data = self._generate_masks(image)
+        # Filter small disconnected regions and holes in masks
+        if self.min_mask_region_area > 0:
+            mask_data = self.postprocess_small_regions(
+                mask_data,
+                self.min_mask_region_area,
+                max(self.box_nms_thresh, self.crop_nms_thresh),
+            )
+        # Encode masks
+        if self.output_mode == "coco_rle":
+            mask_data["segmentations"] = [coco_encode_rle(rle) for rle in mask_data["rles"]]
+        elif self.output_mode == "binary_mask":
+            mask_data["segmentations"] = [rle_to_mask(rle) for rle in mask_data["rles"]]
+        else:
+            mask_data["segmentations"] = mask_data["rles"]
+        # Write mask records
+        curr_anns = []
+        for idx in range(len(mask_data["segmentations"])):
+            ann = {
+                "segmentation": mask_data["segmentations"][idx],
+                "area": area_from_rle(mask_data["rles"][idx]),
+                "bbox": box_xyxy_to_xywh(mask_data["boxes"][idx]).tolist(),
+                "predicted_iou": mask_data["iou_preds"][idx].item(),
+                "point_coords": [mask_data["points"][idx].tolist()],
+                "stability_score": mask_data["stability_score"][idx].item(),
+                "crop_box": box_xyxy_to_xywh(mask_data["crop_boxes"][idx]).tolist(),
+            }
+            curr_anns.append(ann)
+        return curr_anns
+    def _generate_masks(self, image: np.ndarray) -> MaskData:
+        orig_size = image.shape[:2]
+        crop_boxes, layer_idxs = generate_crop_boxes(
+            orig_size, self.crop_n_layers, self.crop_overlap_ratio
+        )
+        # Iterate over image crops
+        data = MaskData()
+        for crop_box, layer_idx in zip(crop_boxes, layer_idxs):
+            crop_data = self._process_crop(image, crop_box, layer_idx, orig_size)
+            data.cat(crop_data)
+        # Remove duplicate masks between crops
+        if len(crop_boxes) > 1:
+            # Prefer masks from smaller crops
+            scores = 1 / box_area(data["crop_boxes"])
+            scores = scores.to(data["boxes"].device)
+            keep_by_nms = batched_nms(
+                data["boxes"].float(),
+                scores,
+                torch.zeros_like(data["boxes"][:, 0]),  # categories
+                iou_threshold=self.crop_nms_thresh,
+            )
+            data.filter(keep_by_nms)
+        data.to_numpy()
+        return data
+    def _process_crop(
+        self,
+        image: np.ndarray,
+        crop_box: List[int],
+        crop_layer_idx: int,
+        orig_size: Tuple[int, ...],
+    ) -> MaskData:
+        # Crop the image and calculate embeddings
+        x0, y0, x1, y1 = crop_box
+        cropped_im = image[y0:y1, x0:x1, :]
+        cropped_im_size = cropped_im.shape[:2]
+        self.predictor.set_image(cropped_im)
+        # Get points for this crop
+        points_scale = np.array(cropped_im_size)[None, ::-1]
+        points_for_image = self.point_grids[crop_layer_idx] * points_scale
+        # Generate masks for this crop in batches
+        data = MaskData()
+        for (points,) in batch_iterator(self.points_per_batch, points_for_image):
+            batch_data = self._process_batch(points, cropped_im_size, crop_box, orig_size)
+            data.cat(batch_data)
+            del batch_data
+        self.predictor.reset_image()
+        # Remove duplicates within this crop.
+        keep_by_nms = batched_nms(
+            data["boxes"].float(),
+            data["iou_preds"],
+            torch.zeros_like(data["boxes"][:, 0]),  # categories
+            iou_threshold=self.box_nms_thresh,
+        )
+        data.filter(keep_by_nms)
+        # Return to the original image frame
+        data["boxes"] = uncrop_boxes_xyxy(data["boxes"], crop_box)
+        data["points"] = uncrop_points(data["points"], crop_box)
+        data["crop_boxes"] = torch.tensor([crop_box for _ in range(len(data["rles"]))])
+        return data
+    def _process_batch(
+        self,
+        points: np.ndarray,
+        im_size: Tuple[int, ...],
+        crop_box: List[int],
+        orig_size: Tuple[int, ...],
+    ) -> MaskData:
+        orig_h, orig_w = orig_size
+        # Run model on this batch
+        transformed_points = self.predictor.transform.apply_coords(points, im_size)
+        in_points = torch.as_tensor(transformed_points, device=self.predictor.device)
+        in_labels = torch.ones(in_points.shape[0], dtype=torch.int, device=in_points.device)
+        masks, iou_preds, _ = self.predictor.predict_torch(
+            in_points[:, None, :],
+            in_labels[:, None],
+            multimask_output=True,
+            return_logits=True,
+        )
+        # Serialize predictions and store in MaskData
+        data = MaskData(
+            masks=masks.flatten(0, 1),
+            iou_preds=iou_preds.flatten(0, 1),
+            points=torch.as_tensor(points.repeat(masks.shape[1], axis=0)),
+        )
+        del masks
+        # Filter by predicted IoU
+        if self.pred_iou_thresh > 0.0:
+            keep_mask = data["iou_preds"] > self.pred_iou_thresh
+            data.filter(keep_mask)
+        # Calculate stability score
+        data["stability_score"] = calculate_stability_score(
+            data["masks"], self.predictor.model.mask_threshold, self.stability_score_offset
+        )
+        if self.stability_score_thresh > 0.0:
+            keep_mask = data["stability_score"] >= self.stability_score_thresh
+            data.filter(keep_mask)
+        # Threshold masks and calculate boxes
+        data["masks"] = data["masks"] > self.predictor.model.mask_threshold
+        data["boxes"] = batched_mask_to_box(data["masks"])
+        # Filter boxes that touch crop boundaries
+        keep_mask = ~is_box_near_crop_edge(data["boxes"], crop_box, [0, 0, orig_w, orig_h])
+        if not torch.all(keep_mask):
+            data.filter(keep_mask)
+        # Compress to RLE
+        data["masks"] = uncrop_masks(data["masks"], crop_box, orig_h, orig_w)
+        data["rles"] = mask_to_rle_pytorch(data["masks"])
+        del data["masks"]
+        return data
+    @staticmethod
+    def postprocess_small_regions(
+        mask_data: MaskData, min_area: int, nms_thresh: float
+    ) -> MaskData:
+        """
+        Removes small disconnected regions and holes in masks, then reruns
+        box NMS to remove any new duplicates.
+        Edits mask_data in place.
+        Requires open-cv as a dependency.
+        """
+        if len(mask_data["rles"]) == 0:
+            return mask_data
+        # Filter small disconnected regions and holes
+        new_masks = []
+        scores = []
+        for rle in mask_data["rles"]:
+            mask = rle_to_mask(rle)
+            mask, changed = remove_small_regions(mask, min_area, mode="holes")
+            unchanged = not changed
+            mask, changed = remove_small_regions(mask, min_area, mode="islands")
+            unchanged = unchanged and not changed
+            new_masks.append(torch.as_tensor(mask).unsqueeze(0))
+            # Give score=0 to changed masks and score=1 to unchanged masks
+            # so NMS will prefer ones that didn't need postprocessing
+            scores.append(float(unchanged))
+        # Recalculate boxes and remove any new duplicates
+        masks = torch.cat(new_masks, dim=0)
+        boxes = batched_mask_to_box(masks)
+        keep_by_nms = batched_nms(
+            boxes.float(),
+            torch.as_tensor(scores),
+            torch.zeros_like(boxes[:, 0]),  # categories
+            iou_threshold=nms_thresh,
+        )
+        # Only recalculate RLEs for masks that have changed
+        for i_mask in keep_by_nms:
+            if scores[i_mask] == 0.0:
+                mask_torch = masks[i_mask].unsqueeze(0)
+                mask_data["rles"][i_mask] = mask_to_rle_pytorch(mask_torch)[0]
+                mask_data["boxes"][i_mask] = boxes[i_mask]  # update res directly
+        mask_data.filter(keep_by_nms)
+        return mask_data

SAM/segment_anything/build_sam.py ADDED Viewed

	@@ -0,0 +1,111 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+from functools import partial
+from .modeling import ImageEncoderViT, MaskDecoder, PromptEncoder, Sam, TwoWayTransformer
+def build_sam_vit_h(checkpoint=None, device="cpu"):
+    return _build_sam(
+        encoder_embed_dim=1280,
+        encoder_depth=32,
+        encoder_num_heads=16,
+        encoder_global_attn_indexes=[7, 15, 23, 31],
+        checkpoint=checkpoint,
+        device=device,
+    )
+build_sam = build_sam_vit_h
+def build_sam_vit_l(checkpoint=None, device="cpu"):
+    return _build_sam(
+        encoder_embed_dim=1024,
+        encoder_depth=24,
+        encoder_num_heads=16,
+        encoder_global_attn_indexes=[5, 11, 17, 23],
+        checkpoint=checkpoint,
+        device=device,
+    )
+def build_sam_vit_b(checkpoint=None, device="cpu"):
+    return _build_sam(
+        encoder_embed_dim=768,
+        encoder_depth=12,
+        encoder_num_heads=12,
+        encoder_global_attn_indexes=[2, 5, 8, 11],
+        checkpoint=checkpoint,
+        device=device,
+    )
+sam_model_registry = {
+    "default": build_sam_vit_h,
+    "vit_h": build_sam_vit_h,
+    "vit_l": build_sam_vit_l,
+    "vit_b": build_sam_vit_b,
+}
+def _build_sam(
+    encoder_embed_dim,
+    encoder_depth,
+    encoder_num_heads,
+    encoder_global_attn_indexes,
+    checkpoint=None,
+    device="cpu"
+):
+    prompt_embed_dim = 256
+    image_size = 1024
+    vit_patch_size = 16
+    image_embedding_size = image_size // vit_patch_size
+    sam = Sam(
+        image_encoder=ImageEncoderViT(
+            depth=encoder_depth,
+            embed_dim=encoder_embed_dim,
+            img_size=image_size,
+            mlp_ratio=4,
+            norm_layer=partial(torch.nn.LayerNorm, eps=1e-6),
+            num_heads=encoder_num_heads,
+            patch_size=vit_patch_size,
+            qkv_bias=True,
+            use_rel_pos=True,
+            global_attn_indexes=encoder_global_attn_indexes,
+            window_size=14,
+            out_chans=prompt_embed_dim,
+        ),
+        prompt_encoder=PromptEncoder(
+            embed_dim=prompt_embed_dim,
+            image_embedding_size=(image_embedding_size, image_embedding_size),
+            input_image_size=(image_size, image_size),
+            mask_in_chans=16,
+        ),
+        mask_decoder=MaskDecoder(
+            num_multimask_outputs=3,
+            transformer=TwoWayTransformer(
+                depth=2,
+                embedding_dim=prompt_embed_dim,
+                mlp_dim=2048,
+                num_heads=8,
+            ),
+            transformer_dim=prompt_embed_dim,
+            iou_head_depth=3,
+            iou_head_hidden_dim=256,
+        ),
+        pixel_mean=[123.675, 116.28, 103.53],
+        pixel_std=[58.395, 57.12, 57.375],
+    )
+    sam.eval()
+    if checkpoint is not None:
+        with open(checkpoint, "rb") as f:
+            state_dict = torch.load(f, map_location=device)
+        sam.load_state_dict(state_dict)
+    return sam

SAM/segment_anything/modeling/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from .sam import Sam
+from .image_encoder import ImageEncoderViT
+from .mask_decoder import MaskDecoder
+from .prompt_encoder import PromptEncoder
+from .transformer import TwoWayTransformer

SAM/segment_anything/modeling/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (516 Bytes). View file

SAM/segment_anything/modeling/__pycache__/common.cpython-311.pyc ADDED Viewed

Binary file (3.24 kB). View file