ChristianOrr commited on
Commit
4bddb4a
·
1 Parent(s): 744f403

Updated readme

Browse files
Files changed (1) hide show
  1. README.md +107 -1
README.md CHANGED
@@ -1,3 +1,109 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: Non-commercial use
3
+ tags:
4
+ - vision
5
+ - deep-stereo
6
+ datasets:
7
+ - flyingthings-3d
8
+ - kitti
9
  ---
10
+
11
+ # MADNet Keras
12
+
13
+ MADNet is a deep stereo depth estimation model. Its key defining features are:
14
+ 1. It has a light-weight architecture which means it has low latency.
15
+ 2. It supports self-supervised training, so it can be convieniently adapted in the field with no training data.
16
+ 3. Its a stereo depth model, which means its capable of much higher accuracy than mono depth techniques.
17
+
18
+ The MADNet weights in this repository were trained using a Tensorflow 2 / Keras implementation of the original code. The model was created using the Keras Functional API, which enables the following features:
19
+ 1. Good optimization out the box.
20
+ 2. High level Keras methods (.fit, .predict and .evaluate).
21
+ 3. Less boilerplate code.
22
+ 4. Decent support from external packeges (like Weights and Biases).
23
+ 5. Callbacks.
24
+
25
+ The weights provided were either trained on the 2012 / 2015 kitti stereo dataset or flyingthings-3d dataset. The weights of the pretrained models from the original paper (tf1_conversion_kitti.h5 and tf1_conversion_synthetic.h5) are provided in tensorflow 2 format. The TF1 weights help speed up fine-tuning, but its recommended to use either synthetic.h5 (trained on flyingthings-3d) or kitti.h5 (trained on 2012 and 2015 kitti stereo datasets).
26
+
27
+ **Abstract**:
28
+
29
+ Deep convolutional neural networks trained end-to-end are the undisputed state-of-the-art methods to regress dense disparity maps directly from stereo pairs. However, such methods suffer from notable accuracy drops when exposed to scenarios significantly different from those seen in the training phase (e.g.real vs synthetic images, indoor vs outdoor, etc). As it is unlikely to be able to gather enough samples to achieve effective training/ tuning in any target domain, we propose to perform unsupervised and continuous online adaptation of a deep stereo network in order to preserve its accuracy independently of the sensed environment. However, such a strategy can be extremely demanding regarding computational resources and thus not enabling real-time performance. Therefore, we address this side effect by introducing a new lightweight, yet effective, deep stereo architecture Modularly ADaptive Network (MADNet) and by developing Modular ADaptation (MAD), an algorithm to train independently only sub-portions of our model. By deploying MADNet together with MAD we propose the first ever realtime self-adaptive deep stereo system.
30
+
31
+ ## Usage Instructions
32
+ See the accompanying codes readme for details on how to perform training and inferencing with the model: [madnet-deep-stereo-with-keras](https://github.com/ChristianOrr/madnet-deep-stereo-with-keras).
33
+
34
+ ## Training
35
+ ### TF1 Kitti and TF1 Synthetic
36
+ Training details for the TF1 weights are available in the supplementary material (at the end) of this paper: [Real-time self-adaptive deep stereo](https://arxiv.org/abs/1810.05424)
37
+
38
+ ### Synthetic
39
+ The synthetic model was finetuned using the tf1 synthetic weights. It was trained on the flyingthings-3d dataset with the following parameters:
40
+ - Steps: 1.5 million
41
+ - Learning Rate: 0.0001
42
+ - Decay Rate: 0.999
43
+ - Minimum Learning Rate Cap: 0.000001
44
+ - Batch Size: 1
45
+ - Optimizer: Adam
46
+ - Image Height: 480
47
+ - Image Width: 640
48
+
49
+ ### Kitti
50
+ The kitti model was finetuned using the synthetic weights. Tensorboard events file is available in the logs directory. It was trained on the 2012 and 2015 kitti stereo dataset with the following parameters:
51
+ - Steps: 0.5 million
52
+ - Learning Rate: 0.0001
53
+ - Decay Rate: 0.999
54
+ - Minimum Learning Rate Cap: 0.0000001
55
+ - Batch Size: 1
56
+ - Optimizer: Adam
57
+ - Image Height: 480
58
+ - Image Width: 640
59
+
60
+ ### BibTeX entry and citation info
61
+
62
+ ```bibtex
63
+ @InProceedings{Tonioni_2019_CVPR,
64
+ author = {Tonioni, Alessio and Tosi, Fabio and Poggi, Matteo and Mattoccia, Stefano and Di Stefano, Luigi},
65
+ title = {Real-time self-adaptive deep stereo},
66
+ booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
67
+ month = {June},
68
+ year = {2019}
69
+ }
70
+ ```
71
+
72
+ ```bibtex
73
+ @article{Poggi2021continual,
74
+ author={Poggi, Matteo and Tonioni, Alessio and Tosi, Fabio
75
+ and Mattoccia, Stefano and Di Stefano, Luigi},
76
+ title={Continual Adaptation for Deep Stereo},
77
+ journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
78
+ year={2021}
79
+ }
80
+ ```
81
+
82
+ ```bibtex
83
+ @InProceedings{MIFDB16,
84
+ author = "N. Mayer and E. Ilg and P. Hausser and P. Fischer and D. Cremers and A. Dosovitskiy and T. Brox",
85
+ title = "A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation",
86
+ booktitle = "IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)",
87
+ year = "2016",
88
+ note = "arXiv:1512.02134",
89
+ url = "http://lmb.informatik.uni-freiburg.de/Publications/2016/MIFDB16"
90
+ }
91
+ ```
92
+
93
+ ```bibtex
94
+ @INPROCEEDINGS{Geiger2012CVPR,
95
+ author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
96
+ title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
97
+ booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
98
+ year = {2012}
99
+ }
100
+ ```
101
+
102
+ ```bibtex
103
+ @INPROCEEDINGS{Menze2015CVPR,
104
+ author = {Moritz Menze and Andreas Geiger},
105
+ title = {Object Scene Flow for Autonomous Vehicles},
106
+ booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
107
+ year = {2015}
108
+ }
109
+ ```