timm
/

Image Classification
timm
PyTorch
Safetensors
Transformers
rwightman HF staff commited on
Commit
978d3c4
·
1 Parent(s): 5221417
Files changed (4) hide show
  1. README.md +191 -0
  2. config.json +40 -0
  3. model.safetensors +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ ---
10
+ # Model card for lamhalobotnet50ts_256.a1h_in1k
11
+
12
+ A Lambda+Halo+BoTNet image classification model (based on ResNet architecture). Trained on ImageNet-1k in `timm` by Ross Wightman.
13
+
14
+ NOTE: this model did not adhere to any specific paper configuration, it was tuned for reasonable training times and reduced frequency of self-attention blocks.
15
+
16
+ Recipe details:
17
+ * Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `A1` recipe
18
+ * LAMB optimizer
19
+ * Stronger dropout, stochastic depth, and RandAugment than paper `A1` recipe
20
+ * Cosine LR schedule with warmup
21
+
22
+ This model architecture is implemented using `timm`'s flexible [BYOBNet (Bring-Your-Own-Blocks Network)](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/byobnet.py).
23
+
24
+ BYOB (with BYOANet attention specific blocks) allows configuration of:
25
+ * block / stage layout
26
+ * block-type interleaving
27
+ * stem layout
28
+ * output stride (dilation)
29
+ * activation and norm layers
30
+ * channel and spatial / self-attention layers
31
+
32
+ ...and also includes `timm` features common to many other architectures, including:
33
+ * stochastic depth
34
+ * gradient checkpointing
35
+ * layer-wise LR decay
36
+ * per-stage feature extraction
37
+
38
+
39
+ ## Model Details
40
+ - **Model Type:** Image classification / feature backbone
41
+ - **Model Stats:**
42
+ - Params (M): 22.6
43
+ - GMACs: 5.0
44
+ - Activations (M): 18.4
45
+ - Image size: 256 x 256
46
+ - **Papers:**
47
+ - LambdaNetworks: Modeling Long-Range Interactions Without Attention: https://arxiv.org/abs/2102.08602
48
+ - Scaling Local Self-Attention for Parameter Efficient Visual Backbones: https://arxiv.org/abs/2103.12731
49
+ - Bottleneck Transformers for Visual Recognition: https://arxiv.org/abs/2101.11605
50
+ - ResNet strikes back: An improved training procedure in timm: https://arxiv.org/abs/2110.00476
51
+ - **Dataset:** ImageNet-1k
52
+
53
+ ## Model Usage
54
+ ### Image Classification
55
+ ```python
56
+ from urllib.request import urlopen
57
+ from PIL import Image
58
+ import timm
59
+
60
+ img = Image.open(urlopen(
61
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
62
+ ))
63
+
64
+ model = timm.create_model('lamhalobotnet50ts_256.a1h_in1k', pretrained=True)
65
+ model = model.eval()
66
+
67
+ # get model specific transforms (normalization, resize)
68
+ data_config = timm.data.resolve_model_data_config(model)
69
+ transforms = timm.data.create_transform(**data_config, is_training=False)
70
+
71
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
72
+
73
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
74
+ ```
75
+
76
+ ### Feature Map Extraction
77
+ ```python
78
+ from urllib.request import urlopen
79
+ from PIL import Image
80
+ import timm
81
+
82
+ img = Image.open(urlopen(
83
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
84
+ ))
85
+
86
+ model = timm.create_model(
87
+ 'lamhalobotnet50ts_256.a1h_in1k',
88
+ pretrained=True,
89
+ features_only=True,
90
+ )
91
+ model = model.eval()
92
+
93
+ # get model specific transforms (normalization, resize)
94
+ data_config = timm.data.resolve_model_data_config(model)
95
+ transforms = timm.data.create_transform(**data_config, is_training=False)
96
+
97
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
98
+
99
+ for o in output:
100
+ # print shape of each feature map in output
101
+ # e.g.:
102
+ # torch.Size([1, 32, 128, 128])
103
+ # torch.Size([1, 256, 64, 64])
104
+ # torch.Size([1, 512, 32, 32])
105
+ # torch.Size([1, 1024, 16, 16])
106
+ # torch.Size([1, 2048, 8, 8])
107
+
108
+ print(o.shape)
109
+ ```
110
+
111
+ ### Image Embeddings
112
+ ```python
113
+ from urllib.request import urlopen
114
+ from PIL import Image
115
+ import timm
116
+
117
+ img = Image.open(urlopen(
118
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
119
+ ))
120
+
121
+ model = timm.create_model(
122
+ 'lamhalobotnet50ts_256.a1h_in1k',
123
+ pretrained=True,
124
+ num_classes=0, # remove classifier nn.Linear
125
+ )
126
+ model = model.eval()
127
+
128
+ # get model specific transforms (normalization, resize)
129
+ data_config = timm.data.resolve_model_data_config(model)
130
+ transforms = timm.data.create_transform(**data_config, is_training=False)
131
+
132
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
133
+
134
+ # or equivalently (without needing to set num_classes=0)
135
+
136
+ output = model.forward_features(transforms(img).unsqueeze(0))
137
+ # output is unpooled, a (1, 2048, 8, 8) shaped tensor
138
+
139
+ output = model.forward_head(output, pre_logits=True)
140
+ # output is a (1, num_features) shaped tensor
141
+ ```
142
+
143
+ ## Model Comparison
144
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
145
+
146
+ ## Citation
147
+ ```bibtex
148
+ @misc{rw2019timm,
149
+ author = {Ross Wightman},
150
+ title = {PyTorch Image Models},
151
+ year = {2019},
152
+ publisher = {GitHub},
153
+ journal = {GitHub repository},
154
+ doi = {10.5281/zenodo.4414861},
155
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
156
+ }
157
+ ```
158
+ ```bibtex
159
+ @article{Bello2021LambdaNetworksML,
160
+ title={LambdaNetworks: Modeling Long-Range Interactions Without Attention},
161
+ author={Irwan Bello},
162
+ journal={ArXiv},
163
+ year={2021},
164
+ volume={abs/2102.08602}
165
+ }
166
+ ```
167
+ ```bibtex
168
+ @article{Vaswani2021ScalingLS,
169
+ title={Scaling Local Self-Attention for Parameter Efficient Visual Backbones},
170
+ author={Ashish Vaswani and Prajit Ramachandran and A. Srinivas and Niki Parmar and Blake A. Hechtman and Jonathon Shlens},
171
+ journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
172
+ year={2021},
173
+ pages={12889-12899}
174
+ }
175
+ ```
176
+ ```bibtex
177
+ @article{Srinivas2021BottleneckTF,
178
+ title={Bottleneck Transformers for Visual Recognition},
179
+ author={A. Srinivas and Tsung-Yi Lin and Niki Parmar and Jonathon Shlens and P. Abbeel and Ashish Vaswani},
180
+ journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
181
+ year={2021},
182
+ pages={16514-16524}
183
+ }
184
+ ```
185
+ ```bibtex
186
+ @inproceedings{wightman2021resnet,
187
+ title={ResNet strikes back: An improved training procedure in timm},
188
+ author={Wightman, Ross and Touvron, Hugo and Jegou, Herve},
189
+ booktitle={NeurIPS 2021 Workshop on ImageNet: Past, Present, and Future}
190
+ }
191
+ ```
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "lamhalobotnet50ts_256",
3
+ "num_classes": 1000,
4
+ "num_features": 2048,
5
+ "pretrained_cfg": {
6
+ "tag": "a1h_in1k",
7
+ "custom_load": false,
8
+ "input_size": [
9
+ 3,
10
+ 256,
11
+ 256
12
+ ],
13
+ "min_input_size": [
14
+ 3,
15
+ 224,
16
+ 224
17
+ ],
18
+ "fixed_input_size": true,
19
+ "interpolation": "bicubic",
20
+ "crop_pct": 0.95,
21
+ "crop_mode": "center",
22
+ "mean": [
23
+ 0.485,
24
+ 0.456,
25
+ 0.406
26
+ ],
27
+ "std": [
28
+ 0.229,
29
+ 0.224,
30
+ 0.225
31
+ ],
32
+ "num_classes": 1000,
33
+ "pool_size": [
34
+ 8,
35
+ 8
36
+ ],
37
+ "first_conv": "stem.conv1.conv",
38
+ "classifier": "head.fc"
39
+ }
40
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9a8238bba6caee41459062af11610701b0c0d61be195e059a0df24b0b765277
3
+ size 90531036
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fd62f985cdbb7bb40e075d0d2dd458b3a350adb47812dc9039f9f3cf64e82cd
3
+ size 90626633