yuwenz commited on
Commit
fff851d
·
1 Parent(s): 4a111ae

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <[email protected]>

Files changed (2) hide show
  1. README.md +29 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -4,6 +4,7 @@ tags:
4
  - int8
5
  - Intel® Neural Compressor
6
  - PostTrainingStatic
 
7
  datasets:
8
  - squad
9
  metrics:
@@ -12,7 +13,9 @@ metrics:
12
 
13
  # INT8 DistilBERT base uncased finetuned on Squad
14
 
15
- ### Post-training static quantization
 
 
16
 
17
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
18
 
@@ -22,14 +25,14 @@ The calibration dataloader is the train dataloader. The default calibration samp
22
 
23
  The linear module **distilbert.transformer.layer.1.ffn.lin2** falls back to fp32 to meet the 1% relative accuracy loss.
24
 
25
- ### Test result
26
 
27
  | |INT8|FP32|
28
  |---|:---:|:---:|
29
  | **Accuracy (eval-f1)** |86.1069|86.8374|
30
  | **Model size (MB)** |74.7|265|
31
 
32
- ### Load with optimum:
33
 
34
  ```python
35
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForQuestionAnswering
@@ -37,3 +40,26 @@ int8_model = IncQuantizedModelForQuestionAnswering(
37
  'Intel/distilbert-base-uncased-distilled-squad-int8-static',
38
  )
39
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - int8
5
  - Intel® Neural Compressor
6
  - PostTrainingStatic
7
+ - onnx
8
  datasets:
9
  - squad
10
  metrics:
 
13
 
14
  # INT8 DistilBERT base uncased finetuned on Squad
15
 
16
+ ## Post-training static quantization
17
+
18
+ ### PyTorch
19
 
20
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
21
 
 
25
 
26
  The linear module **distilbert.transformer.layer.1.ffn.lin2** falls back to fp32 to meet the 1% relative accuracy loss.
27
 
28
+ #### Test result
29
 
30
  | |INT8|FP32|
31
  |---|:---:|:---:|
32
  | **Accuracy (eval-f1)** |86.1069|86.8374|
33
  | **Model size (MB)** |74.7|265|
34
 
35
+ #### Load with optimum:
36
 
37
  ```python
38
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForQuestionAnswering
 
40
  'Intel/distilbert-base-uncased-distilled-squad-int8-static',
41
  )
42
  ```
43
+
44
+ ### ONNX
45
+
46
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
47
+
48
+ The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squad).
49
+
50
+ The calibration dataloader is the eval dataloader. The default calibration sampling size is 100.
51
+
52
+ #### Test result
53
+
54
+ | |INT8|FP32|
55
+ |---|:---:|:---:|
56
+ | **Accuracy (eval-f1)** |0.8626|0.8687|
57
+ | **Model size (MB)** |153|254|
58
+
59
+
60
+ #### Load ONNX model:
61
+
62
+ ```python
63
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
64
+ model = ORTModelForQuestionAnswering.from_pretrained('Intel/distilbert-base-uncased-distilled-squad-int8-static')
65
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28d8b3b532576ce491b7b94a4a5df32e2a763d1e03b04e6c2af32e7494cb5d41
3
+ size 159403455