Text Generation
Transformers
GGUF
English
code
Eval Results
imatrix
Mungert commited on
Commit
e2edd48
·
verified ·
1 Parent(s): 7b33e53

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +882 -0
README.md ADDED
@@ -0,0 +1,882 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ widget:
5
+ - text: 'def print_hello_world():'
6
+ example_title: Hello world
7
+ group: Python
8
+ license: bigscience-openrail-m
9
+ pretrain-datasets:
10
+ - books
11
+ - arxiv
12
+ - c4
13
+ - falcon-refinedweb
14
+ - wiki
15
+ - github-issues
16
+ - stack_markdown
17
+ - self-made dataset of permissive github code
18
+ datasets:
19
+ - bigcode/the-stack-dedup
20
+ - rombodawg/2XUNCENSORED_MegaCodeTraining188k
21
+ - bigcode/commitpackft
22
+ metrics:
23
+ - code_eval
24
+ library_name: transformers
25
+ tags:
26
+ - code
27
+ model-index:
28
+ - name: Refact-1.6B
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ dataset:
33
+ type: openai_humaneval
34
+ name: HumanEval
35
+ metrics:
36
+ - name: pass@1 (T=0.01)
37
+ type: pass@1
38
+ value: 32.0
39
+ verified: false
40
+ - name: pass@1 (T=0.2)
41
+ type: pass@1
42
+ value: 31.5
43
+ verified: false
44
+ - name: pass@10 (T=0.8)
45
+ type: pass@10
46
+ value: 53.0
47
+ verified: false
48
+ - name: pass@100 (T=0.8)
49
+ type: pass@100
50
+ value: 76.9
51
+ verified: false
52
+ - task:
53
+ type: text-generation
54
+ dataset:
55
+ type: bigcode/humanevalpack
56
+ name: HumanEvalSynthesize Python
57
+ metrics:
58
+ - name: pass@1 (T=0.2)
59
+ type: pass@1
60
+ value: 35.8
61
+ verified: false
62
+ - task:
63
+ type: text-generation
64
+ dataset:
65
+ type: bigcode/humanevalpack
66
+ name: HumanEvalSynthesize JavaScript
67
+ metrics:
68
+ - name: pass@1 (T=0.2)
69
+ type: pass@1
70
+ value: 31.6
71
+ verified: false
72
+ - task:
73
+ type: text-generation
74
+ dataset:
75
+ type: bigcode/humanevalpack
76
+ name: HumanEvalSynthesize Java
77
+ metrics:
78
+ - name: pass@1 (T=0.2)
79
+ type: pass@1
80
+ value: 29.1
81
+ verified: false
82
+ - task:
83
+ type: text-generation
84
+ dataset:
85
+ type: bigcode/humanevalpack
86
+ name: HumanEvalSynthesize Go
87
+ metrics:
88
+ - name: pass@1 (T=0.2)
89
+ type: pass@1
90
+ value: -1
91
+ verified: false
92
+ - task:
93
+ type: text-generation
94
+ dataset:
95
+ type: bigcode/humanevalpack
96
+ name: HumanEvalSynthesize C++
97
+ metrics:
98
+ - name: pass@1 (T=0.2)
99
+ type: pass@1
100
+ value: 26.3
101
+ verified: false
102
+ - task:
103
+ type: text-generation
104
+ dataset:
105
+ type: bigcode/humanevalpack
106
+ name: HumanEvalSynthesize Rust
107
+ metrics:
108
+ - name: pass@1 (T=0.2)
109
+ type: pass@1
110
+ value: -1
111
+ verified: false
112
+ - task:
113
+ type: text-generation
114
+ dataset:
115
+ type: bigcode/humanevalpack
116
+ name: HumanEvalSynthesize Average
117
+ metrics:
118
+ - name: pass@1 (T=0.2)
119
+ type: pass@1
120
+ value: -1
121
+ verified: false
122
+
123
+
124
+
125
+
126
+
127
+ - task:
128
+ type: text-generation
129
+ dataset:
130
+ type: bigcode/humanevalpack
131
+ name: HumanEvalFixTests Python
132
+ metrics:
133
+ - name: pass@1 (T=0.2)
134
+ type: pass@1
135
+ value: 18.38
136
+ verified: false
137
+ - task:
138
+ type: text-generation
139
+ dataset:
140
+ type: bigcode/humanevalpack
141
+ name: HumanEvalFixTests JavaScript
142
+ metrics:
143
+ - name: pass@1 (T=0.2)
144
+ type: pass@1
145
+ value: 12.28
146
+ verified: false
147
+ - task:
148
+ type: text-generation
149
+ dataset:
150
+ type: bigcode/humanevalpack
151
+ name: HumanEvalFixTests Java
152
+ metrics:
153
+ - name: pass@1 (T=0.2)
154
+ type: pass@1
155
+ value: 15.12
156
+ verified: false
157
+ - task:
158
+ type: text-generation
159
+ dataset:
160
+ type: bigcode/humanevalpack
161
+ name: HumanEvalFixTests Go
162
+ metrics:
163
+ - name: pass@1 (T=0.2)
164
+ type: pass@1
165
+ value: -1
166
+ verified: false
167
+ - task:
168
+ type: text-generation
169
+ dataset:
170
+ type: bigcode/humanevalpack
171
+ name: HumanEvalFixTests C++
172
+ metrics:
173
+ - name: pass@1 (T=0.2)
174
+ type: pass@1
175
+ value: 13.17
176
+ verified: false
177
+ - task:
178
+ type: text-generation
179
+ dataset:
180
+ type: bigcode/humanevalpack
181
+ name: HumanEvalFixTests Rust
182
+ metrics:
183
+ - name: pass@1 (T=0.2)
184
+ type: pass@1
185
+ value: 2.8
186
+ verified: false
187
+ - task:
188
+ type: text-generation
189
+ dataset:
190
+ type: bigcode/humanevalpack
191
+ name: HumanEvalFixTests Average
192
+ metrics:
193
+ - name: pass@1 (T=0.2)
194
+ type: pass@1
195
+ value: -1
196
+ verified: false
197
+
198
+
199
+
200
+
201
+
202
+
203
+ - task:
204
+ type: text-generation
205
+ dataset:
206
+ type: bigcode/humanevalpack
207
+ name: HumanEvalFixDocs Python
208
+ metrics:
209
+ - name: pass@1 (T=0.2)
210
+ type: pass@1
211
+ value: 26.92
212
+ verified: false
213
+ - task:
214
+ type: text-generation
215
+ dataset:
216
+ type: bigcode/humanevalpack
217
+ name: HumanEvalFixDocs JavaScript
218
+ metrics:
219
+ - name: pass@1 (T=0.2)
220
+ type: pass@1
221
+ value: 26.85
222
+ verified: false
223
+ - task:
224
+ type: text-generation
225
+ dataset:
226
+ type: bigcode/humanevalpack
227
+ name: HumanEvalFixDocs Java
228
+ metrics:
229
+ - name: pass@1 (T=0.2)
230
+ type: pass@1
231
+ value: 30.76
232
+ verified: false
233
+ - task:
234
+ type: text-generation
235
+ dataset:
236
+ type: bigcode/humanevalpack
237
+ name: HumanEvalFixDocs Go
238
+ metrics:
239
+ - name: pass@1 (T=0.2)
240
+ type: pass@1
241
+ value: -1
242
+ verified: false
243
+ - task:
244
+ type: text-generation
245
+ dataset:
246
+ type: bigcode/humanevalpack
247
+ name: HumanEvalFixDocs C++
248
+ metrics:
249
+ - name: pass@1 (T=0.2)
250
+ type: pass@1
251
+ value: 25.94
252
+ verified: false
253
+ - task:
254
+ type: text-generation
255
+ dataset:
256
+ type: bigcode/humanevalpack
257
+ name: HumanEvalFixDocs Rust
258
+ metrics:
259
+ - name: pass@1 (T=0.2)
260
+ type: pass@1
261
+ value: 8.44
262
+ verified: false
263
+ - task:
264
+ type: text-generation
265
+ dataset:
266
+ type: bigcode/humanevalpack
267
+ name: HumanEvalFixDocs Average
268
+ metrics:
269
+ - name: pass@1 (T=0.2)
270
+ type: pass@1
271
+ value: -1
272
+ verified: false
273
+
274
+
275
+
276
+
277
+ - task:
278
+ type: text-generation
279
+ dataset:
280
+ type: bigcode/humanevalpack
281
+ name: HumanEvalExplain Python
282
+ metrics:
283
+ - name: pass@1 (T=0.2)
284
+ type: pass@1
285
+ value: 26.46
286
+ verified: false
287
+ - task:
288
+ type: text-generation
289
+ dataset:
290
+ type: bigcode/humanevalpack
291
+ name: HumanEvalExplain JavaScript
292
+ metrics:
293
+ - name: pass@1 (T=0.2)
294
+ type: pass@1
295
+ value: 17.86
296
+ verified: false
297
+ - task:
298
+ type: text-generation
299
+ dataset:
300
+ type: bigcode/humanevalpack
301
+ name: HumanEvalExplain Java
302
+ metrics:
303
+ - name: pass@1 (T=0.2)
304
+ type: pass@1
305
+ value: 20.94
306
+ verified: false
307
+ - task:
308
+ type: text-generation
309
+ dataset:
310
+ type: bigcode/humanevalpack
311
+ name: HumanEvalExplain Go
312
+ metrics:
313
+ - name: pass@1 (T=0.2)
314
+ type: pass@1
315
+ value: -1
316
+ verified: false
317
+ - task:
318
+ type: text-generation
319
+ dataset:
320
+ type: bigcode/humanevalpack
321
+ name: HumanEvalExplain C++
322
+ metrics:
323
+ - name: pass@1 (T=0.2)
324
+ type: pass@1
325
+ value: 18.78
326
+ verified: false
327
+ - task:
328
+ type: text-generation
329
+ dataset:
330
+ type: bigcode/humanevalpack
331
+ name: HumanEvalExplain Rust
332
+ metrics:
333
+ - name: pass@1 (T=0.2)
334
+ type: pass@1
335
+ value: -1
336
+ verified: false
337
+ - task:
338
+ type: text-generation
339
+ dataset:
340
+ type: bigcode/humanevalpack
341
+ name: HumanEvalExplain Average
342
+ metrics:
343
+ - name: pass@1 (T=0.2)
344
+ type: pass@1
345
+ value: -1
346
+ verified: false
347
+
348
+
349
+ - task:
350
+ type: text-generation
351
+ dataset:
352
+ type: mbpp
353
+ name: MBPP
354
+ metrics:
355
+ - name: pass@1 (T=0.01)
356
+ type: pass@1
357
+ value: 31.15
358
+ verified: false
359
+ - task:
360
+ type: text-generation
361
+ dataset:
362
+ type: ds1000
363
+ name: DS-1000 (Overall Completion)
364
+ metrics:
365
+ - name: pass@1 (T=0.2)
366
+ type: pass@1
367
+ value: 10.1
368
+ verified: false
369
+ - task:
370
+ type: text-generation
371
+ dataset:
372
+ type: nuprl/MultiPL-E
373
+ name: MultiPL-HumanEval (C++)
374
+ metrics:
375
+ - name: pass@1 (T=0.2)
376
+ type: pass@1
377
+ value: 21.61
378
+ verified: false
379
+ - task:
380
+ type: text-generation
381
+ dataset:
382
+ type: nuprl/MultiPL-E
383
+ name: MultiPL-HumanEval (C#)
384
+ metrics:
385
+ - name: pass@1 (T=0.2)
386
+ type: pass@1
387
+ value: 13.91
388
+ verified: false
389
+ - task:
390
+ type: text-generation
391
+ dataset:
392
+ type: nuprl/MultiPL-E
393
+ name: MultiPL-HumanEval (D)
394
+ metrics:
395
+ - name: pass@1 (T=0.2)
396
+ type: pass@1
397
+ value: 9.5
398
+ verified: false
399
+ - task:
400
+ type: text-generation
401
+ dataset:
402
+ type: nuprl/MultiPL-E
403
+ name: MultiPL-HumanEval (Go)
404
+ metrics:
405
+ - name: pass@1 (T=0.2)
406
+ type: pass@1
407
+ value: 53.57
408
+ verified: false
409
+ - task:
410
+ type: text-generation
411
+ dataset:
412
+ type: nuprl/MultiPL-E
413
+ name: MultiPL-HumanEval (Java)
414
+ metrics:
415
+ - name: pass@1 (T=0.2)
416
+ type: pass@1
417
+ value: 21.58
418
+ verified: false
419
+ - task:
420
+ type: text-generation
421
+ dataset:
422
+ type: nuprl/MultiPL-E
423
+ name: MultiPL-HumanEval (Julia)
424
+ metrics:
425
+ - name: pass@1 (T=0.2)
426
+ type: pass@1
427
+ value: 13.75
428
+ verified: false
429
+ - task:
430
+ type: text-generation
431
+ dataset:
432
+ type: nuprl/MultiPL-E
433
+ name: MultiPL-HumanEval (JavaScript)
434
+ metrics:
435
+ - name: pass@1 (T=0.2)
436
+ type: pass@1
437
+ value: 26.88
438
+ verified: false
439
+ - task:
440
+ type: text-generation
441
+ dataset:
442
+ type: nuprl/MultiPL-E
443
+ name: MultiPL-HumanEval (Lua)
444
+ metrics:
445
+ - name: pass@1 (T=0.2)
446
+ type: pass@1
447
+ value: 15.26
448
+ verified: false
449
+ - task:
450
+ type: text-generation
451
+ dataset:
452
+ type: nuprl/MultiPL-E
453
+ name: MultiPL-HumanEval (PHP)
454
+ metrics:
455
+ - name: pass@1 (T=0.2)
456
+ type: pass@1
457
+ value: 23.04
458
+ verified: false
459
+ - task:
460
+ type: text-generation
461
+ dataset:
462
+ type: nuprl/MultiPL-E
463
+ name: MultiPL-HumanEval (Perl)
464
+ metrics:
465
+ - name: pass@1 (T=0.2)
466
+ type: pass@1
467
+ value: 12.1
468
+ verified: false
469
+ - task:
470
+ type: text-generation
471
+ dataset:
472
+ type: nuprl/MultiPL-E
473
+ name: MultiPL-HumanEval (Python)
474
+ metrics:
475
+ - name: pass@1 (T=0.2)
476
+ type: pass@1
477
+ value: 29.6
478
+ verified: false
479
+ - task:
480
+ type: text-generation
481
+ dataset:
482
+ type: nuprl/MultiPL-E
483
+ name: MultiPL-HumanEval (R)
484
+ metrics:
485
+ - name: pass@1 (T=0.2)
486
+ type: pass@1
487
+ value: 13.77
488
+ verified: false
489
+ - task:
490
+ type: text-generation
491
+ dataset:
492
+ type: nuprl/MultiPL-E
493
+ name: MultiPL-HumanEval (Ruby)
494
+ metrics:
495
+ - name: pass@1 (T=0.2)
496
+ type: pass@1
497
+ value: 12.68
498
+ verified: false
499
+ - task:
500
+ type: text-generation
501
+ dataset:
502
+ type: nuprl/MultiPL-E
503
+ name: MultiPL-HumanEval (Racket)
504
+ metrics:
505
+ - name: pass@1 (T=0.2)
506
+ type: pass@1
507
+ value: 4.29
508
+ verified: false
509
+ - task:
510
+ type: text-generation
511
+ dataset:
512
+ type: nuprl/MultiPL-E
513
+ name: MultiPL-HumanEval (Rust)
514
+ metrics:
515
+ - name: pass@1 (T=0.2)
516
+ type: pass@1
517
+ value: 19.54
518
+ verified: false
519
+ - task:
520
+ type: text-generation
521
+ dataset:
522
+ type: nuprl/MultiPL-E
523
+ name: MultiPL-HumanEval (Scala)
524
+ metrics:
525
+ - name: pass@1 (T=0.2)
526
+ type: pass@1
527
+ value: 18.33
528
+ verified: false
529
+ - task:
530
+ type: text-generation
531
+ dataset:
532
+ type: nuprl/MultiPL-E
533
+ name: MultiPL-HumanEval (Bash)
534
+ metrics:
535
+ - name: pass@1 (T=0.2)
536
+ type: pass@1
537
+ value: 5.7
538
+ verified: false
539
+ - task:
540
+ type: text-generation
541
+ dataset:
542
+ type: nuprl/MultiPL-E
543
+ name: MultiPL-HumanEval (Swift)
544
+ metrics:
545
+ - name: pass@1 (T=0.2)
546
+ type: pass@1
547
+ value: 17.68
548
+ verified: false
549
+ - task:
550
+ type: text-generation
551
+ dataset:
552
+ type: nuprl/MultiPL-E
553
+ name: MultiPL-HumanEval (TypeScript)
554
+ metrics:
555
+ - name: pass@1 (T=0.2)
556
+ type: pass@1
557
+ value: 25
558
+ verified: false
559
+
560
+ language:
561
+ - en
562
+ ---
563
+
564
+ # <span style="color: #7FFF7F;">Refact-1_6B-fim GGUF Models</span>
565
+
566
+ ## **Choosing the Right Model Format**
567
+
568
+ Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
569
+
570
+ ### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
571
+ - A 16-bit floating-point format designed for **faster computation** while retaining good precision.
572
+ - Provides **similar dynamic range** as FP32 but with **lower memory usage**.
573
+ - Recommended if your hardware supports **BF16 acceleration** (check your device’s specs).
574
+ - Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
575
+
576
+ 📌 **Use BF16 if:**
577
+ ✔ Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs).
578
+ ✔ You want **higher precision** while saving memory.
579
+ ✔ You plan to **requantize** the model into another format.
580
+
581
+ 📌 **Avoid BF16 if:**
582
+ ❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower).
583
+ ❌ You need compatibility with older devices that lack BF16 optimization.
584
+
585
+ ---
586
+
587
+ ### **F16 (Float 16) – More widely supported than BF16**
588
+ - A 16-bit floating-point **high precision** but with less of range of values than BF16.
589
+ - Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs).
590
+ - Slightly lower numerical precision than BF16 but generally sufficient for inference.
591
+
592
+ 📌 **Use F16 if:**
593
+ ✔ Your hardware supports **FP16** but **not BF16**.
594
+ ✔ You need a **balance between speed, memory usage, and accuracy**.
595
+ ✔ You are running on a **GPU** or another device optimized for FP16 computations.
596
+
597
+ 📌 **Avoid F16 if:**
598
+ ❌ Your device lacks **native FP16 support** (it may run slower than expected).
599
+ ❌ You have memory limitations.
600
+
601
+ ---
602
+
603
+ ### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference**
604
+ Quantization reduces model size and memory usage while maintaining as much accuracy as possible.
605
+ - **Lower-bit models (Q4_K)** → **Best for minimal memory usage**, may have lower precision.
606
+ - **Higher-bit models (Q6_K, Q8_0)** → **Better accuracy**, requires more memory.
607
+
608
+ 📌 **Use Quantized Models if:**
609
+ ✔ You are running inference on a **CPU** and need an optimized model.
610
+ ✔ Your device has **low VRAM** and cannot load full-precision models.
611
+ ✔ You want to reduce **memory footprint** while keeping reasonable accuracy.
612
+
613
+ 📌 **Avoid Quantized Models if:**
614
+ ❌ You need **maximum accuracy** (full-precision models are better for this).
615
+ ❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).
616
+
617
+ ---
618
+
619
+ ### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)**
620
+ These models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint.
621
+
622
+ - **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**.
623
+ - **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large.
624
+ - **Trade-off**: Lower accuracy compared to higher-bit quantizations.
625
+
626
+ - **IQ3_S**: Small block size for **maximum memory efficiency**.
627
+ - **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive.
628
+
629
+ - **IQ3_M**: Medium block size for better accuracy than **IQ3_S**.
630
+ - **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting.
631
+
632
+ - **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy.
633
+ - **Use case**: Best for **low-memory devices** where **Q6_K** is too large.
634
+
635
+ - **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**.
636
+ - **Use case**: Best for **ARM-based devices** or **low-memory environments**.
637
+
638
+ ---
639
+
640
+ ### **Summary Table: Model Format Selection**
641
+
642
+ | Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
643
+ |--------------|------------|---------------|----------------------|---------------|
644
+ | **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
645
+ | **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn’t available |
646
+ | **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
647
+ | **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
648
+ | **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
649
+ | **IQ3_XS** | Very Low | Very Low | Ultra-low-memory devices | Extreme memory efficiency and low accuracy |
650
+ | **Q4_0** | Low | Low | ARM or low-memory devices | llama.cpp can optimize for ARM devices |
651
+
652
+ ---
653
+
654
+ ## **Included Files & Details**
655
+
656
+ ### `Refact-1_6B-fim-bf16.gguf`
657
+ - Model weights preserved in **BF16**.
658
+ - Use this if you want to **requantize** the model into a different format.
659
+ - Best if your device supports **BF16 acceleration**.
660
+
661
+ ### `Refact-1_6B-fim-f16.gguf`
662
+ - Model weights stored in **F16**.
663
+ - Use if your device supports **FP16**, especially if BF16 is not available.
664
+
665
+ ### `Refact-1_6B-fim-bf16-q8_0.gguf`
666
+ - **Output & embeddings** remain in **BF16**.
667
+ - All other layers quantized to **Q8_0**.
668
+ - Use if your device supports **BF16** and you want a quantized version.
669
+
670
+ ### `Refact-1_6B-fim-f16-q8_0.gguf`
671
+ - **Output & embeddings** remain in **F16**.
672
+ - All other layers quantized to **Q8_0**.
673
+
674
+ ### `Refact-1_6B-fim-q4_k.gguf`
675
+ - **Output & embeddings** quantized to **Q8_0**.
676
+ - All other layers quantized to **Q4_K**.
677
+ - Good for **CPU inference** with limited memory.
678
+
679
+ ### `Refact-1_6B-fim-q4_k_s.gguf`
680
+ - Smallest **Q4_K** variant, using less memory at the cost of accuracy.
681
+ - Best for **very low-memory setups**.
682
+
683
+ ### `Refact-1_6B-fim-q6_k.gguf`
684
+ - **Output & embeddings** quantized to **Q8_0**.
685
+ - All other layers quantized to **Q6_K** .
686
+
687
+ ### `Refact-1_6B-fim-q8_0.gguf`
688
+ - Fully **Q8** quantized model for better accuracy.
689
+ - Requires **more memory** but offers higher precision.
690
+
691
+ ### `Refact-1_6B-fim-iq3_xs.gguf`
692
+ - **IQ3_XS** quantization, optimized for **extreme memory efficiency**.
693
+ - Best for **ultra-low-memory devices**.
694
+
695
+ ### `Refact-1_6B-fim-iq3_m.gguf`
696
+ - **IQ3_M** quantization, offering a **medium block size** for better accuracy.
697
+ - Suitable for **low-memory devices**.
698
+
699
+ ### `Refact-1_6B-fim-q4_0.gguf`
700
+ - Pure **Q4_0** quantization, optimized for **ARM devices**.
701
+ - Best for **low-memory environments**.
702
+ - Prefer IQ4_NL for better accuracy.
703
+
704
+ # <span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span>
705
+
706
+ Please click like ❤ . Also I’d really appreciate it if you could test my Network Monitor Assistant at 👉 [Network Monitor Assitant](https://freenetworkmonitor.click/dashboard).
707
+
708
+ 💬 Click the **chat icon** (bottom right of the main and dashboard pages) . Choose a LLM; toggle between the LLM Types TurboLLM -> FreeLLM -> TestLLM.
709
+
710
+ ### What I'm Testing
711
+
712
+ I'm experimenting with **function calling** against my network monitoring service. Using small open source models. I am into the question "How small can it go and still function".
713
+
714
+ 🟡 **TestLLM** – Runs the current testing model using llama.cpp on 6 threads of a Cpu VM (Should take about 15s to load. Inference speed is quite slow and it only processes one user prompt at a time—still working on scaling!). If you're curious, I'd be happy to share how it works! .
715
+
716
+ ### The other Available AI Assistants
717
+
718
+ 🟢 **TurboLLM** – Uses **gpt-4o-mini** Fast! . Note: tokens are limited since OpenAI models are pricey, but you can [Login](https://freenetworkmonitor.click) or [Download](https://freenetworkmonitor.click/download) the Free Network Monitor agent to get more tokens, Alternatively use the FreeLLM .
719
+
720
+ 🔵 **FreeLLM** – Runs **open-source Hugging Face models** Medium speed (unlimited, subject to Hugging Face API availability).
721
+
722
+
723
+
724
+
725
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)
726
+
727
+
728
+ # Refact-1.6B
729
+
730
+ Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
731
+
732
+ After fine-tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats
733
+ StarCoder ten times the size!
734
+
735
+
736
+ Model | Size | HumanEval pass@1 | HumanEval pass@10 |
737
+ ----------------------|---------------|--------------------|--------------------|
738
+ DeciCoder-1b | 1b | 19.1% | |
739
+ <b>Refact-1.6-fim</b> | <b>1.6b</b> | <b>32.0%</b> | <b>53.0%</b> |
740
+ StableCode | 3b | 20.2% | 33.8% |
741
+ ReplitCode v1 | 3b | 21.9% | |
742
+ CodeGen2.5-multi | 7b | 28.4% | 47.5% |
743
+ CodeLlama | 7b | 33.5% | 59.6% |
744
+ StarCoder | 15b | 33.6% | |
745
+
746
+ Likely, it's the best model for practical use in your IDE for code completion because it's smart and fast!
747
+ You can start using it right now by downloading the
748
+ [Refact plugin](https://refact.ai/). You can host the model yourself, too, using the
749
+ [open source docker container](https://github.com/smallcloudai/refact).
750
+
751
+ And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
752
+
753
+ # It Works As a Chat
754
+
755
+ The primary application of this model is code completion (infill) in multiple programming languages.
756
+ But it works as a chat quite well.
757
+
758
+ HumanEval results using instruction following (chat) format, against models specialized for chat only:
759
+
760
+ Model | Size | pass@1 | pass@10 |
761
+ -----------------------|--------|----------|----------|
762
+ <b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
763
+ StableCode-instruct | 3b | 26.9% | 36.2% |
764
+ OctoGeeX | 6b | 44.7% | |
765
+ CodeLlama-instruct | 7b | 34.8% | 64.3% |
766
+ CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
767
+ CodeLlama-instruct | 13b | 42.7% | 71.6% |
768
+ StarChat-β | 15b | 33.5% | |
769
+ OctoCoder | 15b | 46.2% | |
770
+
771
+
772
+ # Example
773
+
774
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
775
+
776
+ ```python
777
+ # pip install -q transformers
778
+ from transformers import AutoModelForCausalLM, AutoTokenizer
779
+
780
+ checkpoint = "smallcloudai/Refact-1_6B-fim"
781
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
782
+
783
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
784
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
785
+
786
+ prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
787
+
788
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
789
+ outputs = model.generate(inputs, max_length=100, temperature=0.2)
790
+ print("-"*80)
791
+ print(tokenizer.decode(outputs[0]))
792
+ ```
793
+
794
+ # Chat Format
795
+
796
+ The same model works as chat (experimental).
797
+
798
+ ```python
799
+ prompt_template = "<empty_output>SYSTEM {system}\n" \
800
+ "<empty_output>USER {query}\n" \
801
+ "<empty_output>ASSISTANT"
802
+ prompt = prompt_template.format(system="You are a programming assistant",
803
+ query="How do I sort a list in Python?")
804
+ ```
805
+
806
+ # Architecture
807
+
808
+ As described in more detail in the blog post, we used:
809
+
810
+ - [ALiBi](https://arxiv.org/abs/2108.12409) based attention
811
+ - [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
812
+ - [Multi Query Attention](https://arxiv.org/abs/1911.02150)
813
+
814
+ We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
815
+
816
+
817
+ # Pretraining
818
+
819
+ For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
820
+ Filtering is the key to success of this model:
821
+
822
+ - We only used text in English
823
+ - Only topics related to computer science
824
+ - Applied heavy deduplication
825
+
826
+ The text to code proportion was 50:50, model trained for 1.2T tokens.
827
+
828
+ We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
829
+ its practical use is limited. But if you still want it, write us a message on Discord.
830
+
831
+
832
+ # Finetuning
833
+
834
+ We tested our hypothesis that chat data should boost base model performance in FIM and
835
+ regular left-to-right code completion. We found that just 15% of open
836
+ [code](https://huggingface.co/datasets/bigcode/commitpackft)
837
+ [instruction-following](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k) datasets,
838
+ that we filtered for quality, improves almost all metrics.
839
+
840
+ Additionally, to improve FIM, we observed common failure modes, and prepared a synthetic dataset based on
841
+ [The Stack dedup v1.1](https://huggingface.co/datasets/bigcode/the-stack-dedup) to address them.
842
+
843
+ There is a distribution shift between typical code on the internet, and the code you write in your IDE.
844
+ The former is likely finished, so the model tries to come up with a suggestion that makes the code complete.
845
+ You are likely to have half-written code as you work on it, there is no single addition that can repair it
846
+ fully.
847
+
848
+ In practice, model needs to have a tendency to stop after a couple of lines are added, and sometimes don't write
849
+ anything at all. We found that just giving it empty completions, single line completions, multiline
850
+ completions that end with a smaller text indent or at least a newline -- makes it much more usable. This data
851
+ was used as the rest 85% of the finetune dataset.
852
+
853
+ The final model is the result of several attempts to make it work as good as possible for code completion,
854
+ and to perform well on a wide range of metrics. The best attempt took 40B tokens.
855
+
856
+
857
+ # Limitations and Bias
858
+
859
+ The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
860
+ code comments. Its performance on non-English languages is lower, for sure.
861
+
862
+
863
+ # Model Stats
864
+
865
+ - **Architecture:** LLAMA-like model with multi-query attention
866
+ - **Objectives** Fill-in-the-Middle, Chat
867
+ - **Tokens context:** 4096
868
+ - **Pretraining tokens:** 1.2T
869
+ - **Finetuning tokens:** 40B
870
+ - **Precision:** bfloat16
871
+ - **GPUs** 64 NVidia A5000
872
+ - **Training time** 28 days
873
+
874
+
875
+ # License
876
+
877
+ The model is licensed under the BigScience OpenRAIL-M v1 license agreement
878
+
879
+
880
+ # Citation
881
+
882
+ If you are using this model, please give a link to this page.