Hi,
Thanks for the notebook.
I tried running this notebook on my M2 Mac, but it doesn’t work due to a bitsandbytes issue. The error I’m getting is:
AssertionError: Torch not compiled with CUDA enabled
Since bitsandbytes doesn’t support Apple Silicon yet (?), it crashes when trying to call torch.cuda.current_device()
. My setup confirms that MPS is available, but the notebook still fails:
import torch
print(torch.mps.is_available()) # Returns True
I also checked my installed package versions with:
uv pip freeze | grep -E "bitsandbytes|transformers|trl|peft|torch"
>>>
bitsandbytes==0.42.0
peft @ git+https://github.com/huggingface/peft.git@7320bb94a04f32dd576c8952cb4d4f59f8fc5a1b
torch==2.8.0.dev20250314
torchaudio==2.6.0.dev20250314
torchvision==0.22.0.dev20250314
transformers @ git+https://github.com/huggingface/transformers@46350f5eae87ac1d168ddfdc57a0b39b64b9a029
trl @ git+https://github.com/huggingface/trl.git@5cb390cd306b6b6aedf6403e6572c62bc33f96db
I believe the issue is coming from bitsandbytes
, which isn’t compatible with MPS?
Is there a recommended workaround for running this on a Mac without CUDA?
Thanks.
Truncated error message:
File ~/playgrounds/.venv/lib/python3.12/site-packages/bitsandbytes/functional.py:1395, in optimizer_update_8bit_blockwise(optimizer_name, g, p, state1, state2, beta1, beta2, eps, step, lr, qmap1, qmap2, absmax1, absmax2, weight_decay, gnorm_scale, skip_zeros)
1374 def optimizer_update_8bit_blockwise(
1375 optimizer_name: str,
1376 g: Tensor,
(...)
1391 skip_zeros=False,
1392 ) -> None:
1394 optim_func = None
-> 1395 prev_device = pre_call(g.device)
1396 is_on_gpu([g, p, state1, state2, qmap1, qmap2, absmax1, absmax2])
1397 if g.dtype == torch.float32 and state1.dtype == torch.uint8:
File ~/playgrounds/.venv/lib/python3.12/site-packages/bitsandbytes/functional.py:416, in pre_call(device)
415 def pre_call(device):
--> 416 prev_device = torch.cuda.current_device()
417 torch.cuda.set_device(device)
418 return prev_device
File ~/playgrounds/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:1026, in current_device()
1024 def current_device() -> int:
1025 r"""Return the index of a currently selected device."""
-> 1026 _lazy_init()
1027 return torch._C._cuda_getDevice()
File ~/playgrounds/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:363, in _lazy_init()
358 raise RuntimeError(
359 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
360 "multiprocessing, you must use the 'spawn' start method"
361 )
362 if not hasattr(torch._C, "_cuda_getDeviceCount"):
--> 363 raise AssertionError("Torch not compiled with CUDA enabled")
364 if _cudart is None:
365 raise AssertionError(
366 "libcudart functions unavailable. It looks like you have a broken build?"
367 )
AssertionError: Torch not compiled with CUDA enabled