🌟 Easy Fine-Tuning with Hugging Face SQL Console, Notebook Creator, and SFT

Community Article Published September 24, 2024

In this tutorial, we'll take you through an end-to-end process of creating a new dataset, fine-tuning a model with it, and sharing it on Hugging Face. By the end, you'll have a model that can respond in a lovely poetic way! πŸ’–

What We'll Use:

  • Hugging Face Dataset Viewer SQL Console
  • Dataset Notebook Create
  • Google Colab

For this example, we'll work with a poetry dataset and filter only the poems in the 'Love' category. This will allow us to fine-tune a model to generate answers filled with love and emotion. πŸ’Œ

1. Getting the data

Let's start by getting our data. We'll use the Georgii/poetry-genre dataset, which contains poems across various topics:

image/png

We only need the 'Love' poems, and we'll filter out any shorter than 150 characters. To do this, we'll use the SQL Console:

Click on SQL Console:

image/png

And now, write the following SQL query:

SELECT text AS poem FROM train WHERE genre = 'Love' AND len(text) > 150

image/png

πŸ’‘ Tip: For more advanced techniques and examples on using the SQL Console, check out this guide.

Now, click on Download to save the filtered dataset as a Parquet file. We'll use this file in the next steps.

image/png

2. Uploading the Dataset to Hugging Face

Create a new repository on Hugging Face for your dataset. You can upload the Parquet file manually, or use the following Python snippet to upload it programmatically:

from datasets import load_dataset

# Load the Parquet file into a dataset
dataset = load_dataset('parquet', data_files='query_result.parquet')

# Push the dataset to your Hugging Face repository
dataset.push_to_hub('your_dataset_name')

Or follow these steps to create your dataset.

In my case, I this dataset which now looks this way:

image/png

3. Generating the Training Code

Next, we'll use the Notebook Creator app to generate the training code for our dataset:

  1. Select asoria/love-poems as the dataset name

image/png

  1. Choose the Supervised fine-tuning (SFT) notebook type.

image/png

  1. Click Generate Notebook and open it in Google Colab.

4. Fine-Tuning the Model

Now, it's time to run the scripts in the generated notebook. We'll use the dataset to fine-tune a pre-trained model like facebook/opt-350m to create a new, more love-inspired version.

Follow the instructions in the notebook to train the model. Once training is complete, you'll have a model that responds in a lovelier way! 🌹✨

Conclusion

With just a few simple steps, we've created a new version of a dataset using the Hugging Face SQL Console, generated the necessary code with the Notebook Creator, and fine-tuned a model to answer with more love and poetry.

Now, your model is ready to spread love in every response! πŸ’•πŸŽ‰

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment