urlbert
Collection
A collection of bert-based models for URL analysis
•
10 items
•
Updated
•
1
This is a lightweight version of BERT, specifically fine-tuned for classifying URLs into four categories: benign, phishing, malware, and defacement.
The model was evaluated on a test set with the following classification metrics:
Metric | Model V3 | Model V4 (this model) |
---|---|---|
Overall Accuracy | 0.9837 | 0.9922 |
F1-score (Benign) | 0.9907 | 0.9955 |
F1-score (Defacement) | 0.9937 | 0.9984 |
F1-score (Malware) | 0.9741 | 0.9845 |
F1-score (Phishing) | 0.9444 | 0.9734 |
Weighted Average F1-score | 0.9836 | 0.9922 |
Below is an example of how to use the model for URL classification using the Hugging Face transformers
library:
from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
import torch
# Определение устройства (GPU или CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Используемое устройство: {device}")
# Загрузка модели и токенизатора
model_name = "CrabInHoney/urlbert-tiny-v4-malicious-url-classifier"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.to(device)
# Создание pipeline для классификации
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
device=0 if torch.cuda.is_available() else -1,
return_all_scores=True
)
# Примеры URL для тестирования
test_urls = [
"wikiobits.com/Obits/TonyProudfoot",
"http://www.824555.com/app/member/SportOption.php?uid=guest&langx=gb",
]
# Маппинг меток на понятные названия классов
label_mapping = {
"LABEL_0": "benign",
"LABEL_1": "defacement",
"LABEL_2": "malware",
"LABEL_3": "phishing"
}
# Классификация URL
for url in test_urls:
results = classifier(url)
print(f"\nURL: {url}")
for result in results[0]:
label = result['label']
score = result['score']
friendly_label = label_mapping.get(label, label)
print(f"{friendly_label}, %: {score:.4f}")
URL: wikiobits.com/Obits/TonyProudfoot
benign, %: 0.9996
defacement, %: 0.0000
malware, %: 0.0000
phishing, %: 0.0003
URL: http://www.824555.com/app/member/SportOption.php?uid=guest&langx=gb
benign, %: 0.0000
defacement, %: 0.0001
malware, %: 0.9998
phishing, %: 0.0001
Base model
CrabInHoney/urlbert-tiny-base-v4