--- license: unknown language: - en metrics: - accuracy - precision - f1 - recall tags: - art base_model: google/vit-base-patch16-224 datasets: - DataScienceProject/Art_Images_Ai_And_Real_ pipeline_tag: image-classification library_name: transformers --- ### Model Card for Model ID This model is designed for classifying images as either 'real' or 'fake-AI generated' using a Vision Transformer (VIT) . Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test. ### Model Description This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images. The model classifies images into two categories: 'real ' and 'fake - AI generated'. It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs). ### Direct Use This model can be used to classify images as 'real art' or 'fake art' based on visual features learned by the Vision Transformer. ### Out-of-Scope Use The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset. ### Recommendations Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab. ## How to Get Started with the Model Prepare Data: Organize your images into appropriate folders and run the code. ## model architecture ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/RhONF2ZsQi_aVqyyk17yK.png) ## Training Details -Dataset: DataScienceProject/Art_Images_Ai_And_Real_ Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor and stored in special torch dataset. #### Training Hyperparameters optimizer = optim.Adam(model.parameters(), lr=0.001) num_epochs = 10 criterion = nn.CrossEntropyLoss() ## Evaluation The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb , gpu: rtx 3080 your mileage may vary. ### Testing Data, Factors & Metrics -precision -recall -f1 -confusion_matrix -accuracy ### Results -test accuracy = 0.92 -precision = 0.893 -recall = 0.957 -f1 = 0.924 - ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/UYTV1X3AqFM50EFojMbn9.png) #### Summary This model is by far the best of what we tried (CNN , Resnet , CNN + ELA).