Spaces:

kakaoent
/

webtoon_cropper

Running

App Files Files Community

wise-water commited on 14 days ago

Commit

13aa528

1 Parent(s): a56ad34

init commit

Browse files

Files changed (36) hide show

LICENSE +61 -0
README.md +72 -5
app.py +459 -0
requirements.txt +23 -0
sample_img/sample_danbooru_dragonball.png +3 -0
scripts/__init__.py +0 -0
scripts/__pycache__/__init__.cpython-312.pyc +0 -0
scripts/__pycache__/parse_cut_from_page.cpython-312.pyc +0 -0
scripts/convert_psd_to_png.py +106 -0
scripts/parse_cut_from_page.py +248 -0
scripts/run_tag_filter.py +147 -0
src/__init__.py +0 -0
src/__pycache__/__init__.cpython-312.pyc +0 -0
src/detectors/__init__.py +3 -0
src/detectors/__pycache__/__init__.cpython-312.pyc +0 -0
src/detectors/__pycache__/imgutils_detector.cpython-312.pyc +0 -0
src/detectors/imgutils_detector.py +170 -0
src/oskar_crop/__pycache__/detect_and_crop.cpython-312.pyc +0 -0
src/oskar_crop/detect_and_crop.py +56 -0
src/pipelines/__init__.py +3 -0
src/pipelines/__pycache__/__init__.cpython-312.pyc +0 -0
src/pipelines/__pycache__/pipeline_single_character_filtering.cpython-312.pyc +0 -0
src/pipelines/pipeline_single_character_filtering.py +175 -0
src/taggers/__init__.py +4 -0
src/taggers/__pycache__/__init__.cpython-312.pyc +0 -0
src/taggers/__pycache__/order.cpython-312.pyc +0 -0
src/taggers/__pycache__/tagger.cpython-312.pyc +0 -0
src/taggers/filter.py +113 -0
src/taggers/order.py +85 -0
src/taggers/tagger.py +215 -0
src/utils/__pycache__/device.cpython-312.pyc +0 -0
src/utils/__pycache__/timer.cpython-312.pyc +0 -0
src/utils/device.py +18 -0
src/utils/timer.py +51 -0
src/wise_crop/__pycache__/detect_and_crop.cpython-312.pyc +0 -0
src/wise_crop/detect_and_crop.py +84 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,61 @@

+This software project ("the Software") comprises original code and incorporates third-party components as detailed below. Each component is subject to its respective license terms.
+1. Original Code
+The original code within this repository, excluding third-party components, is licensed under the MIT License:
+MIT License
+Copyright (c) 2025 AI Lab, Kakao Entertainment
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+2. Third-Party Components
+a. Google Gemma 3 - 12B Instruction-Tuned Model (google/gemma-3-12b-it)
+This project utilizes the "google/gemma-3-12b-it" model, which is governed by Google's Gemma Terms of Use. Access to and use of this model require agreement to these terms, which include specific restrictions on distribution, modification, and usage. For detailed information, please refer to the Gemma Terms of Use.
+Note: Ensure compliance with Google's terms when using, distributing, or modifying this model.
+b. imgutils Python Package
+The imgutils package is employed for image processing tasks within this project. As of the latest available information, imgutils does not explicitly specify a license. In the absence of a declared license, the usage rights are undefined, and caution is advised. It's recommended to contact the package maintainers or consult the source repository for clarification before using or distributing this package.
+3. Additional Dependencies
+This project also relies on several other open-source packages, each with its own licensing terms:
+    Gradio: Licensed under the Apache License 2.0.
+    Pillow (PIL): Licensed under the Historical Permission Notice and Disclaimer (HPND).
+    psd-tools: Licensed under the MIT License.
+    OpenCV: Licensed under the Apache License 2.0.
+Please refer to each package's documentation for detailed license information.
+4. Usage Guidelines
+Users of this software must ensure compliance with all applicable licenses, especially when distributing or modifying the software. This includes adhering to the terms set forth by third-party components. Failure to comply with these terms may result in legal consequences.
+5. Disclaimer
+This software is provided "as is," without warranty of any kind, express or implied. The authors are not liable for any damages or legal issues arising from the use of this software. Users are responsible for ensuring that their use of the software complies with all applicable laws and regulations.
+For any questions or concerns regarding this license, please contact `[email protected]`.

README.md CHANGED Viewed

@@ -1,13 +1,80 @@
 ---
-title: Webtoon Cropper
-emoji: ⚡
-colorFrom: blue
 colorTo: blue
 sdk: gradio
 sdk_version: 5.25.2
 app_file: app.py
 pinned: false
-short_description: Webtoon Cropper
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Webtoon Lineart Cropper
+emoji: 🤗
+colorFrom: yellow
 colorTo: blue
 sdk: gradio
 sdk_version: 5.25.2
 app_file: app.py
+python_version: 3.12.10
 pinned: false
 ---
+# Helix-Painting Data Tool
+웹툰 PSD 파일을 이미지 파일(PNG)로 변환하고, 컷 추출 및 인물 검출 등을 통해 데이터를 추출하는 도구입니다.
+## Prerequisites
+[Prerequisites](docs/PREREQUISITES.md)
+## Setup a project
+[Setup a project](docs/SETUP.md)
+## Features
+- **PSD → PNG 변환**
+  웹툰 PSD 파일을 고해상도 PNG 이미지로 변환합니다.
+- **컷 추출 및 필터링**
+  이미지 내 흰색 영역을 기준으로 컷 박스를 추출하고, 인물 검출(face detector)과 태깅(tagger) 기능을 통해 컷 별 데이터를 생성합니다.
+- **병렬 처리**
+  처리 속도를 높이기 위해 Python의 multiprocessing을 사용한 병렬 이미지 처리 기능을 제공합니다.
+## Usage
+### 1. PSD 파일을 PNG 이미지로 변환
+`convert_psd_to_png.py` 스크립트를 사용하여 PSD 파일들을 PNG 이미지로 변환합니다.
+```shell
+python scripts/convert_psd_to_png.py --directory <PSD_directory> --output <output_directory> [--visible_layers layer1 layer2 ...] [--invisible_layers layer3 layer4 ...]
+```
+- `--directory` : PSD 파일을 검색할 디렉토리
+- `--output` : 변환된 PNG 파일을 저장할 디렉토리
+- `--visible_layers` / `--invisible_layers` : 보이거나 숨길 레이어들을 지정
+### 2. 이미지 컷 추출 및 필터링
+`parse_cut_from_page.py` 스크립트를 사용하여 컷 박스를 추출하고, 각 컷에 대해 필터링된 데이터를 생성합니다.
+```shell
+python scripts/parse_cut_from_page.py --lineart <lineart_directory> --flat <flat_directory> --segmentation <segmentation_directory> --color <color_directory> --output <output_directory> [--num_process <number_of_processes>]
+```
+- `--lineart` : 라인아트 이미지가 저장된 디렉토리
+- `--flat` : 채색 전 평면 이미지가 저장된 디렉토리
+- `--segmentation` : 세분화 이미지가 저장된 디렉토리
+- `--color` : 컬러 이미지가 저장된 디렉토리
+- `--output` : 잘라낸 컷 이미지를 저장할 디렉토리
+- `--num_process` : 병렬 처리에 사용할 프로세스 수 (선택값)
+### 3. 태깅 및 필터링 실행
+`run_tag_filter.py` 스크립트를 사용하여 이미지에 태깅 및 필터링을 수행하고, 결과를 저장합니다.
+```shell
+python scripts/run_tag_filter.py --input_dir <input_directory> --output_dir <output_directory> [--ext png jpg jpeg]
+```
+- `--input_dir` : 필터링할 이미지들이 저장된 디렉토리
+- `--output_dir` : 필터링 결과 이미지 및 캡션 파일을 저장할 디렉토리
+- `--ext` : 처리할 이미지 확장자 목록 (기본값: png)
+---
+이와 같이 각 스크립트는 별도의 인자들을 받아서 작업을 수행하며, 스크립트 내부의 로직에 따라 PSD 변환, 컷 추출, 인물 검출/태깅, 캡션 파일 생성 등의 기능을 제공합니다.
+### 4. Gradio 데모 페이지
+아래 명렁어를 실행하여 Gradio 데모 페이지로 원하는 PNG파일을 업로드하여 필터링된 이미지 컷 추출 결과를 ZIP파일 혹은 PSD파일로 확인할 수 있습니다.
+```shell
+python app.py
+```

app.py ADDED Viewed

	@@ -0,0 +1,459 @@

+import os
+import tempfile
+import zipfile
+import shutil # For make_archive
+import uuid
+from PIL import Image, ImageDraw
+from psd_tools import PSDImage
+from psd_tools.api.layers import PixelLayer
+import gradio as gr
+import traceback # For printing stack traces
+import subprocess
+def install(package):
+    subprocess.check_call([os.sys.executable, "-m", "pip", "install", package])
+install("timm")
+install("pydantic==2.10.6")
+install("dghs-imgutils==0.15.0")
+install("onnxruntime >= 1.17.0")
+install("psd_tools==1.10.7")
+# os.environ["no_proxy"] = "localhost,127.0.0.1,::1"
+# --- Attempt to import the actual function (Detector 1) ---
+# Let's keep the original import name as requested in the previous version
+try:
+    # Assuming this is the intended "Detector 1"
+    from src.wise_crop.detect_and_crop import crop_and_mask_characters_gradio
+    detector_1_available = True
+    print("Successfully imported 'crop_and_mask_characters_gradio' as Detector 1.")
+except ImportError:
+    detector_1_available = False
+    print("Warning: Could not import 'crop_and_mask_characters_gradio'. Using dummy function for Detector 1.")
+    # Define a dummy version for Detector 1 if import fails
+    def crop_and_mask_characters_gradio(image_pil: Image.Image):
+        """Dummy function 1 if import fails."""
+        print("Using DUMMY Detector 1.")
+        if image_pil is None: return []
+        width, height = image_pil.size
+        boxes = [
+            (0, (int(width * 0.1), int(height * 0.1), int(width * 0.3), int(height * 0.4))),
+            (1, (int(width * 0.6), int(height * 0.5), int(width * 0.25), int(height * 0.35))),
+        ]
+        valid_boxes = []
+        for i, (x, y, w, h) in boxes:
+            x1, y1, x2, y2 = max(0, x), max(0, y), min(width, x + w), min(height, y + h)
+            if x2 - x1 > 0 and y2 - y1 > 0: valid_boxes.append((i, (x1, y1, x2 - x1, y2 - y1)))
+        return valid_boxes
+# from src.oskar_crop.detect_and_crop import process_single_image as detector_2_function
+try:
+    # Assuming this is the intended "Detector 2"
+    # Note: Renamed the import alias to avoid conflict if both imports succeed.
+    # The function call inside process_lineart still uses crop_and_mask_characters_gradio_2
+    from src.oskar_crop.detect_and_crop import process_single_image as detector_2_function
+    detector_2_available = True
+    print("Successfully imported 'process_single_image' as Detector 2.")
+    # Define the function name used in process_lineart
+    def crop_and_mask_characters_gradio_2(image_pil: Image.Image):
+        return detector_2_function(image_pil)
+except ImportError:
+    detector_2_available = False
+    print("Warning: Could not import 'process_single_image'. Using dummy function for Detector 2.")
+    # Define a dummy version for Detector 2 if import fails
+    # --- Define the SECOND Dummy Detection Function ---
+    def crop_and_mask_characters_gradio_2(image_pil: Image.Image):
+        """
+        SECOND Dummy function to simulate detecting objects and returning bounding boxes.
+        Returns different results than the first function.
+        """
+        print("Using DUMMY Detector 2.")
+        if image_pil is None:
+            return []
+        width, height = image_pil.size
+        print(f"Dummy detection 2 running on image size: {width}x{height}")
+        # Define DIFFERENT fixed bounding boxes for demonstration
+        boxes = [
+            (0, (int(width * 0.05), int(height * 0.6), int(width * 0.4), int(height * 0.3))), # Bottom-leftish, wider
+            (1, (int(width * 0.7), int(height * 0.1), int(width * 0.20), int(height * 0.25))), # Top-rightish, smaller
+            (2, (int(width * 0.4), int(height * 0.4), int(width * 0.15), int(height * 0.15))), # Center-ish, very small
+        ]
+        # Basic validation
+        valid_boxes = []
+        for i, (x, y, w, h) in boxes:
+            x1 = max(0, x)
+            y1 = max(0, y)
+            x2 = min(width, x + w)
+            y2 = min(height, y + h)
+            new_w = x2 - x1
+            new_h = y2 - y1
+            if new_w > 0 and new_h > 0:
+                valid_boxes.append((i, (x1, y1, new_w, new_h)))
+        print(f"Dummy detection 2 found {len(valid_boxes)} boxes.")
+        return valid_boxes
+# --- Helper Function (make_lineart_transparent - unchanged) ---
+def make_lineart_transparent(lineart_path, threshold=200):
+    """Converts a lineart image file to a transparent RGBA PIL Image."""
+    try:
+        # Ensure we handle potential pathlib objects if Gradio passes them
+        lineart_gray = Image.open(str(lineart_path)).convert('L')
+        w, h = lineart_gray.size
+        lineart_rgba = Image.new('RGBA', (w, h), (0, 0, 0, 0))
+        gray_pixels = lineart_gray.load()
+        rgba_pixels = lineart_rgba.load()
+        for y in range(h):
+            for x in range(w):
+                gray_val = gray_pixels[x, y]
+                alpha = 255 - gray_val
+                if gray_val < threshold :
+                     rgba_pixels[x, y] = (0, 0, 0, alpha)
+                else:
+                     rgba_pixels[x, y] = (0, 0, 0, 0)
+        return lineart_rgba
+    except FileNotFoundError:
+        print(f"Helper Error: Image file not found at {lineart_path}")
+        # Return a blank transparent image or None? Returning None is clearer.
+        return None
+    except Exception as e:
+        print(f"Helper Error processing image {lineart_path}: {e}")
+        return None
+# --- Main Processing Function (modified for better error handling with PIL) ---
+def process_lineart(input_pil_or_path, detector_choice): # Input can be PIL or path from examples
+    """
+    Processes the input lineart image using the selected detector.
+    Detects objects (e.g., characters based on head/face), crops them,
+    provides a gallery of crops, a ZIP file of crops, and a PSD file
+    with the original lineart (made transparent) and bounding boxes.
+    """
+    # --- Initialize variables ---
+    input_pil_image = None
+    temp_input_path = None
+    using_temp_input_path = False
+    status_updates = ["Status: Initializing..."]
+    psd_output_path = None # Initialize to None
+    zip_output_path = None # Initialize to None
+    cropped_images_for_gallery = [] # Initialize to empty list
+    try:
+        # --- Handle Input ---
+        if input_pil_or_path is None:
+            gr.Warning("Please upload a PNG image or select an example.")
+            return [], None, None, "Status: No image provided."
+        print(f"Input type: {type(input_pil_or_path)}")
+        print(f"Input value: {input_pil_or_path}")
+        # Check if input is already a PIL image (from upload) or a path (from examples)
+        if isinstance(input_pil_or_path, Image.Image):
+            input_pil_image = input_pil_or_path
+            print("Processing PIL image from upload.")
+            # Create a temporary path for make_lineart_transparent if needed later
+            temp_input_fd, temp_input_path = tempfile.mkstemp(suffix=".png")
+            os.close(temp_input_fd)
+            input_pil_image.save(temp_input_path, "PNG")
+            using_temp_input_path = True
+        elif isinstance(input_pil_or_path, str) and os.path.exists(input_pil_or_path):
+            print(f"Processing image from file path: {input_pil_or_path}")
+            try:
+                input_pil_image = Image.open(input_pil_or_path)
+                # Use the example path directly for make_lineart_transparent
+                temp_input_path = input_pil_or_path
+                using_temp_input_path = False # Don't delete the example file later
+            except Exception as e:
+                 status_updates.append(f"ERROR: Could not open image file from path '{input_pil_or_path}': {e}")
+                 print(status_updates[-1])
+                 return [], None, None, "\n".join(status_updates) # Return error status
+        else:
+             status_updates.append(f"ERROR: Invalid input type received: {type(input_pil_or_path)}. Expected PIL image or file path.")
+             print(status_updates[-1])
+             return [], None, None, "\n".join(status_updates) # Return error status
+        # --- Ensure RGBA and get dimensions ---
+        try:
+            input_pil_image = input_pil_image.convert("RGBA")
+            width, height = input_pil_image.size
+        except Exception as e:
+             status_updates.append(f"ERROR: Could not process input image (convert/get size): {e}")
+             print(status_updates[-1])
+             # Clean up temp file if created before error
+             if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+                 try: os.remove(temp_input_path)
+                 except Exception as e_rem: print(f"Warning: Could not remove temp input file {temp_input_path}: {e_rem}")
+             return [], None, None, "\n".join(status_updates) # Return error status
+        status_updates = [f"Status: Processing started using {detector_choice}."] # Reset status
+        print("Starting processing...")
+        # --- 1. Detect Objects (Conditional) ---
+        print(f"Selected detector: {detector_choice}")
+        if detector_choice == "Detector 1":
+            if not detector_1_available:
+                 status_updates.append("Warning: Using DUMMY Detector 1.")
+            boxes_info = crop_and_mask_characters_gradio(input_pil_image)
+        elif detector_choice == "Detector 2":
+            if not detector_2_available:
+                 status_updates.append("Warning: Using DUMMY Detector 2.")
+            boxes_info = crop_and_mask_characters_gradio_2(input_pil_image)
+        else:
+            # This case should ideally not happen with Radio buttons, but good for safety
+            status_updates.append(f"ERROR: Invalid detector choice received: {detector_choice}")
+            print(status_updates[-1])
+            # Clean up temp file if created before error
+            if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+                try: os.remove(temp_input_path)
+                except Exception as e_rem: print(f"Warning: Could not remove temp input file {temp_input_path}: {e_rem}")
+            return [], None, None, "\n".join(status_updates) # Return error status
+        if not boxes_info:
+            gr.Warning("No objects detected.")
+            status_updates.append("No objects detected.")
+            # Clean up temp file if created
+            if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+                try: os.remove(temp_input_path)
+                except Exception as e_rem: print(f"Warning: Could not remove temp input file {temp_input_path}: {e_rem}")
+            return [], None, None, "\n".join(status_updates)
+        status_updates.append(f"Detected {len(boxes_info)} objects.")
+        print(f"Detected boxes: {boxes_info}")
+        # --- Temporary file paths (partially adjusted) ---
+        temp_dir_for_outputs = tempfile.gettempdir()
+        unique_id = uuid.uuid4().hex[:8]
+        zip_base_name = os.path.join(temp_dir_for_outputs, f"cropped_images_{unique_id}")
+        zip_output_path = f"{zip_base_name}.zip" # Path for the final zip file
+        psd_output_path = os.path.join(temp_dir_for_outputs, f"lineart_boxes_{unique_id}.psd")
+        # temp_input_path is already handled above based on input source
+        # --- 2. Crop Images and Prepare for ZIP ---
+        with tempfile.TemporaryDirectory() as temp_crop_dir:
+            print(f"Saving cropped images to temporary directory: {temp_crop_dir}")
+            for i, (x, y, w, h) in boxes_info:
+                # Ensure box coordinates are within image bounds
+                x1, y1 = max(0, x), max(0, y)
+                x2, y2 = min(width, x + w), min(height, y + h)
+                box = (x1, y1, x2, y2)
+                if box[2] > box[0] and box[3] > box[1]: # Check if width and height are positive
+                    try:
+                        cropped_img = input_pil_image.crop(box)
+                        cropped_images_for_gallery.append(cropped_img)
+                        crop_filename = os.path.join(temp_crop_dir, f"cropped_{i}.png")
+                        cropped_img.save(crop_filename, "PNG")
+                    except Exception as e:
+                        print(f"Error cropping or saving box {i} with coords {box}: {e}")
+                        status_updates.append(f"Warning: Error processing crop {i}.")
+                else:
+                     print(f"Skipping invalid box {i} with coords {box}")
+                     status_updates.append(f"Warning: Skipped invalid crop dimensions for box {i}.")
+            # --- 3. Create ZIP File ---
+            # Check if any PNG files were actually created in the temp dir
+            if any(f.endswith(".png") for f in os.listdir(temp_crop_dir)):
+                print(f"Creating ZIP file: {zip_output_path} from {temp_crop_dir}")
+                try:
+                    shutil.make_archive(zip_base_name, 'zip', temp_crop_dir)
+                    status_updates.append("Cropped images ZIP created.")
+                    # zip_output_path is already correctly set
+                except Exception as e:
+                    print(f"Error creating ZIP file: {e}")
+                    status_updates.append("Error: Failed to create ZIP file.")
+                    zip_output_path = None # Indicate failure
+            else:
+                print("No valid cropped images were saved, skipping ZIP creation.")
+                status_updates.append("Skipping ZIP creation (no valid crops).")
+                zip_output_path = None # No zip file to provide
+        # --- 4. Prepare PSD Layers ---
+        #   a) Line Layer (Use the temp_input_path which is either the original example path or a temp copy)
+        print(f"Using image path for transparent layer: {temp_input_path}")
+        line_layer_pil = make_lineart_transparent(temp_input_path)
+        if line_layer_pil is None:
+            status_updates.append("Error: Failed to create transparent lineart layer.")
+            print(status_updates[-1])
+            # Don't create PSD if lineart failed, return current results
+            # Clean up temp file if created
+            if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+                try: os.remove(temp_input_path)
+                except Exception as e_rem: print(f"Warning: Could not remove temp input file {temp_input_path}: {e_rem}")
+            return cropped_images_for_gallery, zip_output_path, None, "\n".join(status_updates) # Return None for PSD
+        status_updates.append("Transparent lineart layer created.")
+        #   b) Box Layer
+        box_layer_pil = Image.new('RGBA', (width, height), (255, 255, 255, 255)) # White background
+        draw = ImageDraw.Draw(box_layer_pil)
+        for i, (x, y, w, h) in boxes_info:
+            # Use validated coords again, ensure they are within bounds
+            x1, y1 = max(0, x), max(0, y)
+            x2, y2 = min(width, x + w), min(height, y + h)
+            if x2 > x1 and y2 > y1: # Check validity again just in case
+                 rect = [(x1, y1), (x2, y2)]
+                 # Changed to fill for solid boxes, yellow fill, semi-transparent
+                 draw.rectangle(rect, fill=(255, 255, 0, 128))
+        status_updates.append("Bounding box layer created.")
+        # --- 5. Create PSD File ---
+        print(f"Creating PSD file: {psd_output_path}")
+        # Double check layer sizes before creating PSD object
+        if line_layer_pil.size != (width, height) or box_layer_pil.size != (width, height):
+             size_error_msg = (f"Error: Layer size mismatch during PSD creation. "
+                             f"Line: {line_layer_pil.size}, Box: {box_layer_pil.size}, "
+                             f"Expected: {(width, height)}")
+             status_updates.append(size_error_msg)
+             print(size_error_msg)
+             # Clean up temp file if created
+             if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+                 try: os.remove(temp_input_path)
+                 except Exception as e_rem: print(f"Warning: Could not remove temp input file {temp_input_path}: {e_rem}")
+             return cropped_images_for_gallery, zip_output_path, None, "\n".join(status_updates) # No PSD
+        try:
+            psd = PSDImage.new(mode='RGBA', size=(width, height))
+            # Add layers (order matters for visibility in PSD viewers)
+            # Base layer is transparent by default with RGBA
+            psd.append(PixelLayer.frompil(line_layer_pil, layer_name='line', top=0, left=0))
+            psd.append(PixelLayer.frompil(box_layer_pil, layer_name='box', top=0, left=0))
+            psd.save(psd_output_path)
+            status_updates.append("PSD file created.")
+        except Exception as e:
+            print(f"Error saving PSD file: {e}")
+            traceback.print_exc()
+            status_updates.append("Error: Failed to save PSD file.")
+            psd_output_path = None # Indicate failure
+        print("Processing finished.")
+        status_updates.append("Success!")
+        final_status = "\n".join(status_updates)
+        # Return all paths, even if None (Gradio handles None for File output)
+        return cropped_images_for_gallery, zip_output_path, psd_output_path, final_status
+    except Exception as e:
+        print(f"An unexpected error occurred in process_lineart: {e}")
+        traceback.print_exc()
+        status_updates.append(f"FATAL ERROR: {e}")
+        final_status = "\n".join(status_updates)
+        # Return empty/None outputs and the error status
+        # Ensure cleanup happens even on fatal error
+        if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+            try:
+                os.remove(temp_input_path)
+                print(f"Cleaned up temporary input file due to error: {temp_input_path}")
+            except Exception as e_rem:
+                print(f"Warning: Could not remove temp input file {temp_input_path} during error handling: {e_rem}")
+        return [], None, None, final_status # Return safe defaults
+    finally:
+        # --- Final Cleanup (Only removes temp input if created from upload) ---
+        if using_temp_input_path and temp_input_path and os.path.exists(temp_input_path):
+            try:
+                os.remove(temp_input_path)
+                print(f"Cleaned up temporary input file: {temp_input_path}")
+            except Exception as e_rem:
+                # This might happen if the file was already removed in an error block
+                print(f"Notice: Could not remove temp input file {temp_input_path} in finally block (may already be removed): {e_rem}")
+# --- Gradio Interface Definition (modified) ---
+css = '''
+        .custom-gallery {
+            height: 500px !important;
+            width: 100%;
+            margin: 10px auto;
+            padding: 0px;
+            overflow-y: auto !important;
+        }
+        '''
+with gr.Blocks(theme=gr.themes.Soft(), css=css) as demo:
+    gr.Markdown("# Webtoon Lineart Cropper with Filtering by Head-or-Face Detection")
+    gr.Markdown("Upload a PNG lineart image of your webtoon and automatically crop the character's face or head included region. "
+                "This demo leverages some detectors to precisely detect and isolate characters. "
+                "The app will display cropped objects, provide a ZIP of cropped PNGs, "
+                "and a PSD file with transparent lineart and half-transparent yellow-filled box layers. "
+                "We provide two detectors to choose from, each with different filtering methods. ")
+    gr.Markdown("- **Detector 1**: Uses [`imgutils.detect`](https://github.com/deepghs/imgutils/tree/main/imgutils/detect) and VLM-based filtering with [`google/gemma-3-12b-it`](https://huggingface.co/google/gemma-3-12b-it)")
+    gr.Markdown("- **Detector 2**: Uses [`imgutils.detect`](https://github.com/deepghs/imgutils/tree/main/imgutils/detect) and tag-based filtering with [`SmilingWolf/wd-eva02-large-tagger-v3`](https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3)")
+    gr.Markdown("**Note 1:** The app may take a few seconds to process the image, depending on the size and number of characters detected. The example image below is a lineart PNG file created synthetically from images on [Danbooru](https://danbooru.donmai.us/posts?page=1&tags=dragon_ball_z) after [lineart extraction](https://huggingface.co/spaces/carolineec/informativedrawings).")
+    gr.Markdown("**Note 2:** This demo is developed by [Kakao Entertainment](https://kakaoent.com/)'s AI Lab for research purposes, specifically designed to preprocess webtoon image data and is also not intended for production use. It is a research prototype and may not be suitable for all use cases. Please use it at your own risk.")
+    with gr.Row():
+        with gr.Column(scale=1):
+            # Input type remains 'filepath' to handle examples cleanly.
+            image_input = gr.Image(type="filepath", label="Upload Lineart PNG or Select Example", image_mode='RGBA', height=400)
+            detector_choice_radio = gr.Radio(
+                choices=["Detector 1", "Detector 2"],
+                label="Choose Detection Function",
+                value="Detector 1" # Default value
+            )
+            process_button = gr.Button("Process Uploaded/Modified Image", variant="primary")
+            status_output = gr.Textbox(label="Status", interactive=False, lines=8) # Increased lines slightly more
+        with gr.Column(scale=3):
+            gr.Markdown("### Cropped Objects")
+            # Setting height explicitly can sometimes help layout.
+            gallery_output = gr.Gallery(label="Detected Objects (Cropped)", elem_id="gallery_crops", columns=4, height=500, interactive=False, elem_classes="custom-gallery")  # object_fit="contain")
+            with gr.Row():
+                zip_output = gr.File(label="Download Cropped Images (ZIP)")
+                psd_output = gr.File(label="Download PSD (Lineart + Boxes)")
+    # --- Add Examples ---
+    # IMPORTANT: Make sure 'sample_img.png' exists in the same directory
+    #            as this script, or provide the correct relative/absolute path.
+    #            Also ensure the image is a valid PNG.
+    example_image_path = "./sample_img/sample_danbooru_dragonball.png"
+    if os.path.exists(example_image_path):
+         gr.Examples(
+            examples=[
+                [example_image_path, "Detector 1"],
+                [example_image_path, "Detector 2"] # Add example for detector 2 as well
+                ],
+            # Inputs that the examples populate
+            inputs=[image_input, detector_choice_radio],
+            # Outputs that are updated when an example is clicked AND run_on_click=True
+            outputs=[gallery_output, zip_output, psd_output, status_output],
+            # The function to call when an example is clicked
+            fn=process_lineart,
+            # Make clicking an example automatically run the function
+            run_on_click=True,
+            label="Click Example to Run Automatically", # Updated label
+            cache_examples=True, # Disable caching to ensure fresh processing
+            cache_mode="lazy",
+         )
+    else:
+         gr.Markdown(f"**(Note:** Could not find `{example_image_path}` for examples. Please create it or ensure it's in the correct directory.)")
+    # --- Button Click Handler (for manual uploads/changes) ---
+    process_button.click(
+        fn=process_lineart,
+        inputs=[image_input, detector_choice_radio],
+        outputs=[gallery_output, zip_output, psd_output, status_output]
+    )
+# --- Launch the Gradio App ---
+if __name__ == "__main__":
+    # Create a dummy sample image if it doesn't exist for testing
+    if not os.path.exists("./sample_img/sample_danbooru_dragonball.png"):
+        print("Creating a dummy 'sample_danbooru_dragonball.png' for demonstration.")
+        try:
+            img = Image.new('L', (300, 200), color=255) # White background (grayscale)
+            draw = ImageDraw.Draw(img)
+            # Draw some black lines/shapes
+            draw.line((30, 30, 270, 30), fill=0, width=2)
+            draw.rectangle((50, 50, 150, 150), outline=0, width=3)
+            draw.ellipse((180, 70, 250, 130), outline=0, width=3)
+            img.save("./sample_img/sample_danbooru_dragonball.png", "PNG")
+            print("Dummy 'sample_danbooru_dragonball.png' created.")
+        except Exception as e:
+            print(f"Warning: Failed to create dummy sample image: {e}")
+    demo.launch()
+    # ssr_mode=False

requirements.txt ADDED Viewed

	@@ -0,0 +1,23 @@

+--extra-index-url https://download.pytorch.org/whl/cpu
+torch
+--extra-index-url https://download.pytorch.org/whl/cpu
+torchvision
+--extra-index-url https://download.pytorch.org/whl/cpu
+torchaudio
+pydantic==2.10.6
+timm
+psd_tools==1.10.7
+accelerate
+diffusers
+transformers
+xformers
+opencv-python
+dghs-imgutils==0.15.0
+pillow
+numpy
+scikit-learn
+huggingface_hub
+tqdm
+opencv-contrib-python
+pandas
+scipy

sample_img/sample_danbooru_dragonball.png ADDED Viewed

Git LFS Details

SHA256: 203064708dc908c07b95c1e8e53302635d260b6c2b238f29f6580e2af597373c
Pointer size: 132 Bytes
Size of remote file: 5.56 MB

scripts/__init__.py ADDED Viewed

File without changes

scripts/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (204 Bytes). View file

scripts/__pycache__/parse_cut_from_page.cpython-312.pyc ADDED Viewed

Binary file (14 kB). View file

scripts/convert_psd_to_png.py ADDED Viewed

	@@ -0,0 +1,106 @@

+import argparse
+import logging
+import multiprocessing
+import os
+from typing import Iterable
+from psd_tools import PSDImage
+from tqdm import tqdm
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=logging.INFO)
+def parse_args():
+    parser = argparse.ArgumentParser(description="Convert PSD files to PNG.")
+    parser.add_argument(
+        "-d",
+        "--directory",
+        type=str,
+        default="./",
+        help="Directory to search for PSD files.",
+    )
+    parser.add_argument("-o", "--output", type=str, default="./", help="Directory to save PNG files.")
+    parser.add_argument(
+        "--visible_layers",
+        default=[],
+        nargs="+",
+        type=str,
+        help="List of layer names to make visible.",
+    )
+    parser.add_argument(
+        "--invisible_layers",
+        default=[],
+        nargs="+",
+        type=str,
+        help="List of layer names to make invisible.",
+    )
+    parser.add_argument("--num_processes", "-n", default=None, type=int, help=" Number of processes to use.")
+    return parser.parse_args()
+def find_psd_files(directory):
+    psd_files = []
+    for root, dirs, files in os.walk(directory):
+        for file in files:
+            if file.endswith(".psd"):
+                psd_files.append(os.path.join(root, file))
+    return psd_files
+def set_layer_visibility(layer, visible_layers, invisible_layers):
+    if layer.name in visible_layers:
+        layer.visible = True
+    if layer.name in invisible_layers:
+        layer.visible = False
+    if isinstance(layer, Iterable):
+        for child in layer:
+            set_layer_visibility(child, visible_layers, invisible_layers)
+def process_psd_file(task):
+    """
+    Worker function that processes a single PSD file.
+    Opens the PSD, sets layer visibility, composites the image and saves it as PNG.
+    """
+    psd_file, output, visible_layers, invisible_layers, force = task
+    try:
+        psd = PSDImage.open(psd_file)
+        if force:
+            for layer in psd:
+                set_layer_visibility(layer, visible_layers, invisible_layers)
+        image = psd.composite(force=force)
+        fname = os.path.basename(psd_file).replace(".psd", ".png")
+        output_file = os.path.join(output, fname)
+        image.save(output_file)
+    except Exception as e:
+        logger.error("Error processing file %s: %s", psd_file, e)
+def main(args):
+    # Create output directory if it doesn't exist
+    if not os.path.exists(args.output):
+        os.makedirs(args.output)
+    psd_files = find_psd_files(args.directory)
+    # force=True when any layer visibility is provided
+    force = True if len(args.visible_layers) + len(args.invisible_layers) else False
+    tasks = [(psd_file, args.output, args.visible_layers, args.invisible_layers, force) for psd_file in psd_files]
+    num_processes = args.num_processes if args.num_processes else multiprocessing.cpu_count() // 2
+    # Use multiprocessing to process PSD files in parallel
+    with multiprocessing.Pool(processes=num_processes) as pool:
+        list(
+            tqdm(
+                pool.imap_unordered(process_psd_file, tasks),
+                total=len(tasks),
+                desc="Convert PSD to PNG files",
+            )
+        )
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)

scripts/parse_cut_from_page.py ADDED Viewed

	@@ -0,0 +1,248 @@

+import argparse
+import logging
+import multiprocessing
+import re
+import sys
+from pathlib import Path, PurePath
+from typing import List, Tuple
+import numpy as np
+from PIL import Image
+from tqdm import tqdm
+current_file_path = Path(__file__).resolve()
+sys.path.insert(0, str(current_file_path.parent.parent))
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=logging.INFO)
+def parse_args():
+    parser = argparse.ArgumentParser(description="Parse cut from page script.")
+    parser.add_argument("--lineart", "-l", type=str, required=True, help="Directory of lineart images.")
+    parser.add_argument("--flat", "-f", type=str, required=True, help="Directory of flat images.")
+    parser.add_argument(
+        "--segmentation",
+        "-s",
+        type=str,
+        required=True,
+        help="Directory of segmentatio.",
+    )
+    parser.add_argument("--color", "-c", type=str, required=True, help="Directory of color images.")
+    parser.add_argument("--output", "-o", type=str, required=True, help="Output directory for parsed images.")
+    parser.add_argument("--num_process", "-n", type=int, default=None, help="Number of processes to use.")
+    return parser.parse_args()
+def get_image_list(input_dir: str, ext: List[str]):
+    """
+    Get a list of images from the input directory with the specified extensions.
+    Args:
+        input_dir (str): Directory containing images to filter.
+        ext (list): List of file extensions to filter by.
+    Returns:
+        list: List of image file paths.
+    """
+    image_list = []
+    for ext in ext:
+        image_list.extend(Path(input_dir).glob(f"*.{ext}"))
+    return image_list
+def check_image_pair_validity(
+    lineart_list: List[PurePath],
+    flat_list: List[PurePath],
+    segmentation_list: List[PurePath],
+    color_list: List[PurePath],
+    pattern: str = r"\d+_\d+",
+) -> Tuple[List[PurePath], List[PurePath], List[PurePath], List[PurePath]]:
+    """
+    Validates and filters lists of image file paths to ensure they correspond to the same IDs
+    based on a given naming pattern. If the lengths of the input lists are mismatched, the
+    function filters the lists to include only matching IDs.
+    Args:
+        lineart_list (List[PurePath]): List of file paths for lineart images.
+        flat_path (List[PurePath]): List of file paths for flat images.
+        segmentation_path (List[PurePath]): List of file paths for segmentation images.
+        color_path (List[PurePath]): List of file paths for color images.
+        pattern (str, optional): Regular expression pattern to extract IDs from file names.
+                                 Defaults to r"\d+_\d+".
+    Returns:
+        Tuple[List[PurePath], List[PurePath], List[PurePath], List[PurePath]]:
+            A tuple containing four lists of file paths (lineart, flat, segmentation, color)
+            that have been filtered to ensure matching IDs.
+    """
+    pattern = re.compile(pattern)
+    # Sort the lists based on the pattern
+    lineart_list = sorted(lineart_list, key=lambda x: pattern.match(x.name).group(0))
+    flat_list = sorted(flat_list, key=lambda x: pattern.match(x.name).group(0))
+    segmentation_list = sorted(segmentation_list, key=lambda x: pattern.match(x.name).group(0))
+    color_list = sorted(color_list, key=lambda x: pattern.match(x.name).group(0))
+    # Check if the lengths of the lists are equal
+    if (
+        len(lineart_list) != len(flat_list)
+        or len(lineart_list) != len(segmentation_list)
+        or len(lineart_list) != len(color_list)
+    ):
+        # If the lengths are not equal, we need to filter the lists based on the pattern
+        logger.warning(
+            f"Length mismatch: lineart({len(lineart_list)}), flat({len(flat_list)}), segmentation({len(segmentation_list)}), color({len(color_list)})"
+        )
+        new_lineart_list = []
+        new_flat_list = []
+        new_segmentation_list = []
+        new_color_list = []
+        for lineart_path in lineart_list:
+            lineart_name = lineart_path.name
+            lineart_match = pattern.match(lineart_name)
+            if lineart_match:
+                file_id = lineart_match.group(0)
+                corresponding_flat_files = [p for p in flat_list if file_id in p.name]
+                corresponding_segmentation_files = [p for p in segmentation_list if file_id in p.name]
+                corresponding_color_paths = [p for p in color_list if file_id in p.name]
+                if corresponding_flat_files and corresponding_segmentation_files and corresponding_color_paths:
+                    new_lineart_list.append(lineart_path)
+                    new_flat_list.append(corresponding_flat_files[0])
+                    new_segmentation_list.append(corresponding_segmentation_files[0])
+                    new_color_list.append(corresponding_color_paths[0])
+        return new_lineart_list, new_flat_list, new_segmentation_list, new_color_list
+    else:
+        return lineart_list, flat_list, segmentation_list, color_list
+def extract_cutbox_coordinates(image: Image.Image) -> List[Tuple[int, int, int, int]]:
+    """
+    Extracts bounding box coordinates for non-white regions in an image.
+    This function identifies regions in the given image that contain non-white pixels
+    and calculates the bounding box coordinates for each region. The bounding boxes
+    are represented as tuples of (left, top, right, bottom).
+    Args:
+        image (Image.Image): The input image as a PIL Image object.
+    Returns:
+        List[Tuple[int, int, int, int]]: A list of bounding box coordinates for non-white regions.
+            Each tuple contains four integers representing the left, top, right, and bottom
+            coordinates of a bounding box.
+    """
+    # We'll now detect the bounding box for non-white pixels instead of relying on the alpha channel.
+    # Convert the image to RGB and get the numpy array
+    image_rgb = image.convert("RGB")
+    image_np_rgb = np.array(image_rgb)
+    # Define white color threshold (treat pixels close to white as white)
+    threshold = 255  # Allow some margin for near-white
+    non_white_mask = np.any(image_np_rgb < threshold, axis=-1)  # Any channel below threshold is considered non-white
+    non_white_mask = non_white_mask.astype(np.uint8) * 255
+    # Image.fromarray(non_white_mask).save("non_white_mask.png")
+    # Find rows containing non-white pixels
+    non_white_rows = np.where(non_white_mask.any(axis=1))[0]
+    if len(non_white_rows) == 0:
+        return []
+    # Group continuous non-white rows
+    horizontal_line = np.where(np.diff(non_white_rows) != 1)[0] + 1
+    non_white_rows = np.split(non_white_rows, horizontal_line)
+    top_bottom_pairs = [(group[0], group[-1]) for group in non_white_rows]
+    # Iterate through each cut and find the left and right bounds
+    bounding_boxes = []
+    for top, bottom in top_bottom_pairs:
+        cut = image_np_rgb[top : bottom + 1]
+        non_white_mask = np.any(cut < threshold, axis=-1)
+        non_white_cols = np.where(non_white_mask.any(axis=0))[0]
+        left = non_white_cols[0]
+        right = non_white_cols[-1]
+        bounding_boxes.append((left, top, right + 1, bottom + 1))
+    return bounding_boxes
+def process_single_image(task: Tuple[Image.Image, Image.Image, Image.Image, Image.Image, str]):
+    """
+    Worker function to process a single set of images.
+    Opens images, extracts bounding boxes, crops and saves the cut images.
+    """
+    line_path, flat_path, seg_path, color_path, output_str = task
+    output_dir = Path(output_str)
+    try:
+        line_img = Image.open(line_path).convert("RGB")
+        flat_img = Image.open(flat_path).convert("RGB")
+        seg_img = Image.open(seg_path).convert("RGB")
+        color_img = Image.open(color_path).convert("RGB")
+    except Exception as e:
+        logger.error(f"Error opening images for {line_path}: {e}")
+        return
+    bounding_boxes = extract_cutbox_coordinates(line_img)
+    fname = line_path.stem
+    match = re.compile(r"\d+_\d+").match(fname)
+    if not match:
+        logger.warning(f"Filename pattern not matched for {line_path.name}")
+        return
+    ep_page_str = match.group(0)
+    for i, (left, top, right, bottom) in enumerate(bounding_boxes):
+        try:
+            cut_line = line_img.crop((left, top, right, bottom))
+            cut_flat = flat_img.crop((left, top, right, bottom))
+            cut_seg = seg_img.crop((left, top, right, bottom))
+            cut_color = color_img.crop((left, top, right, bottom))
+            cut_line.save(output_dir / "line" / f"{ep_page_str}_{i}_line.png")
+            cut_flat.save(output_dir / "flat" / f"{ep_page_str}_{i}_flat.png")
+            cut_seg.save(output_dir / "segmentation" / f"{ep_page_str}_{i}_segmentation.png")
+            cut_color.save(output_dir / "fullcolor" / f"{ep_page_str}_{i}_fullcolor.png")
+        except Exception as e:
+            logger.error(f"Error processing crop for {line_path.name} at box {i}: {e}")
+def main(args):
+    # Prepare output directory
+    output_dir = Path(args.output)
+    if not output_dir.exists():
+        output_dir.mkdir(parents=True, exist_ok=True)
+        (output_dir / "line").mkdir(parents=True, exist_ok=True)
+        (output_dir / "flat").mkdir(parents=True, exist_ok=True)
+        (output_dir / "segmentation").mkdir(parents=True, exist_ok=True)
+        (output_dir / "fullcolor").mkdir(parents=True, exist_ok=True)
+    # Prepare input images
+    lineart_list = get_image_list(args.lineart, ["png", "jpg", "jpeg"])
+    flat_list = get_image_list(args.flat, ["png", "jpg", "jpeg"])
+    segmentation_list = get_image_list(args.segmentation, ["png", "jpg", "jpeg"])
+    color_list = get_image_list(args.color, ["png", "jpg", "jpeg"])
+    # Check image pair validity
+    lineart_list, flat_list, segmentation_list, color_list = check_image_pair_validity(
+        lineart_list, flat_list, segmentation_list, color_list
+    )
+    # Prepare tasks for multiprocessing
+    tasks = []
+    for l, f, s, c in zip(lineart_list, flat_list, segmentation_list, color_list):
+        tasks.append((l, f, s, c, str(output_dir)))
+    # Use multiprocessing to process images in parallel
+    num_processes = args.num_process if args.num_process else multiprocessing.cpu_count() // 2
+    with multiprocessing.Pool(processes=num_processes) as pool:
+        list(tqdm(pool.imap_unordered(process_single_image, tasks), total=len(tasks), desc="Processing images"))
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)

scripts/run_tag_filter.py ADDED Viewed

	@@ -0,0 +1,147 @@

+import argparse
+import logging
+import os
+import shutil
+import sys
+from pathlib import Path, PurePath
+from typing import List
+from PIL import Image
+current_file_path = Path(__file__).resolve()
+sys.path.insert(0, str(current_file_path.parent.parent))
+from src.detectors import AnimeDetector
+from src.pipelines import TagAndFilteringPipeline
+from src.taggers import WaifuDiffusionTagger
+from src.utils.device import determine_accelerator
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=logging.INFO)
+def parse_args():
+    parser = argparse.ArgumentParser(description="Filtering script")
+    parser.add_argument(
+        "--input_dir",
+        "-i",
+        type=str,
+        required=True,
+        help="Directory containing images to filter",
+    )
+    parser.add_argument(
+        "--output_dir",
+        "-o",
+        type=str,
+        required=True,
+        help="Directory to save filtered images",
+    )
+    parser.add_argument(
+        "--ext",
+        "-e",
+        type=str,
+        nargs="+",
+        default=["png"],
+        help="File extension of images to filter (default: png)",
+    )
+    return parser.parse_args()
+def get_image_list(input_dir: str, ext: List[str]):
+    """
+    Get a list of images from the input directory with the specified extensions.
+    Args:
+        input_dir (str): Directory containing images to filter.
+        ext (list): List of file extensions to filter by.
+    Returns:
+        list: List of image file paths.
+    """
+    image_list = []
+    for ext in ext:
+        image_list.extend(Path(input_dir).glob(f"*.{ext}"))
+    return image_list
+def write_image_caption_file(image_list: List[PurePath], captions: List[str], output_dir: str):
+    """
+    Writes a caption file
+    This function generates a text file named "captions.txt" in the specified output directory.
+    Each line in the file contains the image name (without extension) followed by its caption,
+    separated by a colon.
+    Args:
+        image_list (List[PurePath]): A list of image file paths. Each path should be a PurePath object.
+        captions (List[str]): A list of captions corresponding to the images in `image_list`.
+        output_dir (str): The directory where the "captions.txt" file will be created.
+    Example:
+        image_list = [PurePath("image1.jpg"), PurePath("image2.jpg")]
+        captions = ["A beautiful sunset.", "A serene mountain view."]
+        output_dir = "/path/to/output"
+        write_image_caption_file(image_list, captions, output_dir)
+    """
+    caption_file = Path(output_dir) / "captions.txt"
+    lines = []
+    for img_path, caption in zip(image_list, captions):
+        img_name = img_path.stem
+        line = f"{img_name}: {caption}\n"
+        lines.append(line)
+    with open(caption_file, "w") as f:
+        f.writelines(lines)
+def main(args):
+    os.makedirs(args.output_dir, exist_ok=True)
+    # 1. Initialize the filtering pipeline
+    device = determine_accelerator()
+    logger.info(f"Using device: {device}")
+    logger.info("Initializing filtering pipeline...")
+    detector = AnimeDetector(
+        repo_id="deepghs/anime_face_detection",
+        model_name="face_detect_v1.4_s",
+        hf_token=None,
+    )
+    tagger = WaifuDiffusionTagger(device=device)
+    filtering_pipeline = TagAndFilteringPipeline(tagger=tagger, detector=detector)
+    # 2. Load images from the input directory
+    logger.info(f"Loading images from {args.input_dir}...")
+    image_list = get_image_list(args.input_dir, args.ext)
+    images = [Image.open(img_path).convert("RGB") for img_path in image_list]
+    logger.info(f"Found {len(images)} images.")
+    # 3. Filter images using the filtering pipeline
+    logger.info(f"Filtering images...")
+    filter_output = filtering_pipeline(images, batch_size=32, tag_threshold=0.3, conf_threshold=0.3, iou_threshold=0.7)
+    filter_flags = filter_output.filter_flags
+    tags = filter_output.tags
+    captions = [",".join(tag) for tag in tags]
+    logger.info(f"Filtered {sum(filter_flags)} images out of {len(images)}.")
+    # 4. Save filtered images and captions
+    write_image_caption_file(image_list, captions, args.input_dir)  # Write captions to input_dir
+    logger.info(f"Copying filtered images to {args.output_dir}...")
+    filtered_images = [img for img, flag in zip(image_list, filter_flags) if flag]
+    filtered_captions = [caption for caption, flag in zip(captions, filter_flags) if flag]
+    for img_path in filtered_images:
+        img_name = img_path.stem
+        output_path = Path(args.output_dir) / f"{img_name}.png"
+        shutil.copy(img_path, output_path)
+    write_image_caption_file(filtered_images, filtered_captions, args.output_dir)
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)

src/__init__.py ADDED Viewed

File without changes

src/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (200 Bytes). View file

src/detectors/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .imgutils_detector import AnimeDetector
2	+
3	+ __all__ = ["AnimeDetector"]

src/detectors/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (296 Bytes). View file

src/detectors/__pycache__/imgutils_detector.cpython-312.pyc ADDED Viewed

Binary file (7.3 kB). View file

src/detectors/imgutils_detector.py ADDED Viewed

	@@ -0,0 +1,170 @@

+from dataclasses import dataclass, field
+from typing import Generic, Optional, TypeVar
+import cv2
+import imgutils
+import numpy as np
+from imgutils.generic.yolo import (
+    _image_preprocess,
+    _rtdetr_postprocess,
+    _yolo_postprocess,
+    rgb_encode,
+)
+from PIL import Image, ImageDraw
+T = TypeVar("T", int, float)
+REPO_IDS = {
+    "head": "deepghs/anime_head_detection",
+    "face": "deepghs/anime_face_detection",
+    "eye": "deepghs/anime_eye_detection",
+}
+@dataclass
+class DetectorOutput(Generic[T]):
+    bboxes: list[list[T]] = field(default_factory=list)
+    masks: list[Image.Image] = field(default_factory=list)
+    confidences: list[float] = field(default_factory=list)
+    previews: Optional[Image.Image] = None
+class AnimeDetector:
+    """
+    A class used to perform object detection on anime images.
+    Please refer to the `imgutils` documentation for more information on the available models.
+    """
+    def __init__(self, repo_id: str, model_name: str, hf_token: Optional[str] = None):
+        model_manager = imgutils.generic.yolo._open_models_for_repo_id(
+            repo_id, hf_token=hf_token
+        )
+        model, max_infer_size, labels = model_manager._open_model(model_name)
+        self.model = model
+        self.max_infer_size = max_infer_size
+        self.labels = labels
+        self.model_type = model_manager._get_model_type(model_name)
+    def __call__(
+        self,
+        image: Image.Image,
+        conf_threshold: float = 0.3,
+        iou_threshold: float = 0.7,
+        allow_dynamic: bool = False,
+    ) -> DetectorOutput[float]:
+        """
+        Perform object detection on the given image.
+        Args:
+            image (Image.Image): The input image on which to perform detection.
+            conf_threshold (float, optional): Confidence threshold for detection. Defaults to 0.3.
+            iou_threshold (float, optional): Intersection over Union (IoU) threshold for detection. Defaults to 0.7.
+            allow_dynamic (bool, optional): Whether to allow dynamic resizing of the image. Defaults to False.
+        Returns:
+            DetectorOutput[float]: The detection results, including bounding boxes, masks, confidences, and a preview image.
+        Raises:
+            ValueError: If the model type is unknown.
+        """
+        # Preprocessing
+        new_image, old_size, new_size = _image_preprocess(
+            image, self.max_infer_size, allow_dynamic=allow_dynamic
+        )
+        data = rgb_encode(new_image)[None, ...]
+        # Start detection
+        (output,) = self.model.run(["output0"], {"images": data})
+        # Postprocessing
+        if self.model_type == "yolo":
+            output = _yolo_postprocess(
+                output=output[0],
+                conf_threshold=conf_threshold,
+                iou_threshold=iou_threshold,
+                old_size=old_size,
+                new_size=new_size,
+                labels=self.labels,
+            )
+        elif self.model_type == "rtdetr":
+            output = _rtdetr_postprocess(
+                output=output[0],
+                conf_threshold=conf_threshold,
+                iou_threshold=iou_threshold,
+                old_size=old_size,
+                new_size=new_size,
+                labels=self.labels,
+            )
+        else:
+            raise ValueError(
+                f"Unknown object detection model type - {self.model_type!r}."
+            )  # pragma: no cover
+        if len(output) == 0:
+            return DetectorOutput()
+        bboxes = [x[0] for x in output]  # [x0, y0, x1, y1]
+        masks = create_mask_from_bbox(bboxes, image.size)
+        confidences = [x[2] for x in output]
+        # Create a preview image
+        previews = []
+        for mask in masks:
+            np_image = np.array(image)
+            np_mask = np.array(mask)
+            preview = cv2.bitwise_and(
+                np_image, cv2.cvtColor(np_mask, cv2.COLOR_GRAY2BGR)
+            )
+            preview = Image.fromarray(preview)
+            previews.append(preview)
+        return DetectorOutput(
+            bboxes=bboxes, masks=masks, confidences=confidences, previews=previews
+        )
+def create_mask_from_bbox(
+    bboxes: list[list[float]], shape: tuple[int, int]
+) -> list[Image.Image]:
+    """
+    Creates a list of binary masks from bounding boxes.
+    Args:
+        bboxes (list[list[float]]): A list of bounding boxes, where each bounding box is represented
+                                    by a list of four float values [x_min, y_min, x_max, y_max].
+        shape (tuple[int, int]): The shape of the mask (height, width).
+    Returns:
+        list[Image.Image]: A list of PIL Image objects representing the binary masks.
+    """
+    masks = []
+    for bbox in bboxes:
+        mask = Image.new("L", shape, 0)
+        mask_draw = ImageDraw.Draw(mask)
+        mask_draw.rectangle(bbox, fill=255)
+        masks.append(mask)
+    return masks
+def create_bbox_from_mask(
+    masks: list[Image.Image], shape: tuple[int, int]
+) -> list[list[int]]:
+    """
+    Create bounding boxes from a list of mask images.
+    Args:
+        masks (list[Image.Image]): A list of PIL Image objects representing the masks.
+        shape (tuple[int, int]): A tuple representing the desired shape (width, height) to resize the masks.
+    Returns:
+        list[list[int]]: A list of bounding boxes, where each bounding box is represented as a list of four integers [left, upper, right, lower].
+    """
+    bboxes = []
+    for mask in masks:
+        mask = mask.resize(shape)
+        bbox = mask.getbbox()
+        if bbox is not None:
+            bboxes.append(list(bbox))
+    return bboxes

src/oskar_crop/__pycache__/detect_and_crop.cpython-312.pyc ADDED Viewed

Binary file (3.37 kB). View file

src/oskar_crop/detect_and_crop.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import numpy as np
+from pathlib import Path
+from typing import List, Tuple
+import logging
+from PIL import Image
+from torchvision.transforms.v2 import ToPILImage
+from scripts.parse_cut_from_page import extract_cutbox_coordinates
+from src.detectors import AnimeDetector
+from src.pipelines import TagAndFilteringPipeline
+from src.taggers import WaifuDiffusionTagger
+from src.utils.device import determine_accelerator
+topil = ToPILImage()
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=logging.INFO)
+# 1. Initialize the filtering pipeline
+device = determine_accelerator()
+logger.info(f"Using device: {device}")
+logger.info("Initializing filtering pipeline...")
+detector = AnimeDetector(
+    repo_id="deepghs/anime_face_detection",
+    model_name="face_detect_v1.4_s",
+    hf_token=None,
+)
+tagger = WaifuDiffusionTagger(device=device)
+filtering_pipeline = TagAndFilteringPipeline(tagger=tagger, detector=detector)
+def process_single_image(lineart_pil_img: Image.Image) -> List[Tuple[int, Tuple[int, int, int, int]]]:
+    """
+    Worker function to process a single set of images.
+    Opens images, extracts bounding boxes, crops and filters the cut images.
+    """
+    try:
+        line_img = lineart_pil_img.convert("RGB")
+    except Exception as e:
+        logger.error(f"Error loading images for {lineart_pil_img}: {e}")
+        return
+    bounding_boxes = extract_cutbox_coordinates(line_img)
+    images = [topil(np.array(line_img)[top:bottom, left:right]) for (left, top, right, bottom) in bounding_boxes]
+    # 3. Filter images using the filtering pipeline
+    logger.info(f"Filtering images...")
+    filter_output = filtering_pipeline(images, batch_size=32, tag_threshold=0.3, conf_threshold=0.3, iou_threshold=0.7)
+    filter_flags = filter_output.filter_flags
+    logger.info(f"Filtered {sum(filter_flags)} images out of {len(images)}.")
+    filtered_bboxes = [bb for bb, flag in zip(bounding_boxes, filter_flags) if flag]
+    index_added_filtered_bboxes = [(i+1, (left, top, right - left, bottom - top)) for i, (left, top, right, bottom) in enumerate(filtered_bboxes)]
+    return index_added_filtered_bboxes

src/pipelines/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .pipeline_single_character_filtering import TagAndFilteringPipeline
2	+
3	+ __all__ = ["TagAndFilteringPipeline"]

src/pipelines/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (324 Bytes). View file

src/pipelines/__pycache__/pipeline_single_character_filtering.cpython-312.pyc ADDED Viewed

Binary file (9.88 kB). View file

src/pipelines/pipeline_single_character_filtering.py ADDED Viewed

	@@ -0,0 +1,175 @@

+import logging
+from dataclasses import dataclass
+from typing import List, Tuple
+from PIL import Image
+from tqdm import tqdm
+from ..detectors.imgutils_detector import AnimeDetector
+from ..taggers import WaifuDiffusionTagger, sort_tags
+from ..utils.timer import ElapsedTimer
+logger = logging.getLogger(__name__)
+@dataclass
+class TagAndFilteringOutput:
+    filter_flags: List[bool]
+    tags: List[str]
+class TagAndFilteringPipeline:
+    """
+    TagAndFilteringPipeline is a pipeline for processing images by tagging them and filtering based on tags and face detection.
+    Attributes:
+        tagger (WaifuDiffusionTagger): An instance of the WaifuDiffusionTagger used for tagging images.
+        detector (AnimeDetector): An instance of the AnimeDetector used for detecting faces in images.
+    """
+    def __init__(self, tagger: WaifuDiffusionTagger, detector: AnimeDetector):
+        self.tagger = tagger
+        self.detector = detector
+    def __call__(self, images: List[Image.Image], *args, **kwargs) -> TagAndFilteringOutput:
+        """
+        Processes a list of images by tagging and filtering them based on tags and face detection.
+        Args:
+            images (List[Image.Image]): A list of images to process.
+            batch_size (int, optional): The batch size for processing images. Default is 32.
+            tag_threshold (float, optional): The threshold for tag confidence. Default is 0.3.
+            sort_mode (str, optional): The mode for sorting tags. Default is "score".
+            include_tags (List[str], optional): Tags to include during filtering. Default is ["solo"].
+            exclude_tags (List[str], optional): Tags to exclude during filtering. Default is ["head_out_of_frame", "out_of_frame"].
+            conf_threshold (float, optional): Confidence threshold for face detection. Default is 0.3.
+            iou_threshold (float, optional): IOU threshold for face detection. Default is 0.7.
+            filter_by_tags (bool, optional): Whether to filter images based on tags. Default is True.
+            filter_by_faces (bool, optional): Whether to filter images based on face detection. Default is True.
+        Returns:
+            FilterOutput: An object containing filter flags and captions for the processed images.
+        """
+        if not isinstance(images, list):
+            images = [images]
+        # Tagging parameters
+        batch_size = kwargs.pop("batch_size", 32)
+        tag_threshold = kwargs.pop("tag_threshold", 0.3)
+        tag_sort_mode = kwargs.pop("sort_mode", "score")
+        include_tags = kwargs.pop("include_tags", ["solo"])
+        exclude_tags = kwargs.pop("exclude_tags", ["head_out_of_frame", "out_of_frame", "chibi", "negative_space"])
+        # Face detection parameters
+        conf_threshold = kwargs.pop("conf_threshold", 0.3)
+        iou_threshold = kwargs.pop("iou_threshold", 0.7)
+        # Etc.
+        minimum_resolution = kwargs.pop("minimum_resolution", 512)
+        filter_flags = [True] * len(images)
+        if kwargs.pop("fiter_by_resolution", True):
+            with ElapsedTimer("Resolution-based Filtering", logger=logger):
+                for idx, image in enumerate(images):
+                    if min(image.size) < minimum_resolution:
+                        filter_flags[idx] = False
+        logger.info(f"Filtered {sum(filter_flags)} images out of {len(images)} based on resolution. ({minimum_resolution}px)")
+        with ElapsedTimer("Tagging", logger=logger):
+            tags = self.tagging(images, threshold=tag_threshold, sort_mode=tag_sort_mode, batch_size=batch_size)
+        if kwargs.pop("filter_by_tags", True):
+            with ElapsedTimer("Tag-based Filtering", logger=logger):
+                filter_flags = self.tag_based_filtering(
+                    tags,
+                    filter_flags,
+                    include_tags=include_tags,
+                    exclude_tags=exclude_tags,
+                )
+        if kwargs.pop("filter_by_faces", True):
+            with ElapsedTimer("Face-based Filtering", logger=logger):
+                filter_flags = self.face_based_filtering(
+                    images,
+                    filter_flags=filter_flags,
+                    conf_threshold=conf_threshold,
+                    iou_threshold=iou_threshold,
+                )
+        return TagAndFilteringOutput(filter_flags=filter_flags, tags=tags)
+    def tagging(
+        self,
+        images,
+        threshold: float = 0.3,
+        sort_mode: str = "score",
+        batch_size: int = 32,
+    ) -> List[List[str]]:
+        """
+        Tags a list of images and returns their captions.
+        Parameters:
+            images (List[Image.Image]): A list of images to tag.
+            threshold (float, optional): The threshold for tag confidence. Default is 0.3.
+            sort_mode (str, optional): The mode for sorting tags. Default is "score".
+            batch_size (int, optional): The batch size for tagging images. Default is 32.
+        Returns:
+            List[str]: A list of captions for the tagged images.
+        """
+        tags = []
+        for i in tqdm(range(0, len(images), batch_size), desc="Tagging"):
+            batch = images[i : i + batch_size]
+            tagger_output = self.tagger(batch, threshold=threshold)
+            tags.extend([sort_tags(tags, mode=sort_mode) for tags in tagger_output])
+        return tags
+    def tag_based_filtering(
+        self,
+        tags: List[List[str]],
+        filter_flags: List[bool],
+        include_tags=["solo"],
+        exclude_tags=["head_out_of_frame", "out_of_frame"],
+    ) -> List[bool]:
+        """
+        Filters images based on their tags.
+        Parameters:
+            tags (List[List[str]]): A list of tags for the images.
+            filter_flags (List[bool]): A list of boolean flags indicating whether each image passes filtering.
+            include_tags (List[str], optional): Tags to include during filtering. Default is ["solo"].
+            exclude_tags (List[str], optional): Tags to exclude during filtering. Default is ["head_out_of_frame", "out_of_frame"].
+        Returns:
+            Tuple[List[bool], List[str]]: Updated filter flags and captions after tag-based filtering.
+        """
+        for idx, tag in tqdm(enumerate(tags), desc="Tag-based Filtering", total=len(tags)):
+            if any(include_tag in tag for include_tag in include_tags) and all(
+                exclude_tag not in tag for exclude_tag in exclude_tags
+            ):
+                filter_flags[idx] = True
+            else:
+                filter_flags[idx] = False
+        return filter_flags
+    def face_based_filtering(
+        self, images: List[Image.Image], filter_flags: List[bool], conf_threshold: float = 0.3, iou_threshold=0.7
+    ) -> List[bool]:
+        """
+        Filters images based on face detection.
+        Parameters:
+            images (List[Image.Image]): A list of images to filter.
+            filter_flags (List[bool]): A list of boolean flags indicating whether each image passes filtering.
+            conf_threshold (float, optional): Confidence threshold for face detection. Default is 0.3.
+            iou_threshold (float, optional): IOU threshold for face detection. Default is 0.7.
+        Returns:
+            List[bool]: Updated filter flags after face-based filtering.
+        """
+        for idx, image in tqdm(enumerate(images), desc="Face-based Filtering", total=len(images)):
+            if not filter_flags[idx]:
+                continue
+            detector_output = self.detector(image, conf_threshold=conf_threshold, iou_threshold=iou_threshold)
+            if len(detector_output.bboxes) != 1:
+                filter_flags[idx] = False
+            else:
+                filter_flags[idx] = True
+        return filter_flags

src/taggers/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .order import sort_tags
+from .tagger import WaifuDiffusionTagger
+__all__ = ["WaifuDiffusionTagger", "sort_tags"]

src/taggers/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (339 Bytes). View file

src/taggers/__pycache__/order.cpython-312.pyc ADDED Viewed

Binary file (3.78 kB). View file

src/taggers/__pycache__/tagger.cpython-312.pyc ADDED Viewed

Binary file (12.2 kB). View file

src/taggers/filter.py ADDED Viewed

	@@ -0,0 +1,113 @@

+from typing import Dict, Tuple
+TAG_GROUP = {
+    "frame": [
+        "portrait",
+        "upper_body",
+        "lower_body",
+        "cowboy_shot",
+        "feet_out_of_frame",
+        "full_body",
+        "wide_shot",
+        "very_wide_shot",
+    ],
+    "frame_2": [
+        "close-up",
+        "cut-in",
+        "cropped",
+    ],
+    "view_angle": [
+        "dutch_angle",
+        "from_above",
+        "from_behind",
+        "from_below",
+        "from_side",
+        # "multiple_views",
+        "sideways",
+        "straight-on",
+        "three_quarter_view",
+        "upside-down",
+    ],
+    "focus": ["eye_focus"],
+    "lip_action": [
+        "parted_lips",
+        "biting_own_lip",
+        "pursed_lips",
+        "spread_lips",
+        "open_mouth",
+        "closed_mouth",
+    ],
+    "eye": ["closed_eyes", "one_eye_closed"],
+    "gaze": [
+        "eye_contact",
+        "looking_afar",
+        "looking_around",
+        "looking_at_another",
+        "looking_at_hand",
+        "looking_at_hands",
+        "looking_at_mirror",
+        "looking_at_self",
+        "looking_at_viewer",
+        "looking_away",
+        "looking_back",
+        "looking_down",
+        "looking_outside",
+        "looking_over_eyewear",
+        "looking_through_own_legs",
+        "looking_to_the_side",
+        "looking_up",
+    ],
+    "emotion": [
+        "smile",
+        "angry",
+        "anger_vein",
+        "annoyed",
+        "clenched_teeth",
+        "scowl",
+        "blush",
+        "embarrassed",
+        "bored",
+        "confused",
+        "crazy",
+        "despair",
+        "disappointed",
+        "disgust",
+        "envy",
+        "excited",
+        "exhausted",
+        "expressioinless",
+        "furrowed_brow",
+        "happy",
+        "sad",
+        "depressed",
+        "frown",
+        "tears",
+        "scared",
+        "serious",
+        "sleepy",
+        "surprised",
+        "thinking",
+        "pain",
+    ],
+}
+def parse_valid_tags(
+    input_tags: Dict[str, float], valid_tags=TAG_GROUP
+) -> Dict[str, Tuple[str, float]]:
+    """
+    Parses valid tags from the input tags based on predefined tag groups.
+    Args:
+        input_tags (Dict[str, float]): A dictionary of tags with their confidence scores, sorted by confidence in descending order.
+        valid_tags (dict, optional): A dictionary where keys are tag groups and values are lists of valid tags. Defaults to TAG_GROUP.
+    Returns:
+        dict: A dictionary where keys are tag groups and values are the first valid tag found in the input tags for each group.
+    """
+    output_tags = {}
+    for tag_group, tags in valid_tags.items():
+        for tag in tags:
+            if tag.replace(" ", "_") in input_tags:
+                output_tags[tag_group] = (tag, input_tags[tag])
+                break  # parse only one tag from each tag group and return the tag group and tag
+    return output_tags

src/taggers/order.py ADDED Viewed

	@@ -0,0 +1,85 @@

+# Adopted by https://github.com/deepghs/imgutils/blob/main/imgutils/tagging/order.py
+import random
+import re
+from typing import List, Literal, Mapping, Union
+def sort_tags(
+    tags: Union[List[str], Mapping[str, float]], mode: Literal["original", "shuffle", "score"] = "score"
+) -> List[str]:
+    """
+    Sort the input list or mapping of tags by specified mode.
+    Tags can represent people counts (e.g., '1girl', '2boys'), and 'solo' tags.
+    :param tags: List or mapping of tags to be sorted.
+    :type tags: Union[List[str], Mapping[str, float]]
+    :param mode: The mode for sorting the tags. Options: 'original' (original order),
+                 'shuffle' (random shuffle), 'score' (sorted by score if available).
+    :type mode: Literal['original', 'shuffle', 'score']
+    :return: Sorted list of tags based on the specified mode.
+    :rtype: List[str]
+    :raises ValueError: If an unknown sort mode is provided.
+    :raises TypeError: If the input tags are of unsupported type or if mode is 'score'
+                       and the input is a list (as it does not have scores).
+    Examples:
+        Sorting tags in original order:
+        >>> from imgutils.tagging import sort_tags
+        >>>
+        >>> tags = ['1girls', 'solo', 'red_hair', 'cat ears']
+        >>> sort_tags(tags, mode='original')
+        ['solo', '1girls', 'red_hair', 'cat ears']
+        >>>
+        >>> tags = {'1girls': 0.9, 'solo': 0.95, 'red_hair': 1.0, 'cat_ears': 0.92}
+        >>> sort_tags(tags, mode='original')
+        ['solo', '1girls', 'red_hair', 'cat_ears']
+        Sorting tags by score (for a mapping of tags with scores):
+        >>> from imgutils.tagging import sort_tags
+        >>>
+        >>> tags = {'1girls': 0.9, 'solo': 0.95, 'red_hair': 1.0, 'cat_ears': 0.92}
+        >>> sort_tags(tags)
+        ['solo', '1girls', 'red_hair', 'cat_ears']
+        Shuffling tags (output is not unique)
+        >>> from imgutils.tagging import sort_tags
+        >>>
+        >>> tags = ['1girls', 'solo', 'red_hair', 'cat ears']
+        >>> sort_tags(tags, mode='shuffle')
+        ['solo', '1girls', 'red_hair', 'cat ears']
+        >>>
+        >>> tags = {'1girls': 0.9, 'solo': 0.95, 'red_hair': 1.0, 'cat_ears': 0.92}
+        >>> sort_tags(tags, mode='shuffle')
+        ['solo', '1girls', 'cat_ears', 'red_hair']
+    """
+    if mode not in {"original", "shuffle", "score"}:
+        raise ValueError(f"Unknown sort_mode, 'original', " f"'shuffle' or 'score' expected but {mode!r} found.")
+    npeople_tags = []
+    remaining_tags = []
+    if "solo" in tags:
+        npeople_tags.append("solo")
+    for tag in tags:
+        if tag == "solo":
+            continue
+        if re.fullmatch(r"^\d+\+?(boy|girl)s?$", tag):  # 1girl, 1boy, 2girls, 3boys, 9+girls
+            npeople_tags.append(tag)
+        else:
+            remaining_tags.append(tag)
+    if mode == "score":
+        if isinstance(tags, dict):
+            remaining_tags = sorted(remaining_tags, key=lambda x: -tags[x])
+        else:
+            raise TypeError(f"Sort mode {mode!r} not supported for list, " f"for it do not have scores.")
+    elif mode == "shuffle":
+        random.shuffle(remaining_tags)
+    else:
+        pass
+    return npeople_tags + remaining_tags

src/taggers/tagger.py ADDED Viewed

	@@ -0,0 +1,215 @@

+import logging
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, List, Literal, Optional, Union
+import numpy as np
+import pandas as pd
+import timm
+import torch
+import torch.nn.functional as F
+from huggingface_hub import hf_hub_download
+from huggingface_hub.utils import HfHubHTTPError
+from PIL import Image
+from timm.data import create_transform, resolve_data_config
+logger = logging.getLogger(__name__)
+MODEL_REPO_MAP = {
+    "vit": "SmilingWolf/wd-vit-tagger-v3",
+    "swinv2": "SmilingWolf/wd-swinv2-tagger-v3",
+    "convnext": "SmilingWolf/wd-convnext-tagger-v3",
+    "eva-02": "SmilingWolf/wd-eva02-large-tagger-v3",
+}
+@dataclass
+class LabelData:
+    names: List[str]
+    rating: List[np.int64]
+    general: List[np.int64]
+    character: List[np.int64]
+def pil_ensure_rgb(image: Image.Image) -> Image.Image:
+    # convert to RGB/RGBA if not already (deals with palette images etc.)
+    if image.mode not in ["RGB", "RGBA"]:
+        image = image.convert("RGBA") if "transparency" in image.info else image.convert("RGB")
+    # convert RGBA to RGB with white background
+    if image.mode == "RGBA":
+        canvas = Image.new("RGBA", image.size, (255, 255, 255))
+        canvas.alpha_composite(image)
+        image = canvas.convert("RGB")
+    return image
+def pil_pad_square(image: Image.Image) -> Image.Image:
+    w, h = image.size
+    # get the largest dimension so we can pad to a square
+    px = max(image.size)
+    # pad to square with white background
+    canvas = Image.new("RGB", (px, px), (255, 255, 255))
+    canvas.paste(image, ((px - w) // 2, (px - h) // 2))
+    return canvas
+class WaifuDiffusionTagger:
+    def __init__(
+        self,
+        model_name: Literal["vit", "swinv2", "convnext", "eva-02"] = "eva-02",
+        device: str = "cpu",
+    ):
+        if model_name not in MODEL_REPO_MAP.keys():
+            raise ValueError(f"Model {model_name} not found. Available models: {MODEL_REPO_MAP.keys()}")
+        repo_id = MODEL_REPO_MAP[model_name]
+        self.init_model(repo_id, device)
+        self.transform = create_transform(**resolve_data_config(self.model.pretrained_cfg, model=self.model))
+        self.labels = self.load_labels_from_hf(repo_id)
+    def init_model(self, repo_id: str, device: str = "cpu"):
+        logger.info(f"Loading taggingmodel from {repo_id}")
+        self.model = timm.create_model("hf-hub:" + repo_id, pretrained=True)
+        state_dict = timm.models.load_state_dict_from_hf(repo_id)
+        self.model.load_state_dict(state_dict)
+        self.model.to(device).eval()
+    def load_labels_from_hf(self, repo_id: str, revision: Optional[str] = None, token: Optional[str] = None):
+        try:
+            csv_path = hf_hub_download(repo_id, filename="selected_tags.csv", revision=revision, token=token)
+            csv_path = Path(csv_path).resolve()
+        except HfHubHTTPError as e:
+            raise FileNotFoundError(f"Failed to download labels from {repo_id}") from e
+        df = pd.read_csv(csv_path, usecols=["name", "category"])
+        tag_data = LabelData(
+            names=df["name"],
+            rating=np.where(df["category"] == 9)[0],
+            general=np.where(df["category"] == 0)[0],
+            character=np.where(df["category"] == 4)[0],
+        )
+        return tag_data
+    def prepare_inputs(self, images: List[Image.Image]):
+        inputs = []
+        for image in images:
+            image = pil_ensure_rgb(image)
+            image = pil_pad_square(image)
+            inputs += [self.transform(image)]
+        inputs = torch.stack(inputs, dim=0)
+        inputs = inputs[:, [2, 1, 0]]  # RGB to BGR
+        return inputs.to(self.device, dtype=self.dtype)
+    def get_tags(self, probs: torch.Tensor, gen_threshold: float) -> List[Dict[str, float]]:
+        """
+        Generate tags based on prediction probabilities and a confidence threshold.
+        Args:
+            probs (torch.Tensor): A tensor of shape [B, num_labels] containing
+                prediction probabilities for each label, where B is the batch size.
+            gen_threshold (float): The confidence threshold for selecting labels.
+                Only labels with probabilities greater than this threshold will be included.
+        Returns:
+            List[Dict[str, float]]: A list of dictionaries, where each dictionary
+                corresponds to a batch element and contains label names as keys and
+                their associated probabilities as values. The labels are sorted in
+                descending order of probability.
+        """
+        # probs: [B, num_labels]
+        gen_labels = []
+        for prob in probs:
+            # Convert indices+probs to labels
+            prob = list(zip(self.labels.names, prob.cpu().numpy()))
+            # General labels, pick any where prediction confidence > threshold
+            gen_label = [prob[i] for i in self.labels.general]
+            gen_label = dict([x for x in gen_label if x[1] > gen_threshold])
+            gen_label = dict(sorted(gen_label.items(), key=lambda item: item[1], reverse=True))
+            gen_labels += [gen_label]
+        return gen_labels
+    @torch.inference_mode()
+    def __call__(
+        self,
+        images: Union[Image.Image, List[Image.Image]],
+        threshold: float = 0.3,
+    ):
+        """
+        Processes input images through the model and returns predicted labels based on a threshold.
+        Args:
+            images (Union[Image.Image, List[Image.Image]]): A single image or a list of images to be processed.
+            threshold (float, optional): The threshold value for determining labels. Defaults to 0.3.
+        Returns:
+            List[List[str]]: A list of lists containing predicted labels for each input image.
+        """
+        if not isinstance(images, list):
+            images = [images]
+        inputs = self.prepare_inputs(images)
+        outputs = self.model(inputs)
+        outputs = F.sigmoid(outputs)
+        labels = self.get_tags(outputs, threshold)
+        return labels
+    @torch.inference_mode()
+    def get_image_features(self, images: Union[Image.Image, List[Image.Image]], global_pool: bool = True):
+        """
+        Extracts features from one or more images using the model.
+        Args:
+            images (Union[Image.Image, List[Image.Image]]): A single PIL Image or a list of PIL Images
+                from which features are to be extracted.
+            global_pool (bool, optional): If True, applies global pooling to the extracted features
+                by averaging across all spatial dimensions. If False, returns only the features
+                corresponding to the first token. Defaults to True.
+        Returns:
+            torch.Tensor: A tensor containing the extracted features. If `global_pool` is True,
+            the features are averaged across spatial dimensions. Otherwise, the features
+            corresponding to the first token are returned.
+        """
+        if not isinstance(images, list):
+            images = [images]
+        inputs = self.prepare_inputs(images)
+        features = self.model.forward_features(inputs)
+        if global_pool:
+            return features[:, self.model.num_prefix_tokens :].mean(dim=1)
+        else:
+            return features[:, 0]
+    @property
+    def device(self):
+        return next(self.model.parameters()).device
+    @property
+    def dtype(self):
+        return next(self.model.parameters()).dtype
+def show_result_with_confidence(image, tag_result, ax):
+    ax.imshow(image)
+    confidence = [[x] for x in tag_result.values()]
+    rowLabels = list(tag_result[0].keys())
+    ax.table(
+        cellText=confidence,
+        loc="bottom",
+        rowLabels=rowLabels,
+        cellLoc="center",
+        colLabels=["Confidence"],
+    )

src/utils/__pycache__/device.cpython-312.pyc ADDED Viewed

Binary file (659 Bytes). View file

src/utils/__pycache__/timer.cpython-312.pyc ADDED Viewed

Binary file (2.46 kB). View file

src/utils/device.py ADDED Viewed

	@@ -0,0 +1,18 @@

+import torch
+def determine_accelerator():
+    """
+    Determine the accelerator to be used based on the environment.
+    """
+    # Check for CUDA availability
+    if torch.cuda.is_available():
+        return "cuda"
+    # Check for MPS (Metal Performance Shaders) availability on macOS
+    if torch.backends.mps.is_available():
+        return "mps"
+    # Default to CPU if no accelerators are available
+    return "cpu"

src/utils/timer.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import logging
+import time
+def msec2human(ms) -> str:
+    """
+    Converts milliseconds to a human-readable string representation.
+    Args:
+        ms (int): The input number of milliseconds.
+    Returns:
+        str: The formatted string representing the milliseconds in a human-readable format.
+    """
+    s = ms // 1000  # Calculate the number of seconds
+    m = s // 60  # Calculate the number of minutes
+    h = m // 60  # Calculate the number of hours
+    m %= 60  # Get the remaining minutes after calculating hours
+    s %= 60  # Get the remaining seconds after calculating minutes
+    ms %= 1000  # Get the remaining milliseconds after calculating seconds
+    if h:
+        return (
+            f"{h} hour {m:2d} min"  # Return the formatted string with hours and minutes
+        )
+    if m:
+        return f"{m} min {s:2d} sec"  # Return the formatted string with minutes and seconds
+    if s:
+        return f"{s} sec {ms:3d} msec"  # Return the formatted string with seconds and milliseconds
+    return f"{ms} msec"  # Return the formatted string with milliseconds
+class ElapsedTimer:
+    def __init__(self, name, logger=None, unit="ms"):
+        self.name = name
+        self.logger = logger or logging.getLogger(__name__)
+    def __enter__(self):
+        self.start_time = time.perf_counter()
+        self.logger.info(f"<{self.name}>: start")
+        return self
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        elapsed_time = time.perf_counter() - self.start_time
+        elapsed_time = msec2human(int(elapsed_time * 1000))
+        if exc_type:
+            self.logger.warning(f"<{self.name}> raised {exc_type}, {elapsed_time}")
+        else:
+            self.logger.info(f"<{self.name}>: {elapsed_time}")

src/wise_crop/__pycache__/detect_and_crop.cpython-312.pyc ADDED Viewed

Binary file (10.5 kB). View file

src/wise_crop/detect_and_crop.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import cv2
+import numpy as np
+from torchvision.transforms.v2 import ToPILImage
+from PIL import Image
+from transformers import pipeline
+import torch
+from imgutils.detect import detect_heads
+from src.utils.device import determine_accelerator
+topil = ToPILImage()
+# 1. Initialize the filtering pipeline
+device = determine_accelerator()
+print("Loading AI Model...")
+pipe = pipeline(
+    "image-text-to-text",
+    model="google/gemma-3-12b-it",
+    device=device,
+    torch_dtype=torch.bfloat16,
+)
+def crop_and_mask_characters_gradio(pil_img):
+    """
+    Crops character regions from an image, saves them as separate files,
+    and generates binary masks for each cropped region using the Gemini 2.0 Flash Exp model.
+    Args:
+        uploaded_file_obj (str): The path to the input image.
+    """
+    img = np.array(pil_img)
+    # Convert the image to grayscale
+    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+    # Apply thresholding to create a binary image
+    _, thresh = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY_INV)
+    # Find contours in the binary image
+    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+    # Create output directories if they don't exist
+    # cropped_dir = Path(temp_dir) / 'cropped_dir'
+    # masks_dir = cropped_dir
+    # os.makedirs(cropped_dir, exist_ok=True)
+    # os.makedirs(masks_dir, exist_ok=True)
+    coord_info_list = []
+    i = 0
+    # Iterate through the contours and crop the regions
+    for contour in contours:
+        # Get the bounding box of the contour
+        x, y, w, h = cv2.boundingRect(contour)
+        if w < 256 or h < 256:  # Skip small contours
+            continue
+        # Crop the region
+        cropped_img = img[y:y+h, x:x+w]
+        messages = [
+            {
+                "role": "system",
+                "content": [{"type": "text", "text": "You are a helpful assistant."}]
+            },
+            {
+                "role": "user",
+                "content": [
+                    {"type": "image", "image": topil(cropped_img)},
+                    {"type": "text", "text": "You are given a black-and-white line drawing as input. Please analyze the image carefully. If the drawing contains the majority of a head or face—meaning most key facial features or the overall shape of the head are visible—respond with 'True'. Otherwise, respond with 'False'. Do not contain any punctuation or extra spaces in your answer. Just respond with 'True' or 'False'"}
+                ]
+            }
+        ]
+        result = detect_heads(topil(cropped_img))
+        if len(result) == 0:
+            continue
+        output = pipe(text=messages, max_new_tokens=200)
+        if output[0]["generated_text"][-1]["content"] == 'False':
+            # print(f"Skipping character {i+1} as it does not contain a head or face.")
+            continue
+        i += 1
+        # Append the coordinates to the list
+        coord_info_list.append((i,(x,y,w,h)))
+    return coord_info_list