Papers
arxiv:2412.16473

"ScatSpotter" 2024 -- A Distributed Dog Poop Detection Dataset

Published on Dec 21, 2024
Authors:

Abstract

We introduce a new -- currently 42 gigabyte -- ``living'' dataset of phone images of dog feces, annotated with manually drawn or AI-assisted polygon labels. There are 6k full resolution images and 4k detailed polygon annotations. The collection and annotation of images started in late 2020 and the dataset grows by roughly 1GB a month. We train VIT and MaskRCNN baseline models to explore the difficulty of the dataset. The best model achieves a pixelwise average precision of 0.858 on a 691-image validation set and 0.847 on a small independently captured 30-image contributor test set. The most recent snapshot of dataset is made publicly available through three different distribution methods: one centralized (Girder) and two decentralized (IPFS and BitTorrent). We study of the trade-offs between distribution methods and discuss the feasibility of each with respect to reliably sharing open scientific data. The code to reproduce the experiments is hosted on GitHub, and the data is published under the Creative Commons Attribution 4.0 International license. Model weights are made publicly available with the dataset. Experimental hardware, time, energy, and emissions are quantified.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.16473 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.16473 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.