Marcin Kluczek's picture
6 31

Marcin Kluczek

mkluczek

AI & ML interests

Earth Observation (EO), Remote sensing

Recent Activity

published a dataset 5 days ago
Major-TOM/Core-S2L1C-DeCUR
updated a dataset 5 days ago
Major-TOM/Core-S2L1C-DeCUR
View all activity

Organizations

ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture Multi🤖Transformers's profile picture Major TOM's profile picture MLX Community's profile picture CloudFerro S.A.'s profile picture Hugging Face Discord Community's profile picture

Posts 2

view post
Post
230
Expansion of Global and Dense Open Embeddings Dataset of Earth 🌍

We updated our previous embeddings release with three models MMEarth and DeCUR-S2, DeCUR-S1 of the Major TOM embeddings dataset, developed in collaboration with CloudFerro S.A. asterisk labs and Φ-lab, European Space Agency - ESA. Together with @mikonvergence , Jędrzej S. Bojanowski, we extend the open-access collection of open dataset of Copernicus embeddings built at global scale, providing dense coverage across the entire acquisition area of Sentinel-1 and Sentinel-2 sensors.

Total embedding resources after the update:
- 51 TB of AI-embeddings generated from processed Sentinel data,
- over 40 billion embedding vectors,
- processing of 147 TB of raw satellite data,
- analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.

This project delivers open and free vectorized expansions of Major TOM datasets available on CREODIAS and Hugging Face, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

Datasets:
Major-TOM/Core-S2L2A-MMEarth
Major-TOM/Core-S2L1C-DeCUR
Major-TOM/Core-S1RTC-DeCUR


#EarthObservation #AI #CloudFerro #asterisklabs #ESA
view post
Post
1770
First Global and Dense Open Embedding Dataset of Earth! 🌍 🤗

Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. 🔶 and Φ-lab at the European Space Agency (ESA) 🛰️. Together with @mikonvergence and Jędrzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.

💡 Highlights:
📊 Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
🧠 Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
📦 Scale: 62 TB of raw satellite data processed into 170M+ embeddings.

This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

🤗 Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP

📖 Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
💻 Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb

models 0

None public yet

datasets 0

None public yet