Commit
·
726b00a
1
Parent(s):
c6d1dbd
Update ReadMe
Browse files- README.md +39 -28
- assets/Logo IBM.jpg +0 -0
- assets/Logo IBM.png +0 -0
- assets/Logo_FZ_Juelich.jpg +0 -0
- assets/NASA_Worm_logo.png +0 -0
- assets/modal_architecture.jpg +0 -0
README.md
CHANGED
@@ -2,56 +2,67 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
|
6 |
-
Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function. The model includes spatial attention across multiple patches and also temporal attention for each patch.
|
7 |
|
8 |
-
|
9 |
|
10 |
-
|
11 |
|
12 |
-
|
13 |
-
The model was pre-trained with NASA's HLS V2 L30 product (30m granularity) from the contiguous United States. The bands that were used are the following:
|
14 |
|
15 |
-
|
16 |
-
2. Green
|
17 |
-
3. Red
|
18 |
-
4. Narrow NIR
|
19 |
-
5. SWIR 1
|
20 |
-
6. SWIR 2
|
21 |
|
22 |
-
|
23 |
-
The model follows the [original MAE repo](https://github.com/facebookresearch/mae) with some modifications including:
|
24 |
|
25 |
-
|
26 |
-
2. replace 2D positional embed with 3D positional embed;
|
27 |
-
3. replace 2D patchify and unpatchify with 3D.
|
28 |
-
4. adding infrared bands besides RGB
|
29 |
|
30 |
-
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
```
|
34 |
-
python
|
35 |
```
|
36 |
|
37 |
-
|
38 |
|
39 |
-
|
40 |
-
Examples of finetuning the model for image segmentation using the mmsegmentation library are available through Hugging Face (e.g. [burn scars segmentation](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-burn-scar), [flood mapping](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-sen1floods11), and [multi temporal crop classification](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-multi-temporal-crop-classification)), with the code used for the experiments available on [github](https://github.com/NASA-IMPACT/hls-foundation-os/tree/main/fine-tuning-examples). This also contains instructions to finetune the model for flood detection on the popular open access [sen1floods11 dataset](https://github.com/cloudtostreet/Sen1Floods11).
|
41 |
|
42 |
### Feedback
|
43 |
|
44 |
-
Your feedback is invaluable to us. If you have any feedback about the model, please feel free to share it with us. You can do this by
|
45 |
|
46 |
### Citation
|
47 |
|
48 |
-
If this model helped your research, please cite `Prithvi-
|
49 |
|
50 |
```
|
51 |
-
@article{Prithvi-2-preprint,
|
52 |
author = {},
|
53 |
title = {{Title}},
|
54 |
-
journal = {
|
55 |
year = {2024}
|
56 |
}
|
57 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# Prithvi-EO-2.0
|
|
|
6 |
|
7 |
+
Prithvi-EO-2.0 is the second generation EO foundation model jointly developed by IBM, NASA, and JSC.
|
8 |
|
9 |
+
## Architecture Overview
|
10 |
|
11 |
+
Prithvi-EO-2.0 is based on the ViT architecture, pre-trained using a masked autoencoder (MAE) approach, with two major modifications as shown in the figure below. First, we introduce a random dropout mechanism that completely removes different bands before the patch embeddings, with the aim of improving the ability of the model to deal with missingness of data. Second, we make modifications to support inputs with temporal and multi-spectral characteristics.
|
|
|
12 |
|
13 |
+

|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
+
Our main modifications to the ViT architecture are the 3D positional embedding and the 3D patch embedding, which are required to deal with spatiotemporal data. We have also included metadata and process metadata about the actual geolocation (e.g. latitude and longitude) and date (i.e. year and day-of-year ranging 1-365). This is done by adding biases that are calculated via 2D sine-cosine positional encoding and added to the 3D positional embeddings and 3D patch embeddings via a learned weighted sum (i.e. the weight given is a parameter learned during pretraining). Since this metadata is often not available, we pretrained Prithvi-EO-2.0 allowing for this to be absent via a dropout.
|
|
|
16 |
|
17 |
+
## Pre-trained Models
|
|
|
|
|
|
|
18 |
|
19 |
+
| Model | Details | Weights |
|
20 |
+
| ------------- | ------------- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
21 |
+
|Prithvi-EO-2.0-300M | Pretrained 300M parameter model | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M) |
|
22 |
+
|Prithvi-EO-2.0-300M-TL | Pretrained 300M parameter model with temporal and location embeddings | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL) |
|
23 |
+
|Prithvi-EO-2.0-600M | Pretrained 600M parameter model | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M) | |
|
24 |
+
|Prithvi-EO-2.0-600M-TL | Pretrained 600M parameter model with temporal and location embeddings | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL) |
|
25 |
+
|
26 |
+
The models are pre-trained with NASA's HLS V2 product (30m granularity) using 4.2M samples with six bands in the following order: Blue, Green, Red, Narrow NIR, SWIR, SWIR 2.
|
27 |
+
|
28 |
+
|
29 |
+
## Demo and inference
|
30 |
+
We provide a **demo** running Prithvi-EO-2.0-300M-TL [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-EO-2.0-Demo).
|
31 |
+
|
32 |
+
There is also an inference script (`inference.py`) that allows to run the image reconstruction on a set of HLS images assumed to be from the same location at different timestamps (see example below). These should be provided in chronological order in geotiff format, including the channels described above (Blue, Green, Red, Narrow NIR, SWIR 1, SWIR 2) in reflectance units.
|
33 |
|
34 |
```
|
35 |
+
python inference.py --data_files t1.tif t2.tif t3.tif t4.tif --output_dir output/ --input_indices <space separated 0-based indices of channels to select from input>
|
36 |
```
|
37 |
|
38 |
+
## Finetuning
|
39 |
|
40 |
+
You can finetune the model using [TerraTorch](https://github.com/IBM/terratorch).
|
|
|
41 |
|
42 |
### Feedback
|
43 |
|
44 |
+
Your feedback is invaluable to us. If you have any feedback about the model, please feel free to share it with us. You can do this by starting a discussion in this HF repository or submitting an issue to [TerraTorch](https://github.com/IBM/terratorch) on GitHub.
|
45 |
|
46 |
### Citation
|
47 |
|
48 |
+
If this model helped your research, please cite `Prithvi-EO-2.0` in your publications. Here are two BibTeX entries as examples:
|
49 |
|
50 |
```
|
51 |
+
@article{Prithvi-EO-2-preprint,
|
52 |
author = {},
|
53 |
title = {{Title}},
|
54 |
+
journal = {arxiv},
|
55 |
year = {2024}
|
56 |
}
|
57 |
```
|
58 |
+
|
59 |
+
### Partners
|
60 |
+
|
61 |
+
<p align="center" float="left">
|
62 |
+
<img src="/assets/Logo IBM.png" height="50" />
|
63 |
+
|
64 |
+
<img src="/assets/NASA_Worm_logo.png" height="50" />
|
65 |
+
|
66 |
+
<img src="/assets/Logo_FZ_Juelich.jpg" height="50" />
|
67 |
+
</p>
|
68 |
+
|
assets/Logo IBM.jpg
ADDED
![]() |
assets/Logo IBM.png
ADDED
![]() |
assets/Logo_FZ_Juelich.jpg
ADDED
![]() |
assets/NASA_Worm_logo.png
ADDED
![]() |
assets/modal_architecture.jpg
ADDED
![]() |