How to access EODATA using boto3 on CODE-DE

In this article you will learn how to access Earth observation data repository

  • using Python library called boto3,

  • running on Linux or Windows virtual machine

  • within EO-Lab cloud.

What Are We Going To Cover

  • The S3 protocol

  • Installing boto3

  • How to execute scripts found in this article

  • Browsing Earth observation data

  • Downloading a single file from Earth observation data repository

Prerequisites

No. 1 Account

You need a EO-Lab hosting account with access to the Horizon interface: https://cloud.fra1-1.cloudferro.com/auth/login/?next=/.

No. 2 A virtual machine

You need a virtual machine running on EO-Lab cloud. This article is written for Ubuntu 22.04 and for Windows Server 2022.

You can create a Linux virtual machine by following one of these articles:

Other operating systems might also work, but they are outside of scope of this article and might require adjusting of commands provided here.

No. 3 Python

You need Python installed on your virtual machine.

If you are using Linux, it is likely that Python is already installed. To verify, execute:

which python3

If the output contains the path to Python like /usr/bin/python3, you should be good to go.

If you want to install Python, or virtualenvwrapper which allows you to create an environment with its own set of packages, see How to install Python virtualenv or virtualenvwrapper on EO-Lab

And on Windows, you can follow this article: How to install Python in Windows on EO-Lab

No. 4 Basic knowledge about Python

boto3 is a Python library so you have to know your way around Python.

The S3 protocol

Earth observation data repository contains satellite products, available as an object storage container in S3 standard. These data originate from satellite data and you can access them but not rearrange, delete, rename them, and so on.

Apart from that, EO-Lab cloud allows its users to create object storage containers to store their own files. These containers use the same S3 standard as the Earth observation data repository, however, users can modify data within containers they created.

EO-Lab cloud enables you to access both services through S3 compliant software (one example of which is boto3 library which we are going to use in this article). In particular, to access data in S3 format, you must have an access key and a secret key. Since it is the same standard, you will have to supply the same type of credentials for both, but each of these services, however, uses its own key pair. In Tenant Manager, your dashboard for EO-Lab, there are two options:

S3 Credentials

to access object storage containers (not covered in this article)

S3 VM Credentials

to access Earth observation data repository

The credentials in those two options are separate and not interchangeable. Even if you already have a key pair used to access object storage containers, to access the Earth observation data repository you will still need to obtain a different key pair.

Obtaining the access and secret keys

On EO-Lab cloud, S3 VM Credentials are created automatically, approximately one minute after the basic VM has been created. You should be able to see their presence in Tenant Manager:

../_images/typical_s3_vm_credentials.png

However, to actually access them, use contents of file /etc/passwd-s3fs in your virtual machine. Viewing contents of that file requires sudo privileges:

sudo cat /etc/passwd-s3fs

The output of this command should give you your access and your secret keys, divided by colon. For example, if 1234 is your access key and 4321 is your secret key, this is what the contents of that file will look like:

1234:4321

The coding examples in this article contain rows like this:

import boto3

access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'

Be sure to replace YOUR_ACCESS_KEY and YOUR_SECRET_KEY with the proper values from file /etc/passwd-s3fs.

Installing boto3

Follow appropriate procedures on installing boto3:

If you are using Python environment like virtualenv, enter the environment in which you wish to install boto3. In it, execute the following command:

pip3 install boto3

You can also install the package globally:

sudo apt install python3-boto3

How to execute scripts found in this article

To execute the scripts from this article, copy relevant code into a file, using a text editor of your choice and then edit the variables to your liking. Save and navigate to the directory where the .py file is located and execute it:

Editor: nano or vim

File extension: .py

Environment to execute with: Terminal

Command to execute:

python3 browse.py

Browsing Earth observation data

Here is how to use boto3 to browse Earth observation data repository. Create file eodictionary.py and enter the following code:

eodictionary.py

import boto3

access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'

directory='Sentinel-5P/TROPOMI/L1B/2018/05/30/'

host='http://data.fra1-1.cloudferro.com'
container='CODEDE'

s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)

for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
    print(i['Prefix'])

Explanation of variables

Variable directory denotes the data repository that you want to explore. When defining it, obey the following rules:

  • Use slashes / as separators between directories

  • Do not start the path with a slash /

  • Since the element you are exploring is a directory, finish the path with a slash /

  • Start path with folder name found within the root directory of the Earth observation data repository (for example Sentinel-2 or Sentinel-5P)

For root directory of Earth observation data, assign an empty string:

directory=''

Variables host and container contain the Earth observation data endpoint and the name of the container used, respectively. You do not need to modify them.

Execute with

python3 eodictionary.py

This code will list the products found in Sentinel-5P/TROPOMI/L1B/2018/05/30 directory. The output will present path to a file or directory from the product, each path in its own row:

../_images/eodictionary_code-de.png

There are 67 lines of output in this case and the image above shows only the part of it.

Downloading single file from Earth observation data repository

The script downloadfile.py downloads file to a directory from which it is executed. If a file of that name already exists, it will be overwritten without prompt for confirmation.

downloadfile.py

import boto3

access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Sentinel-5P/TROPOMI/L1B/2018/05/30/S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150/S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150.cdl'
host='http://data.fra1-1.cloudferro.com'
container='CODEDE'

s3=boto3.resource('s3',aws_access_key_id=access_key,
aws_secret_access_key=secret_key, endpoint_url=host,)

bucket=s3.Bucket(container)

filename=key.split("/")[-1]

bucket.download_file(key, filename)

Explanation of variables

Variable directory denotes is the full path (including folders) of a file you want to download from Earth observation data repository. When defining it, obey the following rules:

  • Use slashes / as separators between directories and files

  • Do not start or finish the path with slash /

  • Start path with the name of the folder found within the root directory of the Earth observation data repository (for example Sentinel-2 or Sentinel-5P)

Again, variables host and container contain the Earth observation data endpoint and the name of the container being used, respectively. You do not need to modify them.

Execute the code with

python3 downloadfile.py

The following file should be downloaded:

S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150.cdl

This file is located within the root directory of the product

S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150

After executing the script, the output should be empty. Regardless, the downloaded file should be visible within the directory from which the script was executed. For example, this is what it will look like on Linux:

../_images/downloadedfile_code-de.png

What To Do Next

boto3 can also be used to access object storage containers from EO-Lab cloud: /eodata/How-to-access-object-storage-from-EO-Lab-using-boto3