How to access EODATA using boto3 on EO-Lab
In this article you will learn how to access Earth observation data repository
using Python library called boto3,
running on Linux or Windows virtual machine
within EO-Lab cloud.
What Are We Going To Cover
The S3 protocol
Installing boto3
How to execute scripts found in this article
Browsing Earth observation data
Downloading a single file from Earth observation data repository
Prerequisites
No. 1 Account
You need a EO-Lab hosting account with access to the Horizon interface: https://cloud.fra1-1.cloudferro.com/auth/login/?next=/.
No. 2 A virtual machine
You need a virtual machine running on EO-Lab cloud. This article is written for Ubuntu 22.04 and for Windows Server 2022.
You can create a Linux virtual machine by following one of these articles:
Other operating systems might also work, but they are outside of scope of this article and might require adjusting of commands provided here.
No. 3 Python
You need Python installed on your virtual machine.
If you are using Linux, it is likely that Python is already installed. To verify, execute:
which python3
If the output contains the path to Python like /usr/bin/python3, you should be good to go.
If you want to install Python, or virtualenvwrapper which allows you to create an environment with its own set of packages, see How to install Python virtualenv or virtualenvwrapper on EO-Lab
And on Windows, you can follow this article: How to install Python in Windows on EO-Lab
No. 4 Basic knowledge about Python
boto3 is a Python library so you have to know your way around Python.
The S3 protocol
Earth observation data repository contains satellite products, available as an object storage container in S3 standard. These data originate from satellite data and you can access them but not rearrange, delete, rename them, and so on.
Apart from that, EO-Lab cloud allows its users to create object storage containers to store their own files. These containers use the same S3 standard as the Earth observation data repository, however, users can modify data within containers they created.
EO-Lab cloud enables you to access both services through S3 compliant software (one example of which is boto3 library which we are going to use in this article). In particular, to access data in S3 format, you must have an access key and a secret key. Since it is the same standard, you will have to supply the same type of credentials for both, but each of these services, however, uses its own key pair. In Tenant Manager, your dashboard for EO-Lab, there are two options:
- S3 Credentials
to access object storage containers (not covered in this article)
- S3 VM Credentials
to access Earth observation data repository
The credentials in those two options are separate and not interchangeable. Even if you already have a key pair used to access object storage containers, to access the Earth observation data repository you will still need to obtain a different key pair.
Obtaining the access and secret keys
On EO-Lab cloud, S3 VM Credentials are created automatically, approximately one minute after the basic VM has been created. You should be able to see their presence in Tenant Manager:
However, to actually access them, use contents of file /etc/passwd-s3fs in your virtual machine. Viewing contents of that file requires sudo privileges:
sudo cat /etc/passwd-s3fs
The output of this command should give you your access and your secret keys, divided by colon. For example, if 1234 is your access key and 4321 is your secret key, this is what the contents of that file will look like:
1234:4321
The coding examples in this article contain rows like this:
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
Be sure to replace YOUR_ACCESS_KEY and YOUR_SECRET_KEY with the proper values from file /etc/passwd-s3fs.
Installing boto3
Follow appropriate procedures on installing boto3:
Installing boto3 on Linux
If you are using Python environment like virtualenv, enter the environment in which you wish to install boto3. In it, execute the following command:
pip3 install boto3
You can also install the package globally:
sudo apt install python3-boto3
Installing boto3 on Windows
Follow this article to install boto3 on Windows: How to Install Boto3 in Windows on EO-Lab
How to execute scripts found in this article
To execute the scripts from this article, copy relevant code into a file, using a text editor of your choice and then edit the variables to your liking. Save and navigate to the directory where the .py file is located and execute it:
Executing script on Linux
Editor: nano or vim
File extension: .py
Environment to execute with: Terminal
Command to execute:
python3 browse.py
Executing script on Windows
Editor: Notepad
File extension: .py (but be careful, Notepad often appends .txt extension to files)
Environment to execute with: command prompt, cmd.exe.
Command to execute:
python3 browse.py
Browsing Earth observation data
Here is how to use boto3 to browse Earth observation data repository. Create file eodictionary.py and enter the following code:
eodictionary.py
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Sentinel-5P/TROPOMI/L1B/2018/05/30/'
host='http://data.fra1-1.cloudferro.com'
container='CODEDE'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
print(i['Prefix'])
Explanation of variables
Variable directory denotes the data repository that you want to explore. When defining it, obey the following rules:
Use slashes / as separators between directories
Do not start the path with a slash /
Since the element you are exploring is a directory, finish the path with a slash /
Start path with folder name found within the root directory of the Earth observation data repository (for example Sentinel-2 or Sentinel-5P)
For root directory of Earth observation data, assign an empty string:
directory=''
Variables host and container contain the Earth observation data endpoint and the name of the container used, respectively. You do not need to modify them.
Execute with
python3 eodictionary.py
This code will list the products found in Sentinel-5P/TROPOMI/L1B/2018/05/30 directory. The output will present path to a file or directory from the product, each path in its own row:
There are 67 lines of output in this case and the image above shows only a part of it.
Downloading single file from Earth observation data repository
The script downloadfile.py downloads file to a directory from which it is executed. If a file of that name already exists, it will be overwritten without prompt for confirmation.
downloadfile.py
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Sentinel-5P/TROPOMI/L1B/2018/05/30/S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150/S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150.cdl'
host='http://data.fra1-1.cloudferro.com'
container='CODEDE'
s3=boto3.resource('s3',aws_access_key_id=access_key,
aws_secret_access_key=secret_key, endpoint_url=host,)
bucket=s3.Bucket(container)
filename=key.split("/")[-1]
bucket.download_file(key, filename)
Explanation of variables
Variable directory denotes is the full path (including folders) of a file you want to download from Earth observation data repository. When defining it, obey the following rules:
Use slashes / as separators between directories and files
Do not start or finish the path with slash /
Start path with the name of the folder found within the root directory of the Earth observation data repository (for example Sentinel-2 or Sentinel-5P)
Again, variables host and container contain the Earth observation data endpoint and the name of the container being used, respectively. You do not need to modify them.
Execute the code with
python3 downloadfile.py
The following file should be downloaded:
S5P_RPRO_L1B_RA_BD4_20180530T005923_20180530T024053_03244_03_020100_20220701T194150.cdl
This file is located within the root directory of the product
After executing the script, the output should be empty. Regardless, the downloaded file should be visible within the directory from which the script was executed. For example, this is what it will look like on Linux:
What To Do Next
boto3 can also be used to access object storage containers from EO-Lab cloud: How to access object storage from EO-Lab using boto3