Remote Data Sets

A remote data set is a data set that is stored on one or more remote servers. It may be a single grid file or a collection of subset tiles making up a larger grid. They are not distributed with GMT or installed during the installation procedures. GMT offers several remote global data grids that you can access via our remote file mechanism. The first time you access one of these files, GMT will download the file (or a subset tile) from the selected GMT server and save it to the server directory under your GMT user directory [~/.gmt]. From then on we read the local file from there.

By using the remote file mechanism you should know that these files, on the server, will change from time to time (i.e., new versions are released, a problem in one file is fixed, or a dataset becomes obsolete), and GMT will take actions accordingly. It is our policy to only supply the latest version of any dataset that undergoes revisions. If you require previous versions for your work you will need to get those data from the data provider separately. Unless you deactivate the remote data service, GMT will do the following when you request a remote file in a GMT command:

  1. We check if the locally cached catalog with information about the data available from the server is up-to-date or if it needs to be refreshed. If the file is older that the GMT_DATA_UPDATE_INTERVAL limit then we refresh the catalog.

  2. When the catalog is refreshed, we determine the publication date for each dataset on the server, and if any local copies you may have are now obsolete we will remove them to force a re-download from the server.

Currently Available Remote Data Sets

Documentation for the individual remote datasets available through the GMT server and its mirrors can be found at Remote Datasets.

Usage

We have processed and reformatted publicly available global data sets (grids and images) and standardized their file names. In GMT, you may access such data (or a subset only by using the -R option) by specifying the special name

@remote_name_rru[_reg]

where the leading @ symbol identifies the file as a remote data set, the remote_name_ is specific to the dataset and the rr code is a 2-digit integer specifying the grid/image resolution in the unit u, where u is either d, m or s for arc degree, arc minute or arc second, respectively. The codes for rru and the optional reg that are supported will be listed in the sections below describing each of the available data sets.

When used in plots (i.e., both when a region and map projection is selected to make an image) the data resolution is optional. If it is not given then we determine a data set resolution that will result in a final plot image dots-per-unit resolution that is the closest to the GMT_GRAPHICS_DPU default setting. This eliminates the need for the user to determine what grid resolution will give a nice-looking image and not create a bloated file that exceeds what the eye (or printers) can discern. Use grdcut with the -D option to inquire about the automatic resolution. Note: Grid processing tools require the data resolution to be specified since no plot is being generated.

Details about the remote datasets currently provided by GMT can be found at Remote Datasets.

Many of the remote datasets have a preferred, default color table that will be used unless you override that default by giving your desired CPT information.

Data Registration

Optionally, you can append _g or _p to specifically get the gridline-registered or pixel-registered version (if they both exist). If reg is not specified then the behavior depends on whether you are making a plot or processing/extracting a subset of the data:

  • For plots we will return the pixel-registered version unless only the gridline-registered file is available.

  • For grid processing modules we will return the gridline-registered version unless only the pixel-registered file is available. We will also issue a warning since for calculations you should ideally know and specify exactly what you want.

If you do specify a specific registration and that version is not available you will get an error message.

Controlling the Process

There are several ways you can control the remote data process and the amount of space taken up by your own server directory:

  1. You can select the GMT data server closest to you to minimize download time [GMT_DATA_SERVER].

  2. You can set an upper limit on the file sizes that may be downloaded [GMT_DATA_SERVER_LIMIT].

  3. You can turn off the automatic download temporarily [GMT_DATA_UPDATE_INTERVAL].

  4. You can control how often GMT will refresh the catalog of information on your computer [GMT_DATA_UPDATE_INTERVAL]

  5. You can clear the server directory, or perhaps just some subsets, any time via gmt clear.

Offline Usage

If you anticipate to be without an Internet connection (or have a very slow one), you can download all (or some) of the remote files prior to losing connection with the module gmtget. You can choose which data to download and limit it to node spacings larger or equal to a limit, and you can minimize space on your computer by requesting that any JPEG2000 tiles not be converted until GMT is accessing them. Here are some examples of usage. Download the entire cache directory used in examples and tests:

gmt get -Dcache

Get all the data for Earth but only for 1 arc minute and coarser, and leave tiles in JPEG2000 format:

gmt get -Ddata=earth -I1m -N

As shown in the tables below, the largest datasets may take some time to download the data from GMT server, so be patient!

File Compression

Typically, a dataset is released by the data provider in a single, high-resolution format. To optimize use of these data in GMT and to prevent download bottlenecks we have downsampled them via Cartesian Gaussian filtering to prevent aliasing while preserving the latitude-dependent resolution in the original grid or image. To improve responsiveness, the larger files (i.e., currently for node spacings 05m and smaller) have been split into smaller tiles. When the 06m or lower resolution files are accessed the first time we download the entire file, regardless of your selected region (-R). However, for the tiled data sets we only download the tiles that intersect your selected region the first time they are referenced. Note: The mask grids are not tiled as they are very small even for 15s resolution (due to byte format and effective compression), and neither are images (at least for as long as GMT does not have the capability of blending image tiles - this may change in the future).

Single grids are provided as netCDF-4 maximum-lossless compressed short int grids, making the files much smaller than their original source files without any loss of precision. To minimize download speed, the dataset tiles are all stored as JPEG2000 images on the GMT server due to superior compression, but once downloaded to your server directory they are converted to the same short int compressed netCDF4 format for easier access. This step uses our GDAL bridge and requires that your GDAL distribution was built with openjpeg support.

../_images/srtm1.png

Histogram of compression rates for the SRTM 1x1 arc second tiles. 100% reflects the full short integer size of an uncompressed tile (~25 Mb). As can be seen, on average a JPEG2000 tile is only half the size of the corresponding fully compressed (level 9) netCDF short int grid. This is why we have chosen the JP2 format for tiles on the server.

Cache File Updates

Remote cache files are our collection of miscellaneous files that are used throughout the GMT examples, man pages, and test suite. There is no system nor catalog and files come and go as we need them. The cache files are subject to similar rules as the remote data set when it comes to refreshing or deleting them. If any of these files is precious to you we suggest you make a copy somewhere.

Getting a single grid

Should you need a single grid from any of our tiled dataset, e.g., to feed into other programs that do not depend on GMT, you can create that via grdcut. For instance, to make a global grid from the eight tiles that make up the 2m x 2m gridline-registered data, try:

gmt grdcut @earth_relief_02m_g -Gearth_at_2m.grd -Rg

Finally, if you wish to determine the most suitable grid resolution that is adequate for making a map given a region and projection, you can inquire about this information by passing -D, e.g.:

gmt grdcut @earth_relief -R270/20/305/25+r -JOc280/25.5/22/69/24c -D -V > info.txt

or obtain the required subset grid directly via:

gmt grdcut @earth_relief -R270/20/305/25+r -JOc280/25.5/22/69/24c -Gsubset.grd -V