select

Select data table subsets based on multiple spatial criteria

Synopsis

gmt select [ table ] [ -Amin_area[/min_level/max_level][+a[g|i][s|S]][+l|r][+ppercent] ] [ -Cpointfile|lon/lat+ddist ] [ -Dresolution[+f] ] [ -E[f][n] ] [ -Fpolygonfile ] [ -Ggridmask ] [ -I[cfglrsz] ] [ -Jparameters ] [ -Llinefile+ddist[+p] ] [ -Nmaskvalues ] [ -Rregion ] [ -V[level] ] [ -Zmin[/max][+a][+ccol][+h[k|s]][+i] ] [ -aflags ] [ -bbinary ] [ -dnodata[+ccol] ] [ -eregexp ] [ -fflags ] [ -ggaps ] [ -hheaders ] [ -iflags ] [ -oflags ] [ -qflags ] [ -sflags ] [ -wflags ] [ -:[i|o] ] [ --PAR=value ]

Note: No space is allowed between the option flag and the associated arguments.

Description

select is a filter that reads (x, y) or (longitude, latitude) positions from the first 2 columns of infiles [or standard input] and uses a combination of 1-7 criteria to pass or reject the records. Records can be selected based on whether or not they are 1) inside a rectangular region (-R [and -J]), 2) within dist km of any point in pointfile, 3) within dist km of any line in linefile, 4) inside one of the polygons in the polygonfile, 5) inside geographical features (based on coastlines), 6) has z-values within a given range, or 7) inside bins of a grid mask whose nodes are non-zero. The sense of the tests can be reversed for each of these 7 criteria by using the -I option. See option -: on how to read (y, x) or (lat, lon) files (this option affects all module input data). Note: If no projection information is used then you must supply -fg to tell select that your data are geographical.

Required Arguments

table: One or more ASCII (or binary, see -bi[ncols][type]) data table file(s) holding a number of data columns. If no tables are given then we read from standard input.

Optional Arguments

-Amin_area[/min_level/max_level][+a[g|i][s|S]][+l|r][+ppercent]

Features with an area smaller than min_area in km² or of hierarchical level that is lower than min_level or higher than max_level will not be plotted [Default is 0/0/4 (all features)]. Level 2 (lakes) contains regular lakes and wide river bodies which we normally include as lakes. Several modifiers provide further control:

+a - Control special aspects of the Antarctica coastline. Append one of g|s|s|S:
- g - Selects the Antarctica ice grounding line as the coastline.
- i - Selects the ice shelf boundary as the coastline for Antarctica [Default].
- s - Skip all GSHHG features below 60S (For users who wish to utilize their own Antarctica (with islands) coastline).
- S - Like s but skip instead all features north of 60S.
+l - Only regular lakes and exclude river-lakes.
+p - Append percent to exclude polygons whose percentage area of the corresponding full-resolution feature is less than percent.
+r - Only select river-lakes and exclude regular lakes.

See GSHHG Information below for more details. Ignored unless -N is set.

-Cpointfile|lon/lat+ddist: Pass all records whose location is within dist of any of the points in the ASCII file pointfile. If dist is zero then the 3rd column of pointfile must have each point’s individual radius of influence. If you only have a single point then you can specify lon/lat instead of pointfile. Distances are Cartesian and in user units; specify -fg to indicate spherical distances and append a distance unit, even if the distance specified is 0. (see Units). Alternatively, if -R and -J are used then geographic coordinates are projected to map coordinates (in cm, inch, or points, as determined by PROJ_LENGTH_UNIT) before Cartesian distances are compared to dist.

-Dresolution[+f]: Ignored unless -N is set. Selects the resolution of the coastline data set to use ((f)ull, (h)igh, (i)ntermediate, (l)ow, or (c)rude). The resolution drops off by ~80% between data sets. [Default is l]. Append (+f) to automatically select a lower resolution should the one requested not be available [abort if not found]. Note that because the coastlines differ in details it is not guaranteed that a point will remain inside [or outside] when a different resolution is selected.

-E[f][n]: Specify how points exactly on a polygon boundary should be considered. By default, such points are considered to be inside the polygon. Append f and/or n to change this behavior for the -F and/or -N options, respectively, so that boundary points are considered to be outside.

-Fpolygonfile: Pass all records whose location is within one of the closed polygons in the multiple-segment file polygonfile. For spherical polygons (lon, lat), make sure no consecutive points are separated by 180 degrees or more in longitude. Note that polygonfile must be in ASCII regardless of whether -bi is used.

-Ggridmask: Pass all locations that are inside the valid data area of the grid gridmask. Nodes that are outside are either NaN or zero.

-I[cflrsz]

Reverses the sense of the test for each of the criteria specified:

c - select records not inside any point’s circle of influence.
f - select records not inside any of the polygons.
g - pass records inside the cells with z equal zero of the grid mask in -G.
l - select records not within the specified distance of any line.
r - select records not inside the specified rectangular region.
s - select records not considered inside as specified by -N (and -A, -D).
z - select records not within the range specified by -Z.

-Jparameters: Specify the projection. (See full description) (See technical reference) (See projections table).

-Llinefile+ddist[+p]: Pass all records whose location is within dist of any of the line segments in the ASCII multiple-segment file linefile. If dist is zero then we will scan each sub-header in the linefile for an embedded -Ddist setting that sets each line’s individual distance value. Distances are Cartesian and in user units; specify -fg to indicate spherical distances append a distance unit (see Units). Alternatively, if -R and -J are used then geographic coordinates are projected to map coordinates (in cm, inch, m, or points, as determined by PROJ_LENGTH_UNIT) before Cartesian distances are compared to dist. Append +p to ensure only points whose orthogonal projections onto the nearest line-segment fall within the segment’s endpoints [Default considers points “beyond” the line’s endpoints].

-Nmaskvalues

Pass all records whose location is inside specified geographical features. Specify if records should be skipped (s) or kept (k) using 1 of 2 formats:

-Nwet/dry.

-Nocean/land/lake/island/pond.

[Default is s/k/s/k/s (i.e., s/k), which passes all points on dry land].

-Rxmin/xmax/ymin/ymax[+r][+uunit]: Specify the region of interest. If no map projection is supplied we implicitly set -Jx1. (See full description) (See technical reference).

-V[level]: Select verbosity level [w]. (See full description) (See technical reference).

-Zmin[/max][+a][+ccol][+h[k|s]][+i]

Control passing or skipping records (or entire segments) given the selections set via the arguments. Pass all records whose 3rd column (z; col = 2) lies within the given range or is NaN (use -s to skip NaN records). If max is omitted then we test if z equals min instead. This means equality within 5 ULPs (unit of least precision; http://en.wikipedia.org/wiki/Unit_in_the_last_place). To indicate no limit on min or max, specify a hyphen (-). Notes: (1) If your 3rd column is absolute time then remember to supply -f2T. (2) To specify several tests just repeat the Z option as many times as you have columns to test. (3) To use -Z, the input file must have at least three columns. (4) When more than one Z option is given then the -Iz option cannot be used. Several modifiers are available:

+a - In the case of multiple tests, output any record that passes at least one of your z tests [Default is all tests must pass].
+c - To specify another z-column, append +ccol. If +c is not used then it is automatically incremented for each new -Z option, starting from 2.
+h - Instead of obtaining z from the data column(s), extract z from the segment header -Zz string. If no such entry is found we skip [Default, or +hs] the entire segment (or we keep the entire segment if +hk was given), otherwise it is subject to the test(s) using the constant z for each segment.
+i - Reverses the tests to pass record with a z value not in the given range.

-a[[col=]name[,…]] (more …): Set aspatial column associations col=name.

-birecord[+b|l] (more …): Select native binary format for primary table input. [Default is 2 input columns].

-borecord[+b|l] (more …): Select native binary format for table output. [Default is same as input].

-d[i|o][+ccol]nodata (more …): Replace input columns that equal nodata with NaN and do the reverse on output.

-e[~]“pattern” | -e[~]/regexp/[i] (more …): Only accept data records that match the given pattern.

-f[i|o]colinfo (more …): Specify data types of input and/or output columns.

-gx|y|z|d|X|Y|Dgap[u][+a][+ccol][+n|p] (more …): Determine data gaps and line breaks.

-h[i|o][n][+c][+d][+msegheader][+rremark][+ttitle] (more …): Skip or produce header record(s).

-icols[+l][+ddivisor][+sscale|d|k][+ooffset][,…][,t[word]] (more …): Select input columns and transformations (0 is first column, t is trailing text, append word to read one word only).

-ocols[+l][+ddivisor][+sscale|d|k][+ooffset][,…][,t[word]] (more …): Select output columns and transformations (0 is first column, t is trailing text, append word to write one word only).

-q[i|o][~]rows|limits[+ccol][+a|t|s] (more …): Select input or output rows or data limit(s) [all].

-s[cols][+a][+r] (more …): Set handling of NaN records for output.

-wy|a|w|d|h|m|s|cperiod[/phase][+ccol] (more …): Convert an input coordinate to a cyclical coordinate.

-:[i|o] (more …): Swap 1st and 2nd column on input and/or output.

-^ or just -: Print a short message about the syntax of the command, then exit (Note: on Windows just use -).
-+ or just +: Print an extensive usage (help) message, including the explanation of any module-specific option (but not the GMT common options), then exit.
-? or no arguments: Print a complete usage (help) message, including the explanation of all options, then exit.
--PAR=value: Temporarily override a GMT default setting; repeatable. See gmt.conf for parameters.

Units

For map distance unit, append unit d for arc degree, m for arc minute, and s for arc second, or e for meter [Default unless stated otherwise], f for foot, k for km, M for statute mile, n for nautical mile, and u for US survey foot. By default we compute such distances using a spherical approximation with great circles (-jg) using the authalic radius (see PROJ_MEAN_RADIUS). You can use -jf to perform “Flat Earth” calculations (quicker but less accurate) or -je to perform exact geodesic calculations (slower but more accurate; see PROJ_GEODESIC for method used).

ASCII Format Precision

The ASCII output formats of numerical data are controlled by parameters in your gmt.conf file. Longitude and latitude are formatted according to FORMAT_GEO_OUT, absolute time is under the control of FORMAT_DATE_OUT and FORMAT_CLOCK_OUT, whereas general floating point values are formatted according to FORMAT_FLOAT_OUT. Be aware that the format in effect can lead to loss of precision in ASCII output, which can lead to various problems downstream. If you find the output is not written with enough precision, consider switching to binary output (-bo if available) or specify more decimals using the FORMAT_FLOAT_OUT setting.

This note applies to ASCII output only in combination with binary or netCDF input or the -: option. See also the note below.

Note On Processing ASCII Input Records

Unless you are using the -: option, selected ASCII input records are copied verbatim to output. That means that options like -foT and settings like FORMAT_FLOAT_OUT and FORMAT_GEO_OUT will not have any effect on the output. On the other hand, it allows selecting records with diverse content, including character strings, quoted or not, comments, and other non-numerical content.

Note On Distances

If options -C or -L are selected then distances are Cartesian and in user units; use -fg to imply spherical distances in km and geographical (lon, lat) coordinates. Alternatively, specify -R and -J to measure projected Cartesian distances in map units (cm, inch, or points, as determined by PROJ_LENGTH_UNIT).

This program has evolved over the years. Originally, the -R and -J were mandatory in order to handle geographic data, but now there is full support for spherical calculations. Thus, -J should only be used if you want the tests to be applied on projected data and not the original coordinates. If -J is used the distances given via -C and -L are projected distances.

Note On Segments

Segment headers in the input files are copied to output if one or more records from a segment passes the test. Selection is always done point by point, not by segment. That means only points from a segment that pass the test will be included in the output. If you wish to clip the lines and include the new boundary points at the segment ends you must use spatial instead.

Examples

Note: Below are some examples of valid syntax for this module. The examples that use remote files (file names starting with @) can be cut and pasted into your terminal for testing. Other commands requiring input files are just dummy examples of the types of uses that are common but cannot be run verbatim as written.

To only return the data points from the remote file @ship_15.txt that lie within the region between longitudes 246 and 247 and latitudes 20 and 21, try:

gmt select @ship_15.txt -R246/247/20/21

To return all the points except those inside that square, use:

gmt select @ship_15.txt -R246/247/20/21 -Ir

To extract the subset of data set that is within 300 km of any of the points in pts.txt but more than 100 km away from the lines in lines.txt, run

gmt select lonlatfile -fg -Cpts.txt+d300k -Llines.txt+d100k -Il > subset.txt

Here, you must specify -fg so the program knows you are processing geographical data.

To keep all points in data.txt within the specified region, except the points on land (as determined by the high-resolution coastlines), use

gmt select data.txt -R120/121/22/24 -Dh -Nk/s > subset.txt

To return all points in quakes.txt that are inside or on the spherical polygon lonlatpath.txt, try

gmt select quakes.txt -Flonlatpath.txt -fg > subset1.txt

To return all points in stations.txt that are within 5 cm of the point in origin.txt for a certain projection, try

gmt select stations.txt -Corigin.txt+d5 -R20/50/-10/20 -JM20c --PROJ_LENGTH_UNIT=cm > subset2.txt

To return all points in quakes.txt that are inside the grid topo.nc where the values are nonzero, try

gmt select quakes.txt -Gtopo.nc > subset2.txt

The pass all records whose 3rd column values fall in the range 10-50 and 5th column values are all negative, try

gmt select dataset.txt -Z10/50 -Z-/0+c4 > subset3.txt

GSHHG Information

The coastline database is GSHHG (formerly GSHHS) which is compiled from three sources: World Vector Shorelines (WVS, not including Antarctica), CIA World Data Bank II (WDBII), and Atlas of the Cryosphere (AC, for Antarctica only). Apart from Antarctica, all level-1 polygons (ocean-land boundary) are derived from the more accurate WVS while all higher level polygons (level 2-4, representing land/lake, lake/island-in-lake, and island-in-lake/lake-in-island-in-lake boundaries) are taken from WDBII. The Antarctica coastlines come in two flavors: ice-front or grounding line, selectable via the -A option. Much processing has taken place to convert WVS, WDBII, and AC data into usable form for GMT: assembling closed polygons from line segments, checking for duplicates, and correcting for crossings between polygons. The area of each polygon has been determined so that the user may choose not to draw features smaller than a minimum area (see -A); one may also limit the highest hierarchical level of polygons to be included (4 is the maximum). The 4 lower-resolution databases were derived from the full resolution database using the Douglas-Peucker line-simplification algorithm. The classification of rivers and borders follow that of the WDBII. See The Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG) for further details.

Inside/outside Status

To determine if a point is inside, outside, or exactly on the boundary of a polygon we need to balance the complexity (and execution time) of the algorithm with the type of data and shape of the polygons. For any Cartesian data we use a non-zero winding algorithm, which is quite fast. For geographic data we will also use this algorithm as long as (1) the polygons do not include a geographic pole, and (2) the longitude extent of the polygons is less than 360. If this is the situation we also carefully adjust the test point longitude for any 360 degree offsets, if appropriate. Otherwise, we employ a full spherical ray-shooting method to determine a points status.