trend1d

Fit [weighted] [robust] polynomial/Fourier model for y = f(x) to xy[w] data

Synopsis

gmt trend1d [ table ] -Fxymrw|p|P|c -N[p|P|f|F|c|C|s|S|x]n[,…][+llength][+oorigin][+r] [ -Ccondition_number ] [ -I[confidence_level] ] [ -T[min/max/]inc[+i|n] |-Tfile|list ] [ -V[level] ] [ -W[+s] ] [ -bbinary ] [ -dnodata[+ccol] ] [ -eregexp ] [ -fflags ] [ -hheaders ] [ -iflags ] [ -qflags ] [ -sflags ] [ -wflags ] [ -:[i|o] ] [ --PAR=value ]

Note: No space is allowed between the option flag and the associated arguments.

Description

trend1d reads x, y [and w] values from the first two [three] columns on standard input [or file] and fits a regression model y = f(x) + e by [weighted] least squares [Menke, 1989]. The functional form of f(x) may be chosen as polynomial or Fourier or a mix of the two, and the fit may be made robust by iterative reweighting of the data. The user may also search for the number of terms in f(x) which significantly reduce the variance in y.

Required Arguments

table: One or more ASCII [or binary, see -bi] files containing x, y[, w] values in the first 2 [3] columns. If no files are specified, trend1d will read from standard input.

-Fxymrw|p|P|c: Specify up to five letters from the set {x y m r w} in any order to create columns of ASCII [or binary] output. x = x, y = y, m = model f(x), r = residual y - m, w = weight used in fitting. Alternatively, choose just the single selection p to output a record with the polynomial model coefficients, P for the normalized polynomial model coefficients, or c for the normalized Chebyshev model coefficients. Note: If m is included then we sort the output on increasing x (whether x is selected or not).

-N[p|P|f|F|c|C|s|S|x]n[,…][+llength][+oorigin][+r]

Specify the components of the (possibly mixed) model. Append one or more comma-separated model components. Each component is of the form of a directive Tn, where T indicates the basis function and n indicates the polynomial degree or how many terms in the Fourier series we want to include. Choose one of more comma-separated directives T from this list:

p - Polynomial with intercept and powers of x up to degree n.
P - Just include the single term \(x^n\).
f - Fourier series with n terms.
c - Cosine series with n terms.
s - Sine series with n terms.
F - Single Fourier component of order n.
C - Single cosine component of order n.
S - Single sine component of order n.

By default the x-origin and fundamental period length is set to the mid-point and data range, respectively. Change this using these modifiers:

+l - Append a custom length value.
+o - Append a custom x-origin.
+r - Seek a robust solution [Default gives a least squares fit].

Notes: Using origin and length, we normalize x before evaluating the basis functions. Basically, the trigonometric bases all use the normalized \(x' = 2\pi(x-\mbox{origin})/\mbox{length}\) while the polynomials use \(x' = 2(x-\mbox{origin})/\mbox{length}\) for stability. (2) Use -V to see a plain-text representation of the y(x) model requested.

Optional Arguments

-Ccondition_number: Set the maximum allowed condition number for the matrix solution. trend1d fits a damped least squares model, retaining only that part of the eigenvalue spectrum such that the ratio of the largest eigenvalue to the smallest eigenvalue is condition_number. [Default: condition_number = 1.0e06. ].

-I[confidence_level]: Iteratively increase the number of model parameters, starting at one, until n_model is reached or the reduction in variance of the model is not significant at the confidence_level level. You may set -I only, without an attached number; in this case the fit will be iterative with a default confidence level of 0.51. Or choose your own level between 0 and 1. See remarks section. Note that the model terms are added in the order they were given in -N so you should place the most important terms first.

-T[min/max/]inc[+i|n] |-Tfile|list: Evaluate the best-fit regression model at the equidistant points implied by the arguments. If only -Tinc is given instead we will reset min and max to the extreme x-values for each segment. To skip the model evaluation entirely, simply provide -T0. For details on array creation, see `Generate 1-D Array`_.

-V[level]: Select verbosity level [w]. (See full description) (See technical reference).

-W[+s]: Weights are supplied in input column 3. Do a weighted least squares fit [or start with these weights when doing the iterative robust fit]. Append +s to instead read data uncertainties (one sigma) and create weights as 1/sigma² [Default reads only the first 2 columns].

-birecord[+b|l] (more …): Select native binary format for primary table input. [Default is 2 (or 3 if -W is set) columns].

-borecord[+b|l] (more …): Select native binary format for table output. [Default is 1-5 columns as given by -F].

-d[i|o][+ccol]nodata (more …): Replace input columns that equal nodata with NaN and do the reverse on output.

-e[~]“pattern” | -e[~]/regexp/[i] (more …): Only accept data records that match the given pattern.

-f[i|o]colinfo (more …): Specify data types of input and/or output columns.

-h[i|o][n][+c][+d][+msegheader][+rremark][+ttitle] (more …): Skip or produce header record(s).

-icols[+l][+ddivisor][+sscale|d|k][+ooffset][,…][,t[word]] (more …): Select input columns and transformations (0 is first column, t is trailing text, append word to read one word only).

-q[i|o][~]rows|limits[+ccol][+a|t|s] (more …): Select input or output rows or data limit(s) [all].

-s[cols][+a][+r] (more …): Set handling of NaN records for output.

-wy|a|w|d|h|m|s|cperiod[/phase][+ccol] (more …): Convert an input coordinate to a cyclical coordinate.

-:[i|o] (more …): Swap 1st and 2nd column on input and/or output.

-^ or just -: Print a short message about the syntax of the command, then exit (Note: on Windows just use -).
-+ or just +: Print an extensive usage (help) message, including the explanation of any module-specific option (but not the GMT common options), then exit.
-? or no arguments: Print a complete usage (help) message, including the explanation of all options, then exit.
--PAR=value: Temporarily override a GMT default setting; repeatable. See gmt.conf for parameters.

ASCII Format Precision

The ASCII output formats of numerical data are controlled by parameters in your gmt.conf file. Longitude and latitude are formatted according to FORMAT_GEO_OUT, absolute time is under the control of FORMAT_DATE_OUT and FORMAT_CLOCK_OUT, whereas general floating point values are formatted according to FORMAT_FLOAT_OUT. Be aware that the format in effect can lead to loss of precision in ASCII output, which can lead to various problems downstream. If you find the output is not written with enough precision, consider switching to binary output (-bo if available) or specify more decimals using the FORMAT_FLOAT_OUT setting.

Remarks

If a polynomial model is included, then the domain of x will be shifted and scaled to [-1, 1] and the basis functions will be Chebyshev polynomials provided the polygon is of full order (otherwise we stay with powers of x). The Chebyshev polynomials have a numerical advantage in the form of the matrix which must be inverted and allow more accurate solutions. The Chebyshev polynomial of degree n has n+1 extrema in [-1, 1], at all of which its value is either -1 or +1. Therefore the magnitude of the polynomial model coefficients can be directly compared. Note: The stable model coefficients are Chebyshev coefficients. The corresponding polynomial coefficients in a + bx + cxx + … are also given in Verbose mode but users must realize that they are not stable beyond degree 7 or 8. See Numerical Recipes for more discussion. For evaluating Chebyshev polynomials, see math.

The -N…+r (robust) and -I (iterative) options evaluate the significance of the improvement in model misfit Chi-Squared by an F-test. The default confidence limit is set at 0.51; it can be changed with the -I option. The user may be surprised to find that in most cases the reduction in variance achieved by increasing the number of terms in a model is not significant at a very high degree of confidence. For example, with 120 degrees of freedom, Chi-Squared must decrease by 26% or more to be significant at the 95% confidence level. If you want to keep iterating as long as Chi-Squared is decreasing, set confidence_level to zero.

A low confidence limit (such as the default value of 0.51) is needed to make the robust method work. This method iteratively reweights the data to reduce the influence of outliers. The weight is based on the Median Absolute Deviation and a formula from Huber [1964], and is 95% efficient when the model residuals have an outlier-free normal distribution. This means that the influence of outliers is reduced only slightly at each iteration; consequently the reduction in Chi-Squared is not very significant. If the procedure needs a few iterations to successfully attenuate their effect, the significance level of the F-test must be kept low.

Examples

Note: Below are some examples of valid syntax for this module. The examples that use remote files (file names starting with @) can be cut and pasted into your terminal for testing. Other commands requiring input files are just dummy examples of the types of uses that are common but cannot be run verbatim as written.

To remove a linear trend from data.xy by ordinary least squares, use:

gmt trend1d data.xy -Fxr -Np1 > detrended_data.xy

To make the above linear trend robust with respect to outliers, use:

gmt trend1d data.xy -Fxr -Np1+r > detrended_data.xy

To fit the model y(x) = a + bx² + c * cos(2*pi*3*(x/l) + d * sin(2*pi*3*(x/l), with l the fundamental period (here l = 15), try:

gmt trend1d data.xy -Fxm -NP0,P2,F3+l15 > model.xy

To find out how many terms (up to 20, say in a robust Fourier interpolant are significant in fitting data.xy, use:

gmt trend1d data.xy -Nf20+r -I -V

References

Huber, P. J., 1964, Robust estimation of a location parameter, Ann. Math. Stat., 35, 73-101.

Menke, W., 1989, Geophysical Data Analysis: Discrete Inverse Theory, Revised Edition, Academic Press, San Diego.