Michael Thomas Flanagan's Java Scientific Library

Normality Class:     Normality

     

Last update: 30 March 2015                                                                                                                             PERMISSION TO USE
Main Page of Michael Thomas Flanagan's Java Scientific Library

This class contains methods for examining a data set for deviation from normality:
See ProbabiltyPlot class documentation for further details of normal probability plots and see Outliers for outlier detection.

import statement:      import flanagan.analysis.Normality;

SUMMARY OF CONSTRUCTORS AND METHODS

See class ArrayMaths for recasting arrays if the array argument below is not of the type that you require.

Constructors public Normality()
public Normality(double[] data)
public Normality(float[] data)
public Normality(long[] data)
public Normality(int[] data)
public Normality(BigDecimal[] data)
public Normality(BigInteger[] data)
Read data in from a text file public void readDataFromTextFile()
Significance Reset the significance level public void resetSignificance(double significance)
Get the significance level public double getSignificance()
Full analysis public void fullAnalysis()
Shapiro-Wilk W Test
Returns Shapiro-Wilk W value public double shapiroWilkWvalue()
Returns Shapiro-Wilk critical W value Default significance level public double shapiroWilkCriticalW()
Entered significance level public double shapiroWilkCriticalW(double significance)
Entered significance level and number of observations public double shapiroWilkCriticalW(double significance, int nObservations)
Reset number of iterations public void resetNsimulation(int n)
Get number of iterations public int getNsimulation()
Returns Shapiro-Wilk p-value public double shapiroWilkPvalue()
Returns the Shapiro-Wilk coefficients For entered data public double[] shapiroWilkCoeff()
For n observations public double[] shapiroWilkCoeff(int n)
Normal Probability Plot
(Gaussian Probability Plot)

See below for the
Standard Normal Probability Plot
Calculate and display plot public void normalProbabiltyPlot()
User supplied initial estimates public void normalUserSuppliedInitialEstimates(double mu, double sigma)
Remove user supplied initial estimates public void removeNormalUserSuppliedInitialEstimates()
Correlation coefficient public double normalCorrelationCoefficient()
Gradient value public double normalGradient()
error public double normalGradientError()
Intercept value public double normalIntercept()
error public double normalInterceptError()
μ value public double normalMu()
error public double normalMuError()
σ value public double normalSigma()
error public double normalSigmaError()
Sum of squares public double normalSumOfSquares()
Order Statistic Medians public double[] normalOrderStatisticMedians()
Standard Normal Probability Plot
(Gaussian Probability Plot)

See above for the
Two Parameter Normal Probability Plot
Calculate and display plot public void normalStandardProbabiltyPlot()
Correlation coefficient public double normalStandardCorrelationCoefficient()
Gradient value public double normalStandardGradient()
error public double normalStandardGradientError()
Intercept value public double normalStandardIntercept()
error public double normalStandardInterceptError()
Sum of squares public double normalStandardSumOfSquares()
Order Statistic Medians public double[] normalStandardOrderStatisticMedians()
Data Get entered data as doubles public double[] getData()
Ordered data public double[] getOrderedData()



CONSTRUCTOR
public Normality(double[] data)
public Normality(float[] data)
public Normality(long[] data)
public Normality(int[] data)
public Normality(BigDecimal[] data)
public Normality(BigInteger[] data)
public Normality()
Usage:                      Normality norm = Normality(data);
Creates an instance of Normality. The data to be analysed may be entered via the constructor argument data.
Data of types float, long, int, BigDecimal and BigInteger will be converted to double. The conversion of the latter two will trigger a warning message on possible loss of precision. BigDecimal will trigger an exception if it is too large to convert to double. Data may alternatively be read in from a text file; see Constructor immediately below and also
readDataFromTextFile().

Usage:                      Normality norm = Normality();
This constructor creates an instance of Normality but requires that the data be read in from a text file; see readDataFromTextFile().



READ IN THE DATA FROM A TEXT FILE
public void readDataFromTextFile()
Usage:                      norm.readDataFromTextFile();
This method allows the data to be analysed to be read in from a text file. On calling this method an Open File dialogue window will be displayed allowing the directories to be searched for the relevant file and for the selection of that file.
The file must be a text file, i.e. a .txt file. The data points must be separated by either one or more spaces, a comma, a semicolon, a colon, a tab or a combination of these. The data is read sequentially along each line of the file. The data may be contained in one or more lines or in a single column. Multiple lines may contain different numbers of points.
This method is best used with the constructor with no data argument (see immediately above). If data has already been entered this method will overwrite the previously entered data.



SIGNIFICANCE
The default value of the significance level used by normality tests in this class is 0.05 [5%]
This value may be reset to a different user chosen value.

Resetting the significance level
public void resetSignificance(double significance)
Usage:                      norm.resetSignificance(significance);
This method resets the significance level to a user supplied value significance.

Getting the significance level
public double getSignificance()
Usage:                      significance = norm.getSignificance();
This method returns the current significance level.



FULL ANALYSIS

public void fullAnalysis()
Usage:                      norm.fullAnalysis();
This method performs a normality check using both a Shapiro-Wilk W test and a normal probability plot and prints the analysis to a text file.
A typical output is shown below:





SHAPIRO-WILK W TEST

This section describes the application of the Shapiro-Wilk W Test in checking for departures from normality in a data set.
The Shapiro-Wilk W test statistic is defined as

where ȳ is the sample data mean, the yi are the entered data sorted into ascending order, n is the number of data points and the Shapiro-Wilk coefficients, ai, are described below.

Shapiro-Wilk W value
public double shapiroWilkWvalue()
Usage:                      w = norm.shapiroWilkWvalue();
This method returns the Shapiro-Wilk W test statistic, W, for the data either entered via a constructor or read in from a text file.

Shapiro-Wilk W critical value
Default significance level
public double shapiroWilkCriticalW()
Usage:                      wcrit = norm.shapiroWilkCriticalW();
This method returns the critical value for Shapiro-Wilk W test statistic, Wcrit, for the data either entered via a constructor or read in from a text file at the
default significance level.
The null hypothesis of Shapiro-Wilk test is that the samples are taken from a normal distribution. If
      W > Wcrit
the null hypothesis may be rejected.
Wcrit is calculated using q simulations of the calculation of W in which each of the q data sets is generated as n random normal deviates. The value of Wcrit is then obtained by noting the value of the ordered simulation value of Ws corresponding to the significance level probability. The default value of q is 10000 but this may be reset (See below).

Entered significance level
public double shapiroWilkCriticalW(double significance)
Usage:                      wcrit = norm.shapiroWilkCriticalW(significance);
This method returns the critical value for Shapiro-Wilk W test statistic, Wcrit, for the data either entered via a constructor or read in from a text file at the significance level entered as the argument significance. Otherwise the method is as described above for shapiroWilkCriticalW()

Entered significance level and number of data points
public double shapiroWilkCriticalW(double significance, int nPoints)
Usage:                      wcrit = norm.shapiroWilkCriticalW(significance, nPoints);
This method returns the critical value for Shapiro-Wilk W test statistic, Wcrit, for nPoints data points at the significance level entered as the argument significance. Otherwise the method is as described above for shapiroWilkCriticalW()

Reset number of iterations
public void resetNsimulation(int n)
Usage:                      norm.resetNsimulation(n);
This method allows the default number of iterations, used in the calculation of Wcrit and of the Shapiro-wilk p-value, to be reset to the argument n. The preset default value is 10000.

Return the number of iterations
public int getNsimulation()
Usage:                      n = norm.getNsimulation();
This method returns the number of iterations used in the calculation of Wcrit and of the Shapiro-wilk p-value.

Shapiro-Wilk p-value
public double shapiroWilkPvalue()
Usage:                      p = norm.shapiroWilkPvalue();
This method returns the p-value for Shapiro-Wilk W test statistic, p, for the data either entered via a constructor or read in from a text file at the
default significance level, α.
The null hypothesis of Shapiro-Wilk test is that the samples are taken from a normal distribution. If
      p < α
the null hypothesis may be rejected.
The p-value is calculated from the simulation values described in the calculation of Wcrit above.

Shapiro-Wilk coefficients
The Shapiro-Wilk coefficients, ai, are calculated by the method of Patrick Royston [Statistics and Computing (1992) 2, 117-119].
The algorithm used in this library is a translation into Java of the coefficient calculation section of the FORTRAN Algorithm AS R94 Appl. Statist. (1995) Vol.44, No.4.

Shapiro-Wilk coefficients for the entered data
public double[] shapiroWilkCoeff()
Usage:                      coeff = norm.shapiroWilkCoeff();
This method returns the Shapiro-Wilk coefficients for n points where n is the number of data points in the data entered via a constructor or read in from a text file.

Shapiro-Wilk coefficients for n observations
public double[] shapiroWilkCoeff(int nPoints)
Usage:                      coeff = norm.shapiroWilkCoeff(nPoints);
This method returns the Shapiro-Wilk coefficients for n points where n is entered as the argument nPoints.




NORMAL PROBABILITY PLOTS (GAUSSIAN PROBABILITY PLOTS)

This section describes methods for plotting and analysising Normal Probability Plots where the Normal probability density function is defined as

Naming note: normal may be replaced by gaussian in the names of all these methods, e.g. gaussianProbabilityPlot() performs the same function as normalProbabilityPlot()

Calculate and Display a Normal Probabilty Plot
public void normalProbabilityPlot()
Usage:                      norm.normalProbabilityPlot();
This method calculates and displays a Normal probablity plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Normal order statistic medians (abscissa), and the best fit straight line. The data is first sorted into an ascending order.

User supplied initial estimates
public void normalUserSuppliedInitialEstimates(double mu, double sigma)
Usage:                      norm.normalUserSuppliedInitialEstimates(mu, sigma);
This method allows the user to override the above method's [normalProbabilityPlot()] automatic calculation of the initial estimates of μ and σ and enter their own initial estimates of μ [mu] and σ [sigma]. If this method is required it must be called before normalProbabilityPlot() is called. See the warning note 1 for an explanation of the possible need for this method.

Remove user supplied initial estimates
public void removeNormalUserSuppliedInitialEstimates()
Usage:                      norm.removeNormalUserSuppliedInitialEstimates();
This method removes initial estimates supplied by the user via the method normalUserSuppliedInitialEstimates(mu, sigma) [see immediately above]. The automatic calculation of the initial estimates of μ and σ by normalProbabilityPlot() is restored on calling this method.

Return the Correlation Coefficient
public double normalCorrelationCoefficient()
Usage:                      rho = norm.normalCorrelationCoefficient();
This method returns the correlation coefficient of the data entered via the Constructor and the corresponding Normal order statistic medians. The data is first sorted into an ascending order.

Return the Gradient
public double normalGradient()
Usage:                      gradient = norm.normalGradient();
This method returns the gradient of the best straight line fit to the Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Normal order statistic medians (abscissa). The data is first sorted into an ascending order.

Return the Estimated Error of the Gradient
public double normalGradientError()
Usage:                      gradientError = norm.normalGradientError();
This method returns the estimated error of the gradient of the best straight line fit to the Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Normal order statistic medians (abscissa). The data is first sorted into an ascending order. The error is obtained from the linear regression.

Return the Intercept
public double normalIntercept()
Usage:                      intercept = norm.normalIntercept();
This method returns the intercept of the best straight line fit to the Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Normal order statistic medians (abscissa). The data is first sorted into an ascending order.

Return the Estimated Error of the Intercept
public double normalInterceptError()
Usage:                      interceptError = norm.normalInterceptError();
This method returns the estimated error of the intercept of the best straight line fit to the Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Normal order statistic medians (abscissa). The data is first sorted into an ascending order. The error is obtained from the linear regression.

Return normal parameter μ
public double normalMu()
Usage:                      mu = norm.normalMu();
This method returns the Normal parameter, μ, obtained from the best fit Probabilty Plot calculated as the one with the minimum sum of squares of the diferences in the data values entered via the Constructor and the corresponding Normal order statistic median value. The data is first sorted into an ascending order. See also Warning.


Return the Estimated Error of the normal parameter μ
public double normalMuError()
Usage:                      muError = norm.normalMuError();
This method returns the estimated error of the Normal parameter, μ. The error is obtained as the square root of the appropriate diagonal of a covariance matrix obtained as the inverse of the numerically calculated matrix of second derivatives of the sum of squares with respect to μ and σ. See also Warning.

Return normal parameter σ
public double normalSigma()
Usage:                      sigma = norm.normalSigma();
This method returns the Normal parameter, σ, obtained from the best fit Probabilty Plot calculated as the one with the minimum sum of squares of the diferences in the data values entered via the Constructor and the corresponding Normal order statistic median value. The data is first sorted into an ascending order. See also Warning.


Return the Estimated Error of the normal parameter σ
public double normalSigmaError()
Usage:                      sigmaError = norm.normalSigmaError();
This method returns the estimated error of the Normal parameter, σ. The error is obtained as the square root of the appropriate diagonal of a covariance matrix obtained as the inverse of the numerically calculated matrix of second derivatives of the sum of squares with respect to μ and σ. See also Warning.

Return the Sum of Squares
public double normalSumOfSquares()
Usage:                      ss = norm.normalSumOfSquares();
This method returns the unweighted sum of squares of the diferences in the data values entered via the unweighted regression Constructor and the corresponding Normal order statistic median value. The data is first sorted into an ascending order.

Return the Normal Order Statistic Medians
public double[] normalOrderStatisticMedians()
Usage:                      gsom = norm.normalOrderStatisticMedians();
This method returns the Normal order statistic median used in the Probability Plot.



STANDARD NORMAL PROBABILITY PLOT
See above for the Two Parameter Normal Standard Probability Plot

This section describes methods for plotting and analysising Standard Normal Probability Plots where the Standard Normal probabilty density function is defined as

Naming note: normalStandard may be replaced by gaussianStandard in the names of all these methods, e.g. gaussianStandardProbabilityPlot() performs the same function as normalStandardProbabilityPlot().

Calculate and Display a Standard Normal Probabilty Plot
public void normalStandardProbabilityPlot()
Usage:                      norm.normalStandardProbabilityPlot();
This method calculates and displays a Standard Normal probablity plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Standard Normal order statistic medians (abscissa), and the best fit straight line. The data is first sorted into an ascending order.

Return the Correlation Coefficient
public double normalStandardCorrelationCoefficient()
Usage:                      rho = norm.normalStandardCorrelationCoefficient();
This method returns the correlation coefficient of the data entered via the Constructor and the corresponding Standard Normal order statistic medians. The data is first sorted into an ascending order.

Return the Gradient
public double normalStandardGradient()
Usage:                      gradient = norm.normalStandardGradient();
This method returns the gradient of the best straight line fit to the Standard Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Standard Normal order statistic medians (abscissa). The data is first sorted into an ascending order.

Return the Estimated Error of the Gradient
public double normalStandardGradientError()
Usage:                      gradientError = norm.normalStandardGradientError();
This method returns the estimated error of the gradient of the best straight line fit to the Standard Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Standard Normal order statistic medians (abscissa). The data is first sorted into an ascending order. The error is obtained from the linear regression.

Return the Intercept
public double normalStandardIntercept()
Usage:                      intercept = norm.normalStandardIntercept();
This method returns the intercept of the best straight line fit to the Standard Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Standard Normal order statistic medians (abscissa). The data is first sorted into an ascending order.

Return the Estimated Error of the Intercept
public double normalStandardInterceptError()
Usage:                      interceptError = norm.normalStandardInterceptError();
This method returns the estimated error of the intercept of the best straight line fit to the Standard Normal probability plot, i.e. a plot of the data entered via the Constructor (ordinate) against the corresponding Standard Normal order statistic medians (abscissa). The data is first sorted into an ascending order. The error is obtained from the linear regression.

Return the Sum of Squares
public double normalStandardSumOfSquares()
Usage:                      ss = norm.normalStandardSumOfSquares();
This method returns the sum of squares of the diferences in the data values entered via the unweighted regression Constructor and the corresponding Standard Normal order statistic median value. The data is first sorted into an ascending order.

Return the Standard normalStandard Order Statistic Medians
public double[] normalStandardOrderStatisticMedians()
Usage:                      gsom = norm.normalStandardOrderStatisticMedians();
This method returns the Standard Normal order statistic median used in the Probability Plot.





DATA

Original data
public double[] getData()
Usage:                      data = norm.getData();
This method returns the entered data as a double[] array in the order in which the data was entered.

Ordered data
public double[] getOrderedData()
Usage:                      orddata = norm.getOrderedData();
This method returns the entered data as a double[] array sorted into ascending oder.



OTHER CLASSES USED BY THIS CLASS

This class uses the following classes in this library:


PERMISSION TO USE

Permission to use, copy and modify this software and its documentation for NON-COMMERCIAL purposes is granted, without fee, provided that an acknowledgement to the author, Dr Michael Thomas Flanagan at www.ee.ucl.ac.uk/~mflanaga, appears in all copies and associated documentation or publications.

Public listing of the source codes on the internet is not permitted.

Redistribution of the source codes or of the flanagan.jar file is not permitted.

Redistribution in binary form of all or parts of these classes is not permitted.

Dr Michael Thomas Flanagan makes no representations about the suitability or fitness of the software for any or for a particular purpose. Dr Michael Thomas Flanagan shall not be liable for any damages suffered as a result of using, modifying or distributing this software or its derivatives.



This page was prepared by Dr Michael Thomas Flanagan