13. MTfit.inversion: Inversion object¶
The main MTfit
inversion object is used to handle the forward model evaluation and process the results. The object initialisation and forward functions are documented here, although there are additional (private and special) functions that are documented in the source code.
-
class
MTfit.inversion.
Inversion
(data={}, data_file=False, location_pdf_file_path=False, algorithm='iterate', parallel=True, n=0, phy_mem=8, dc=False, **kwargs)[source]¶ Main Inversion object
Runs the MT inversion, follwing a parameterisation set on initialisation. Parameters can be set for the algorithm to use in the kwargs.
Algorithm options are:
- Time - Monte Carlo random sampling - runs until time limit reached.
- Iterate - Monte Carlo random sampling - runs until sample limit reached.
- McMC - Markov chain Monte Carlo sampling.
- TransDMcMC - Markov chain Monte Carlo sampling.
These are discussed in more detail in the MTfit.algorithms documentation.
The inversion is run by calling the forward function (MTfit.inversion.Inversion.forward)
Data Format
The inversion expects a python dictionary of the data in the format:
>>data={'PPolarity':{'Measured':numpy.matrix([[-1],[-1]...]), 'Error':numpy.matrix([[0.01],[0.02],...]), 'Stations':{'Name':[Station1,Station2,...] 'Azimuth':numpy.matrix([[248.0],[122.3]...]), 'TakeOffAngle':numpy.matrix([[24.5],[2.8]...]) } }, 'PSHAmplitudeRatio':{...}, ... 'UID':'Event1' }
The initial key arguments correspond to the data types that can be used in the inversion. The inversion uses polarity observations and amplitude ratios, and can also use relative amplitudes. This means that the useable data types are:
Polarity
- PPolarity
- SHPolarity
- SVPolarity
Polarity observations made manually or automatically. The corresponding data dictionary for polarities needs the following keys:
Measured: numpy matrix of polarity observations
Error: numpy matrix of fractional uncertainty in the polarity observations.
Stations: dictionary of station information with keys:
- Name: list of station names
- Azimuth: numpy matrix of azimuth values in degrees
- TakeOffAngle: numpy matrix of take off angle values in degrees - 0 down (NED coordinate system).
As with all of these dictionaries, the indexes of the observations and errors must correspond to the stations, i.e. data['Measured'][0,0] -> from data['Stations']['Name'][0] with error data['Error'][0,:] etc.
If polarity probabilities are being used, the keys are:
- PPolarityProbability
- SHPolarityProbability
- SVPolarityProbability
With a similar structure to the Polarity data, except the measured matrix has an additional dimension, i.e. data['Measured'][0,0] is the positive polarity probability and data['Measured'][0,1] is the negative polarity probability.
Amplitude ratios
- P/SHRMSAmplitudeRatio
- P/SVRMSAmplitudeRatio
- SH/SVRMSAmplitudeRatio
- P/SHQRMSAmplitudeRatio
- P/SVQRMSAmplitudeRatio
- SH/SVQRMSAmplitudeRatio
- P/SHAmplitudeRatio
- P/SVAmplitudeRatio
- SH/SVAmplitudeRatio
- P/SHQAmplitudeRatio
- P/SVQAmplitudeRatio
- SH/SVQAmplitudeRatio
Amplitude ratio observations made manually or automatically. The Q is not necessary but is useful to label the amplitudes with a Q correction. The corresponding data dictionary for amplitude ratios needs the following keys:
Measured: numpy matrix of corrected numerator and denominator amplitude ratio observations, needs to have two columns, one for the numerator and one for the denominator.
Error: numpy matrix of uncertainty (standard deviation) in the amplitude ratio observations, needs to have two columns, one for the numerator and one for the denominator.
Stations: dictionary of station information with keys:
- Name: list of station names
- Azimuth: numpy matrix of azimuth values in degrees
- TakeOffAngle: numpy matrix of take off angle values in degrees - 0 down (NED coordinate system).
As with all of these dictionaries, the indexes of the observations and errors must correspond to the stations, i.e. data['Measured'][0,0] -> from data['Stations']['Name'][0] with error data['Error'][0,:] etc.
Relative Amplitude Ratios
- PAmplitude
- SHAmplitude
- SVAmplitude
- PQAmplitude
- SHQAmplitude
- SVQAmplitude
- PRMSAmplitude
- SHRMSAmplitude
- SVRMSAmplitude
- PQRMSAmplitude
- SHQRMSAmplitude
- SVQRMSAmplitude
Relative Amplitude ratios use amplitude observations for different events made manually or automatically. The Q is not necessary but is useful to label the amplitudes with a Q correction. The corresponding data dictionary for amplitude ratios needs the following keys:
Measured: numpy matrix of amplitude observations for the event.
Error: numpy matrix of uncertainty (standard deviation) in the amplitude observations.
Stations: dictionary of station information with keys:
- Name: list of station names
- Azimuth: numpy matrix of azimuth values in degrees
- TakeOffAngle: numpy matrix of take off angle values in degrees - 0 down (NED coordinate system).
As with all of these dictionaries, the indexes of the observations and errors must correspond to the stations, i.e. data['Measured'][0,0] -> from data['Stations']['Name'][0] with error data['Error'][0,:] etc.
Angle Scatter Format The angle scatter files can be generated using a utility Scat2Angle based on the NonLinLoc angle code. The angle scatter file is a text file with samples separated by blank lines. The expected format is for samples from the location PDF that have been converted into take off and azimuth angles (in degrees) for the stations, along with a probability value. It is important that the samples are drawn from the location PDF as a Monte Carlo based integration approach is used for marginalising over the uncertainty.
It is possible to use XYZ2Angle to samples drawn from a location PDF using the NonLinLoc angle approach.
Expected format is:
Probability StationName Azimuth TakeOffAngle StationName Azimuth TakeOffAngle Probability . . .
With TakeOffAngle as 0 down (NED coordinate system). e.g.:
504.7 S0271 231.1 154.7 S0649 42.9 109.7 S0484 21.2 145.4 S0263 256.4 122.7 S0142 197.4 137.6 S0244 229.7 148.1 S0415 75.6 122.8 S0065 187.5 126.1 S0362 85.3 128.2 S0450 307.5 137.7 S0534 355.8 138.2 S0641 14.7 120.2 S0155 123.5 117 S0162 231.8 127.5 S0650 45.9 108.2 S0195 193.8 147.3 S0517 53.7 124.2 S0004 218.4 109.8 S0588 12.9 128.6 S0377 325.5 165.3 S0618 29.4 120.5 S0347 278.9 149.5 S0529 326.1 131.7 S0083 223.7 118.2 S0595 42.6 117.8 S0236 253.6 118.6 502.7 S0271 233.1 152.7 S0649 45.9 101.7 S0484 25.2 141.4 S0263 258.4 120.7 . . .
Initialisation of inversion object
Parameters: - data (dict/list) -- Dictionary or list of dictionaries containing data for inversion. Can be ignored if a data_file is passed as an argument (for data format, see below).
- data_file (str) -- Path or list of file paths containing (binary) pickled data dictionaries.
- location_pdf_file_path (str) -- Path or list of file paths to angle scatter files (for file format, see above - other format extensions can be added using setuptools entry points).
- algorithm (str) -- ['iterate'] algorithm selector
- parallel (bool) -- [True] Run the inversion in parallel using multiprocessing
- n (int) -- [0] Number of workers, default is to use as many as returned by multiprocessing.cpu_count
- phy_mem (int) -- [8] Estimated physical memory to use (used for determining array sizes, it is likely that more memory will be used, and if so no errors are forced). On python versions <2.7.4 there is a bug (http://bugs.python.org/issue13555) with pickle that limits the total number of samples when running in parallel, so large (>20GB) phy_mem allocations per process are ignored.
- dc (bool) -- [False] Boolean flag as to run inversion constrained to double-couple or allowed to explore the full moment tensor space.
Keyword Arguments: - number_stations (int) -- [0] Used for estimating sample sizes in the Monte Carlo random sampling algorithms (Time,Iterate) if set.
- number_location_samples (int) -- [0] Used for estimating sample sizes in the Monte Carlo random sampling algorithms (Time,Iterate) if set.
- path (str) -- File path for output. Default is current working dir (interactive) or PBS workdir.
- fid (str) -- File name root for output, default is to use MTfitOutput or the event UID if set in the data dictionary.
- inversion_options (list) -- List of data types to be used in the iversion, if not set, the inversion uses all the data types in the data dictionary, irrespective of independence.
- diagnostic_output (bool) -- [False] Boolean flag to output diagnostic information in MATLAB save file.
- marginalise_relative (bool) -- [False] Boolean flag to marginalise over location/model uncertainty during relative amplitude inversion.
- mpi (bool) -- [False] Boolean flag to run using MPI from mpi4py (useful on cluster).
- multiple_events (bool) -- [False] Boolean flag to run all the events in the inversion in one single joint inversion.
- relative_amplitude (bool) -- [False] Boolean flag to run multiple_events including relative amplitude inversion.
- output_format (str) -- ['matlab'] str format style.
- minimum_number_intersections (int) -- [2] Integer minimum number of station intersections between events in relative amplitude inversion.
- quality_check (bool) -- [False] Boolean flag/Float value for maximum non-zero percentage check (stops inversion if event quality is poor)
- recover (bool) -- [False] Tries to recover pre-existing inversions, and does not re-run if an output file exists and is readable.
- file_sample (bool) -- [False] Saves samples to a mat file (allowing for easier recovery and reduced memory requirements but can slow the inversion down as requires disk writing every non-zero iteration. Only works well for recovery with random sampling methods)
- results_format (str) -- ['full_pdf'] Data format for the output
- marginalise_relative -- [False] Boolean flag to marginalise the location uncertainty between absolute and relative data.
- file_sample -- [False] save samples to file rather than keeping in memory (not necessary in normal use).
- normalise (bool) -- [True] normalise output Probability (doesn't affect ln_pdf output).
- convert (bool) -- Convert output moment tensors to Tape parameters, Hudson u&v coordinates and strike-dip-rake triples.
- discard (bool) -- [False] Probability cut-off for discarding samples Discarding samples - samples less than 1/(discard*n_samples) of the maximum likelihood value are discarded as negligeable. False means no samples are discarded.
- c_generate (bool) -- [False] Generate samples in the probability calculation when using Cython.
- generate_cutoff (int) -- Set number of samples to cut-off at when using c_generate (Default is the value of max_samples)
- relative_loop (bool) -- [False] Loop over non-zero samples when using relative amplitudes.
- bin_angle_coefficient_samples (int) -- [0] Bin size in degrees when binning angle coefficients (All station angle differences must be within this range for samples to fall in the same bin)
- no_station_distribution (bool) -- [True] Boolean flag to output station distribution or not.
- max_samples (int) -- [6000000] Max number of samples when using the iterate algorithm.
- max_time (int) -- [600] Max time when using the time algorithm.
- verbosity (int) -- [0] Set verbosity level (0-4) high numbers mean more logging output and verbosity==4 means the debugger will be called on errors.
- debug (bool) -- [False] Sets debug on or off (True means verbosity set to 4).
Other kwargs are passed to the algorithm - see MTfit.algorithms documentation for help on those.