SPATIAL ANALYSIS OF EARTHQUAKES IN IRAQ USING STATISTICAL AND DATA MINING TECHNIQUES

Statistical and data mining techniques (DMTs) are applied to an earthquakes catalogue of Iraq to study the spatial distribution pattern of earthquakes over the period from 1900 to 2010. The employed techniques are Quadrant Account Analysis (QCA), Tree-clustering, k-means Clustering, Association rules, and Linear Regression. Results of QCA showed that the pattern of earthquake occurrence beneath Iraq was spatially clustered. According to results of application of tree-clustering, earthquakes were grouped into nine clusters depending on degree of similarity between events. Results K-means clustering confirmed results of tree-clustering. Application of association rules failed to generate association rules between the earthquakes parameters (location, depth and magnitude, ...etc.). A weak relationship between depth and magnitude was the result of application of linear regression.


INTRODUCTION
Earthquakes are ground motions caused by the sudden release of elastic energy stored in the rocks along period of time.Earthquakes have very complex spatiotemporal distribution (Turcott, 1993;Sornett, 1999;Bak et al., 2002;Vecchio et al., 2008).The spatial, temporal and energy distribution of earthquakes was investigated in last decades in terms of geodynamical characterization of the seismic process and seismic hazard analysis (Main, 1995, andDe Rubeis et al., 1997).Three empirical statistical laws represent the basis for earthquakes models development: 1. Omori law; 2. Gutenberg-Richter law; 3. Fractal distribution (Vecchio et al., 2008).

Seismic Data
The complete and homogeneous magnitude earthquakes catalogue, which compiled by Al-Heety (2014), was used as a seismic data source for this work.This catalogue spanned the time interval from 1900 to 2010 and covered the area 29° to 37.5° E and 39° to 48° N. It includes 726 earthquakes.The current catalogue was compiled depending on a previous published catalogues (Fahmi andAl-Abbasi, 1989, andAmeer et al., 2005) and the seismological bulletins including those of the International Seismological Center (ISC), National Earthquakes Information Center (NEIC), and European Mediterranean Seismological Center (EMSC).The method employed to investigate the spatial distribution of earthquakes in Iraq is called Quadrant Count Analysis (QCA) (Cressie andWikle, 2010, andRogers andGomar, 2010).The QCA involves dividing the region into a grid with cells of equal size, called quadrant.The number of points in each cell is counted.The regular point process generates a large number of quadrants containing only a single point, some empty quadrants and a very few quadrants with more than one point in them.
Conversely, a clustered point process produces a very large number of empty quadrants, a few quadrants with one or two points, and several quadrants with many points in them.To evaluate the distribution pattern, we use the variance-to-mean ratio (VTMR) or the index of dispersion (Alhamdi et al., 2013): Where σ 2 is the variance and μ is the mean.If the VTMR is greater than 1(σ 2 > μ) the pattern is clustered (Negative binomial distribution), if the VTMR equals to 1(σ 2 = μ) the pattern is random (Poisson distribution), and if the VTMR is less than 1(σ 2 < μ), the pattern is regular (Binomial distribution).The whole area is divided into a grid with cells of equal size (quadrants 1. 0° longitude by 1.0° latitude) and the pattern is shown in Figure 2. The number of points in each cell within the study region is counted and sample statistics are calculated.

DATA MINING TECHNIQUES (DMTS)
The following DMTs are employed to investigate the seismicity of Iraq as:

Cluster Analysis
The purpose of the cluster analysis is to join together the similar objects into subgroups (called clusters) so that objects (observations) in the same cluster are similar in some sense.There are three classes of cluster analysis techniques: Joining (Tree Clustering), Two-way Joining (Block Clustering), and K-mean Clustering.In this study, the tree clustering and k-means clustering were employed.

Tree Clustering Method
Hierarchical cluster analysis, the most common approach of tree clustering, starts with each case in a separate cluster and joins clusters together step by step until only one cluster remains.The subsequent clusters of objects (observations) ought to display high inner (inside clusters) homogeneity and high outer (between clusters) heterogeneity (McGarial et al., 2000).The Euclidean distance as a rule gives the similarity between two observations, and a distance can be explained by the distance between observed (analytical) values from the samples or observations (Otto, 1998).
The squared Euclidean distance (D 2 ) is computed as follow: The results of the application of the tree clustering technique are the best in which it is described using a dendogram or binary tree.The dendogram gives a visual outline of the clustering processes, showing a picture of the groups and their vicinity, with an effective decrease in dimensionality of the premier data (Tabachnick and Fidell, 1996).

k-Means Clustering Method
This technique for clustering is altogether different from the Joining (Tree Clustering).It means to segment n objects (obsevations) into k clusters so that the subsequent intra-cluster similarity is high but the between cluster similarity is low.
Cluster similarity is measured with respect to the mean value of the object in a cluster, which can be seen as the cluster's focal point of gravity (Sriniv Asamurthy et al., 2014).
This method begins with k irregular cluster, each of which at first appears a cluster mean or focus.For each remaining objects, an object is indicated to the cluster in which it is most similar, taking into account the distance between the object and the cluster mean.It then calculates the new mean for every cluster.The execution of a clustering calculation might be influenced by the selected value of k.We and with number of iterations.

Association Rules
The objective of this method which depicted in this section is to recognize connections or relationship between particular values of definite variables in vast data sets.Since it was proposed by Agrawal et al. (1993), the task association rules mining has gotten a lot of consideration.Briefly, an association rules is an expression X Y, where X and Y are sets of items.The sense of such rule is just conjectural: Given database D of transactions where every transaction T D is a set of items, X Y explicit that at whatever point an transaction T contains X than T most likely contains Y moreover.The probability or rule confidence is defined as "the percentage of transactions Y in addition to X with regard to the overall number of transactions containing X" (Hipp et al., 2000).Association rules can provide features that at first glance may not be visible in a large data set, since its case of understanding and effective in time to find interesting relationships.Using the association rules technique on earthquakes data allow us to find significant relationships between earthquakes parameters such as epicentral location, depth and magnitude.

Linear Regression
It is a method that predicts the numerical value for a variable from the known value of others.The meaning of this method is to some degree like the sorting with the distinction that in the regression are predictive variables and a numeric class variable (Somodevilla et al., 2012).The simple linear regression should be distinguished from multivariate linear regression by the number of dependent variables while in the simple regression there is only one dependent variable, it is employed more than one dependent variables in the linear regression.This method can be utilized to anticipate earthquake parameter, for example, seismic tremor profundity from different parameters, area or extent.

Statistical Analysis
The data were statistically analyzed using the STATISTICA software (Stat. Soft. Inc., 2007).The descriptive statistics, graphs and data mining techniques were carried out using this software.

RESULTS AND DISSCUSION
The descriptive statistics of earthquake data are summarized in  (Alsinawi and Ghalib, 1975;Alsinawi and Issa, 1986;Alsinawi and Al-Qasrabi, 2003;Al-Abbasi and Fahmi, 1985;Fahmi andAl-Abbasi, 1989, andAbd Alridha andJasem, 2013).Majority of earthquakes in this period was recorded during 2000 to 2010 (Fig. 6).This result can be interpreted in terms of increase of deployment the regional seismological observatories and rehabilitation of Iraqi Seismic Network and/ or increasing the seismic activity.The results of QCA as spatial statistical technique are listed in Table 2. VTMR was greater than 1 for all earthquake magnitude categories which indicates that the pattern of earthquake occurrence beneath Iraq from 1900 to 2010 was spatially clustered.

Tree-Clustering
The result of application of the Hierarchical cluster analysis, the most common approach of tree clustering, was presented as dendrogram illustrated in Figure (7).With regard the dendrogram, the earthquakes were grouped into nine statistically significant clusters.This result is in good agreement of the spatial distribution of events as illustrated by bivariate histogram, (Fig. 8).This result is consistent with that of QCA which indicates a tendency towards spatial clustering for earthquakes beneath Iraq.

k-Means Clustering
K-mean clustering was referred to as "analysis of variance (ANOVA) in reverse".
In an ANOVA, the between-groups variance is compared to the within-groups variance in order to decide whether the means for a particular variable are significantly different between these groups.According to the result of ANOVA, Table 3, depending on the value of significance level at p ≤ 0.05, parameters depth, moment magnitude (Mw), longitude and latitude are the major criteria for assigning objects to clusters.We adopted the results of k-means algorithm with k = 9 because it is consistent with the results of Hierarchical cluster analysis.It can be noticed that most of the earthquakes has a depth less than 50 Km indicating that they are shallow earthquakes.However Table 4, shows the centroids of each cluster.According to Figure ( 9), the results can be verified in terms of earthquake depth, that they are similar to the seismic events of similar latitude, longitude and magnitude.

Association Rules
To illustrate the application of association rules and the interpretation of its results, only respondent's latitude, longitude, depth and magnitude will be analyzed.The respondent's latitude and longitude are entered as a categorical variable.Other variables (magnitude and depth) entered as multiple response variables.Application such data mining technique did not generate any rules indicating that the data did not contain any associations between their variables, given the current specifications for the rule-finding algorithm.

Linear Regression
The application of linear regression analysis on earthquakes database of Iraq showed a weak relationship between the earthquake depth and magnitude, Fig. 10.The equation of this relationship can use with extreme caution.The application did not show any relationship between the depth with latitude and longitude, respectively, and also between the magnitude with latitude and longitude.
Figure (1) shows the epicentral map of the earthquakes during the 1900 to 2010.

Fig. 2 :
Fig. 2: The quadrats 1° latitude by 1° longitude and pattern adopted the technique proposed byPham et al. (2005) to select the quantity of k with a specific end goal to execute the k-mean calculation.Such proposed procedure can recommend numerous values of k to clients for situations when diverse clustering results could be gotten with different required levels of point of interest.The legitimacy of the clustering result is appreciated just visually without applying any formal execution measures.In this work, we executed the k-mean calculation with various k taken as(3, 5, 7, 9, 10, 15 and 19)