SPE 166449 New Approach to Identify Analogue Reservoirs H. Martin Rodriguez, E. Escobar, S. Embid, N. Rodriguez, and M. Hegazy, Repsol, and Larry W. Lake, SPE, The University of Texas at Austin
Copyright 2013, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Annual Technical Conference and Exhibition held in New Orleans, Louisiana, USA, 30 September –2 October 2013. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract The identification of analogous reservoirs is an important step in planning the development of a new field, because the information available about the new areas is usually limited or even nonexistent. Traditionally the search for analogous reservoirs has been made by experienced geoscientists, but this practice is subject to availability of this experience and the results are heavily dominated by geology. In this paper we present a systematic and unbiased procedure to search for analogous reservoirs, based on information contained in a validated large database of reservoirs parameters, both engineering and geologic. Each reservoir has its own “fingerprint” characterized by the set of its own properties, which differ from one reservoir to another. The method uses multivariate statistical techniques to find a unique and reproducible list of reservoirs with fingerprints that are most similar to the selected target. The flexibility of the method allows variation of the similarity function (weights) and evaluation of different scenarios (static, dynamic, PVT behavior, etc.). Our method basically consists of four steps: Data Preprocessing, Key Parameters Selection, Multivariate Analysis, and Similarity Ranking. The first step consists of the analysis and preprocessing of the available database. In the second step, Key Parameter Selection, variables with largest impact on the case to be evaluated are identified. The third step, Multivariate Analysis, applies several multivariate techniques such as principal component analysis (PCA) and cluster analysis. Finally, in the Ranking step, we apply a similarity function to the group of previously selected “analogous reservoirs”, generating a similarity ranking of analogous reservoirs. To validate this new method we use the Casablanca Casablanca oil field as a target reservoir. Casablanca is a mature carbonate reservoir very well known by Repsol whose experts identified four analogues for this target. The new developed method was independently applied in this case to obtain 19 analogous reservoirs sorted by similarity criteria. The maximum similarity found was 85 % for the Amposta Marino reservoir, one of the independently identified analogous reservoirs given by the business unit team. Moreover, the four analogous reservoirs previously identified ide ntified by the business unit team were between the first ten positions in the similar ity ranking. These results are highly encouraging as it captures the know -how of the experts and ensures a reproducible response, regardless of the user expertise. The most relevant advantage of this new method is that it is based on a similarity function that takes into account all the weighted key parameters simultaneously, instead of sequential filters used by some commercial software. As a result, the procedure we present in this work will support the predictive search of missing properties for the target reservoir, reducing the uncertainty for decision making.
Introduction In Oil & Gas Exploration and Production, analogous reservoirs are normally used in many ways to study reservoir that lack critical knowledge. In this paper, we will call a “target reservoir ” any reservoir any reservoir with a deficit of information to which we want to identify a ranked list of analogues reservoirs. Besides, analogous reservoirs are those with the most similar fingerprints to the selected target based on the selected key parameters, not necessarily geographically closed.
2
SPE 166449
Often, relatively similar neighboring reservoirs are used as a first approximation of analogous reservoirs to provide reservoir data as: PVT properties, and petrophysic information. However, although this is not always the best decision, this is done because a lack of a simple method for identifying analogs in the short term. In other cases, the development plan and production behavior of analogous reservoirs are used as a reference for preliminary forecasting of less mature projects. The screening criterion of EOR processes is other example, where analogous reservoir concepts are used to pre-select the potential recovery processes for application to a target reservoir. All the same, one of the best known uses of analogous reservoir is for estimation of reserves (Harrel, 2004; Hodgin, 2006; Sidle, 2010). However, a literature review about the processes used in the oil industry to systematically identify analogous reservoirs showed that this is an area with limited technical references. Also, some major oil companies consider the analogous reservoir identification skill as a differentiating capability for reservoir assessments. There are many reasons to develop a systematic method to identify analogous reservoirs. From a reservoir development point of view, one of the most important reasons is to support the assessment of new business opportunities that are normally restricted by short evaluation times, and limited amount of specific information from a particular or target reservoir. The objective of this paper is to describe a statistical method for systematic identification of analogous reservoirs. It includes the procedure followed to integrate a reservoir database to be used by the statistical analysis. The developed method was successfully validated with a well-known Respol carbonate reservoir, which its respective analogous reservoirs were previously identified by a qualified geoscience team. Among the advantages of the method presented in this paper are: (1) It generates a list of analogous reservoirs ranked by similarity criteria in a quantitative form. (2) It helps to diminish the error originated by an evaluator because of: prejudice, erroneous or lack of experience. (3) It supports the predictive estimation of missing properties for target reservoir. (4) It provides flexibility to improve or to adapt to different needs. (5) For the final user it is very easy to use. (6) It needs a minimum amount of reservoir key parameters.
Method The method proposed in this work basically consists of four steps: Data Preprocessing, Key Parameters Selection, Multivariate Analysis, and Similarity Ranking. Fig. 1 shows the flow diagram of this new method.
DATA PREPROCESSING
KEY PARAMETER SELECTION
MULTIVARIATE ANALYSIS
SIMILARITY RANKING
Figure 1. Flow diagram of the proposed method.
The first step corresponds to the analysis and preprocessing of the available database. It includes data validation, normalization, outlier identification, missing values imputation and standardization of the properties. During the Key Parameter Selection, experts identify the variables with largest impact on the specific used case to be evaluated. At the third step we apply several multivariate techniques as principal component analysis (PCA) (Aminzadeh, 2005 and Sharma et al ., 2010) and cluster analysis (Sharma et al ., 2010). PCA is used to reduce the dimension of the problem and avoid co-linearity between the properties describing reservoirs. Cluster Analysis is then applied to the principal components extracted to identify those “analogous reservoirs” lying in the same cluster as the target reservoir. Finally, at the Ranking step, we apply a similarity function to the group of previously selected “analogous reservoirs” and generate a similarity ranking based on the normalized similarity function.
Data Preprocessing The main source of information to be used is a large database of reservoirs, created by merging existing databases from different business units inside the company.
SPE 166449
3
This large database, from now on the master database (MDB), is organized by cases (rows), which will be the different reservoirs, and variables (columns), that will be the different properties or characteristics taken into account for each reservoir. Table 1 shows a fragment of the MDB.
Table 1. Snapshot of the first few lines and columns of the Master Database
The data preprocessing consist of five consecutive steps (Tan et al ., 2006): (1) data validation, (2) normalization, (3) outlier identification (and elimination), (4) missing values analysis (and imputation), and (5) standardization. All these five steps must be done before the multivariate analysis. This data preprocessing needs to be repeated only if the MDB is updated. Data validation Depending on the type of variable (string, numeric, continuous, nominal, and ordinal), the treatment of data and the statistical methods applicable will be different. In our case we have originally numeric (continuous) data and string nominal data (string categorical variables). So, the first step is to define correctly the type of variable that we will be dealing with.
One issue related to the special characteristic of hierarchical categorical variables is an excessive number of categories and subcategories for a given property, which makes its analysis and interpretation very difficult. Some of these subcategories have very few cases (< 5 %), so they are not statistically representative. To solve this inconvenience some of these less representative subcategories have been grouped into its respective higher level, reducing the number of levels in the hierarchical variable. An example of this is in Table 2, which shows a fragment of the Klemme´s basins classification (Klemme, 1980), the Type II which refers to Continental Multicyclic Basins. In this case, all items inside the third level: 2Ca, 2Cb and 2Cc are all merged into category 2C in the second level.
Table 2. Type II of the Klemme´s basins classification. 2. Continental Multicyclic Basins
2A. Craton Margin - Composite 2B. Craton Accreted Margin - Complex 2C. Crustal Collision Zone – Convergent plate margin
2Ca. Closed 2Cb. Trough 2Cc. Open
On the other hand, some categorical variables in the MDB have different categories in the same cell (sorted by importance or age), giving rise to many possible combinations of categories. This excess of combinations makes the interpretation unfeasible. To avoid this problem we take into account only the most important category or the most recent one in every cell. Finally, to allow using some needed statistical methods later in the multivariate analysis (without losing any information), all the string nominal data are changed into numeric nominal data. So, the exact correspondence between the original string values and the new numeric values must be described and preserved. This change has been done for all the categorical variables.
4
SPE 166449
Normalization One of the hypothesis of many of the statistical analysis used in this method is that the variables must follow a normal probability distribution. To check agreement of this hypothesis we make histograms and normality analysis for each variable in the MDB.
As a result, most of the variables followed a normal probability distribution; the ones that do not do that follow a lognormal probability distribution. An example of this last kind of variables can be seen in Fig. 2. To normalize these variables, some transformation must be applied (Box-Cox power transformation), and for the case shown the transformation is equivalent to taking logarithms of these variables. An example of the result of applying this transformation is in Fig. 3. From now on when we refer to these variables, it will be understood that we mean the logarithm of them. As an example, for comparison purposes, the following figures show the effect of taking logar ithms of the variable Dip Angle.
Figure 2. Example of a variable following a lognormal probability distribution (original variable).
Figure 3. Example of variable in Figure 2 after applying a logarithm transformation.
SPE 166449
5
Outlier identification Outliers can have a disastrous effect on the results of the statistical analysis used in this method. To avoid this, a step is added to identify and delete those outliers found. To detect the presence of outliers we use a Box-plot analysis for every variable in the MDB, a very useful graphical tool for this purpose (Walpole et al , 2012). Fig. 4 shows a Box-plot for the variable Averarage Matrix porosity (%), where one reservoir with anomalous value for this property appear (square symbol in the right of the figure). The shaded area contains 50 % of the reported values, and the extended arms 95 %. The isolated points are outliers. The line in the center of the shaded box is the median.
Figure 4. Box-plot for the variable average matrix porosity (%).
Missing values analysis and imputation Missing values raise an important challenge because typical statistical modelling procedures discard these cases from the analysis. When there are f ew missing values (≤ 10 % of the total number of cases) and those values can be considered to be missing in a random way, then the typical method of list-wise deletion (if any of the variables have missing values, the whole case is omitted from the computations) is a “good” alternative. This “quick solution” could not be applied if there are many cases (> 10 %) with missing values, because then we would loose a great amount of information.
When there are too many cases with missing values, what is needed is an accurate estimation of these missing values. The method used in this work for imputation of missing numeric scale data is the univariate model Multiple Linear Regression, while Logistic Regression is used as the univariate model for dichotomous categorical variables. When there are more than two outcome categories in the variable we used Discriminant Analysis and Multinomial Logistic Regression (César Pérez, 2004). During imputation, each variable in the MDB may be selected as independent variable or not, and can be restricted the range of imputed values of a numeric scale variable so that they are plausible. In addition, the imputation will be restricted to variables with less than a maximum percentage of missing values, in this work we use a 30 % cutoff. After imputation, a user should check the results. In our case imputed values are inside the original range for any given property, and the values of the mean, median and standard deviation for the whole set of data (included imputed ones) do not change significantly from the original ones. An important issue related to missing values imputation is that a new imputation is accomplished every time that the MDB is updated with new reservoirs or new data. This is so because it is necessary to analyse again the new missing data pattern, which may change, and the new ranges of the properties, because they could change too, what means that the imputed values could be really different. Standardization Additionally, all the variables should be standardized to make their ranges have about the same order of magnitude. Having
6
SPE 166449
variables with very different orders of magnitude could influence the results of the multivariate analysis. So, every variable (column in the MDB) is standardized by subtracting its mean and dividing by its standard deviation. Thus, in the following analysis all the variables are standardized unless explicitly stated otherwise.
Selection of Key Parameters The use of analogs is a very common practice in the E&P industry. Identifying reservoirs with similar features and characteristics is a reliable way to infer unknown parameters from the target reservoir under evaluation and to analyze production strategies. Even if it is almost a routine activity inside the companies, little has been said about procedures to identify this similarity among reservoirs. However, it is well accepted that the validity of the analogs depend on the purpose followed. In this sense, we may define different kind of needs or “type problems”: static (not related with production), dynamic (related with production), PVT behaviour (related with fluid properties), etc. The parameters used to prove the analogy (commonly referred as Key Parameters, KP) change depending on the future use of the analogs. Therefore, it is crucial to dedicate some time to define the role of the desired analogs, and to select the appropriate set of Key Parameters to reduce to a minimum the uncertainty entailed in the act of assuming reservoir characteristics. The majority of the references which deal with the subject of finding analogous reservoirs (Harrel, 2004; Hodgin, 2006; Sidle, 2010) talk about several categories of data or KP used to find analogies: geological, petrophysical, engineering and operational. In this work we use the following KP shown in T able 3.
Table 3. List of Key Parameters used in this work.
BASIN CODE (BALLY) BASIN CODE (KLEMME) PRESENT TECTONICS CODE FLUID TYPE CODE PRINCIPAL STRUCTURAL CODE PRIMARY TRAP TYPE CODE log_NUMBER OF STRUCTURAL COMPARTMENTS TOP RESERVOIR DEPTH (m) log_DIP ANGLE (º) log_AREA (km2) Log_ORIGINAL TOTAL HC COLUMN HEIGHT (m) PRIMARY SEDIMENTARY SYSTEM CODE PRIMARY SEDIMENTARY ENVIRONMENT CODE log_NUMBER OF STRATIGRAPHIC COMPARTMENTS PRIMARY LITHOLOGY CODE PRIMARY POROSITY TYPE CODE SECONDARY POROSITY TYPE CODE log_AVERAGE GROSS THIKNESS (m) log_AVERAGE NET PAY (m) AVERAGE MATRIX POROSITY (%) log_AVERAGE AIR PERMEABILITY (mD) AVERAGE WATER SATURATION (%) DIAGENETIC PROCESS CODE FRACTURE RESERVOIR CLASSIFICATION CODE
SPE 166449
7
AVERAGE API GRAVITY (ºAPI) ORIGINAL TEMPERATURE GRADIENT (ºC/m) TEMPERATURE AT TOP RESERVOIR DEPTH (ºC) ORIGINAL PRESSURE GRADIENT (KPa/m) PRESSURE AT TOP RESERVOIR DEPTH (KPa) PRIMARY DRIVE MECHANISM CODE
Multivariate Analysis The core of the current method presented is based on a multivariate statistical analysis of the preprocessed KP database (KPDB). The statistical techniques to be applied are Principal Components Analysis (PCA) and Cluster Analysis. Principal Components Analysis PCA is often used in dimension reduction, to identify a small number of new latent variables (principal components, PC) that explain most of the variance that is observed in a much larger number of original variables. This fact is very important for conceptual clarification and simplification of the later analysis.
The new variables (principal components) are computed as linear combinations of the original variables. They have the desirable property that they are uncorrelated with each other. That the variables are uncorrelated makes the results easy to attribute back to their source in Cluster Analysis. In this study we use the PCA method. This method is used when there is a more emphasis on data reduction and less on interpretation, whereas Factor Analysis is used when there is interest in studying the relationships among the variables. Another advantage of the PCA method is that it is insensitive to the variables being multicollinear (Daniel Peña, 2002). Taking into account that the variables may have very different orders of magnitude, we apply the PCA to the standardized variables. PCA tries to account for the maximum amount of variation in the set of variables. The value of every eigenvalue of the correlation matrix represents the standardized variance in the original variables that is taken into account by its respective component. On the other hand, the associated eigenvector defines the coefficients of every original variable in its respective component. The maximum amount of standardized variance contained in a single variable is 1. So, if an eigenvalue is greater than 1, it must account for variation in several variables. The first component selected has maximum variance (the greater the variance, the greater the information it has). So, successive principal components explain progressively smaller portions of the variance. The result of applying PCA to our data can be shown in Table 4.
Table 4. Absolute, relative and cumulative variance obtained from applying PCA to the KPDB. Value of the Component
Cumulative Eigenvalue
Variance (%) Variance (%)
1
4.64
15.5
15.5
2
2.99
10.0
25.5
3
2.72
9.1
34.6
4
2.06
6.9
41.5
5
1.80
6.0
47.5
6
1.62
5.4
52.9
8
SPE 166449
7
1.39
4.6
57.5
8
1.23
4.1
61.6
9
1.12
3.7
65.3
10
1.05
3.5
68.8
11
0.98
3.3
72.1
12
0.92
3.1
75.2
13
0.82
2.7
77.9
14
0.76
2.6
80.5
15
0.69
2.3
82.8
16
0.67
2.2
85.0
17
0.63
2.1
87.1
18
0.58
1.9
89.0
19
0.53
1.8
90.8
…
…
…
…
29
0.04
0.2
99.9
30
0.01
0.1
100.00
Usually, it is customary to select the first principal components that have a cumulative variance of 90 %. To fulfil this criterion taking into account Table 4, we should select the first 19 principal components. If we would decide to relax this criterion, and to maintain only 80 % of the original variance, we should select the first 14 principal components. Another criterion is to take only those principal components with a variance greater than the mean variance. When the correlation matrix is used to compute the principal components, the mean value of the variance is 1, so, we would select only those PC with an absolute variance greater than 1. In this case, taking into account Table 4 we would select the first 10 principal components (maintaining only the 69 % of the variance in the original data). Depending on the criterion used, the reduction in the number of variables maybe significant. In this case, even with the last criterion, the reduction in the number of variables is only t wo-third of the number of original variables, maintaining 10 PC. Component loadings for the first 10 PC are in Table 5, which shows the weights for the KP for each PC.
SPE 166449
9
Table 5. Component loadings for the first 10 PCs.
Cluster analysis Cluster Analysis is a multivariate statistical method for automatic or unsupervised data classification. In this work Cluster Analysis is used to find groups (clusters) of similar reservoirs based on the variables examined, in this case the PC. where the similarity between members of the same group is large and the similarity between members of different groups must be low. The results can be used to identify associations t hat would otherwise not be apparent.
10
SPE 166449
Applying Cluster Analysis to the components scores resulting from the PCA, instead of applying it directly to the original variables, has several advantages, it: i) simplifies the later interpretation of results and ii) avoids the problem of weighting in excess some concepts that are taken into account at the same time b y similar properties in the original variables. The way we use Cluster Analysis is with the aim of identifying the cluster in which the target reservoir belongs. In that way, all the reservoirs belonging to that cluster will be considered analogous reservoirs. This concept is explained in Fig. 5, in this particular case the target reservoir belong to the cluster number 2.
Figure 5. Idealization of the application of cluster analysis for the identification of analogous reservoirs.
We use two different methods of cluster analysis: Hierarchical method with Ward´s grouping rule and Two Step method (César Pérez, 2004). H ierarchi cal cluster analysis
This procedure attempts to identify relatively homogeneous groups of reservoirs based on selected characteristics (principal components in this work). Using an algorithm that starts with each reservoir in a separate cluster, it combines clusters until only one is left (to which all reservoirs belong). Ward’s grouping rule is selected as an internal parameter in the hierarchical cluster analysis. This rule uses criteria for association of clusters those movements that produce the minimum increase in the residual variance, so, in 3D it would tend to create spherical shaped clusters. One of the main outputs of this method is the dendrogram, a graphical representation used to assess the cohesiveness of the clusters formed, and at the same time provide information about the appropriate number of clusters to keep. The dendogram has a hierarchical structure, classifying all the reservoirs inside clusters in different levels, where the different levels represent the distances between every cluster. Two Step cluster anal ysis
The two steps in this algorithm refer to pre-clustering and clustering steps. During the pre-clustering step, individual cases are grouped into pre-clusters in a single data pass. Next, in the clustering step, an agglomerative hierarchical cluster analysis algorithm is applied to the pre-clusters. Statistical criteria are recorded as clustering is performed and are used to determine the optimal number of clusters (within a user-specified range). As an internal parameter in the two step cluster analysis we have selected the following choices: Distance measure: Log-likelihood. This selection determines how the similarity between two clusters is computed. The likelihood measure places a probability distribution on the variables. The two step procedure deals with both categorical and continuous variables, and their distinct properties are taken into account b y the log-likelihood distance measure.
Clustering criterion: This selection determines how the automatic clustering algorithm determines the number of clusters. Either the Schwarz´s Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) can be specified. These two criteria are used to choose the optimal number of clusters, through evaluation of the models goodness-of-fit.
The Two Step Cluster Analysis procedure has several potential advantages compared to other clustering methods, which make
SPE 166449
11
it the better choice for our kind of data: It can automatically select the number of clusters based on statistical criteria.
This procedure works with both continuous and categorical variables, taking account of their different properties.
It is recommended when applied to large data sets.
Similarity Ranking Once the analogous reservoirs have been identified (those reservoirs that belong to the same cluster as the target reservoir), we apply a similarity function to these a priori analogous reservoirs to classify them according to decreasing similarity regarding the target reservoir. The aim here is to make a ranking of similarity with the analogous reservoirs regarding the target reservoir, that is, to measure in a quantitative way which reservoirs (within all analogous reservoirs) are more similar to the target reservoir, and how strong is this si milarity. This concept is explained in Fig. 6.
Figure 6. Idealization of the similarity ranking of analogous reservoirs.
In this step we work with only a reduced set of reservoirs, those reservoirs that belong to the same cluster that the target reservoir. These reservoirs are considered a priori analogous reservoirs. All the information needed to apply the similarity function is represented in a schematic database in Fig. 7:
KP 1
KP 2
…
KP (j)
…
KP p
Analogous Reservoir 1 Analogous Reservoir 2 …
Analogous Reservoir (i) …
Analogous Reservoir n Target Reservoir (t) Figure 7. Schematic database containing all the information needed to apply the similarity function.
If the variable is continuous, for the j th KP we define the following similarity function between one analogous reservoir (i) and the target reservoir (t):
| |( ) Where:
( ) } }
If the variable is categorical, for the j th KP we define in this work the following similarity function between one analogous
12
SPE 166449
reservoir (i) and the target reservoir (t):
{
Now we define the following global similarity function between one analogous reservoir (i) and the target reservoir (t). This expression is known as Gower´s General Similarity Coefficient (Gower, 1971):
∑ ∑ Where: are binary weights, in the following sense:
{ In this work all the weights are equal to one, but they may be different depending on the kind of KP and purpose of the analogs.
Validation of the Method Using a Well-known Target The Casablanca oil field is one of the main producer fields in the Spanish Mediterranean Sea. Discovered in 1975, is still in operation with Repsol as main operator. Geographically, it is located at 45 km offshore south-east of the city of Tarragona (Spain). The main reservoir rocks are Upper Jurassic and Lower Cretaceous shallow marine carbonates (Orlopp, 1988; Lomando et al., 1993). In general terms, once the method has been developed, it should be checked using a well-known case. For this purpose we are going to use the Casablanca field (henceforth Casablanca) as the target reservoir, because we know some analogous r eservoirs from it. In this example, due to confidentiality reasons, all the data used co me from the C&C Reservoirs DAKS commercial database (C&C Reservoirs Limited 2013). In the first step of the multivariate analysis we apply PCA to the KPDB, selecting finally 19 PC to follow the criterion of selecting the first PC that have an accumulated variance of 90 % (losing as little information as possible). Then we apply both clustering techniques Hierarchical Ward and Two Step on these 19 PCs. The Hierarchical Ward method results in a dendogram. Fig. 8 below shows a fragment of this dendogram, where we have outlined in red the cluster of Casablanca´s analogous reservoirs. This cluster consists of 17 reservoirs, including, of course, Casablanca itself.
SPE 166449
13
Figure 8. Fragment of the dendogram obtained as result of applying hierarchical cluster analysis. Casablanca´s analogous reservoirs are shown in red outline.
In this dendogram it could be interpreted that the cluster of Casablanca´s analogous reservoir is the merger of other two little clusters with great similarity between them (this similarity is measured by the short length of the horizontal lines in blue that merge them). In this case, the next possible grouping would need a big jump (three times longer length of the horizontal line in yellow) to merge this cluster with others. So, we interpret that the “natural” classification of Casablanca´s analogous reservoirs consists of 17 reservoirs. We also applied the Two Step method with different numbers of clusters, trying to optimize this parameter to obtain a “natural” classification of reservoirs. The final number of clusters selected is 25. With this configuration, the analogous c luster consists of 21 reservoirs, included Casablanca itself. The analogous reservoirs obtained with the Two Step method are in Table 6.
Table 6. Casablanca´s analogous reservoirs obtained with the two-step method. AMPOSTA MARINO [MONTSIA] ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)] AUK [ZECHSTEIN (HALIBUT)] BLACKBURN [NEVADA] CASABLANCA
14
SPE 166449
EAGLE SPRINGS [SHEEP PASS] GELA [TAORMINA] GRANT CANYON [GUILMETTE] LIUBEI [WUMISHAN] MARKOVO [OSA] MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS] RAGUSA [TAORMINA] RECHITSA [SEMILUKI] RECHITSA [VORONEZH] RECHITSA [ZADONSK] RENQIU [WUMISHAN] VEGA [SIRACUSA] VERKHNECHONA [DANILOV (PREOBRAZHEN HZ)] VERKHNEVILYUY [YURYAKH] YANLING [WUMISHAN] YIHEZHUANG [MAJIAGOU-BADOU]
Comparing the results obtained with both methods, we find that all analogous reservoirs identified by the Hierarchical Ward method are also identified by the Two Step method, except reservoir NAGYLENGYEL. Moreover, the Two Step method identifies five new analogous reservoirs that were not identified previously by the Hierarchical method: ARDMORE, AUK, MARKOVO, VERKHNECHONA and VERKHNEVILYUY. In this study, the criterion to define Casablanca´s analogous reservoirs after cluster analysis has been to choose “all those analogous reservoirs identified in at least one of the two clustering methods used”. So, the final list of Casablanca´s analo gous reservoirs after cluster analysis is in Table 7.
Table 7. Casablanca´s analogous reservoirs obtained after cluster analysis. AMPOSTA MARINO [MONTSIA] ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)] AUK [ZECHSTEIN (HALIBUT)] BLACKBURN [NEVADA] CASABLANCA EAGLE SPRINGS [SHEEP PASS] GELA [TAORMINA] GRANT CANYON [GUILMETTE] LIUBEI [WUMISHAN] MARKOVO [OSA] MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS] NAGYLENGYEL [MAIN DOLOMITE (HAUPTDOLOMITE)] RAGUSA [TAORMINA] RECHITSA [SEMILUKI] RECHITSA [VORONEZH] RECHITSA [ZADONSK] RENQIU [WUMISHAN] VEGA [SIRACUSA]
SPE 166449
15
VERKHNECHONA [DANILOV (PREOBRAZHEN HZ)] VERKHNEVILYUY [YURYAKH] YANLING [WUMISHAN] YIHEZHUANG [MAJIAGOU-BADOU]
To check if the method developed in this study is valid, the results obtained are compared with company´s previously known Casablanca´s analogous reservoirs: •
Amposta Marino
•
Nagylengyel
•
Yanling
•
Renqiu
As can be seen, the four known analogous reservoirs have also been identified as analogous reservoirs by the method developed in this study. Besides, a similarity function has been applied to all Casablanca´s analogous reservoirs after cluster analysis, to make a ranking of similarity regarding Casablanca reservoir. The similarity index and similarity ranking of Casablanca´s analogous reservoirs are represented in Table 8, and Fig. 9 below shows their geographical distribution.
Table 8. Similarity index and ranking of Casablanca´s analogous reservoirs obtained with the method proposed in this work. RESERVOIR
SIMILARITY INDEX
SIMILARITY RANKING
AMPOSTA MARINO [MONTSIA]
0,85
1º
RECHITSA [VORONEZH]
0,75
2º
YIHEZHUANG [MAJIAGOU-BADOU]
0,75
2º
RECHITSA [SEMILUKI]
0,74
3º
LIUBEI [WUMISHAN]
0,74
3º
MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS]
0,73
4º
RENQIU [WUMISHAN]
0,72
5º
YANLING [WUMISHAN]
0,72
5º
RECHITSA [ZADONSK]
0,71
6º
AUK [ZECHSTEIN (HALIBUT)]
0,68
7º
BLACKBURN [NEVADA]
0,68
7º
GRANT CANYON [GUILMETTE]
0,67
8º
NAGYLENGYEL [MAIN DOLOMITE (HAUPTDOLOMITE)]
0,66
9º
ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)]
0,66
9º
VEGA [SIRACUSA]
0,63
10º
EAGLE SPRINGS [SHEEP PASS]
0,61
11º
RAGUSA [TAORMINA]
0,59
12º
GELA [TAORMINA]
0,57
13º
MARKOVO [OSA]
0,57
13º
16
SPE 166449
Figure 9. Geographycal distribution of Casablanca reservoir and its analogous reservoirs
Using the method proposed in this work obtains the following results:
19 analogous reservoirs have been finally obtained.
The 4 previously known reservoirs analogous to Casablanca were also identified.
Within the 4 previously known analogous reservoirs, AMPOSTA M ARINO is the most similar (first i n the ranking), th th RENQIU and YANLING are 5 , and NAGYLENGYEL is 9 . These are in bold in Table 8 above. Similarity index is between 0.57 – 0.85. Only those reservoirs with similarity index greater than 0.5 are ultimately identified as analogous reservoirs (notice that 2 reservoirs from the 21 originally selected were left out b ecause they had a similarity index of only 0.4: VERKHNECHONA and VERKHNEVILYUY).
Summary and Conclusions Development of the statistical method for the identification of analogous reservoirs Using statistical criteria, we performed an evaluation of the existing data in the master database. In this way, outliers were detected, analyzed and corrected. Using a statistical procedure of imputation, all the missing data in the master database were estimated, which allowed us to work with a 100 % complete database. We developed a statistical and systematic method for the identification of analogues reservoirs. This method is based on principal component analysis using as input data the KP in the carbonate database, followed by a cluster generation using as imput data the PCs obtained in the previous step. The clustering methods used are Hierarchical Ward and Two Step. With each method and for every target reservoir, we generated the necessary clusters to obtain a “natural” grouping of reservoirs. Additionally, a similarity index was calculated for every analogous reservoir, based on the mean distance of the set of properties of every reservoir regarding the target reservoir. This similarity index allows us to generate a similarity ranking of all the analogous reservoirs inside the cluster to which the target reservoir belongs. Validation of the developed method We validated the new method using the Casablanca reservoir as the target, because it is a carbonate reservoir that is very wellknown inside the company, and a set of 4 analogous reservoirs previously identified by the business unit team after an independent analysis of the properties of Casablanca.
The developed method was successfully applied in this case, obtaining 19 analogous reservoirs sorted by similarity criteria regarding Casablanca reservoir. The maximum similarity found was 85 % for Amposta M arino reservoir. This new method provides a systematic procedure for the identification of analogous reservoirs, with the following advantages: Results are obtained in an objective manner. They are unaffected by personal criteria or lack of previous experience.
SPE 166449
17
The method allows the application of simultaneous similarity concepts. This is an improvement over the tools that are based on successive application of filters. The method allows estimation of missing parameters in the target reservoir, based on the information provided by the identified analogous reservoirs.
The analogous reservoirs found with this method obviously depend on the Key Parameters used (number, weighting and kind of them). Different KP or weighting of them will give inevitably different classifications. A very important step in this kind of analysis is the proper definition of the KP to be included and the value of its weights.
Acknowledgements The authors would like to thank C&C Reservoirs for granting permission to publish some of the results obtained in the example shown in this paper.
References 1.
Aminzadeh, F. 2005. Applications of AI and soft computing for challenging problems in the oil industry. Journal of Petroleum Science and Engineering .
2.
César Pérez. 2004. Técnicas de Análisis Multivariante de Datos. Prentice Hall .
3.
C&C Reservoirs Limited. 2013. C&C Reservoirs DAKS, http://www.ccreservoirs.com/ (accessed 5 April 2013).
4.
Daniel Peña. 2002. Análisis de Datos Multivariantes. McGraw-Hill.
5.
Harrell, D.R. and Hodgin, J.E. 2004. Oil and gas estimates: recurring mistakes and errors. SPE-91069.
6.
Hodgin, J.E and D.R. Harrell, D.R. 2006. The selection, application, and misapplication of reservoir analogs for the estimation of petroleum reserves. SPE-102505.
7.
Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics 27, 857-74.
8.
Klemme, H.D. 1980. Petroleum basins: classification and characteristics. Journal Petroleum Geology, Vol.3, No.2, 187 – 207.
9.
Orlopp, D.E.1988. Casablanca Oilfied, Spain: a karsted carbonate trap at the shelf edge. Proceedings of the Offshore Technology Conference, OTC 5734, 441-448.
10. Lomando, A.J., Harris, P.M., Orlopp, D.E. 1993. Casablanca Field, Tarragona Basin, Offshore Spain: A karsted carbonate reservoir. In: Fritz, R.D., Wilson, J.L., Yurewicz, D.A. (eds.). Paleokarst related hydrocarbon reservoirs, SEMP Core Workshop, 18, 201-225. 11. Sharma, S. Srinivasan and Larry W. Lake. 2010. Classification of oil and gas reservoirs based on recovery factor: a data-mining approach. SPE-130257. 12. Sidle, R.E. and Lee, W.J. 2010. An update on the use of reservoir analogs for the estimation of oil and gas reserves. SPE Economics & Management. 13. Tan, P., Steinbach, M. and Kumar, V. 2006. Introduction to Data Mining. Addison-Wesley. 14. Walpole, Myers & Ye. 2012. Probability and Statistics for Engineers and Scientists. Prentice-Hall .