Agreement Between Two Datasets

Barnhart HX, Lokhnygina Y, Kosinski AS, Haber M. Comparison of correlation coefficient and coefficient of individual agreement in the assessment of the agreement. J Biopharm Stat. 2007;17(4):721-38. Bland and Altman first proposed the limitations of the agreement method (loA) more than 30 years ago in their 1986 paper [5] as an alternative to correlation-based methods, which they thought they had not accurately characterized [19]. The 95% compliance limit is calculated simply in the form of m ± 2 SD, m the average value of differences between the types of values (e.g.B. differences in respiratory frequency measured simultaneously by the same participant with two different devices) and SD being the standard deviation of type differences. The boundaries of concordance are intended to quantify the dispersion between species. The wider the compliance limits, the more different the measurements of the devices are expected, indicating a lack of consistency between devices. To formally assess this agreement, the boundary values are compared to a clinically acceptable difference (CAO): an area in which differences are virtually considered negligible. When the limit values are included in the CAD domain, it is concluded that the devices agree and can be used interchangeably. The OAC should be decided prior to data analysis to avoid distortion in the decision, although the statistical validity of the method does not require it.

The limits of the chord are usually displayed on a Bland-Altman plot of differences between states with the average values of the measured values. As mentioned above, correlation is not synonymous with agreement. The correlation refers to the existence of a relationship between two different variables, while the agreement considers the agreement between two measures of a variable. Two sets of observations, strongly correlated, may have a poor agreement; However, if the two sets of values agree, they will certainly be strongly correlated. For example, in the hemoglobin example, the correlation coefficient between the values of the two methods is high, although the agreement is poor [Figure 2]; (r – 0.98). The other way of looking at it is that, although the different points are not close enough to the dotted line (least square line; [2], indicating a good correlation), these are quite far from the running black line that represents the perfect chord line (Figure 2: the black line running). If there is a good agreement, the dots should fall on or near this line (of the current black line). For our example of COPD pattern and to establish the difference between the measured values of the two devices when the subject performs i Activity at the time t, i.e. Dilt – Yi2lt – Yi1lt, we model these differences by the linear mixed effect model Lin L, Hedayat AS, Sinha B, Yang M. Statistical methods of evaluating matches: models, tools. J Am Stat Assoc. 2002;97(457):257–70.

The limitations of the agreement approach are to determine whether the differences between the devices are, on average, sufficient to be considered clinically acceptable. This is determined by assessing the insparation of their limits of variation in the range of clinically acceptable differences. The probability of coverage (CP) proposed by Lin et al. [6] answers the same question more directly by calculating the probability that the differences between the devices themselves are at the limit of a tolerance interval – what Bland and Altman call the domain of clinically acceptable differences. Higher probabilities clearly indicate closer convergence. In practice, the researcher must decide whether the value of the CP is large enough for the two devices to be interchangeable. When applying the limit values for the mixed effects of the COPD data agreement method on the model (3), we calculated an average distortion of 1.60 (95% loA – 11.57 to 8.38).