Methods of Establishing Reliability in Psychology
Table 1. Methods of Establishing Reliability in Psychology with Brief Descriptions.
|Method of Establishing Reliability||Description|
|Alternate forms reliability||This approach to reliability involves checking the consistency of the results of the alternate versions of one instrument. High consistency allows making conclusions about the high reliability of the instrument and its variants (Royal & Hecker, 2016).|
|Inter-rater reliability||This technique requires involving more than one well-trained rater (judge) who should employ the same precise rating criteria. The observations or other accounts of the two or more raters are compared to check their accuracy. They can be considered accurate (or reliable) if the studied phenomenon is rated similarly by different raters. The method allows for establishing internal reliability (Eysenck & Banyard, 2017). An index that is typically connected to inter-rater reliability is Cohen’s Kappa (Carr & McNulty, 2016).|
|Internal consistency reliability||This approach to determining internal consistency presupposes checking the items of the instrument to determine if they are intercorrelated and connected to the same studied phenomenon. A common index that is used with this method is Cronbach’s alpha (Carr & McNulty, 2016; Katsogridaki et al., 2018).|
|Split-half technique||This technique is used for testing internal reliability by comparing half of an instrument’s scores to its other half. If the scores of the two halves are found to be highly similar, the internal consistency of the instrument can be proven by this approach (Eysenck & Banyard, 2017).|
|Test-retest reliability||This technique is meant for checking external reliability, and it consists of testing an instrument more than once. If the results of the subsequent testing are consistent with those of the first one, the reliability of the instrument is supported (Royal & Hecker, 2016).|
Types of Validity Used for Boholst’s Life Position Scale
Boholst’s (2002) Life Position Scale (LPS) was initially constructed because no alternative measure of the studied phenomenon could be found. The author tested the scale for reliability and validity, but more research was required to prove LPS as a valid instrument. LPS did not attract much attention from researchers, but it still was reviewed several times. The present paper will study the articles related to LPS to analyze the types of validity that have been of interest to the scientists who developed or reviewed LPS.
To put it simply, validity refers to the ability of a psychological instrument to actually measure the phenomenon that it is supposed to measure. There are a number of different types of validity, including content, construct, and criterion validity (Goodwin & Goodwin, 2016; MacIntire & Miller, 2015; Newton & Shaw, 2014). In addition, there is the so-called face validity, which refers to the ability of an instrument to appear valid; according to Goodwin and Goodwin (2016), this approach to validity is needed to ensure that the respondents perceive the instrument as appropriate and treat the task of using it seriously. The present paper will focus on the types of validity that were employed to test LPS and will discuss them in detail.
The following information about LPS is noteworthy. LPS is a scale that consists of 20 items that represent a person’s life position. The construct is measured by the combinations of categories “I am OK, I am not OK” and “You are OK, you are not OK.” Basically, these categories represent a person’s assessment of oneself and the people around him or her (Boholst, 2002). The items of LPS consist of specific statements, for example, “I like myself” or “I distrust people.” The statements are paired with a Likert scale that includes five points from “all of the time” to “never.” The scores for I-statements and You-statements are to be computed separately. The scale was tested with healthy participants, but Boholst (2002) encouraged retesting the scale on different populations as well. Indeed, in the first article dedicated to LPS, Boholst (2002) proposed plans for future research that could help to establish the validity of the scale, including the testing of LPS with the populations that can be theorized to have a particular life position. However, this type of testing was not brought up in the rest of the articles that are studied in this paper.
The First Article
The first article introduced LPS and discussed the process of its development. In particular, Boholst (2002) analyzed the construct of life position, demonstrating the reasoning behind the creation of LPS. LPS was, as a result, developed to reflect the four major elements of life positions, and its content was matched to the specifics of the phenomenon that was being analyzed. From the perspective of validity, it can be suggested that content validity could be established this way: as described by Goodwin and Goodwin (2016), this type of validity is usually associated with the creation of instruments because it presupposes choosing the wording and content of items to correspond to the construct being measured. Thus, while Boholst (2002) did not explicitly mention content validity, there is some evidence of it being considered during the development of LPS.
Furthermore, in the same article, Boholst (2002) performed a factor analysis. As pointed out by Boyacı and Atalay, (2016) and Isgor, Kaygusuz, and Ozpolat (2012), factor analysis is an approach to establishing validity, including construct validity. This type of validity refers to the ability of an instrument to accurately measure a construct, which is LPS in the present case (Goodwin & Goodwin, 2016). Boholst’s (2002) factor analysis allowed containing the I-statements and U-statements in two factors respectively, indicating that LPS was indeed capable of measuring life positions, which suggests that construct validity was also touched upon by the author. However, Boholst (2002) did not mention this type of validity explicitly.
The primary approach to the scale’s validation that Boholst (2002) proposed consisted of testing it and comparing it to phenomenological reports. According to Boholst (2002), upon the practical use of LPS during workshops, the results “seem to have been validated by workshop participants’ phenomenological reports” (p. 31). This approach seems to correspond to concurrent validity, which is a type of criterion validity (Goodwin & Goodwin, 2016). Criterion validity can be described as the ability of an instrument to be comparable to a specific behavior or well-established criterion. In the case of the first article, Boholst (2002) compared LPS to another measure (phenomenological reports), which implies that concurrent validity was employed by the author. Since this type of validity was actually termed by Boholst (2002) as a method of validating the scale, it can be suggested that Boholst (2002) intended to use concurrent validity.
It should also be noted that the reliability of LPS was also tested in the first article with the help of Pearson’s r (Boholst’s, 2002), which is one of the methods of establishing reliability (Royal & Hecker, 2016). However, the present paper focuses on validity, which is why conclusions will be made about the validity of LPS. Thus, the first article considered concurrent validity explicitly while also touching upon content and construct validity. Overall, Boholst (2002) worked to establish LPS as a valid and reliable instrument.
The Second Article
In 2005, Boholst, Boholst, and Mende (2005) revisited LPS by comparing life positions with another construct: attachment prototypes. The authors described the attachment theory as a well-researched one, and the attachment prototypes as observable in children, adolescents, and adults. As a result, Boholst et al. (2005) intended to compare the two constructs to gain a better understanding of the construct of life positions, which they considered to be understudied. According to Boholst et al. (2005), attachment prototypes could be viewed as similar to life positions in that they described positive and negative beliefs about oneself and others. The methodology of Boholst et al. (2005) employed LPS, as well as the relevant instruments for attachment prototypes. The correlational analysis demonstrated that each attachment prototype with the exception of preoccupied attachment correlated with relevant life positions.
Since the article consists of comparing two constructs that are shown to be either similar or mostly the same, the type of validity that is being considered is convergent validity, which is a form of construct validity. Basically, the approach consists of comparing the scores of the tests for two theoretically comparable or related constructs (Goodwin & Goodwin, 2016), which is what Boholst et al. (2005) did. Thus, Boholst et al. (2005) tested the validity of LPS by comparing its scores to those measuring another, similar phenomenon.
Furthermore, it appears that the article can be viewed as another attempt at establishing concurrent validity (Goodwin & Goodwin, 2016; MacIntire & Miller, 2015; Newton & Shaw, 2014). Indeed, LPS’s results were compared to those of the instruments that were supposed to measure attachment prototypes, which constitute a construct comparable to life positions. In addition, the measures used by the authors for attachment prototypes were well-established, which is also a requirement for establishing concurrent validity (Goodwin & Goodwin, 2016). Thus, the effort can be argued to resemble a test for criterion validity. As pointed out by Goodwin and Goodwin (2016), criterion and construct validity are generally connected, and the former can help to establish the latter, which the present example supports. Thus, the second article consisted of construct and criterion validity testing.
The Third Article
Several years later, Isgor et al. (2012) revisited LPS to analyze its reliability, validity, and language. It should be pointed out that the authors considered the English and Turkish variants of the instrument because they developed the latter. The need to validate the new version of LPS explains their effort to consider the language of LPS and justifies their intent to revisit the scale (Özsoy, Rauthmann, Jonason, & Ardıç, 2017). The article consists of the establishment of the Turkish LPS as a valid and reliable instrument.
It can be concluded that the Turkish version of LPS was studied very thoroughly; in general, the third article contains more evidence of deliberate testing of validity and reliability than the rest of them. The development of Turkish LPS involved face validity testing both with participants (students) and the developer of the English version (Boholst). Furthermore, the scale was tested for item distinctiveness and internal consistency; also, it was retested several times to establish its reliability. As for validity, construct and concurrent validity were considered by Isgor et al. (2012). The present paper will focus on validity testing.
As pointed out by Isgor et al. (2012), the primary aim of the face validity procedures was to ensure that the form was comprehensible and compatible with LPS while also being adjusted to the cultural norms of Turkish people. Additionally, the authors wanted their translation to be approved by the scale’s developer, and they received a positive response from Boholst. According to Goodwin and Goodwin (2016), this type of validity is not as valuable as other approaches since it does not involve any actual testing. However, it is appropriate for achieving the goals mentioned by Isgor et al. (2012). Furthermore, it can be suggested that in checking the new version of the scale with Boholst, the authors attempted to verify the content of the translation, which can be connected to content validity testing.
Factor analyses (exploratory and confirmatory ones) were employed by Isgor et al. (2012) to determine the construct validity of the scale. The results indicated that all the items of the scale could be gathered into four factors that measured their respective subscales (I- or U-scales and OK- or Not-OK-scales) and had “acceptable concurrent statistic” (p. 289). Thus, the authors managed to demonstrate the ability of the scale to measure the construct of interest. As for concurrent validity, the authors used the original LPS and compared its results to their Turkish version, demonstrating that the scores were similar. According to the authors, the variation was explained by the differences in the presentation of the items. Thus, the authors of the third article explicitly intended to test the criterion validity of their version of LPS and implicitly considered the content validity.
LPS is a scale that was developed to measure the construct of life position, which has not received much attention from researchers. As a result, the scale itself was not tested very extensively either. However, its author, as well as other researchers have established it as a valid instrument over the years. The first article dedicated to LPS did not cover the validity of the scale explicitly with the exception of a mentioning of concurrent validity. However, the article does bear the evidence of checking the content and construct validity. The second work about LPS was dedicated to comparing its measures to those of a similar (if not identical) construct, which allowed establishing the construct validity of the instrument. Finally, the third work was dedicated to testing the construct validity, including concurrent validity, of a Turkish version of LPS.
As a result, the Turkish version of the scale received better coverage and proof of its validity, as well as reliability, than the initial variant. Still, the results of all the above-described tests establish LPS to be a valid instrument. Additionally, Boholst (2002) offered several ideas on how to establish the validity of the scale by recruiting different populations for its testing. Thus, future research may proceed to review LPS and its validity and reliability.
Boholst, F. (2002). A life position scale. Transactional Analysis Journal, 32(1), 28-32. Web.
Boholst, F., Boholst, G., & Mende, M. (2005). Life positions and attachment styles: A canonical correlation analysis. Transactional Analysis Journal, 35(1), 62-67. Web.
Boyacı, Ş., & Atalay, N. (2016). A scale development for 21st century skills of primary school students: A validity and reliability study. International Journal of Instruction, 9(1), 133-148. Web.
Carr, A., & McNulty, M. (2016). The handbook of adult clinical psychology (2nd ed.). New York, NY: Routledge.
Eysenck, M., & Banyard, P. (2017). A2 level psychology. New York, NY: Routledge.
Goodwin, C., & Goodwin, K. (2016). Research in psychology. New York, NY: John Wiley & Sons.
Isgor, I., Kaygusuz, C., & Ozpolat, A. (2012). Life positions scale language equivalence, reliability and validity analysis. Procedia – Social and Behavioral Sciences, 47, 284-291. Web.
Katsogridaki, G., Zacharoulis, D., Galanos, A., Sioka, E., Zachari, E., & Tzovaras, G. (2018). Validation of the suter questionnaire after laparoscopic sleeve gastrectomy in the Greek population. Clinical Nutrition ESPEN, 28, 153-157. Web.
MacIntire, S., & Miller, L. (2015). Foundations of psychological testing (5th ed.). Boston, MA: McGraw-Hill.
Newton, P., & Shaw, S. (2014). Validity in educational and psychological assessment. New York, NY: SAGE.
Özsoy, E., Rauthmann, J., Jonason, P., & Ardıç, K. (2017). Reliability and validity of the Turkish versions of Dark Triad Dirty Dozen (DTDD-T), Short Dark Triad (SD3-T), and Single Item Narcissism Scale (SINS-T). Personality and Individual Differences, 117, 11-14. Web.
Royal, K., & Hecker, K. (2016). Understanding reliability: A review for veterinary educators. Journal of Veterinary Medical Education, 43(1), 1-4. Web.