Skip to Navigation
Skip to Main content
OIG Home
OIG Home

IN THIS SECTION

Skip SHARE THIS PAGE section Skip STAY CONNECTED section

Board Report: 2015-MO-B-006 March 31, 2015

The Board Can Enhance Its Diversity and Inclusion Efforts

available formats

Appendix E: External Consulting Firm's Statistical Analysis of the Board's FY 2011, FY 2012, and FY 2013 Performance Ratings

An Analysis of Gender, Race, and Age Differences in Performance Ratings of FRB Employees: 2011-2013

October 20, 2014

 

Prepared By:

DCI CONSULTING GROUP
19201 ST NW,
WASHINGTON, DC 20006
(202) 828-6900

Prepared For:

OFFICE OF INSPECTOR GENERAL
BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM
CONSUMER FINANCIAL PROTECTION BUREAU
20TH STREET AND CONSTITUTION AVENUE NW
MAIL STOP K-300
WASHINGTON, DC 20551

Contents

Executive Summary

On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions. The Federal Reserve Board (FRB) was one of these agencies.

The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workforce. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (where applicable). This report presents the methodology and results of the analyses conducted for the OIG for the Board of Governors of the Federal Reserve System (FRB).

Separate analyses were conducted on overall performance ratings administered in 2011, 2012, and 2013. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity and age. Analyses were conducted at a number of different job levels. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.

For gender data, males and females did not differ significantly in performance ratings at any level of analysis in any of the three years.

For the agency wide race/ethnicity data, Whites were rated significantly higher than African Americans and Asians in all three years, and were rated significantly higher than Hispanics in 2012 but not 2011 and 2013.

Lastly, for age data, younger workers were rated significantly higher than older workers at the mid-level jobs for all three years.

Statistically significant group differences do not necessarily indicate discrimination by themselves. Differences in performance ratings could be due to a wide variety of explanations. This report concludes with a number of measures that an agency can take to assess performance rating system content and process.

Introduction

Project Background

On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions.1 The agencies include the following:

  • Federal Deposit Insurance Corporation (FDIC)
  • Board of Governors of the Federal Reserve System (FRB)
  • Consumer Financial Protection Bureau (CFPB)
  • Office of the Comptroller of the Currency (OCC)
  • Federal Housing Finance Agency (FHFA)
  • National Credit Union Administration (NCUA)
  • Securities and Exchange Commission (SEC)

The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workplace. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (applicable to all agencies except the FRB and FHFA). The FDIC Office of Inspector General (FDIC OIG) offered to engage and fund an independent contractor to perform statistical analyses of the performance appraisal results for each agency to determine whether there are statistically significant differences between groups of interest. DCI Consulting Group was selected to conduct these analyses for each of the agencies except for the Securities and Exchange Commission (SEC).

This report presents the methodology and results of the analyses conducted for the OIG for the Board of Governors of the Federal Reserve System (FRB).2

The FRB Performance Rating System

The performance management program at FRB serves as the basis for determining “pay-for-performance” amounts provided to employees. These increases take the form of merit increases, which affect employees’ base salary and growth over time. Performance ratings may also be considered when determining variable pay or eligibility for additional incentive programs. Employees who receive lower ratings (i.e., unsatisfactory or marginal) may not be eligible for such increases.

The distribution of performance ratings for 2011, 2012, and 2013 are depicted in Table 1. As presented, the lower the rating the better the performance. For purposes of exposition and consistency across agencies, these codes were reverse ordered so that higher ratings reflected better performance in the results section below. For example, a rating of 5 represents extraordinary performance in the results summarized in this report.

Table 1. Distribution of Performance Ratings
Rating Count Percent
2011 2012 2013 2011 2012 2013
1 - Extraordinary 378 429 480 19.13 20.27 22.39
2 - Outstanding 712 848 956 36.03 40.08 44.59
3 - Commendable 882 829 696 44.64 39.18 32.46
4 - Marginal 4 10 12 0.20 0.47 0.56
5 - Unsatisfactory 0 0 0 0.00 0.00 0.00

Method

Initial Dataset

FRB OIG provided DCI with data for 2011, 2012 and 2013. The performance time period covered for each year was from October 1st through September 30th. Relevant information for each year included:

  • Performance Year
  • EEO1 Category
  • Job Function
  • Job Level
  • Salary Plan
  • Overall Rating
  • Rating Description
  • Age
  • Race/National origin
  • Gender
  • Whether the employee was 40 years of age or Older

The dataset for each year included all employees who were eligible for performance ratings. Although OIG employees are rated using the FRB system, they were not included in the dataset. Neither employee name nor employee number was included in the dataset.

Data Cleaning

The first step in the data cleaning process was to remove employees in the dataset who had not been with the agency long enough (90 days) to receive a performance rating. As it turned out, no employees were removed in any of the three years.

Race/Ethnicity Grouping

FRB OIG provided race/ethnicity grouping for analysis. Their coding scheme is presented in Table 2. If employees listed only one race/ethnicity (e.g., White, Asian), they were placed into that race/ethnicity category. If employees listed more than one race/ethnicity (e.g., Asian and White) were placed into the category of “Two or more”.3 Employees who did not identify their race/ethnicity were included in the gender and age analyses but were omitted from the race/ethnicity analyses.

Table 2. Race/Ethnicity From Dataset and Race/Ethnicity Analysis Groups
Analysis Grouping Race/Ethnicity Categories in Dataset
White, Non-Hispanic (White)
  • White
  • White, not of Hispanic origin
  • Not Hispanic in Puerto Rico
Asian (Asian)
  • Asian
Black or African American (African American)
  • Black or African American
  • Black, not of Hispanic Origin
Hispanic or Latino (Hispanic)
  • Hispanic
  • Hispanic or Latino
  • Hispanic or Latino, American Indian or Alaska Native
  • Hispanic or Latino, Black or African American
  • Hispanic or Latino, Black or African American, White
  • Hispanic or Latino, White
Native Hawaiian or Other Pacific Islander (Native Hawaiian)
  • Native Hawaiian
  • Other Pacific Islander
American Indian or Alaska Native (American Indian)
  • American Indian
  • Alaska Native
  • American Indian/ Alaska Native
Other
  • Unknown

Age Grouping

FRB OIG also provided age groupings for analysis. Employees were placed into one of two categories: under 40 or 40+. These categories were chosen to be consistent with the Age Discrimination in Employment Act (ADEA). The category placement was based on the employee’s age on the first day of the performance period for each of the three years. Table 3 depicts the race/ethnicity, gender, and age breakdown for each of the three years.

Table 3. Number of Employees by Gender, Race/Ethnicity, and Age
Demographic Group Year
2011 2012 2013
TOTAL 1,976 2,116 2,144
Gender
Female 917 960 973
Male 1,059 1,156 1,171
Race/Ethnicity
White 1,087 1,184 1,185
Black or African American 546 550 554
Asian 229 259 280
Not Specified 3 2 2
Hispanic/Latino 79 87 88
American Indian/Alaskan Native 2 2 2
Native Hawaiian/Pacific Islander 1 2 0
Other (Unknown) 29 30 33
Age
Under 40 778 881 893
40+ 1,198 1,235 1,251

Data Integrity

To ensure the integrity of the data, two consultants reviewed the initial dataset. To ensure the accuracy of the statistical analyses, the analyses were conducted twice by separate consultants using different analysis programs (i.e., SAS, SPSS, Excel, HR Equator). These separate analyses yielded identical results.

Data Analysis Methodology

The OIGs for each agency agreed that the analyses would be conducted at two levels for all agencies: Overall and by bargaining unit status (where applicable). However, bargaining status was not a factor in the FRB data. Each agency then determined other levels of analysis that made sense for the agency. FRB OIG asked that analyses also be conducted by job level (senior managers, mid-level employees, and all other employees).

To compare the differences in the mean performance ratings across gender, race/ethnicity and age groups, tests of both statistical significance and practical significance were used.4 Tests of statistical significance indicate the probability that the group difference could have been due to chance. A statistically significant result does not imply that a difference is good or bad or that it is large or small. Instead it simply indicates that the observed difference is probably not due to chance. In contrast, measures of practical significance provide an indication of the size of the difference.

To determine if the group differences were statistically significant, t-tests were used.5 To assess statistical significance, DCI used two-tailed tests, which assess rating differences in both directions (e.g, differences that favor males as well as differences that favor females) and an alpha level of .05. Both standards are common in social science research. An alpha level of .05 indicates that the probability of a false positive (i.e., a statistically significant result that is incorrect) is 5 percent. This threshold for identifying a statistically significant difference generally corresponds to a t-value of 1.96 (although this value may vary slightly depending on sample size). Any t-value highlighted in the results tables was statistically significant at an alpha level of .05.

To determine practical significance, two measures were used: the percent differences between the two groups and d-scores. A d-score indicates the size of the difference in terms of standard deviations. That is, a d of 1.0 indicates that the two groups differed by a full standard deviation (a large effect) whereas a d of 0.10 indicates that the two groups differed by a tenth of a standard deviation (a small effect).

Table 4 will be helpful in interpreting the d-scores observed for FRB. The table summarizes a combination of d-scores obtained in a meta-analysis6 by Roth, Huffcutt, and Bobko
(2003)7 on racial differences, a meta-analysis by McKay and McDaniel (2006)8 on Black-White differences, a meta-analysis by Roth, Purvis, and Bobko (2012)9 on gender differences, as well as internal research conducted by DCI. Thus, Table 4 represents the gender and race/ethnicity differences that are “typically found” in studies of performance appraisal differences. There have been no meta-analyses comparing performance ratings of employees over and under 40.

Table 4. "Typical" D-Scores Found in Performance Rating Studies
Comparison Level of Analysis
Company Wide By Title
Male - Female -0.07 -0.08
White - Black 0.34 0.22
White - Hispanic 0.14 0.07
White - Asian 0.08 0.00

Note: Negative d-scores indicate females have higher ratings than men. D-scores computed by title reflect average performance differences between protected class subgroups within specific titles, rather than company-wide. Thus, analyses conducted company wide, such that employees are more similar to on another in each cross-section of employees that are analyzed.

Analysis Results

Gender

Table 5 presents the results of gender analyses. There were no statistically significant gender differences in average performance ratings either agency wide or within the three job levels. This pattern was observed for all three years.

Race/Ethnicity

White to African-American Comparison

As depicted in Table 6, for 2013 the average performance ratings for Whites were higher than the average performance ratings for African Americans, at a statistically significant level, when evaluating ratings agency wide. However, when the data were analyzed by job level, there were no statistically significant White-African American differences in ratings. The effect size for the agency wide difference in 2013 (d= 0.23) was smaller than the value normally found for company-wide White-African American comparisons (which is d= 0.34).

In terms of statistically significant findings at the agency wide level, the pattern for 2012 and 2011 was identical to that of 2013: the average performance ratings for Whites were higher than the average performance ratings for African Americans, at a statistically significant level. The effect size for 2012 was d= 0.32 and for 2011 was d= 0.27, which is similar to the magnitude of differences reported in the research literature. The results for 2012 were also similar to those for 2013 in that there were no statistically significant differences once the data were analyzed by job level. In 2011, however, there was a statistically significant difference between average ratings of Whites and African Americans for the all other employee job level (d= 0.15).10

White to Hispanic Comparison

As depicted in Table 7, there were two statistically significant differences in the average performance ratings of Whites and Hispanics across all years and comparisons. In 2013, there was a statistically significant difference in favor of Hispanics at the senior manager level. This effect was in the opposite direction of what is typically found in the literature and was large (d=-0.77). However, it should be noted that this latter comparison included 247 Whites and only 7 Hispanics, and results should be interpreted with caution.

Table 5. Analysis Results - Gender Comparison
Year/Unit of Analysis Count Avg Rating Statistics
M F M F t-value % diff d
2013
  Overall 1171 973 3.89 3.88 0.47 0.4 0.02
  Level
  Sr Mgmt 186 127 4.32 4.42 -1.36 -2.3 -0.16
  Mid-Level 529 399 3.91 3.90 0.26 0.3 0.02
  Other 456 447 3.70 3.71 -0.15 -0.2 -0.01
2012
  Overall 1156 960 3.82 3.78 1.12 1.0 0.05
  Level
  Sr Mgmt 178 115 4.38 4.33 0.58 1.1 0.07
  Mid-Level 519 394 3.84 3.80 0.79 1.0 0.05
  Other 459 451 3.58 3.62 -0.95 -1.3 -0.06
2011
  Overall 1059 917 3.74 3.74 0.20 0.2 0.01
  Level
  Sr Mgmt 161 108 4.26 4.34 -0.96 -1.9 -0.12
  Mid-Level 462 351 3.68 3.77 -1.85 -2.5 -0.13
  Other 436 458 3.62 3.57 1.14 1.6 0.08
  • Note: Negative t-values indicate women received higher ratings than men
  • t-values highlighted in orange indicate that the t-value is statistically significant favoring women 
  • t-values highlighted in gray indicate that the t-value is statistically significant favoring men
Table 6. Analysis Results - Race: White to African American Comparison
Year/Unit of Analysis Count Avg Rating Statistics
W AA W AA t-value % diff d
2013
  Overall 1185 554 3.96 3.79 4.48 4.6 0.23
  Level
    Sr Mgmt 247 37 4.37 4.27 0.86 2.3 0.15
    Mid-Level 584 129 3.95 3.86 1.20 2.2 0.12
    Other 354 388 3.71 3.72 -0.23 -0.3 -0.02
2012
  Overall 1184 550 3.89 3.65 6.21 6.6 0.32
  Level
    Sr Mgmt 243 27 4.37 4.37 0.03 0.1 0.01
    Mid-Level 584 119 3.84 3.79 0.73 1.4 0.07
    Other 357 404 3.65 3.56 1.66 2.5 0.12
2011
  Overall 1087 546 3.82 3.62 5.13 5.6 0.27
  Level
    Sr Mgmt 223 26 4.30 4.31 -0.08 -0.3 -0.02
    Mid-Level 533 103 3.73 3.75 -0.25 -0.5 -0.03
    Other 331 417 3.66 3.54 2.09 3.2 0.15
  • Note: Negative t-values indicate African Americans received higher ratings than Whites
  • t-values highlighted in orange indicate that the t-value is statistically significant favoring African Americans
  • t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites
Table 7. Analysis Results - Race: White to Hispanic Comparison
Year/Unit of Analysis Count Avg Rating Statistics
W H W H t-value % diff d
2013
  Overall 1185 88 3.96 3.81 1.87 4.1 0.21
  Level
    Sr Mgmt 247 7 4.37 4.86 -2.00 -10.1 -0.77
    Mid-Level 584 43 3.95 3.77 1.54 4.8 0.24
    Other 354 38 3.71 3.66 0.40 1.4 0.07
2012
  Overall 1184 87 3.89 3.71 2.15 4.9 0.24
  Level
    Sr Mgmt 243 6 4.37 4.50 -0.45 -2.8 -0.19
    Mid-Level 584 45 3.84 3.69 1.37 4.2 0.21
    Other 357 36 3.65 3.61 0.32 1.2 0.06
2011
  Overall 1087 79 3.82 3.67 1.69 4.2 0.20
  Level
    Sr Mgmt 223 4 4.30 n/a n/a n/a n/a
    Mid-Level 533 39 3.73 3.54 1.57 5.4 0.26
    Other 331 36 3.66 3.67 -0.06 -0.2 -0.01
  • Note: Negative t-values indicate Hispanics received higher ratings than Whites
  • t-values highlighted in orange indicate that the t-value is statistically significant favoring Hispanics 
  • t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

At the agency wide level in 2012, Whites received significantly higher performance ratings than Hispanics and the effect size exceeded what would be expected based on the research literature (d=0.24). However, there were no significant differences in performance ratings within any of the three job levels.

For 2011, there were no significant differences in performance ratings for Whites versus Hispanics at an agency wide level, and for either mid-level employees or all other employees. There were too few Hispanic senior managers (N=4) to make a comparison between White and Hispanic at the senior manager level.

White to Asian Comparison

As depicted in Table 8, Whites received significantly higher performance ratings than Asians at the agency wide level in 2013, and the effect size (d=0.22) was larger than the value normally found for company-wide White-Asian differences (d= 0.08). However, there were no significant differences in performance ratings within any of the three job levels.

This pattern was repeated for both 2012 and 2011, where Whites received significantly higher performance ratings than Asians (d= 0.22). However, as in 2013, there were no significant differences in performance ratings within any of the three job levels in either 2012 or 2011.

Age

As depicted in Table 9, there were no statistically significant differences in performance ratings between older and younger employees in 2013 at an agency wide level, for senior managers, or for all other employees. However, there was a significant difference in favor of younger employees for mid-level employees (d= 0.21).

In 2012, there was a statistically significant overall difference in performance ratings favoring older employees (d= -0.11). However, there were no significant differences in ratings for younger and older employees when looking at the senior manager or all other employee job level. Younger employees received significantly higher ratings than older employees for mid-level employees (d =0.13). This flip in the direction of the difference across unit of analysis is likely what statisticians refer to as a Simpson’s paradox. This phenomenon occurs when aggregating data across levels while ignoring the distributions at particular levels produce misleading results (i.e., that older workers are significantly favored in the aggregate). In this case, a larger percentage of older workers were in senior executive roles compared to younger workers, where ratings were much higher than other job levels. As such the larger percentage of older workers at this level may be driving aggregate results.

For 2011, there were no significant differences in performance ratings between older and younger employees at an agency wide level, and the same was true for senior managers and all other employees. However, there was a significant difference in favor of the younger employees for mid-level employees (d= 0.20).

Table 8. Analysis Results - Race: White to Asian Comparison
Year/Unit of Analysis Count Avg Rating Statistics
W A W A t-value % diff d
2013
  Overall 1185 280 3.96 3.80 3.36 4.4 0.22
  Level
    Sr Mgmt 247 20 4.37 4.25 0.80 2.8 0.19
    Mid-Level 584 156 3.95 3.83 1.73 3.0 0.16
    Other 354 104 3.71 3.65 0.67 1.5 0.07
2012
  Overall 1184 259 3.89 3.73 3.24 4.5 0.22
  Level
    Sr Mgmt 243 15 4.37 4.07 1.73 7.6 0.46
    Mid-Level 584 148 3.84 3.80 0.57 1.0 0.05
    Other 357 96 3.65 3.55 1.19 2.8 0.14
2011
  Overall 1087 229 3.82 3.66 3.02 4.6 0.22
  Level
    Sr Mgmt 223 13 4.30 4.15 0.73 3.4 0.21
    Mid-Level 533 122 3.73 3.70 0.31 0.6 0.03
    Other 331 94 3.66 3.52 1.56 3.9 0.18
  • Note: Negative t-values indicate Asians received higher ratings than Whites
  • t-values highlighted in orange indicate that the t-value is statistically significant favoring Asians 
  • t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

 

Table 9. Analysis Results - Age Comparison
Year/Unit of Analysis Count Avg Rating Statistics
<40 40+ <40 40+ t-value % diff d
2013
  Overall 893 1251 3.86 3.90 -1.23 -1.0 -0.05
  Level
    Sr Mgmt 34 279 4.38 4.35 0.24 0.6 0.04
    Mid-Level 394 534 4.00 3.84 3.18 4.0 0.21
    Other 465 438 3.71 3.69 0.46 0.6 0.03
2012
  Overall 881 1235 3.75 3.84 -2.57 -2.2 -0.11
  Level
    Sr Mgmt 34 259 4.47 4.34 1.05 2.9 0.19
    Mid-Level 388 525 3.88 3.78 1.99 2.5 0.13
    Other 459 451 3.59 3.61 -0.40 -0.5 -0.03
2011
  Overall 778 1198 3.72 3.75 -0.87 -0.8 -0.04
  Level
    Sr Mgmt 34 235 4.32 4.29 0.27 0.8 0.05
    Mid-Level 304 509 3.81 3.67 2.79 4.0 0.20
    Other 440 454 3.61 3.57 0.83 1.1 0.06
  • Note: Negative t-values indicate those 40 years of age or older received higher ratings than those younger than 40 years of age
  • t-values highlighted in orange indicate that the t-value is statistically significant favoring those 40 years of age or older
  • t-values highlighted in gray indicate that the t-value is statistically significant favoring those younger than 40 years of age

Conclusions and Discussion

This report summarized the methodology and results of analyses related to subgroup differences on overall performance ratings administered in 2011, 2012, and 2013 at FRB. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity and age. Analyses were conducted at a variety of levels of analysis. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.

The agency wide results across years indicate no pattern of statistically significant differences in average performance ratings between (a) women and men, (b) Hispanics and Whites, or (c) those age 40 or older and those younger than 40. In fact, there were no statistically significant gender differences, regardless of the level of analysis (i.e., overall or by organizational level). There were two statistically significant differences in average performance ratings between Hispanics and Whites, but they were not (a) at the same level of analysis (one was at the agency wide level and the other at the senior management level) or (b) in the same direction (one indicated higher average ratings for Hispanics and the other for Whites). Thus, in general, the overall results indicated no systematic differences in performance ratings for gender or White-Hispanic comparisons. With respect to age, a consistent pattern across years emerged at the mid-level jobs. The average performance ratings for employees younger than 40 were higher than those for employees age 40 or older, at a statistically significant level, but the effect sizes were not large.

With respect to agency wide performance differences between White employees and both Asian and African American employees, there is a trend of statistically significant differences in average ratings. In all three years, the average performance ratings for Whites were higher than those for Asians, at a statistically significant level. Similarly, in all three years, the average performance ratings for Whites were higher than those for African Americans, at a statistically significant level. The White-Asian differences were higher than that which is typically reported in the research literature, whereas the White-African American differences were lower than that which is typically reported in the research literature. It is notable that in the case of both White-Asian comparisons and White-African American comparisons, there is not a trend of statistically significant differences in performance ratings once the data are evaluated by job level.

Interpreting Statistically Significant Findings

It is important to understand that a statistically significant difference in ratings based on gender, race/ethnicity, or age does not necessarily indicate that discrimination is occurring. Such group differences could be due to actual differences in performance, regional differences in ratings, job family differences in ratings (i.e., supervisors in certain fields are more strict or lenient than supervisors in other fields) or some combination of all these factors.

To investigate whether any group differences are due to actual differences in performance or other factors rather than to discrimination, a number of measures could be taken to assess an agency’s performance rating system process and content. These include verification that:

  • The performance appraisal dimensions are job related;
  • The performance appraisal system is adequately structured;
  • Supervisors making the performance evaluations receive training;
  • There is a system in place for management to review supervisor’s performance ratings to determine if there are any patterns (e.g., racial or gender differences) that need to be reviewed;
  • There is an appeal process for employees who believe their performance ratings are not accurate;
  • There is a standardized, objective system for making employment decisions (e.g., merit increases, promotions) on the basis of the performance ratings;
  • There is a well-developed feedback system through which employees can receive information about their performance that will promote their future development and enable them to improve job performance.

Potential Future Analyses

As described above, in cases where statistically significant differences exist, we generally recommend that the performance appraisal system be evaluated along the dimensions described above. In addition, a number of follow up analyses may be useful for interpreting results and gaining a clearer understanding of what factors may be driving those findings.

First, the analyses for this report were conducted at three job levels: senior managers, mid-level employees, and all other employees. It might be useful to conduct further analyses by such strata as salary band, region or location, and job title. In some instances, job level results may be further explained by more nuanced analyses and more granular levels.

Second, examining the interaction between the race/ethnicity and gender of the employee and the race/ethnicity and gender of the supervisor might also provide some insight into the statistically significant group differences. In some instances rater-ratee interactions may further explain results.

Third, because the analyses in this report focused on the overall rating, it might be informative to look at group differences in the initial element ratings, to determine whether a particular element could be driving results.

Fourth, it may be useful to analyze tangible employment outcomes that are directly or indirectly linked to performance ratings. For example, merit raises, bonuses and promotion decisions could all be analyzed across the protected groups discussed in this report. This set of analyses could provide a broader perspective on equal employment opportunity outcomes across groups.

Note: We did not include appendix I of the external consultant’s report, which is a copy of the congressional request letter. We include that letter as appendix A of this report.

  • 1 See the Appendix for a copy of this letter.  Return to text
  • 2 DCI staff conducted all analyses and authored this report. Nothing in the report should be construed as representing the views of FRB OIG.  Return to text
  • 3 As shown in Table 2, the exception to this was that any employees identifying themselves as Hispanic, regardless of whether they listed any other races, were counted as Hispanic rather than “Two or More.” There were no employees categorized as “Two or more” races. Note that employees self-identifying as “two or more” races or “other” are typically not included in any analysis, because those classifications could mean many different things, particularly due to the number of possible race combinations.  Return to text
  • 4 Statistical analyses were only conducted when comparisons included 5 or more employees in each group. This decision was based on professional judgment. Small sample results are often non-representative, unstable and can change substantially with small changes in the data. Samples too small for analyses are labeled n/a in results tables.  Return to text
  • 5 For each comparison, we tested the assumption of equal variances between the two groups. If this test indicated unequal variances, a t-test for unequal variances was used (Welch's t-test). If the Welch's t-test changed the significance interpretation from that of the initial Student’s t-test, the Welch’s t-test value was listed in the table.  Return to text
  • 6 A meta-analysis is a study that statistically combines the results of all previous studies conducted on a topic. These studies combine data over time (e.g., some source studies date back to the 1960s) and from a variety of jobs (e.g., blue collar and white collar) in different settings (e.g., private, public and military) to identify "typical” findings. In this context, the results of a meta-analysis are a series of effect sizes (d-scores) that provide a single source summary of previous research. Interested readers should refer to the references below for more information related to specific studies.  Return to text
  • 7 Roth, P. L., Huffcutt, A. I, & Bobko, P. (2003). Ethnic group differences in measures of job performance: A meta-analysis. Journal of Applied Psychology, 88(4), 694-706.  Return to text
  • 8 McKay, P. F., & McDaniel, M. A. (2006). A reexamination of Black-White mean differences in work performance: More data, more moderators. Journal of Applied Psychology, 91(3), 538-554.  Return to text
  • 9 Roth, P. L., Purvis, K. L., & Bobko, P. (2012). A meta-analysis of gender group differences for measures of job performance in field studies. Journal of Management, 38(2), 719-739.  Return to text
  • 10 One pattern that we were not asked to formally evaluate using statistics, but which is clear simply by evaluating the average ratings across the different job levels, is that employees at higher levels tend to receive higher performance ratings.  Return to text