Skip to Navigation
Skip to Main content
OIG Home
OIG Home

IN THIS SECTION

Skip SHARE THIS PAGE section Skip STAY CONNECTED section

CFPB Report: 2015-MO-C-002 March 4, 2015

The CFPB Can Enhance Its Diversity and Inclusion Efforts

available formats

Appendix E: External Consulting Firm’s Statistical Analysis of the CFPB’s FY 2012 and FY 2013 Performance Ratings

An Analysis of Gender, Race, and Age Differences in Performance Ratings of CFPB Employees: 2012-2013

October 20, 2014

Prepared By:

DCI CONSULTING GROUP
19201 ST NW,
WASHINGTON, DC 20006
(202) 828-6900

Prepared For:

OFFICE OF INSPECTOR GENERAL
BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM
CONSUMER FINANCIAL PROTECTION BUREAU
20TH STREET AND CONSTITUTION AVENUE NW
MAIL STOP K-300
WASHINGTON, DC 20551

Table of Contents

Executive Summary

On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions. The Consumer Financial Protection Bureau (CFPB) was one of these agencies.

The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workforce. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (where applicable). This report presents the methodology and results of the analyses conducted for CFPB OIG.

Separate analyses were conducted on overall performance ratings administered in 2012 and 2013. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity, and age. Analyses were conducted at a number of different levels, including overall, by job level, and by bargaining unit status. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.

For gender, the performance ratings for males and females did not significantly differ in 2013. In 2012, females were rated significantly higher than males among employees in the other job level. 

For race/ethnicity, Whites were rated significantly higher than African Americans in 2012 and 2013. These differences were larger than what are typically found in the research literature. Additionally, Whites were rated significantly higher than Hispanics in 2013 but not 2012. Once again, these differences were larger than what are found in the research literature. There were no significant differences between Whites and American Indian/Alaskan Natives or between Whites and Asians in either year.

For age, younger employees received significantly higher ratings than older employees in 2012 and 2013. This was the case across most units of analysis. 

Statistically significant group differences do not necessarily indicate discrimination by themselves. Differences in performance ratings could be due to a wide variety of explanations. This report concludes with a number of measures that an agency can take to assess performance rating system content and process.

Introduction

Project Background

On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions.1 The agencies include the following:

  • Federal Deposit Insurance Corporation (FDIC)
  • Board of Governors of the Federal Reserve System (FRB)
  • Consumer Financial Protection Bureau (CFPB)
  • Office of the Comptroller of the Currency (OCC)
  • Federal Housing Finance Agency (FHFA)
  • National Credit Union Administration (NCUA)
  • Securities and Exchange Commission (SEC)

The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workforce. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (applicable to all agencies except the FRB and FHFA). The FDIC Office of Inspector General (FDIC OIG) offered to engage and fund an independent contractor to perform statistical analyses of the performance appraisal results for each agency to determine whether there are statistically significant differences between groups of interest. DCI Consulting Group was selected to conduct these analyses for each of the agencies except for the Securities and Exchange Commission (SEC).

This report presents the methodology and results of the analyses conducted for CFPB OIG.2

The CFPB Performance Rating System

The performance management program at CFPB serves as the basis for determining "pay-for-performance" amounts provided to employees. These increases take two forms: merit increases, which affect employees' base salary and growth over time, and supplemental lump sum payments. Both of these annual compensation programs are in part dependent upon an employee's performance rating.

The distribution of performance ratings for 2012 and 2013 are depicted in Table 1. As presented, the higher the rating the better the performance.

Table 1. Distribution of Performance Ratings3
Rating Count Percent
2012 2013 2012 2013
1 - Unacceptable 6 1 0.7 0.1
2 - Between 1 and 3 n/a 11 n/a 0.9
3 - Fully Successful 269 332 32.1 28.0
4 - Between 3 and 5 429 629 51.3 53.1
5 - Outstanding 133 212 15.9 17.9

Method

Initial Dataset

CFPB OIG provided DCI with data for 2012 and 2013. The performance time period covered for each year was from October 1st through September 30th. Relevant information for each year included:

  • Performance Year
  • Date of Evaluation
  • Duty Location (City and State)
  • Division
  • Office
  • Pay Grade
  • Supervisory Code
  • Occupational Category
  • Pattern Code
  • Rating Pattern Summary
  • Rating Code
  • Rating Description
  • Race/National Origin (ERI Description)
  • Gender
  • Age
  • Whether the employee was 40 years of age or older

The dataset for each year included all employees who were eligible for performance ratings for that year. Neither employee name nor employee number was included in the dataset.

Data Cleaning

The first step in the data cleaning process was to remove employees in the dataset who had not been with the agency long enough (90 days) to receive a performance rating. As it turned out, no employees were removed in either of the two years.

Race/Ethnicity Grouping

CFPB OIG provided race/ethnicity grouping for analysis. Their coding scheme is presented in Table 2. If employees listed only one race/ethnicity (e.g., White, Asian), they were placed into that race/ethnicity category. If employees listed more than one race/ethnicity (e.g., Asian and White) were placed into the category of "Two or more".4 Employees who did not identify their race/ethnicity were included in the gender and age analyses but were omitted from the race/ethnicity analyses.

Table 2. Race/Ethnicity From Dataset and Race/Ethnicity Analysis Groups
Analysis Grouping Race/Ethnicity Categories in Dataset
White, Non-Hispanic (White)
  • White
  • White, not of Hispanic origin
  • Not Hispanic in Puerto Rico
Asian
  • Asian
Black or African American (African American)
  • Black or African American
  • Black, not of Hispanic Origin
Hispanic or Latino (Hispanic)
  • Hispanic
  • Hispanic or Latino
  • Hispanic or Latino, American Indian or Alaska Native
  • Hispanic or Latino, Black or African American
  • Hispanic or Latino, Black or African American, White
  • Hispanic or Latino, White
Native Hawaiian or Other Pacific Islander (Native Hawaiian)
  • Native Hawaiian
  • Other Pacific Islander
American Indian or Alaska Native (American Indian)
  • American Indian
  • Alaska Native
  • American Indian/Alaska Native
Two or More Races
  • Unknown

Bargaining Unit

There were only two categories under bargaining unit and they are labeled bargaining and non-bargaining, respectively. CFPB OIG provided these classifications for 2013 employees. It is important to keep in mind that we did not have data regarding which employees were actually union members; only whether they were covered under a bargaining unit.

Age Grouping

CFPB OIG also provided employee age groupings to DCI. Employees were placed into one of two categories: under 40 or 40+. These categories were chosen to be consistent with the Age Discrimination in Employment Act (ADEA). The category placement was based on the employee's age on the first day of the performance period for each of the two years. Table 3 depicts the race/ethnicity, gender, and age breakdown for each of the two years.

Table 3. Number of Employees by Gender, Race/Ethnicity, and Age
Demographic Group Year
2012 2013
TOTAL 837 1,185
Gender
Female 417 560
Male 420 625
Race/Ethnicity
White 549 779
Black or African American 156 204
Asian 71 117
Hispanic/Latino 41 67
American Indian/Alaskan Native 9 9
Native Hawaiian/Pacific Islander 2 1
Two or More 9 8
Age
Under 40 431 629
40+ 406 556

Data Integrity

To ensure the integrity of the data classifications, two consultants reviewed the initial dataset. To ensure the accuracy of the statistical analyses, the analyses were conducted twice by separate consultants using different analysis programs (i.e., SAS, SPSS, Excel, HR Equator). These separate analyses yielded identical results.

Data Analysis Methodology

The OIGs for each agency agreed that the analyses would be conducted at two levels for all agencies: Overall and by bargaining unit status (where applicable). Each agency then determined other levels of analysis that made sense for that agency. CFPB OIG asked that analyses also be conducted by job level (senior managers, mid-level employees, and all other employees).

To compare the differences in the mean performance ratings across gender, race/ethnicity and age, tests of both statistical significance and practical significance were used.5 Tests of statistical significance indicate the probability that the group difference could have been due to chance. A statistically significant result does not imply that a difference is good or bad or that it is large or small. Instead it simply indicates that the observed difference is probably not due to chance. In contrast, measures of practical significance provide an indication of the size of the difference.

To determine if the group differences were statistically significant, t-tests were used.6 To assess statistical significance, DCI used two-tailed tests, which assess rating differences in both directions (e.g., differences that favor males as well as differences that favor females) and an alpha level of .05. Both standards are common in social science research. An alpha level of .05 indicates that the probability of a false positive (i.e., a statistically significant result that is incorrect) is 5 percent. This threshold for identifying a statistically significant difference generally corresponds to a t-value of 1.96 (although this value may vary slightly depending on sample size). Any t-value highlighted in the results tables was statistically significant at an alpha level of .05.

To determine practical significance, two measures were used: the percent differences between the two groups and d-scores. A d-score indicates the size of the difference in terms of standard deviations. That is, a d of 1.0 indicates that the two groups differed by a full standard deviation (a large effect) whereas a d of 0.10 indicates that the two groups differed by a tenth of a standard deviation (a small effect).

Table 4 will be helpful in interpreting the d-scores observed for CFPB. The table summarizes a combination of d-scores obtained in a meta-analysis7 by Roth, Huffcutt, and Bobko (2003)8 on racial differences, a meta-analysis by McKay and McDaniel (2006)9 on Black-White differences, a meta-analysis by Roth, Purvis, and Bobko (2012)10 on gender differences, as well as internal research conducted by DCI. Thus, Table 4 represents the gender and race differences that are "typically found" in studies of performance appraisal differences. There have been no meta-analyses comparing performance ratings of employees over and under 40 or of employees in different bargaining statuses.

Table 4. "Typical" D-Scores Found in Performance Rating Studies

Comparison Level of Analysis
Company Wide By Title
Male - Female -0.07 -0.08
White - Black 0.34 0.22
White - Hispanic 0.14 0.07
White - Asian 0.08 0.00

Note: Negative d-scores indicate females have higher ratings than men. D-scores computed by title reflect average performance differences between protected class subgroups within specific titles, rather than company-wide. Thus, analyses conducted by title are conducted at a finer level of analysis than are analyses conducted company wide, such that employees are more similar to one another in each cross-section of employees that are analyzed.

Analysis Results

Gender

Table 5 presents the results of gender analyses. As the table shows, there were no statistically significant overall gender differences in performance ratings in 2013, or within any of three job levels or two bargaining statuses. For 2012, there were no statistically significant differences in performance ratings agency wide or in two of the job levels (senior managers and mid-level employees). Females had statistically significant higher ratings than males among all other employees11.
 

Race/Ethnicity

White to African-American Comparison

As depicted in Table 6, for 2013, Whites received statistically significant higher performance ratings than African Americans at the agency wide level. The two groups did not differ significantly in their ratings for senior managers (where sample sizes were too small for analyses) and mid-level employees, but Whites received significantly higher ratings among all other employees and in both bargaining and non-bargaining units. Additionally, the effect sizes for the overall difference (d= 0.44), all other employees (d= 0.45), bargaining (d= 0.45) and non-bargaining (d= 0.37) levels were larger than the value normally found for White-African American comparisons (which is d= 0.34).

For 2012, Whites received significantly higher performance ratings than African Americans both agency wide (d= 0.43) and among all other employees (d= 0.49), but not for senior managers (where sample sizes were too small for analysis) or mid-level employees. Again, these effect sizes were larger than the value normally found in the research literature for White-African American comparisons.

White to Hispanic Comparison

As depicted in Table 7, Whites received significantly higher performance ratings than Hispanics agency wide for 2013. The two groups did not differ significantly in their ratings for senior managers (where sample sizes were too small for analyses) and mid-level employees, but Whites received significantly higher ratings among all other employees and in both bargaining and non-bargaining units. Additionally, the effect sizes for the agency wide difference (d =0.40), all other employees (d= 0.32), bargaining (d= 0.29) and non-bargaining (d= 0.70) levels were larger than

Table 5. Analysis Results - Gender Comparison

  Year/Unit of Analysis Count Avg Rating Statistics
M F M F t-value % diff d
2013
  Overall 625 560 3.85 3.91 -1.63 -1.7 -0.09
  Level
    Sr Mgmt 26 15 3.85 4.13 -1.34 -6.9 -0.43
    Mid-Level 89 66 3.99 4.11 -1.07 -2.9 -0.17
    Other 510 479 3.82 3.88 -1.28 -1.5 -0.08
  Bargaining Unit Status
    Yes 456 399 3.80 3.85 -1.12 -1.4 -0.08
    No 169 161 3.98 4.06 -1.16 -2.11 -0.13
2012
  Overall 420 417 3.77 3.86 -1.81 -2.3 -0.13
  Level
    Sr Mgmt 32 14 4.00 4.21 -0.98 -5.1 -0.32
    Mid-Level 64 43 3.91 3.91 -0.01 0.0 0.00
    Other 324 360 3.72 3.84 -2.18 -3.1 -0.17

Note: Negative t-values indicate women received higher ratings than men t-values highlighted in orange indicate that the t-value is statistically significant favoring women t-values highlighted in gray indicate that the t-value is statistically significant favoring men

Table 6. Analysis Results - Race: White to African American Comparison

  Year/Unit of Analysis Count Avg Rating Statistics
W AA W AA t-value % diff d
2013
  Overall 779 204 3.96 3.65 5.65 8.4 0.44
  Level
    Sr Mgmt 32 2 4.03 n/a n/a n/a n/a
    Mid-Level 114 25 4.07 3.84 1.59 6.0 0.35
    Other 633 177 3.94 3.62 5.28 8.7 0.45
  Bargaining Unit Status
    Yes 547 153 3.91 3.59 4.97 8.9 0.45
    No 232 51 4.08 3.84 2.38 6.2 0.37
2012
  Overall 549 156 3.89 3.58 4.72 8.5 0.43
  Level
    Sr Mgmt 35 3 4.14 n/a n/a n/a n/a
    Mid-Level 76 19 3.87 4.00 -0.74 -3.3 -0.19
    Other 438 134 3.87 3.52 4.94 9.9 0.49

Note: Negative t-values indicate African Americans received higher ratings than Whites t-values highlighted in orange indicate that the t-value is statistically significant favoring African Americans
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

Table 7. Analysis Results - Race: White to Hispanic Comparison

  Year/Unit of Analysis Count Avg Rating Statistics
W H W H t-value % diff d
2013
  Overall 779 67 3.96 3.69 3.11 7.4 0.40
  Level
    Sr Mgmt 32 3 4.03 n/a n/a n/a n/a
    Mid-Level 114 5 4.07 3.60 1.70 13.1 0.78
    Other 633 59 3.94 3.71 2.34 6.1 0.32
  Bargaining Unit Status
    Yes 547 50 3.91 3.70 1.98 5.6 0.29
    No 232 17 4.08 3.65 2.79 11.9 0.70
2012
  Overall 549 41 3.89 3.68 1.81 5.5 0.29
  Level
    Sr Mgmt 35 2 4.14 n/a n/a n/a n/a
    Mid-Level 76 5 3.87 3.60 0.92 7.5 0.42
    Other 438 34 3.87 3.68 1.54 5.3 0.27

Note: Negative t-values indicate Hispanics received higher ratings than Whites t-values highlighted in orange indicate that the t-value is statistically significant favoring Hispanics
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

the value normally found in the research literature for White-Hispanic comparisons (which is d= 0.14).

For 2012, there were no significant differences in performance ratings for Whites and Hispanics in the agency wide ratings or within each of the three job levels.

White to Asian Comparison

As depicted in Table 8, for 2013, there were no significant differences in the overall performance ratings between Whites and Asians at any level of analysis. Please note that there were too few Asian senior managers (n=4) to conduct an analyses for this job level. The 2013 pattern of results was the same for 2012 in that there were no significant differences in performance ratings between Whites and Asians at any level of analysis.

White to American Indian/ Alaskan Native Comparison

As depicted in Table 9, there were only 9 American Indian/Alaskan Native employees in both 2013 and 2012, and too few American Indian/Alaskan Natives for senior manager and mid-level employee comparisons for both years. Analyses conducted at the level of all other employees revealed no significant differences between Whites and American Indian/Alaskan Natives for either year.

Age

As depicted in Table 10, younger employees received significantly higher performance ratings than older employees at the agency wide level (d= 0.26), for mid-level employees (d= 0.34), all other employees (d= 0.31), and in both bargaining status categories (d= 0.32 and d= 0.30) for 2013. There were only two senior managers under the age of 40, and for this reason no analysis could be conducted at this job level.

In 2012, there was a statistically significant difference in performance ratings favoring younger employees at the agency wide level (d= 0.35), for mid-level employees (d= 0.39), and for all other employees (d= 0.44). No statistically significant differences were found between older and younger employees at the senior manager level in 2012.

Table 8. Analysis Results - Race: White to Asian Comparison

  Year/Unit of Analysis Count Avg Rating Statistics
W A W A t-value % diff d
2013
  Overall 779 117 3.96 3.83 1.92 3.4 0.19
  Level
    Sr Mgmt 32 4 4.03 n/a n/a n/a n/a
    Mid-Level 114 9 4.07 4.22 -0.70 -3.6 -0.24
    Other 633 104 3.94 3.80 1.87 3.7 0.20
  Bargaining Unit Status
    Yes 547 90 3.91 3.79 1.49 3.2 0.17
    No 232 27 4.08 3.96 0.93 3.0 0.19
2012
  Overall 549 71 3.89 3.89 0.00 -0.01 0.00
  Level
    Sr Mgmt 35 6 4.14 3.83 1.06 8.1 0.47
    Mid-Level 76 6 3.87 4.33 -1.68 -10.7 -0.71
    Other 438 59 3.87 3.85 0.23 0.6 0.03

Note: Negative t-values indicate Asians received higher ratings than Whites t-values highlighted in orange indicate that the t-value is statistically significant favoring Asians
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

Table 9. Analysis Results - Race: White to American Indian

  Year/Unit of Analysis Count Avg Rating Statistics
W AI W AI t-value % diff d
2013
  Overall 779 9 3.96 3.89 0.31 1.8 0.10
  Level
    Sr Mgmt 32 0 4.03 n/a n/a n/a n/a
    Mid-Level 114 1 4.07 n/a n/a n/a n/a
    Other 633 8 3.94 3.75 0.74 5.0 0.26
  Bargaining Unit Status
    Yes 547 7 3.91 3.71 0.71 5.2 0.27
    No 232 2 4.08 n/a n/a n/a n/a
2012
  Overall 549 9 3.89 3.89 -0.01 -0.05 0.00
  Level
    Sr Mgmt 35 0 4.14 n/a n/a n/a n/a
    Mid-Level 76 0 3.87 n/a n/a n/a n/a
    Other 438 9 3.87 3.89 -0.08 -0.5 -0.03

Note: Negative t-values indicate American Indians received higher ratings than Whites t-values highlighted in orange indicate that the t-value is statistically significant favoring American Indians
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

Table 10. Analysis Results - Age Comparison

  Year/Unit of Analysis Count Avg Rating Statistics
<40 40+ <40 40+ t-value % diff d
2013
  Overall 629 556 3.96 3.78 4.44 4.7 0.26
  Level
    Sr Mgmt 2 39 n/a 3.95 n/a n/a n/a
    Mid-Level 53 102 4.19 3.96 2.02 5.8 0.34
    Other 574 415 3.94 3.72 4.87 5.9 0.31
  Bargaining Unit Status
    Yes 497 358 3.92 3.70 4.56 6.0 0.32
    No 132 198 4.14 3.94 2.64 5.00 0.30
2012
  Overall 431 406 3.94 3.69 5.03 6.7 0.35
  Level
    Sr Mgmt 5 41 4.20 4.05 0.47 3.7 0.22
    Mid-Level 37 70 4.08 3.81 1.91 7.0 0.39
    Other 389 295 3.92 3.61 5.68 8.5 0.44

Note: Negative t-values indicate those 40 years of age or older received higher ratings than those younger than 40 years of aget-values highlighted in orange indicate that the t-value is statistically significant favoring those 40 years of age or older
t-values highlighted in gray indicate that the t-value is statistically significant favoring those younger than 40 years of age

Conclusions and Discussion

This report summarized the methodology and results of analyses related to subgroup differences on overall performance ratings administered in 2012 and 2013 at CFPB. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity, age and bargaining status. Analyses were conducted at a variety of levels of analysis. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.

The agency wide results across years indicate no pattern of statistically significant differences in average performance ratings between (a) women and men, (b) Asians and Whites, or (c) American Indian/Alaskan Natives and Whites. In fact, there were no statistically significant White-Asian or White-American Indian/Alaskan Native differences, regardless of the level of analysis. There was one statistically significant difference in average performance ratings between women and men, and this finding was in favor of females. In general, the overall results indicate no systematic differences in performance ratings for gender, White-Asian, or White-American Indian/Alaskan Native comparisons.

With respect to agency wide performance differences between White employees and African American employees, there is a trend of statistically significant differences in average ratings. Furthermore, effect sizes were larger than the values normally found in the research literature. In both years, the average performance ratings for Whites were higher than those for African Americans at multiple levels of analysis. In addition, Whites were rated higher than Hispanics at a statistically significant level for 2013 at multiple levels of analysis. Furthermore, effect sizes were larger than the values normally found in the research literature. This pattern was not true for 2012, where no statistical differences were found between Whites and Hispanics at any level of analysis.

With respect to age, a consistent pattern across years emerged agency wide, and at the mid-level employee and other employee job levels of analysis. The average performance ratings for employees younger than 40 were higher than those for employees age 40 or older, at a statistically significant level. This was true for bargaining and non-bargaining units in 2013.

Interpreting Statistically Significant Findings

It is important to understand that a statistically significant difference in ratings based on gender, race/ethnicity, age, or bargaining unit does not necessarily indicate that discrimination is occurring. Such group differences could be due to actual differences in performance, regional differences in ratings, job family differences in ratings (i.e., supervisors in certain fields are more strict or lenient than supervisors in other fields) or some combination of all these factors.

To investigate whether any group differences are due to actual differences in performance or other factors rather than to discrimination, a number of measures could be taken to assess an agency's performance rating system process and content. These include verification that:

  • The performance appraisal dimensions are job related;
  • The performance appraisal system is adequately structured;
  • Supervisors making the performance evaluations receive training;
  • There is a system in place for management to review supervisor's performance ratings to determine if there are any patterns (e.g., racial or gender differences) that need to be reviewed;
  • There is an appeal process for employees who believe their performance ratings are not accurate;
  • There is a standardized, objective system for making employment decisions (e.g., merit increases, promotions) on the basis of the performance ratings;
  • There is a well-developed feedback system through which employees can receive information about their performance that will promote their future development and enable them to improve job performance.

Potential Future Analyses

As described above, in cases where statistically significant disparities exist, we generally recommend that the performance appraisal system be evaluated along the dimensions described above. In addition, a number of follow up analyses may be useful for interpreting results and gaining a clearer understanding of what factors may be driving those findings.

First, the analyses for this report were conducted at three job levels: senior managers, mid-level employees, and all other employees. It might be useful to conduct further analyses by such strata as salary band, region or location, and job title. In some instances, job level results may be further explained by more nuanced analyses and more granular levels.

Second, examining the interaction between the race/ethnicity and gender of the employee and the race/ethnicity and gender of the supervisor might also provide some insight into the statistically significant group differences. In some instances rater-ratee interactions may further explain results.

Third, because the analyses in this report focused on the overall rating, it might be informative to look at group differences in the initial element ratings, to determine whether a particular element could be driving results.

Fourth, it may be useful to analyze tangible employment outcomes that are directly or indirectly linked to performance ratings. For example, merit raises, bonuses and promotion decisions could all be analyzed across the protected groups discussed in this report. This set of analyses could provide a broader perspective on equal employment opportunity outcomes across groups.

Note: We did not include appendix I of the external consultant’s report, which is a copy of the congressional request letter. We include that letter as appendix A of this report.

  • 1. See the Appendix for a copy of this letter.  Return to text
  • 2. DCI staff conducted all analyses and authored this report. Nothing in the report should be construed as representing the views of CFPB OIG.  Return to text
  • 3. It should be noted that there were some unique challenges regarding CFPB's rating scale and treatment in analysis. For example, in 2012, the operational scale included 1, 3, 4 and 5 options from table 1. CFPB OIG requested that DCI maintain this scaling and treat the data as a 1-5 scale and assume that there were no values of 2 in the data. DCI analyzed and presented these results in this report. Analyses were also conducted on a transformed 1 to 4 scale, and conclusions were identical. In 2013, CFPB OIG reported that a full 1-5 scale was used, although it was unclear exactly when this new scale was implemented. CFPB OIG requested that DCI analyze all 2013 employees on a 1-5 scale, and those results are found in this report.  Return to text
  • 4. As shown in Table 2, the exception to this was that any employees identifying themselves as Hispanic, regardless of whether they listed any other races, were counted as Hispanic rather than "Two or More." Note that employees self-identifying as "two or more" races were not included in any analysis, because that classification could mean many different things due to the number of possible race combinations.  Return to text
  • 5. Statistical analyses were only conducted when comparisons included 5 or more employees in each group. This decision was based on professional judgment. Small sample results are often non-representative, unstable and can change substantially with small changes in the data. Samples too small for analyses are labeled n/a in results tables.  Return to text
  • 6. For each comparison, we tested the assumption of equal variances between the two groups. If this test indicated unequal variances, a t-test for unequal variances was used (Welch's t-test). If the Welch's t-test changed the significance interpretation from that of the initial Student's t-test, the Welch's t-test value was listed in the table.  Return to text
  • 7. A meta-analysis is a study that statistically combines the results of all previous studies conducted on a topic. These studies combine data over time (e.g., some source studies date back to the 1960s) and from a variety of jobs (e.g., blue collar and white collar) in different settings (e.g., private, public and military) to identify "typical" findings. In this context. the results of a meta-analysis arc a series of effect sizes (d-scores) that provide a single source summary of previous research. Interested readers should refer to the references below for more information related to specific studies.  Return to text
  • 8. Roth, P. L., Huffcutt, A. I., & Bobko, P. (2001). Ethnic group differences in measures of job performance: A meta-analysis. Journal of Applied Psychology, 88(4), 694-706.  Return to text
  • 9. McKay, P. F., & McDaniel, M.A. (2006). A reexamination of Black-White mean differences in work performance: More data, more moderators. Journal of Applied Psychology 91(3), 538-554.  Return to text
  • 10. Roth, P. L., Purvis, K. L., & Bobko, P. (2012). A meta-analysis of gender group differences for measures of job performance in field studies. Journal of Management, 38(2), 719-739.  Return to text
  • 11. One pattern that we were not asked to formally evaluate using statistics, but which is clear simply by evaluating the average ratings across the different organizational levels, is that employees at higher organizational levels tend to receive higher performance ratings.  Return to text