- About Us
- Information Technology
- Contact Us
Report Fraud, Waste, or Abuse
DCI CONSULTING GROUP
19201 ST NW,
WASHINGTON, DC 20006
OFFICE OF INSPECTOR GENERAL
BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM
CONSUMER FINANCIAL PROTECTION BUREAU
20TH STREET AND CONSTITUTION AVENUE NW
MAIL STOP K-300
WASHINGTON, DC 20551
On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions. The Federal Reserve Board (FRB) was one of these agencies.
The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workforce. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (where applicable). This report presents the methodology and results of the analyses conducted for the OIG for the Board of Governors of the Federal Reserve System (FRB).
Separate analyses were conducted on overall performance ratings administered in 2011, 2012, and 2013. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity and age. Analyses were conducted at a number of different job levels. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.
For gender data, males and females did not differ significantly in performance ratings at any level of analysis in any of the three years.
For the agency wide race/ethnicity data, Whites were rated significantly higher than African Americans and Asians in all three years, and were rated significantly higher than Hispanics in 2012 but not 2011 and 2013.
Lastly, for age data, younger workers were rated significantly higher than older workers at the mid-level jobs for all three years.
Statistically significant group differences do not necessarily indicate discrimination by themselves. Differences in performance ratings could be due to a wide variety of explanations. This report concludes with a number of measures that an agency can take to assess performance rating system content and process.
On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions.1 The agencies include the following:
The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workplace. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (applicable to all agencies except the FRB and FHFA). The FDIC Office of Inspector General (FDIC OIG) offered to engage and fund an independent contractor to perform statistical analyses of the performance appraisal results for each agency to determine whether there are statistically significant differences between groups of interest. DCI Consulting Group was selected to conduct these analyses for each of the agencies except for the Securities and Exchange Commission (SEC).
This report presents the methodology and results of the analyses conducted for the OIG for the Board of Governors of the Federal Reserve System (FRB).2
The performance management program at FRB serves as the basis for determining “pay-for-performance” amounts provided to employees. These increases take the form of merit increases, which affect employees’ base salary and growth over time. Performance ratings may also be considered when determining variable pay or eligibility for additional incentive programs. Employees who receive lower ratings (i.e., unsatisfactory or marginal) may not be eligible for such increases.
The distribution of performance ratings for 2011, 2012, and 2013 are depicted in Table 1. As presented, the lower the rating the better the performance. For purposes of exposition and consistency across agencies, these codes were reverse ordered so that higher ratings reflected better performance in the results section below. For example, a rating of 5 represents extraordinary performance in the results summarized in this report.
|1 - Extraordinary||378||429||480||19.13||20.27||22.39|
|2 - Outstanding||712||848||956||36.03||40.08||44.59|
|3 - Commendable||882||829||696||44.64||39.18||32.46|
|4 - Marginal||4||10||12||0.20||0.47||0.56|
|5 - Unsatisfactory||0||0||0||0.00||0.00||0.00|
FRB OIG provided DCI with data for 2011, 2012 and 2013. The performance time period covered for each year was from October 1st through September 30th. Relevant information for each year included:
The dataset for each year included all employees who were eligible for performance ratings. Although OIG employees are rated using the FRB system, they were not included in the dataset. Neither employee name nor employee number was included in the dataset.
The first step in the data cleaning process was to remove employees in the dataset who had not been with the agency long enough (90 days) to receive a performance rating. As it turned out, no employees were removed in any of the three years.
FRB OIG provided race/ethnicity grouping for analysis. Their coding scheme is presented in Table 2. If employees listed only one race/ethnicity (e.g., White, Asian), they were placed into that race/ethnicity category. If employees listed more than one race/ethnicity (e.g., Asian and White) were placed into the category of “Two or more”.3 Employees who did not identify their race/ethnicity were included in the gender and age analyses but were omitted from the race/ethnicity analyses.
|Analysis Grouping||Race/Ethnicity Categories in Dataset|
|White, Non-Hispanic (White)||
|Black or African American (African American)||
|Hispanic or Latino (Hispanic)||
|Native Hawaiian or Other Pacific Islander (Native Hawaiian)||
|American Indian or Alaska Native (American Indian)||
FRB OIG also provided age groupings for analysis. Employees were placed into one of two categories: under 40 or 40+. These categories were chosen to be consistent with the Age Discrimination in Employment Act (ADEA). The category placement was based on the employee’s age on the first day of the performance period for each of the three years. Table 3 depicts the race/ethnicity, gender, and age breakdown for each of the three years.
|Black or African American||546||550||554|
|American Indian/Alaskan Native||2||2||2|
|Native Hawaiian/Pacific Islander||1||2||0|
To ensure the integrity of the data, two consultants reviewed the initial dataset. To ensure the accuracy of the statistical analyses, the analyses were conducted twice by separate consultants using different analysis programs (i.e., SAS, SPSS, Excel, HR Equator). These separate analyses yielded identical results.
The OIGs for each agency agreed that the analyses would be conducted at two levels for all agencies: Overall and by bargaining unit status (where applicable). However, bargaining status was not a factor in the FRB data. Each agency then determined other levels of analysis that made sense for the agency. FRB OIG asked that analyses also be conducted by job level (senior managers, mid-level employees, and all other employees).
To compare the differences in the mean performance ratings across gender, race/ethnicity and age groups, tests of both statistical significance and practical significance were used.4 Tests of statistical significance indicate the probability that the group difference could have been due to chance. A statistically significant result does not imply that a difference is good or bad or that it is large or small. Instead it simply indicates that the observed difference is probably not due to chance. In contrast, measures of practical significance provide an indication of the size of the difference.
To determine if the group differences were statistically significant, t-tests were used.5 To assess statistical significance, DCI used two-tailed tests, which assess rating differences in both directions (e.g, differences that favor males as well as differences that favor females) and an alpha level of .05. Both standards are common in social science research. An alpha level of .05 indicates that the probability of a false positive (i.e., a statistically significant result that is incorrect) is 5 percent. This threshold for identifying a statistically significant difference generally corresponds to a t-value of 1.96 (although this value may vary slightly depending on sample size). Any t-value highlighted in the results tables was statistically significant at an alpha level of .05.
To determine practical significance, two measures were used: the percent differences between the two groups and d-scores. A d-score indicates the size of the difference in terms of standard deviations. That is, a d of 1.0 indicates that the two groups differed by a full standard deviation (a large effect) whereas a d of 0.10 indicates that the two groups differed by a tenth of a standard deviation (a small effect).
Table 4 will be helpful in interpreting the d-scores observed for FRB. The table summarizes a combination of d-scores obtained in a meta-analysis6 by Roth, Huffcutt, and Bobko
(2003)7 on racial differences, a meta-analysis by McKay and McDaniel (2006)8 on Black-White differences, a meta-analysis by Roth, Purvis, and Bobko (2012)9 on gender differences, as well as internal research conducted by DCI. Thus, Table 4 represents the gender and race/ethnicity differences that are “typically found” in studies of performance appraisal differences. There have been no meta-analyses comparing performance ratings of employees over and under 40.
|Comparison||Level of Analysis|
|Company Wide||By Title|
|Male - Female||-0.07||-0.08|
|White - Black||0.34||0.22|
|White - Hispanic||0.14||0.07|
|White - Asian||0.08||0.00|
Note: Negative d-scores indicate females have higher ratings than men. D-scores computed by title reflect average performance differences between protected class subgroups within specific titles, rather than company-wide. Thus, analyses conducted company wide, such that employees are more similar to on another in each cross-section of employees that are analyzed.
Table 5 presents the results of gender analyses. There were no statistically significant gender differences in average performance ratings either agency wide or within the three job levels. This pattern was observed for all three years.
White to African-American Comparison
As depicted in Table 6, for 2013 the average performance ratings for Whites were higher than the average performance ratings for African Americans, at a statistically significant level, when evaluating ratings agency wide. However, when the data were analyzed by job level, there were no statistically significant White-African American differences in ratings. The effect size for the agency wide difference in 2013 (d= 0.23) was smaller than the value normally found for company-wide White-African American comparisons (which is d= 0.34).
In terms of statistically significant findings at the agency wide level, the pattern for 2012 and 2011 was identical to that of 2013: the average performance ratings for Whites were higher than the average performance ratings for African Americans, at a statistically significant level. The effect size for 2012 was d= 0.32 and for 2011 was d= 0.27, which is similar to the magnitude of differences reported in the research literature. The results for 2012 were also similar to those for 2013 in that there were no statistically significant differences once the data were analyzed by job level. In 2011, however, there was a statistically significant difference between average ratings of Whites and African Americans for the all other employee job level (d= 0.15).10
White to Hispanic Comparison
As depicted in Table 7, there were two statistically significant differences in the average performance ratings of Whites and Hispanics across all years and comparisons. In 2013, there was a statistically significant difference in favor of Hispanics at the senior manager level. This effect was in the opposite direction of what is typically found in the literature and was large (d=-0.77). However, it should be noted that this latter comparison included 247 Whites and only 7 Hispanics, and results should be interpreted with caution.
|Year/Unit of Analysis||Count||Avg Rating||Statistics|
|Year/Unit of Analysis||Count||Avg Rating||Statistics|
|Year/Unit of Analysis||Count||Avg Rating||Statistics|
At the agency wide level in 2012, Whites received significantly higher performance ratings than Hispanics and the effect size exceeded what would be expected based on the research literature (d=0.24). However, there were no significant differences in performance ratings within any of the three job levels.
For 2011, there were no significant differences in performance ratings for Whites versus Hispanics at an agency wide level, and for either mid-level employees or all other employees. There were too few Hispanic senior managers (N=4) to make a comparison between White and Hispanic at the senior manager level.
White to Asian Comparison
As depicted in Table 8, Whites received significantly higher performance ratings than Asians at the agency wide level in 2013, and the effect size (d=0.22) was larger than the value normally found for company-wide White-Asian differences (d= 0.08). However, there were no significant differences in performance ratings within any of the three job levels.
This pattern was repeated for both 2012 and 2011, where Whites received significantly higher performance ratings than Asians (d= 0.22). However, as in 2013, there were no significant differences in performance ratings within any of the three job levels in either 2012 or 2011.
As depicted in Table 9, there were no statistically significant differences in performance ratings between older and younger employees in 2013 at an agency wide level, for senior managers, or for all other employees. However, there was a significant difference in favor of younger employees for mid-level employees (d= 0.21).
In 2012, there was a statistically significant overall difference in performance ratings favoring older employees (d= -0.11). However, there were no significant differences in ratings for younger and older employees when looking at the senior manager or all other employee job level. Younger employees received significantly higher ratings than older employees for mid-level employees (d =0.13). This flip in the direction of the difference across unit of analysis is likely what statisticians refer to as a Simpson’s paradox. This phenomenon occurs when aggregating data across levels while ignoring the distributions at particular levels produce misleading results (i.e., that older workers are significantly favored in the aggregate). In this case, a larger percentage of older workers were in senior executive roles compared to younger workers, where ratings were much higher than other job levels. As such the larger percentage of older workers at this level may be driving aggregate results.
For 2011, there were no significant differences in performance ratings between older and younger employees at an agency wide level, and the same was true for senior managers and all other employees. However, there was a significant difference in favor of the younger employees for mid-level employees (d= 0.20).
|Year/Unit of Analysis||Count||Avg Rating||Statistics|
|Year/Unit of Analysis||Count||Avg Rating||Statistics|
This report summarized the methodology and results of analyses related to subgroup differences on overall performance ratings administered in 2011, 2012, and 2013 at FRB. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity and age. Analyses were conducted at a variety of levels of analysis. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.
The agency wide results across years indicate no pattern of statistically significant differences in average performance ratings between (a) women and men, (b) Hispanics and Whites, or (c) those age 40 or older and those younger than 40. In fact, there were no statistically significant gender differences, regardless of the level of analysis (i.e., overall or by organizational level). There were two statistically significant differences in average performance ratings between Hispanics and Whites, but they were not (a) at the same level of analysis (one was at the agency wide level and the other at the senior management level) or (b) in the same direction (one indicated higher average ratings for Hispanics and the other for Whites). Thus, in general, the overall results indicated no systematic differences in performance ratings for gender or White-Hispanic comparisons. With respect to age, a consistent pattern across years emerged at the mid-level jobs. The average performance ratings for employees younger than 40 were higher than those for employees age 40 or older, at a statistically significant level, but the effect sizes were not large.
With respect to agency wide performance differences between White employees and both Asian and African American employees, there is a trend of statistically significant differences in average ratings. In all three years, the average performance ratings for Whites were higher than those for Asians, at a statistically significant level. Similarly, in all three years, the average performance ratings for Whites were higher than those for African Americans, at a statistically significant level. The White-Asian differences were higher than that which is typically reported in the research literature, whereas the White-African American differences were lower than that which is typically reported in the research literature. It is notable that in the case of both White-Asian comparisons and White-African American comparisons, there is not a trend of statistically significant differences in performance ratings once the data are evaluated by job level.
It is important to understand that a statistically significant difference in ratings based on gender, race/ethnicity, or age does not necessarily indicate that discrimination is occurring. Such group differences could be due to actual differences in performance, regional differences in ratings, job family differences in ratings (i.e., supervisors in certain fields are more strict or lenient than supervisors in other fields) or some combination of all these factors.
To investigate whether any group differences are due to actual differences in performance or other factors rather than to discrimination, a number of measures could be taken to assess an agency’s performance rating system process and content. These include verification that:
As described above, in cases where statistically significant differences exist, we generally recommend that the performance appraisal system be evaluated along the dimensions described above. In addition, a number of follow up analyses may be useful for interpreting results and gaining a clearer understanding of what factors may be driving those findings.
First, the analyses for this report were conducted at three job levels: senior managers, mid-level employees, and all other employees. It might be useful to conduct further analyses by such strata as salary band, region or location, and job title. In some instances, job level results may be further explained by more nuanced analyses and more granular levels.
Second, examining the interaction between the race/ethnicity and gender of the employee and the race/ethnicity and gender of the supervisor might also provide some insight into the statistically significant group differences. In some instances rater-ratee interactions may further explain results.
Third, because the analyses in this report focused on the overall rating, it might be informative to look at group differences in the initial element ratings, to determine whether a particular element could be driving results.
Fourth, it may be useful to analyze tangible employment outcomes that are directly or indirectly linked to performance ratings. For example, merit raises, bonuses and promotion decisions could all be analyzed across the protected groups discussed in this report. This set of analyses could provide a broader perspective on equal employment opportunity outcomes across groups.
Note: We did not include appendix I of the external consultant’s report, which is a copy of the congressional request letter. We include that letter as appendix A of this report.