If you are seeing this message, Javascript is disabled. Disclaimer for all external links found on this page: The Office of Inspector General (OIG) for the Board of Governors of the Federal Reserve System and the Consumer Financial Protection Bureau does not necessarily endorse the views expressed or the facts presented on the external sites. The OIG does not endorse any commercial products that may be advertised or on the external sites. The OIG's privacy policy does not apply on the external sites. Please check the site for its privacy notice.

IN THIS SECTION

Skip SHARE THIS PAGE section Skip STAY CONNECTED section

Board Report: 2015-MO-B-006 March 31, 2015

The Board Can Enhance Its Diversity and Inclusion Efforts

available formats

Executive Summary:
PDF | HTML
Full Report:
PDF (7 MB) | HTML
Accessible version

Appendix E: External Consulting Firm's Statistical Analysis of the Board's FY 2011, FY 2012, and FY 2013 Performance Ratings

An Analysis of Gender, Race, and Age Differences in Performance Ratings of FRB Employees: 2011-2013

October 20, 2014

Prepared By:

DCI CONSULTING GROUP
19201 ST NW,
WASHINGTON, DC 20006
(202) 828-6900

Prepared For:

OFFICE OF INSPECTOR GENERAL
BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM
CONSUMER FINANCIAL PROTECTION BUREAU
20TH STREET AND CONSTITUTION AVENUE NW
MAIL STOP K-300
WASHINGTON, DC 20551

Executive Summary
Introduction
- Project Background
- The FRB Performance Rating System
  - Table 1. Distribution of Performance Ratings
Method
Analysis Results
Conclusions and Discussion
- Interpreting Statistically Significant Findings
- Potential Future Analyses

Executive Summary

On March 24, 2014, members of the United States House of Representatives Committee on Financial Services sent letters requesting that the Offices of Inspector Generals (OIGs) for seven financial regulatory agencies perform work to determine whether agency internal operations and personnel practices are systematically disadvantaging minorities and women from obtaining senior management positions. The Federal Reserve Board (FRB) was one of these agencies.

Separate analyses were conducted on overall performance ratings administered in 2011, 2012, and 2013. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity and age. Analyses were conducted at a number of different job levels. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.

For gender data, males and females did not differ significantly in performance ratings at any level of analysis in any of the three years.

For the agency wide race/ethnicity data, Whites were rated significantly higher than African Americans and Asians in all three years, and were rated significantly higher than Hispanics in 2012 but not 2011 and 2013.

Lastly, for age data, younger workers were rated significantly higher than older workers at the mid-level jobs for all three years.

Statistically significant group differences do not necessarily indicate discrimination by themselves. Differences in performance ratings could be due to a wide variety of explanations. This report concludes with a number of measures that an agency can take to assess performance rating system content and process.

Introduction

Project Background

Federal Deposit Insurance Corporation (FDIC)
Board of Governors of the Federal Reserve System (FRB)
Consumer Financial Protection Bureau (CFPB)
Office of the Comptroller of the Currency (OCC)
Federal Housing Finance Agency (FHFA)
National Credit Union Administration (NCUA)
Securities and Exchange Commission (SEC)

The OIGs initiated individual assignments with a general overall objective to assess agency personnel operations and other efforts to provide for equal employment opportunities, including equal opportunity for minorities and women to obtain senior management positions, and increase racial, ethnic and gender diversity in the workplace. One element of the work was for each OIG to assemble agency wide performance appraisal data to identify performance ratings distributions by gender, race/ethnicity, age and bargaining unit status (applicable to all agencies except the FRB and FHFA). The FDIC Office of Inspector General (FDIC OIG) offered to engage and fund an independent contractor to perform statistical analyses of the performance appraisal results for each agency to determine whether there are statistically significant differences between groups of interest. DCI Consulting Group was selected to conduct these analyses for each of the agencies except for the Securities and Exchange Commission (SEC).

This report presents the methodology and results of the analyses conducted for the OIG for the Board of Governors of the Federal Reserve System (FRB).²

The FRB Performance Rating System

The performance management program at FRB serves as the basis for determining “pay-for-performance” amounts provided to employees. These increases take the form of merit increases, which affect employees’ base salary and growth over time. Performance ratings may also be considered when determining variable pay or eligibility for additional incentive programs. Employees who receive lower ratings (i.e., unsatisfactory or marginal) may not be eligible for such increases.

The distribution of performance ratings for 2011, 2012, and 2013 are depicted in Table 1. As presented, the lower the rating the better the performance. For purposes of exposition and consistency across agencies, these codes were reverse ordered so that higher ratings reflected better performance in the results section below. For example, a rating of 5 represents extraordinary performance in the results summarized in this report.

Table 1. Distribution of Performance Ratings

Rating	Count			Percent
Rating	2011	2012	2013	2011	2012	2013
1 - Extraordinary	378	429	480	19.13	20.27	22.39
2 - Outstanding	712	848	956	36.03	40.08	44.59
3 - Commendable	882	829	696	44.64	39.18	32.46
4 - Marginal	4	10	12	0.20	0.47	0.56
5 - Unsatisfactory	0	0	0	0.00	0.00	0.00

Method

Initial Dataset

FRB OIG provided DCI with data for 2011, 2012 and 2013. The performance time period covered for each year was from October 1st through September 30th. Relevant information for each year included:

Performance Year
EEO1 Category
Job Function
Job Level
Salary Plan
Overall Rating
Rating Description
Age
Race/National origin
Gender
Whether the employee was 40 years of age or Older

The dataset for each year included all employees who were eligible for performance ratings. Although OIG employees are rated using the FRB system, they were not included in the dataset. Neither employee name nor employee number was included in the dataset.

Data Cleaning

The first step in the data cleaning process was to remove employees in the dataset who had not been with the agency long enough (90 days) to receive a performance rating. As it turned out, no employees were removed in any of the three years.

Race/Ethnicity Grouping

FRB OIG provided race/ethnicity grouping for analysis. Their coding scheme is presented in Table 2. If employees listed only one race/ethnicity (e.g., White, Asian), they were placed into that race/ethnicity category. If employees listed more than one race/ethnicity (e.g., Asian and White) were placed into the category of “Two or more”.³ Employees who did not identify their race/ethnicity were included in the gender and age analyses but were omitted from the race/ethnicity analyses.

Table 2. Race/Ethnicity From Dataset and Race/Ethnicity Analysis Groups

Analysis Grouping	Race/Ethnicity Categories in Dataset
White, Non-Hispanic (White)	White White, not of Hispanic origin Not Hispanic in Puerto Rico
Asian (Asian)	Asian
Black or African American (African American)	Black or African American Black, not of Hispanic Origin
Hispanic or Latino (Hispanic)	Hispanic Hispanic or Latino Hispanic or Latino, American Indian or Alaska Native Hispanic or Latino, Black or African American Hispanic or Latino, Black or African American, White Hispanic or Latino, White
Native Hawaiian or Other Pacific Islander (Native Hawaiian)	Native Hawaiian Other Pacific Islander
American Indian or Alaska Native (American Indian)	American Indian Alaska Native American Indian/ Alaska Native
Other	Unknown

Age Grouping

FRB OIG also provided age groupings for analysis. Employees were placed into one of two categories: under 40 or 40+. These categories were chosen to be consistent with the Age Discrimination in Employment Act (ADEA). The category placement was based on the employee’s age on the first day of the performance period for each of the three years. Table 3 depicts the race/ethnicity, gender, and age breakdown for each of the three years.

Table 3. Number of Employees by Gender, Race/Ethnicity, and Age

Demographic Group	Year
Demographic Group	2011	2012	2013
TOTAL	1,976	2,116	2,144
Gender
Female	917	960	973
Male	1,059	1,156	1,171
Race/Ethnicity
White	1,087	1,184	1,185
Black or African American	546	550	554
Asian	229	259	280
Not Specified	3	2	2
Hispanic/Latino	79	87	88
American Indian/Alaskan Native	2	2	2
Native Hawaiian/Pacific Islander	1	2	0
Other (Unknown)	29	30	33
Age
Under 40	778	881	893
40+	1,198	1,235	1,251

Data Integrity

To ensure the integrity of the data, two consultants reviewed the initial dataset. To ensure the accuracy of the statistical analyses, the analyses were conducted twice by separate consultants using different analysis programs (i.e., SAS, SPSS, Excel, HR Equator). These separate analyses yielded identical results.

Data Analysis Methodology

The OIGs for each agency agreed that the analyses would be conducted at two levels for all agencies: Overall and by bargaining unit status (where applicable). However, bargaining status was not a factor in the FRB data. Each agency then determined other levels of analysis that made sense for the agency. FRB OIG asked that analyses also be conducted by job level (senior managers, mid-level employees, and all other employees).

To compare the differences in the mean performance ratings across gender, race/ethnicity and age groups, tests of both statistical significance and practical significance were used.⁴ Tests of statistical significance indicate the probability that the group difference could have been due to chance. A statistically significant result does not imply that a difference is good or bad or that it is large or small. Instead it simply indicates that the observed difference is probably not due to chance. In contrast, measures of practical significance provide an indication of the size of the difference.

To determine if the group differences were statistically significant, t-tests were used.⁵ To assess statistical significance, DCI used two-tailed tests, which assess rating differences in both directions (e.g, differences that favor males as well as differences that favor females) and an alpha level of .05. Both standards are common in social science research. An alpha level of .05 indicates that the probability of a false positive (i.e., a statistically significant result that is incorrect) is 5 percent. This threshold for identifying a statistically significant difference generally corresponds to a t-value of 1.96 (although this value may vary slightly depending on sample size). Any t-value highlighted in the results tables was statistically significant at an alpha level of .05.

To determine practical significance, two measures were used: the percent differences between the two groups and d-scores. A d-score indicates the size of the difference in terms of standard deviations. That is, a d of 1.0 indicates that the two groups differed by a full standard deviation (a large effect) whereas a d of 0.10 indicates that the two groups differed by a tenth of a standard deviation (a small effect).

Table 4 will be helpful in interpreting the d-scores observed for FRB. The table summarizes a combination of d-scores obtained in a meta-analysis⁶ by Roth, Huffcutt, and Bobko
(2003)⁷ on racial differences, a meta-analysis by McKay and McDaniel (2006)⁸ on Black-White differences, a meta-analysis by Roth, Purvis, and Bobko (2012)⁹ on gender differences, as well as internal research conducted by DCI. Thus, Table 4 represents the gender and race/ethnicity differences that are “typically found” in studies of performance appraisal differences. There have been no meta-analyses comparing performance ratings of employees over and under 40.

Table 4. "Typical" D-Scores Found in Performance Rating Studies

Comparison	Level of Analysis
Comparison	Company Wide	By Title
Male - Female	-0.07	-0.08
White - Black	0.34	0.22
White - Hispanic	0.14	0.07
White - Asian	0.08	0.00

Note: Negative d-scores indicate females have higher ratings than men. D-scores computed by title reflect average performance differences between protected class subgroups within specific titles, rather than company-wide. Thus, analyses conducted company wide, such that employees are more similar to on another in each cross-section of employees that are analyzed.

Analysis Results

Gender

Table 5 presents the results of gender analyses. There were no statistically significant gender differences in average performance ratings either agency wide or within the three job levels. This pattern was observed for all three years.

Race/Ethnicity

White to African-American Comparison

As depicted in Table 6, for 2013 the average performance ratings for Whites were higher than the average performance ratings for African Americans, at a statistically significant level, when evaluating ratings agency wide. However, when the data were analyzed by job level, there were no statistically significant White-African American differences in ratings. The effect size for the agency wide difference in 2013 (d= 0.23) was smaller than the value normally found for company-wide White-African American comparisons (which is d= 0.34).

In terms of statistically significant findings at the agency wide level, the pattern for 2012 and 2011 was identical to that of 2013: the average performance ratings for Whites were higher than the average performance ratings for African Americans, at a statistically significant level. The effect size for 2012 was d= 0.32 and for 2011 was d= 0.27, which is similar to the magnitude of differences reported in the research literature. The results for 2012 were also similar to those for 2013 in that there were no statistically significant differences once the data were analyzed by job level. In 2011, however, there was a statistically significant difference between average ratings of Whites and African Americans for the all other employee job level (d= 0.15).¹⁰

White to Hispanic Comparison

As depicted in Table 7, there were two statistically significant differences in the average performance ratings of Whites and Hispanics across all years and comparisons. In 2013, there was a statistically significant difference in favor of Hispanics at the senior manager level. This effect was in the opposite direction of what is typically found in the literature and was large (d=-0.77). However, it should be noted that this latter comparison included 247 Whites and only 7 Hispanics, and results should be interpreted with caution.

Table 5. Analysis Results - Gender Comparison

Year/Unit of Analysis			Count		Avg Rating		Statistics
Year/Unit of Analysis			M	F	M	F	t-value	% diff	d
2013
	Overall		1171	973	3.89	3.88	0.47	0.4	0.02
	Level
		Sr Mgmt	186	127	4.32	4.42	-1.36	-2.3	-0.16
		Mid-Level	529	399	3.91	3.90	0.26	0.3	0.02
		Other	456	447	3.70	3.71	-0.15	-0.2	-0.01
2012
	Overall		1156	960	3.82	3.78	1.12	1.0	0.05
	Level
		Sr Mgmt	178	115	4.38	4.33	0.58	1.1	0.07
		Mid-Level	519	394	3.84	3.80	0.79	1.0	0.05
		Other	459	451	3.58	3.62	-0.95	-1.3	-0.06
2011
	Overall		1059	917	3.74	3.74	0.20	0.2	0.01
	Level
		Sr Mgmt	161	108	4.26	4.34	-0.96	-1.9	-0.12
		Mid-Level	462	351	3.68	3.77	-1.85	-2.5	-0.13
		Other	436	458	3.62	3.57	1.14	1.6	0.08

Note: Negative t-values indicate women received higher ratings than men
t-values highlighted in orange indicate that the t-value is statistically significant favoring women
t-values highlighted in gray indicate that the t-value is statistically significant favoring men

Table 6. Analysis Results - Race: White to African American Comparison

Year/Unit of Analysis			Count		Avg Rating		Statistics
Year/Unit of Analysis			W	AA	W	AA	t-value	% diff	d
2013
	Overall		1185	554	3.96	3.79	4.48	4.6	0.23
	Level
		Sr Mgmt	247	37	4.37	4.27	0.86	2.3	0.15
		Mid-Level	584	129	3.95	3.86	1.20	2.2	0.12
		Other	354	388	3.71	3.72	-0.23	-0.3	-0.02
2012
	Overall		1184	550	3.89	3.65	6.21	6.6	0.32
	Level
		Sr Mgmt	243	27	4.37	4.37	0.03	0.1	0.01
		Mid-Level	584	119	3.84	3.79	0.73	1.4	0.07
		Other	357	404	3.65	3.56	1.66	2.5	0.12
2011
	Overall		1087	546	3.82	3.62	5.13	5.6	0.27
	Level
		Sr Mgmt	223	26	4.30	4.31	-0.08	-0.3	-0.02
		Mid-Level	533	103	3.73	3.75	-0.25	-0.5	-0.03
		Other	331	417	3.66	3.54	2.09	3.2	0.15

Note: Negative t-values indicate African Americans received higher ratings than Whites
t-values highlighted in orange indicate that the t-value is statistically significant favoring African Americans
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

Table 7. Analysis Results - Race: White to Hispanic Comparison

Year/Unit of Analysis			Count		Avg Rating		Statistics
Year/Unit of Analysis			W	H	W	H	t-value	% diff	d
2013
	Overall		1185	88	3.96	3.81	1.87	4.1	0.21
	Level
		Sr Mgmt	247	7	4.37	4.86	-2.00	-10.1	-0.77
		Mid-Level	584	43	3.95	3.77	1.54	4.8	0.24
		Other	354	38	3.71	3.66	0.40	1.4	0.07
2012
	Overall		1184	87	3.89	3.71	2.15	4.9	0.24
	Level
		Sr Mgmt	243	6	4.37	4.50	-0.45	-2.8	-0.19
		Mid-Level	584	45	3.84	3.69	1.37	4.2	0.21
		Other	357	36	3.65	3.61	0.32	1.2	0.06
2011
	Overall		1087	79	3.82	3.67	1.69	4.2	0.20
	Level
		Sr Mgmt	223	4	4.30	n/a	n/a	n/a	n/a
		Mid-Level	533	39	3.73	3.54	1.57	5.4	0.26
		Other	331	36	3.66	3.67	-0.06	-0.2	-0.01

Note: Negative t-values indicate Hispanics received higher ratings than Whites
t-values highlighted in orange indicate that the t-value is statistically significant favoring Hispanics
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

At the agency wide level in 2012, Whites received significantly higher performance ratings than Hispanics and the effect size exceeded what would be expected based on the research literature (d=0.24). However, there were no significant differences in performance ratings within any of the three job levels.

For 2011, there were no significant differences in performance ratings for Whites versus Hispanics at an agency wide level, and for either mid-level employees or all other employees. There were too few Hispanic senior managers (N=4) to make a comparison between White and Hispanic at the senior manager level.

White to Asian Comparison

As depicted in Table 8, Whites received significantly higher performance ratings than Asians at the agency wide level in 2013, and the effect size (d=0.22) was larger than the value normally found for company-wide White-Asian differences (d= 0.08). However, there were no significant differences in performance ratings within any of the three job levels.

This pattern was repeated for both 2012 and 2011, where Whites received significantly higher performance ratings than Asians (d= 0.22). However, as in 2013, there were no significant differences in performance ratings within any of the three job levels in either 2012 or 2011.

Age

As depicted in Table 9, there were no statistically significant differences in performance ratings between older and younger employees in 2013 at an agency wide level, for senior managers, or for all other employees. However, there was a significant difference in favor of younger employees for mid-level employees (d= 0.21).

In 2012, there was a statistically significant overall difference in performance ratings favoring older employees (d= -0.11). However, there were no significant differences in ratings for younger and older employees when looking at the senior manager or all other employee job level. Younger employees received significantly higher ratings than older employees for mid-level employees (d =0.13). This flip in the direction of the difference across unit of analysis is likely what statisticians refer to as a Simpson’s paradox. This phenomenon occurs when aggregating data across levels while ignoring the distributions at particular levels produce misleading results (i.e., that older workers are significantly favored in the aggregate). In this case, a larger percentage of older workers were in senior executive roles compared to younger workers, where ratings were much higher than other job levels. As such the larger percentage of older workers at this level may be driving aggregate results.

For 2011, there were no significant differences in performance ratings between older and younger employees at an agency wide level, and the same was true for senior managers and all other employees. However, there was a significant difference in favor of the younger employees for mid-level employees (d= 0.20).

Table 8. Analysis Results - Race: White to Asian Comparison

Year/Unit of Analysis			Count		Avg Rating		Statistics
Year/Unit of Analysis			W	A	W	A	t-value	% diff	d
2013
	Overall		1185	280	3.96	3.80	3.36	4.4	0.22
	Level
		Sr Mgmt	247	20	4.37	4.25	0.80	2.8	0.19
		Mid-Level	584	156	3.95	3.83	1.73	3.0	0.16
		Other	354	104	3.71	3.65	0.67	1.5	0.07
2012
	Overall		1184	259	3.89	3.73	3.24	4.5	0.22
	Level
		Sr Mgmt	243	15	4.37	4.07	1.73	7.6	0.46
		Mid-Level	584	148	3.84	3.80	0.57	1.0	0.05
		Other	357	96	3.65	3.55	1.19	2.8	0.14
2011
	Overall		1087	229	3.82	3.66	3.02	4.6	0.22
	Level
		Sr Mgmt	223	13	4.30	4.15	0.73	3.4	0.21
		Mid-Level	533	122	3.73	3.70	0.31	0.6	0.03
		Other	331	94	3.66	3.52	1.56	3.9	0.18

Note: Negative t-values indicate Asians received higher ratings than Whites
t-values highlighted in orange indicate that the t-value is statistically significant favoring Asians
t-values highlighted in gray indicate that the t-value is statistically significant favoring Whites

Table 9. Analysis Results - Age Comparison

Year/Unit of Analysis			Count		Avg Rating		Statistics
Year/Unit of Analysis			<40	40+	<40	40+	t-value	% diff	d
2013
	Overall		893	1251	3.86	3.90	-1.23	-1.0	-0.05
	Level
		Sr Mgmt	34	279	4.38	4.35	0.24	0.6	0.04
		Mid-Level	394	534	4.00	3.84	3.18	4.0	0.21
		Other	465	438	3.71	3.69	0.46	0.6	0.03
2012
	Overall		881	1235	3.75	3.84	-2.57	-2.2	-0.11
	Level
		Sr Mgmt	34	259	4.47	4.34	1.05	2.9	0.19
		Mid-Level	388	525	3.88	3.78	1.99	2.5	0.13
		Other	459	451	3.59	3.61	-0.40	-0.5	-0.03
2011
	Overall		778	1198	3.72	3.75	-0.87	-0.8	-0.04
	Level
		Sr Mgmt	34	235	4.32	4.29	0.27	0.8	0.05
		Mid-Level	304	509	3.81	3.67	2.79	4.0	0.20
		Other	440	454	3.61	3.57	0.83	1.1	0.06

Note: Negative t-values indicate those 40 years of age or older received higher ratings than those younger than 40 years of age
t-values highlighted in orange indicate that the t-value is statistically significant favoring those 40 years of age or older
t-values highlighted in gray indicate that the t-value is statistically significant favoring those younger than 40 years of age

Conclusions and Discussion

This report summarized the methodology and results of analyses related to subgroup differences on overall performance ratings administered in 2011, 2012, and 2013 at FRB. These analyses were conducted to detect potential performance rating differences based on gender, race/ethnicity and age. Analyses were conducted at a variety of levels of analysis. Both statistical significance tests (e.g., t-tests) and effect sizes (e.g., d-scores) were evaluated to determine whether differences were meaningful. Standard social science criteria (e.g., alpha = .05) were used to interpret statistical significance, and effect sizes were compared to typical results found in the personnel selection research literature.

The agency wide results across years indicate no pattern of statistically significant differences in average performance ratings between (a) women and men, (b) Hispanics and Whites, or (c) those age 40 or older and those younger than 40. In fact, there were no statistically significant gender differences, regardless of the level of analysis (i.e., overall or by organizational level). There were two statistically significant differences in average performance ratings between Hispanics and Whites, but they were not (a) at the same level of analysis (one was at the agency wide level and the other at the senior management level) or (b) in the same direction (one indicated higher average ratings for Hispanics and the other for Whites). Thus, in general, the overall results indicated no systematic differences in performance ratings for gender or White-Hispanic comparisons. With respect to age, a consistent pattern across years emerged at the mid-level jobs. The average performance ratings for employees younger than 40 were higher than those for employees age 40 or older, at a statistically significant level, but the effect sizes were not large.

With respect to agency wide performance differences between White employees and both Asian and African American employees, there is a trend of statistically significant differences in average ratings. In all three years, the average performance ratings for Whites were higher than those for Asians, at a statistically significant level. Similarly, in all three years, the average performance ratings for Whites were higher than those for African Americans, at a statistically significant level. The White-Asian differences were higher than that which is typically reported in the research literature, whereas the White-African American differences were lower than that which is typically reported in the research literature. It is notable that in the case of both White-Asian comparisons and White-African American comparisons, there is not a trend of statistically significant differences in performance ratings once the data are evaluated by job level.

Interpreting Statistically Significant Findings

It is important to understand that a statistically significant difference in ratings based on gender, race/ethnicity, or age does not necessarily indicate that discrimination is occurring. Such group differences could be due to actual differences in performance, regional differences in ratings, job family differences in ratings (i.e., supervisors in certain fields are more strict or lenient than supervisors in other fields) or some combination of all these factors.

To investigate whether any group differences are due to actual differences in performance or other factors rather than to discrimination, a number of measures could be taken to assess an agency’s performance rating system process and content. These include verification that:

The performance appraisal dimensions are job related;
The performance appraisal system is adequately structured;
Supervisors making the performance evaluations receive training;
There is a system in place for management to review supervisor’s performance ratings to determine if there are any patterns (e.g., racial or gender differences) that need to be reviewed;
There is an appeal process for employees who believe their performance ratings are not accurate;
There is a standardized, objective system for making employment decisions (e.g., merit increases, promotions) on the basis of the performance ratings;
There is a well-developed feedback system through which employees can receive information about their performance that will promote their future development and enable them to improve job performance.

Potential Future Analyses

As described above, in cases where statistically significant differences exist, we generally recommend that the performance appraisal system be evaluated along the dimensions described above. In addition, a number of follow up analyses may be useful for interpreting results and gaining a clearer understanding of what factors may be driving those findings.

First, the analyses for this report were conducted at three job levels: senior managers, mid-level employees, and all other employees. It might be useful to conduct further analyses by such strata as salary band, region or location, and job title. In some instances, job level results may be further explained by more nuanced analyses and more granular levels.

Second, examining the interaction between the race/ethnicity and gender of the employee and the race/ethnicity and gender of the supervisor might also provide some insight into the statistically significant group differences. In some instances rater-ratee interactions may further explain results.

Third, because the analyses in this report focused on the overall rating, it might be informative to look at group differences in the initial element ratings, to determine whether a particular element could be driving results.

Fourth, it may be useful to analyze tangible employment outcomes that are directly or indirectly linked to performance ratings. For example, merit raises, bonuses and promotion decisions could all be analyzed across the protected groups discussed in this report. This set of analyses could provide a broader perspective on equal employment opportunity outcomes across groups.

Note: We did not include appendix I of the external consultant’s report, which is a copy of the congressional request letter. We include that letter as appendix A of this report.

¹ See the Appendix for a copy of this letter. Return to text
² DCI staff conducted all analyses and authored this report. Nothing in the report should be construed as representing the views of FRB OIG. Return to text
³ As shown in Table 2, the exception to this was that any employees identifying themselves as Hispanic, regardless of whether they listed any other races, were counted as Hispanic rather than “Two or More.” There were no employees categorized as “Two or more” races. Note that employees self-identifying as “two or more” races or “other” are typically not included in any analysis, because those classifications could mean many different things, particularly due to the number of possible race combinations. Return to text
⁴ Statistical analyses were only conducted when comparisons included 5 or more employees in each group. This decision was based on professional judgment. Small sample results are often non-representative, unstable and can change substantially with small changes in the data. Samples too small for analyses are labeled n/a in results tables. Return to text
⁵ For each comparison, we tested the assumption of equal variances between the two groups. If this test indicated unequal variances, a t-test for unequal variances was used (Welch's t-test). If the Welch's t-test changed the significance interpretation from that of the initial Student’s t-test, the Welch’s t-test value was listed in the table. Return to text
⁶ A meta-analysis is a study that statistically combines the results of all previous studies conducted on a topic. These studies combine data over time (e.g., some source studies date back to the 1960s) and from a variety of jobs (e.g., blue collar and white collar) in different settings (e.g., private, public and military) to identify "typical” findings. In this context, the results of a meta-analysis are a series of effect sizes (d-scores) that provide a single source summary of previous research. Interested readers should refer to the references below for more information related to specific studies. Return to text
⁷ Roth, P. L., Huffcutt, A. I, & Bobko, P. (2003). Ethnic group differences in measures of job performance: A meta-analysis. Journal of Applied Psychology, 88(4), 694-706. Return to text
⁸ McKay, P. F., & McDaniel, M. A. (2006). A reexamination of Black-White mean differences in work performance: More data, more moderators. Journal of Applied Psychology, 91(3), 538-554. Return to text
⁹ Roth, P. L., Purvis, K. L., & Bobko, P. (2012). A meta-analysis of gender group differences for measures of job performance in field studies. Journal of Management, 38(2), 719-739. Return to text
¹⁰ One pattern that we were not asked to formally evaluate using statistics, but which is clear simply by evaluating the average ratings across the different job levels, is that employees at higher levels tend to receive higher performance ratings. Return to text

HOTLINE

IN THIS SECTION

The Board Can Enhance Its Diversity and Inclusion Efforts

available formats

Executive Summary:

Full Report:

Appendix E: External Consulting Firm's Statistical Analysis of the Board's FY 2011, FY 2012, and FY 2013 Performance Ratings

An Analysis of Gender, Race, and Age Differences in Performance Ratings of FRB Employees: 2011-2013

October 20, 2014

Contents

Executive Summary

Introduction

Project Background

The FRB Performance Rating System

Table 1. Distribution of Performance Ratings

Method

Initial Dataset

Data Cleaning

Race/Ethnicity Grouping

Table 2. Race/Ethnicity From Dataset and Race/Ethnicity Analysis Groups

Age Grouping

Table 3. Number of Employees by Gender, Race/Ethnicity, and Age

Data Integrity

Data Analysis Methodology

Table 4. "Typical" D-Scores Found in Performance Rating Studies

Analysis Results

Gender

Race/Ethnicity

Table 5. Analysis Results - Gender Comparison

Table 6. Analysis Results - Race: White to African American Comparison

Table 7. Analysis Results - Race: White to Hispanic Comparison

Age

Table 8. Analysis Results - Race: White to Asian Comparison

Table 9. Analysis Results - Age Comparison

Conclusions and Discussion

Interpreting Statistically Significant Findings

Potential Future Analyses

LINKS TO THE BOARD AND THE CFPB

HOTLINE

IN THIS SECTION

SHARE THIS PAGE

STAY CONNECTED

The Board Can Enhance Its Diversity and Inclusion Efforts

available formats

Executive Summary:

Full Report:

Appendix E: External Consulting Firm's Statistical Analysis of the Board's FY 2011, FY 2012, and FY 2013 Performance Ratings

An Analysis of Gender, Race, and Age Differences in Performance Ratings of FRB Employees: 2011-2013

October 20, 2014

Contents

LINKS TO THE BOARD AND THE CFPB

RELATED SITES AND RESOURCES