Redesigning the Foreign Service Exam
The Foreign Service Exam is one of the most selective of its kind in the United States. Of the thousands who take it every year, less than 3% of applicants will ultimately succeed in becoming Foreign Service Officers (FSOs). The evaluation process includes: the Foreign Service Officer Test (FSOT) an exam consisting of multiple choice and essay questions, a personal narrative submitted for review by the Qualifications Evaluation Panel (QEP), and the Foreign Service Oral Assessment (FSOA), a day-long assessment comprising a written assessment, structured oral interview, and a structured group exercise.
The exam has undergone extensive changes over time. In 1989, a court order found that the Department of State had discriminated against women in the written portion of the Foreign Service Officer Test, which led to initial changes in the exam. In 2006, then-Director General of the Foreign Service George Staples proposed that the written exam be removed from the process because of its negative impact on minority hiring. This prompted another review of the evaluation process. Management consultants McKinsey & Company proposed that the process include a “total candidate review,” which led to the creation of personal narrative questions (PNQs).
Margaret Dean, Staff Director for the Board of Examiners (BEX) from 2004 through 2007, explains the challenges that she and her team faced in trying to design an exam which would not be biased against women and minorities and which would continue to yield the high quality of candidates needed by the Foreign Service.
During her tenure as Director of BEX, alterations to the exam included changing from five written exams, one for each track, to one exam; transitioning the exam from bubble-sheet and blue book to an online process; and adding the PNQs so that reviewers could get a better sense of a candidate’s background. Dean was interviewed by Charles Stuart Kennedy beginning in January 2010.
Read this companion piece on redesigning the FSOT to increase diversity.
“You have to have the written exam. It is your best screen, and people who do well there are going to do well in the Foreign Service.”
DEAN: While my colleagues were strengthening bilateral relations, forging international agreements and sweltering in the heat of a Baghdad summer, I was leading the search for their successors. Not to take over today or tomorrow, as many candidates seem to think, but eventually.
Under the Foreign Service Act of 1980 and the Bureau’s Program Plan, the Board of Examiners (BEX) seeks “to strengthen and improve the Foreign Service of the United States by … assuring, in accordance with merit principles, admission through impartial and rigorous examination…”
What I was doing may have been in the weeds, but the goal was always clear: a vigorous, representative, and capable Foreign Service for the future. The first year it was basically overseeing the process of interviewing candidates, setting the standards, moving the people through the selection process.….
When I became the Staff Director, we had five different written exams – one for each career track. I set the cut score for each written exam/career track to make sure that we were only bringing forward — that we were only feeding into the pipeline — as many people as we could reasonably expect to hire in the future. ….
One day then-Director General Staples decided that he wanted to do away with the written exam because of the negative impact on minority hiring. There was this sharp intake of breath by everybody — What are we going to do?
Fortunately, the Cox Foundation was willing to fund a study by McKinsey and Company on how the Department might change the selection process for hiring FS generalists, focusing specifically on the FSOT.…
McKinsey analysts came in September for about three months. McKinsey was chosen as a sole source contractor because they had done the War for Talent [the underpinning for Secretary [Colin] Powell’s effort to increase FS manpower] previously. McKinsey understood how the Foreign Service worked and how it differs from the Civil Service.
Three McKinsey analysts interviewed everybody.…They got the process all straightened out and then they said a couple of useful things, one of which was, if you can only have one test, one screening for choosing new Foreign Service officers, the written exam is the single best tool. You have to have the written exam. It is your best screen, and people who do well there are going to do well in the Foreign Service.
So how about the oral assessment, the all-day oral assessment? McKinsey’s conclusion: The Department’s oral assessment is the gold standard of interview processes. Really the Department is at the cutting edge; the Foreign Service sets a standard for anybody else wanting to conduct one of these kinds of screens.
So we were patting ourselves on the back, very nice, very nice. It took a little effort but the McKinsey analysts did convince the Director General that the Department should not get rid of the written exam and that he should keep this gold standard oral assessment.
“Your selection process is blindfolded, and that’s really stupid”
But then McKinsey said, “Oh, and by the way, one thing that you could do, however, is everything that you have been doing BUT add a ‘total candidate review.’ Your selection process is blindfolded, and that’s really stupid. No university would accept a student without knowing something about them; no international corporation would hire somebody without knowing a little bit about them; no non-profit would hire someone without knowing their educational background and work history.
“How can you hire people who are going to be directing and implementing U.S. foreign policy without knowing anything about them?”
You may remember back to the days of the women’s class action suit [the Alison Palmer suit] that in order to meet the legal requirements of the court order the Department decided to keep any personal information out of the hands of the assessors. For a long time assessors knew nothing about the background of the candidate. By the 2000s the system had relaxed enough that the two assessors who conducted the personal interview component of the oral assessment had some information, generally that disclosed by the candidate in their one-page Statement of Interest.
McKinsey then advised: “What you need is a ‘total candidate’ review that includes a review of the applicant’s work history and education, and you need to do it after the written test and before the oral assessment. Use that review process to decide who’s going to be invited to the oral assessment.”
We say we can do this, yes. And so now it was December, January, and we mapped a plan of how we want to do this. We estimate it is going to take about two years.
Now, it didn’t take us two years. The Director General unilaterally cancelled the annual written exam in April 2007 and said, “Install the new system as quickly as you can.” Facing the loss of an entire interview year, i.e., no new conditional offers, and a draining of the registers, we completed the process by September 2007, astonishing ourselves.
I’m good at putting teams together, we pulled together a team. We created a completely new “total candidate” process; we ensured the software was built, created especially for this operation; we trained the assessors on how to be good panelists in reviewing files. We got that cranked up and working by September 2007. It was absolutely amazing.
The core of the team was the then-REE [Office of Recruitment, Examination and Employment] Director, Ambassador Marianne Myles, with Dick Christensen, Kerry Weiner, and I. Others helped, particularly the HR [Human Resources] software team who designed, tested and implemented all the supporting software, and the lawyers from the legal office who had responsibility for overseeing the Department’s compliance with the rulings from the Palmer class action suit.
The result was that we retained the written exam as the best cognitive test, but we renamed it the Foreign Service Officer Test (FSOT). We eliminated the separate written exams for each of the five career tracks [political, economic, public diplomacy, consular, and administrative], in part because we didn’t have time to create five written exams and in part because we did not have the funding to design and pilot test five separate exams. (Each question costs about $1000 from concept to test.)
We have one written exam. The legal basis, when you have one test form, is that you can only have one standard, one cut score. You cannot differentiate between political officers and econ officers by score. If there’s one test you have to have one cut score; you cannot change the cut scores for different groups.
We worked with our industrial psychologist to set the lowest cut score that we could have and still defend the FSOT as generating the quality we needed. The cut score for the overall T-score [used to tell individuals how far their score is from the mean] for the combined three elements (English Expression, Job Knowledge and the Biographic section) was 154.
The cut score for each of the individual components of the test was set at a T-score of 50 percentile, meaning that if half the applicants scored below 94 (on a 100-point scale) and half scored 95 and above, we would only invite the top 50%. Actually a candidate could score below 50 on an individual test component, but the combined score of the three multiple choice components has to be more than 154. Only those candidates who score 154 have their essay graded….
“We bring as many people forward from the FSOT as we can”
This second step is called the Qualifications Evaluation Process (QEP). In this section every applicant in a specific career track is rank ordered and those above the cut score are invited to the Oral Assessment (FSOA). Then the third and last step (the Oral Assessment) remains the same: the group exercise, the personal interview and the case management writing exercise. In addition we have targeted hiring programs, like the Pickering and Rangel programs, but these people must succeed in passing the Oral Assessment.
We bring as many people forward from the FSOT as we can. At the QEP we use different panels for each career track.
We know the algorithm of how many people we’re going to hire, how many people are not going to accept the job offer, how many people are not going to pass the suitability, medical and security clearances, how many people aren’t going to even complete their part of the pre-hiring process, how many people aren’t going to attend the oral assessment, what the pass rate is for their career track on the oral assessment, how many seats are reserved for Pickering and Rangel interns.
Using the algorithm we back up to how many people in that career track we need to have come from the QEP process. The variation between career tracks is very high.
In the management career track we may invite 60 percent of the candidates. For the political candidates, we have to factor in the Pickerings and the Rangels. They principally choose the political or public diplomacy career track.
This means that we have to subtract a certain number of positions for the Pickerings and Rangels in their desired career tracks because once Pickering and Rangel interns complete their graduate studies they go into the next A-100 [entry class].
Hence we may invite only 30 percent of the political candidates. So you have to be in the very top of the heap, very cream of the crop, to be chosen to be invited to the oral assessment if you’re a political officer but you can be very good, average even, in some of the other career tracks.….
Creating the QEP software generated a lot of messy problems. In each career track we have more than one group of three can process in three weeks (the maximum time we can establish and still complete three cycles/year). This creates the question how can we compare 300 political officers evaluated by Group A with the 300 evaluated by Group B, and the 300 evaluated by Group C. Even with close adherence to using the anchors to score, there is still some variation between groups.
To rationalize scores across different assessing groups we use a statistical process call T Scores …. This way we are always able to compare groups across the different assessing teams. Getting the new process in place was an amazing piece of work.
We didn’t change the oral assessment at all; that was fine. The FSOA works. If people get a passing score of 5.25 they go through the whole clearance process. When we started this in September 2007, we were trying to do it five times a year. There were just not enough days in a year to do the process on a regular annual cycle. We could only run the complete cycle three times within a year. So now we conduct the FSOT in Feb, June and October. That’s the beginning for the cohort cycle.
The FSOT itself now is an entirely online process. We moved from a pen and paper test, a blue book situation, to a process where everything, including the application, is online; there is no paper generated any place along the way. When I think back about how much we did in as short a time as we did I’m still amazed.
“Our applications went from about 2,500 for each test window to about 8,000 applications per window”
Initially we had required the application and what we call the Personal Narrative questions at the application period. We discovered that the volume of information and effort required was a barrier to application. We moved the Personal Narrative questions from the first stage of the application.
We decided to ask for the mini-personal narratives only after the applicant had some buy-in. We thought that once people had passed the FSOT, they would be more inclined to submit their personal essays for the second stage, the QEP. Those who did not pass would not have to write the essays.
What we discovered is that our applications went from about 2,500 for each test window to about 8,000 applications per window. The numbers ramped up quite a bit, probably due to a push-pull causation since the economy was also declining and people were losing their jobs.
Of the 8,000 we brought about 40 percent or 3,200 people to a total candidate review (QEP). Then we evaluated all those people and decided which ones were going to be invited to an oral assessment. And, as I said, we set different pass rates there.
We got everything working fine and then we rather suddenly got this tripling of applications. The flood of applications was a bit of a problem because management decided that in addition to increased minority hiring, we should focus on “just-in-time hiring.”
The process for selecting each candidate who goes on a register is extremely expensive, for the Department and for the candidate, especially if he or she is overseas, as many of our candidates are. We don’t want people dying on the register. We want to tighten the selection mechanism. We can do this through the QEP but there is a lag of 6-9 months before there are results.
Just about the time that we saw the results of our “just-in-time” processing, management changed its needs. Management wanted to increase the hiring threefold. Secretary [Hillary] Clinton was successful in getting additional FS positions.
When you increase your hiring threefold after a period of austerity, there are not enough people on the hiring registers to meet the new need. Opening the testing process three times a year gives BEX greater flexibility than the previous once-a-year test, but it is still like turning an aircraft carrier at sea. Change is going to take time.
After about six months, the QEP spigot was producing candidates. The problem now is CDA’s [Career Development and Assignments] as it tries to find jobs for all the new people, and FSI’s [the Foreign Service Institute] as they try to find space and teachers for all the new hires…. There are not enough jobs. For the first time in a long time [CDA is] having to assign people to one-year tours in Washington because they don’t have enough jobs overseas. The bureaus are creating jobs as fast as they can.
But applications to create new positions overseas have to go through the International Cooperative Administrative Support Services (ICASS) Council at post. A bureau cannot just create a job; rather the bureau has got to obtain every participating agency at post’s approval. All interagency cooperation is expressed through the ICASS Council because the other agencies pay some of the supporting costs of running the Mission. That whole process takes time. We got the new generalist [FSO] selection process all up and running smoothly….
Redesigning the exam for Specialists
When we finished with the generalists we turned to doing the same thing with the specialists. The specialist hiring process had always been an application process and if the applicant had the credentials the vacancy announcement required, 100 percent of those applicants were invited to a specialist oral assessment.
In building the new system we had to realize that we don’t want to invite 200 human resources candidates to an interview when we’re only going to hire seven people. So what we did was link the requirements in the vacancy announcement to the specific specialist career track job analysis; that was done about two years ago (c.2009).
We outlined what applicants actually need to be able to do in the vacancy announcement. Then we created a Qualifications Evaluation process for screening and evaluating the specialists. The new process strengthens the parallels between the generalist and specialist selection processes.
We started with the Office Management Specialists (OMS) because they are one of the largest specialist groups [roughly 25% of all specialist applications, excluding Diplomatic Security applications, which have a different process]. We moved to eliminate the paper application process and just use an online application. We wanted to have only one application system.
Once we close a vacancy announcement and screen for those applicants who do not meet the minimum qualifications, we send the file electronically to three subject matter experts (SMEs). With the new software system, the SMEs can be here in Washington or they can be abroad. The SMEs use the standards that are linked to the promotion precepts.
The three SMEs give the file a numerical score from 60 to 100. Initially it was one to 100; however, we decided that passing the minimum qualifications was worth 60 points. So now the SMEs give a score based on the anchors for that precept of 60-100; the three scores are averaged to reach a final score.
Even though 80 is considered a passing score, we might decide to interview only those scoring above 96 or to 93 because we know that we’re not going to hire that many people. Why should these people pay to come to Washington and spend a day on an interview if we are not serious about hiring many people? If we are only going to hire 10 General Service Officers (GSOs), we do not need to interview 200.
Once we finished with the OMS, we moved on through all the 15 specialties, except the four Diplomatic Security ones. The way the new process works is the oral assessment consists of three sections: one is an on-line multiple-choice computer test or competency test.
Generally there are some situational judgment questions and some job knowledge questions. The candidate is asked to write a two-page essay (or several mini essays) addressing (a) management problem(s) in their area of expertise for about an hour. So for the IT people it’s an IT management kind of problem.
The last element is a 70 plus-minute interview. Previously, if you were an Office Management Specialist (OMS), you would also have had to take a proofreading test of six minutes. This test did not produce candidates who could meet the same high standards that we needed. Now we use the essay as a grammar/spelling check.
The online competency test is where the candidates come into the test center, sign on to a computer and go through 70 questions that are basically technical in nature. Then finally they do the structured interview, which instead of being 45 minutes will be more like an hour to an hour and a half.
The interview has the same format that the generalist processing has, which is three parts: the motivation and experience kinds of questions, the hypothetical kinds of questions and also past behavior questions which are essentially tell us about a time when you had to deal with problem X, Y or Z. And again, these are all related back to the promotion precepts and the dimensions of the precepts….
“The whole idea of preparing, being persistent in pursuing their dream of joining the Foreign Service, there’s a lot of value in that. I think that’s a worthy goal.”
Q: You’ve been serving for a long time, looking at people that you’ve dealt with in the Foreign Service, what do you think about the incoming new officers?…
DEAN: The system is too new to know how new people are coming advancing through the system. Are we making better choices? Does being able to look at the total file, with its educational information and work experience, make a positive difference? We think so.
Because when you passed directly from the written exam to the oral assessment the pass rate on the oral assessment was about 18 to 20 percent. Now, with the additional filter, the pass rate is about 40%. We are sending forward the most competitive candidates. We talk about the elephant and the blind men, where the blind men each have a little picture in their mind about how an elephant looks like based on their feeling a different part of the elephant.
With the QEP, we have got one more snapshot to look at to get that whole picture of the elephant. We think we’re weeding out a lot of people who wouldn’t have been competitive and only sending forward the most competitive applicants.
In the personal narratives in the QEP, you get some sense of the person, although maybe not an appreciation of any of their expeditionary skills that the Department keeps talking about.
You might see whether a person is a desk officer, a person who sits behind his or her desk, or whether the person is more a street officer, inclined to get out and mix it up more. You get a livelier sense of who that person is as a personality. So it is possible that we’re bringing in more engaged, more proactive officers but you’re not going to see that for another half a generation, until these people start becoming supervisors….
You have to have persistent people. I read the Yahoo accounts on trying to enter the Foreign Service. There is a Yahoo group for every stage of the Foreign Service selection process and I subscribe to them all; I must read hundreds of emails every day about people working on the process.
Some of these people have gone through the testing process more than once, some more than twice, but the whole idea of preparing, working hard in preparing, being persistent in pursuing their dream of joining the Foreign Service, there’s a lot of value in that. I think that’s a worthy goal.
If the U.S. interest is to do X, Y or Z, you need to persist until you achieve that goal. I like to think that in my 11 years with the Board of Examiners I have had a forceful impact on the shape of the Foreign Service over the next 20 years. No Foreign Service employee enters the Service today except that he or she passes through a process with my imprint.