Thanks for visiting! Welcome to a new way to research case law. You are viewing a free summary from Descrybe.ai. For citation and good law / bad law checking, legal issue analysis, and other advanced tools, explore our Legal Research Toolkit — not free, but close.
Gulino v. Board of Education
Citations: 113 F. Supp. 3d 663; 2015 U.S. Dist. LEXIS 73136; 2015 WL 3536694Docket: No. 96-CV-8414 (KMW)
Court: District Court, S.D. New York; June 5, 2015; Federal District Court
From 1993 to 2012, New York City's Board of Education (BOE) mandated that public school teaching applicants pass the Liberal Arts and Sciences Test (LAST), which had two versions: LAST-1 (1993-2004) and LAST-2 (2004-2012). These exams were solely focused on assessing knowledge of liberal arts and sciences, not on teaching skills or subject mastery. In a previous ruling, Judge Motley determined that the LAST-1 discriminated against African-American and Latino applicants in violation of Title VII of the Civil Rights Act, as these groups passed at significantly lower rates. Title VII allows plaintiffs to demonstrate discrimination through evidence of disparate impact, which the plaintiffs achieved in 2003. The BOE failed to validate the LAST-1 as job-related, lacking sufficient procedures to define the necessary liberal arts knowledge for competent teaching. In 2012, the Court appointed Dr. James Outtz to assess whether the LAST-2 also disproportionately affected African-American and Latino test takers and whether it was validated as job-related. Although the BOE's Dr. Chad Buckendahl argued for the LAST-2's validation, Dr. Outtz found it had a disparate impact on minority applicants and was not properly validated. The Court concluded that, similar to the LAST-1, the LAST-2 unfairly discriminated against African-American and Latino applicants under Title VII due to its failure to identify specific knowledge areas required for competent teaching. The Court acknowledges the potential benefits of the Board of Education (BOE) testing applicants' knowledge of liberal arts and sciences through a validated exam but emphasizes that validation must be established through proper procedures rather than assumptions. Both the LAST-1 and LAST-2 exams failed in this regard, making them indefensible under Title VII. The New York State Education Department (SED) mandates that only certified teachers can be hired, with noncompliance risking a loss of $7.5 billion in state funding for New York City. The LAST-1 was introduced in 1993, developed by National Evaluation Systems (NES), covering various subjects including science and humanities. It was replaced by the LAST-2 in 2004, which also required prospective teachers to pass the Assessment of Teaching Skills, Written (ATS-W), and a Content Specialty Test (CST). Each teacher was required to pass all three exams without offsetting poor scores on one with high scores on another. The procedural history indicates that African-American and Latino applicants alleged that the LAST-1 disproportionately affected them and was not job-related, leading to a 2003 ruling by Judge Constance Baker Motley that upheld the BOE's requirement of passing the LAST-1. Judge Motley initially found that the plaintiffs established a prima facie case of disparate impact related to the LAST-1 exam but ultimately determined it was not unfairly discriminatory as it was job related. The Second Circuit partially affirmed and reversed this decision, concluding that Judge Motley failed to apply the correct job-relatedness standard from Guardians Association. On remand, the court ruled the LAST-1 was not job related due to lack of proper validation, thus violating Title VII. Subsequently, the State Education Department (SED) replaced the LAST-1 with the LAST-2. The court, exercising its authority under Title VII, sought to ensure the LAST-2 was not discriminatory. Expert Dr. Outtz reported that the LAST-2 had a disparate impact on African-American and Latino test-takers and lacked proper validation. In contrast, the Board of Education (BOE) submitted Dr. Buckendahl's report, which did not address the disparate impact issue but claimed proper validation of the LAST-2. During a hearing, both experts were questioned regarding the exam's validity. Under Title VII, a plaintiff establishes a prima facie case of discrimination by showing disparate impact on minority candidates. The defendant can rebut this by proving the exam is job related through proper validation, which requires demonstrating that the exam correlates significantly with job-related behaviors. A five-part test from Guardians determines proper validation, emphasizing the importance of the test’s development quality. Validation is a specialized area typically outside the courts' expertise. To validate an employment exam, courts must consider the expertise of test validation professionals, primarily relying on expert testimony and the Equal Employment Opportunity Commission's Uniform Guidelines on Employee Selection Procedures (Guidelines). While not legally binding, the Supreme Court affords significant deference to these Guidelines as they represent the administrative interpretation of Title VII. The Second Circuit emphasizes using the Guidelines as the main standard for validating tests like the LAST-2 due to their established use. Two key validation methods are outlined in the Guidelines: content validation and construct validation. Content validation is suited for tests measuring specific job-related knowledge, skills, or abilities (KSAs), while construct validation assesses broader mental traits or processes, such as intelligence or judgment. The distinction is crucial; content validation requires demonstrating that the test content reflects important job performance aspects, which can be straightforwardly linked to job tasks. In contrast, construct validation necessitates extensive empirical evidence to show that the test predicts job performance, which is often challenging and can lead to invalidation of the test. The content-construct distinction plays a significant role in litigation outcomes, as content validation is typically more achievable compared to the often insurmountable requirements of construct validation. Despite the Guidelines' strict differentiation between these two types of validation, there are critiques suggesting that the Guidelines may be overly rigid in their approach. The court identified that 'content' and 'constructs' exist on a continuum relating to a person's ability to perform various tasks. General KSAs (Knowledge, Skills, and Abilities) apply broadly across jobs, such as general intelligence, while job-specific KSAs apply narrowly, exemplified by the unique skills required of a major league baseball player. The Guardians ruling established that construct validation is not always necessary for tests measuring general abilities; however, rigorous standards are required when testing general qualities like intelligence that are not specifically relevant to the job. Content validation is deemed sufficient if the test measures the most observable and significant abilities for the job in question, but the need for validation increases as the tested abilities become more abstract. Therefore, a sliding scale approach is adopted: the more general the abilities, the more thorough the validation process must be. In 1988, a New York State task force recommended that teachers possess a fundamental understanding of the liberal arts, leading to the requirement of passing a liberal arts exam for certification. The State Education Department (SED) began implementing this in 1990 by contracting NES to develop the LAST-1, which was first administered in 1993. Following new regulations from the Board of Regents, the SED redeveloped the exam between 2000 and 2004, resulting in the LAST-2, first administered on February 14, 2004. The LAST-2's development involved creating a test framework that detailed its structure and content, which included five subareas: Scientific, Mathematical and Technical Processes; Historical and Social Scientific Awareness; Artistic Expression and the Humanities; Communication and Research Skills; and Written Analysis and Expression. NES established "objectives" for the LAST-2, which are broad statements reflecting the knowledge and skills essential for public school teachers in New York State, as defined by Ms. Clayton. These objectives, such as the necessity of using mathematical reasoning in problem-solving, are further detailed by "focus statements" that outline specific content areas covered in the test. The framework for the LAST-2 was developed by revising the LAST-1 framework and consulting documents on liberal arts and science course requirements from New York state colleges. The framework underwent review by two committees: the Bias Review Committee (BRC), which assessed it for potential bias and fairness, and the Content Advisory Committee (CAC), which evaluated its content accuracy and appropriateness. Following this, NES conducted two surveys to gauge the relevance of the objectives to the teaching profession. The first survey targeted 500 certified public school teachers, achieving a response rate of 64%, while the second was sent to 181 faculty members, with a 25% return rate. Notably, the second survey lacked responses from African-American faculty and had very few Latino responses. Survey findings indicated that respondents considered all objectives to hold at least some importance, with many rated as having "great importance." However, the small sample sizes and lack of diversity in responses raise concerns regarding representativeness. After approval of the framework by the SED, NES initiated item development for the LAST-2 exam, drafting and refining test questions, some of which were sourced from the existing LAST-1 item bank. Ms. Clayton noted that LAST-1 questions received preliminary designations for continued use, revision, or deletion based on their relevance to the new framework. Newly drafted questions underwent review by both the BRC and CAC, while those LAST-1 questions slated for continued use were only reviewed by the CAC for alignment with revised objectives and job-relatedness. Subsequently, both new and revised questions underwent pilot testing; some were included as non-scorable items in LAST-1 exams and others were administered to volunteers for independent analysis. Results from this pilot testing were also reviewed by the BRC and CAC. A Passing Score Review Panel, composed of New York educators, was established to assist the New York Commissioner of Education in determining the passing score for LAST-2. The Panel evaluated what a minimally competent candidate would score on the test. The process described aligns with standard practices for setting passing scores. The Court found that Plaintiffs established a prima facie case of discrimination, showing that the exam results in disparate impact based on race, color, religion, sex, or national origin. The BOE did not successfully rebut this showing, failing to demonstrate that the LAST-2 was properly validated according to critical exam validation factors. To establish a prima facie case, a party must identify a policy, demonstrate a disparity, and establish a causal link, potentially utilizing the "80% rule" to indicate adverse impact if the selection rate for any group is less than 80% of the highest group's rate. If a minority group achieves less than 80% of the performance level of the highest performing group, a disparate impact is generally inferred. Plaintiffs have met the necessary requirements as evidenced by Dr. Outtz’s report, which identifies the Board of Education's (BOE) requirement for prospective teachers to pass the LAST-2 exam, mandated by the State Education Department (SED). Dr. Outtz found that the pass rates for African-American and Latino applicants ranged from 54% to 75% of Caucasian applicants’ rates, demonstrating a disparate impact. The SED disputes these findings, suggesting Dr. Outtz should have considered candidates' best attempts instead of their first attempts. Dr. Outtz argues that first attempts are the correct metric, as subsequent attempts do not reflect the initial effort and may disadvantage candidates in terms of seniority and promotion. The Court agrees with Dr. Outtz, affirming that adverse impact should be assessed based on first attempts. Consequently, the Court finds that the Plaintiffs have established a prima facie case of disparate impact, shifting the burden to the test proponent to show that the LAST-2 is job-related and consistent with business necessity. The BOE has failed to demonstrate this. To establish job-relatedness, the proponent must satisfy five criteria outlined in relevant case law, focusing here on the adequacy of the job analysis conducted by NES, which used content validation methodology. The Court notes that while the LAST-2 assesses knowledge in various academic subjects, the questions primarily evaluate general abilities such as reading comprehension and problem-solving, rather than specific content knowledge. Therefore, the Court will rigorously examine the content validity of the LAST-2, starting with NES’s job analysis. NES’s job analysis was inadequate, lacking identification of essential job tasks necessary for effective performance. A job analysis should assess key work behaviors and their importance, ensuring that examinations adequately test the knowledge, skills, and abilities (KSAs) needed for the job. Prior case law indicates that a proper job analysis involves identifying job tasks, surveying the importance of those tasks, and defining competency levels for each skill. Dr. Outtz's report highlighted that NES failed to identify any job tasks, which impeded its ability to assess task importance or required competency levels. Instead, NES based its LAST examinations on a liberal arts framework, influenced by a 1988 report recommending such a requirement, rather than on the actual tasks performed by New York teachers. This approach led to a deficient job analysis, as NES relied solely on educational documents to create a test framework without investigating the specific job responsibilities of teachers. NES's approach to determining the necessary knowledge, skills, and abilities (KSAs) for teachers is fundamentally flawed, beginning with an unproven assumption about the importance of specific liberal arts and science knowledge. They did not conduct an open-ended investigation into the actual job tasks of successful teachers in New York, which is essential for validating the relevance of identified KSAs. The guidelines stipulate that a valid test must be grounded in data that reflects essential job performance aspects. Dr. Buckendahl, the Board of Education's expert, argues that NES's lack of a job task identification does not invalidate its analysis, citing a survey of teachers who affirmed the importance of the identified KSAs. However, the court finds this argument unconvincing due to the stringent validation requirements for tests like LAST-2 that assess broad abilities. NES's method assumed the significance of certain KSAs without empirical evidence and only asked teachers to rank those pre-selected KSAs, ignoring any potentially critical KSAs not included in the survey. For a valid job analysis, it is necessary to assess the relative importance of identified work behaviors. NES’s failure to link KSAs to actual teaching tasks undermines confidence in its measurement selections. An illustrative example shows that if NES surveyed reading comprehension and logical reasoning but excluded leadership, it could misrepresent their importance, as leadership might actually be more crucial. A proper method would involve determining job tasks first, which would lead to a more accurate assessment of essential KSAs for effective teaching. Consequently, NES's survey did not successfully identify the most important KSAs for the teaching profession. Surveys can validate findings from a job task investigation or assess the importance of known job tasks, but they are inadequate for initially identifying Knowledge, Skills, and Abilities (KSAs). Dr. Buckendahl's reliance on NES's educator survey is compromised due to its insufficient sample size and lack of appropriate subgroup representation. NES failed to ensure that key subgroups, such as kindergarten, special education, African-American, and Latino teachers, were adequately represented, leading to a biased assessment of the LAST-2 exam. The SED contends the survey demographics were appropriate, noting that 10% of respondents were African-American or Latino, aligning with their proportions in the overall New York State teacher population. However, the actual number of responses—only 24 from African-American and 10 from Latino teachers—was too low to draw statistically significant conclusions about differences in perceptions between minority and majority teachers. Case law supports the need for a larger, more representative sample for reliable results. NES should have considered oversampling underrepresented groups to ensure meaningful representation and data collection. Sheldon Zedeck emphasizes the necessity of representing diverse perspectives, including those of minority groups and women, in employment strategies. He critiques common misconceptions in sampling size determination, asserting that the variation in data is more critical than population size. In addressing NES's failures in conducting a lawful job analysis for employment exams, he outlines a structured approach for identifying essential job tasks for New York public school teachers. This includes conducting teacher interviews, observing educators in their roles, and gathering input through open-ended surveys. He argues that relying solely on educational curricula is inadequate for identifying job tasks and that data must come directly from teachers. NES should analyze these tasks to determine the knowledge, skills, and abilities (KSAs) required for effective teaching. It is essential for NES to document how these KSAs relate to the identified job tasks, forming the basis for the test framework. Given that all teachers in New York must be licensed, NES faces the challenge of accurately defining the varied tasks across different teaching levels. Additionally, the relevant exam, LAST-2, must assess abilities not covered by other required exams, such as the ATS-W and CST, to ensure a comprehensive evaluation for licensing candidates. NES’s inadequate job analysis undermined the entire validation process for the LAST-2 exam, which failed to meet the necessary criteria for reasonable competence, content relatedness, representativeness, and scoring validity required by Guardians. A proper job analysis is essential for establishing the foundation of the validation procedure, yet NES neglected to identify job tasks, making it impossible to evaluate whether the exam content was relevant to the job of teaching or whether its scoring effectively differentiated competent candidates. Consequently, the LAST-2 was deemed unvalidated and not job-related, leading the Court to conclude that the Board of Education violated Title VII by mandating the exam for permanent teaching licenses. The Court ordered the parties to submit a joint status letter by June 29, 2015, to outline further steps required in accordance with this ruling. The case has a history of prior assignments and rulings, as detailed in earlier opinions. NES was acquired by NCS Pearson, Inc. in April 2006, marking Pearson's entry into the teacher certification market. In the legal case Gulino v. Bd. of Educ. of the City Sch. Dist. of the City of N.Y., the Board of Education (BOE) was found liable under Title VII, despite its adherence to state requirements for teacher certification using the LAST exam. The Second Circuit clarified that Title VII preempts conflicting state laws, establishing that compliance with state mandates does not shield the BOE from liability. The LAST-2 exam had not yet been implemented at the time of Judge Motley's initial ruling, which referred to the earlier LAST-1 exam. Although the plaintiffs initially sued both the State Education Department (SED) and the BOE, the SED was later dismissed from the case, thus narrowing the focus to the BOE's procedural history. The court distinguished sections of the Guidelines for clarity, noting that both testing types assess abilities and skills, with a necessity for construct validation to prevent perpetuating discrimination. The Guardians case emphasized the need for flexibility in validating exams that assess general abilities, arguing against overly stringent requirements that could render such tests invalid under Title VII. Determining the necessity of content versus construct validation for employment exams suggests that most tests should utilize content validation, as only a few exams measure such broadly applicable abilities that construct validation is warranted. The term “abstract” is noted in the context of the Guardians case, but the court prefers “general” for clarity in describing abilities closer to the construct end of the spectrum. The court emphasizes that validating tests in a way that avoids cultural disadvantages is essential. The court regards the expert reports by Dr. Outtz and Jeanne Clayton as credible, with Clayton asserting that NES consulted various educational materials for job definitions related to New York State teachers. However, the court finds a lack of evidence beyond Clayton's statement to substantiate this claim. Dr. Outtz criticized the selection of individuals on the BRC for lacking clear expertise in their assessments. His reliability as a neutral expert is acknowledged, and his conclusions are given considerable weight in the court's evaluation. A court-appointed expert is deemed more impartial than a party-appointed expert, as established in United States v. Mosley. The court favors the testimony of court-appointed experts, who are expected to provide unbiased opinions based on a thorough consideration of evidence, contrasting with party experts whose objectivity may be compromised. The substantive complexities of test validation necessitate reliance on professionals in that field. The Commission's report lacks evidence of a job analysis related to New York teachers' tasks. There is a significant methodological difference between Dr. Outtz and Dr. Buckendahl in their analyses of the LAST-2 examination; Outtz relies on guidelines while Buckendahl cites the Standards for Educational and Psychological Testing, which are not endorsed by the executive branch, thus weakening Buckendahl's conclusions. Experts emphasize that when determining sample sizes, the perspectives of various subgroups must be taken into account, with larger samples needed for more variables. The SED acknowledged that its survey database did not include race or ethnicity data, compromising its ability to ensure demographic appropriateness in its sample. Future surveys must either include such demographic data or supplement initial surveys to achieve proper representation.