Home » Whitepapers & Articles

Sampling and Quality Assurance

Herbert L. Roitblat, Ph.D., Principal, OrcaTec LLC

In two recent cases the court has provided strong advice about the need for transparency, sampling, and quality assurance in eDiscovery. In February, Judge Facciola ruled in United States v. O’Keefe, No. 06-249 (D.D.C. Feb. 18, 2008) that disagreements over the adequacy of search terms, like those over the adequacy of discovery in general, would have to be argued on a factual basis.

The government accused a Toronto Consular official, Michael John O’Keefe, Sr. of accepting quid pro quo gifts to expedite visa requests.

Among the defendants’ complaints over the results of the government’s discovery process, they argued that “there are inexplicable deficiencies in the government’s production of electronically stored information.” Judge Facciola ruled that “vague notions that there should have been more than what was produced are speculative and are an insufficient premise for judicial action.” They also complained that the query terms used by the government, “early or expedite* or appointment or early & interview or expedite* & interview,” were also inadequate. Here, too, Judge Facciola ruled that these speculations would have to be supported by evidence, writing:

Whether search terms or “keywords” will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics…. Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread. This topic is clearly beyond the ken of a layman and requires that any such conclusion be based on evidence that, for example, meets the criteria of Rule 702 of
the Federal Rules of Evidence.

It may be a complicated question whether key terms will yield the responsive or privileged documents, but it is much easier to determine whether they have returned the necessary information after the fact. The best, most principled, evidence that a search system has been effective is to use sampling to determine if the process has yielded effective results. This is the opinion expressed in the second case.

Victor Stanley, Inc. is suing Creative Pipes for allegedly stealing product designs. In the course of this case, Creative Pipes produced a number of electronic documents. Among the documents produced were 165 documents that the defendants later claimed to be privileged and inadvertently produced. On May 29, Judge Grimm ruled that the documents were voluntarily produced, and thus privilege was waived.

Judge Grimm determined that the defendant had not made a reasonable effort to identify these privileged documents before producing them to the plaintiff. As evidence of their reasonable efforts, the defendants claimed that they had provided their forensics expert with a set of about 70 keywords to search for potentially privileged documents. The documents retrieved using these keywords were then examined by an attorney or one of the parties. Additional documents, which the forensics expert claimed were not searchable were reviewed only by title, because any further review would have been impractical and burdensome.
The defendants provided no information about whether the 165 documents were flagged by the keyword search or were contained in part of the collection that was not keyword searchable.

Judge Grimm noted that:

[T]he Defendants are regrettably vague in their description of the seventy keywords used for the text-searchable ESI privilege review, how they were developed, how the search was conducted, and what quality controls were employed to assess their reliability and accuracy…. nothing is known from the affidavits provided to the court regarding their qualifications for designing a search and information retrieval strategy that could be expected to produce an effective and reliable privilege review. …

The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive. There is no evidence on the record that the Defendants did so in this case.
The plaintiff asserted that the files that were supposedly unsearchable, were, in fact searchable, but that they consisted largely of image files and related materials that are not typically found to be privileged. Further, the 165 documents that were the subject of this dispute were all text and, therefore, were all readily searchable. Therefore, the plaintiff asserted, the defendant failed to do those reasonable things that would have allowed them to identify the privileged documents (such as search PDF files or OCRing nonsearchable PDFs).

Judge Grimm was not persuaded that the burden of the privilege review on the defendant precluded doing a more thorough job. He concluded by noting that
Use of search and information retrieval methodology, for the purpose of identifying and withholding privileged or work-product protected information from production, requires the utmost care in selecting methodology that is appropriate for the task because the consequence of failing to do so, as in this case, may be the disclosure of privileged/protected information to an adverse party, resulting in a determination by the court that the privilege/protection has been waived. Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology. The implementation of the methodology selected should be tested for quality assurance; and the party selecting the methodology must be prepared to explain the rationale for the method chosen to the court, demonstrate that it is appropriate for the task, and show that it was properly implemented.

Taken together, along with some other recent rulings, it seems fairly clear that eDiscovery practice needs to be more transparent and empirical. Is information being properly classified? How can you demonstrate that you have done a reasonable job at identifying documents for privilege or responsiveness? Judge Grimm also suggest the solution to this problem—using sampling to examine both the documents that have been accepted and those that have been rejected. This approach need not be expensive, but it does have to be done carefully to be of value.

Contact OrcaTec for more information.

About the Author:
Herbert L. Roitblat, Ph.D. is a Principal and co-founder of OrcaTec LLC, which provides consulting and software for electronic discovery, intelligence analysis, and knowledge management. Before starting OrcaTec, Dr. Roitblat was Chief Scientist and a co-founder of DolphinSearch. He is Chairman and co-founder of the Electronic Discovery Institute, a member of the Sedona working group on Electronic Document Retention and Production, and a member of the Advisory Panel for the Georgetown Advanced E-Discovery Institute. Dr. Roitblat has been long been a thought leader in electronic discovery, writing extensively about the problems of dealing with massive amounts of electronically stored information and the emerging standards for dealing with those problems, such as sampling and quality control.

Newsletters

Please feel free to subscribe to our periodic newsletter.