Mindseye - Make Your Keywords Work Smarter, Not Harder

Published on June 2016 | Categories: Documents | Downloads: 23 | Comments: 0 | Views: 161
of 5
Download PDF   Embed   Report

Getting the right set of keywords is important for maximizing the efficiency of the discovery process and ensuring a high-quality production set. And while you want to ensure that you develop the optimal set of keywords up front, new technologies are emerging that enable you to use keywords iteratively throughout the process to refine, improve, and even expedite the discovery process. This article discusses how keywords can be removed, expanded, stemmed, and modified iteratively to better manage data volumes and improve the overall quality for the final production set.

Comments

Content

1

FOCUSED DISCOVERY

Making Your Keywords Work Smarter, Not Harder
By Jeffery C. Fehrman, Chief Strategy Officer of Mindseye

H

ave you ever seen A&E’s reality TV show, “Storage Wars”? The concept is simple. After viewing the contents of a repossessed storage unit for five minutes from the unit’s door, professional buyers take a guess at what’s inside and bid to win the contents of the unit with the hope of discovering valuable goods buried deep within. Sometimes their arbitrary bids pay off, sometimes they don’t. The way most companies conduct keyword searches during eDiscovery is the legal equivalent of “Storage Wars.” Based on little information, a preliminary list of keywords is used to launch a costly discovery process that ends with a review set that includes more data than is needed and may or may not have the most relevant information included. Now imagine this…the “Storage Wars” bidder gets unlimited access to thoroughly explore the storage unit and conduct preliminary searches to gauge the value of the items within, and – using that information – come up with a well-researched, informed bid. The bidder hits pay dirt every time. Is there a keyword search equivalent? Absolutely. The ability to see and analyze the data early in the process to make the informed, cost-effective decisions described in scenario #2 is now possible. New advances in technology over the last few years – which allow you to actually look at your data before you run any search terms and make cost and resource estimates – enable you to uncover more relevant and smaller data sets, save money, and protect you from potential legal exposure.

Keyword Search isn’t Dead; it Just Needs a Fresh Perspective
Keyword searching has been around almost as long as computers – and Boolean since 1854 when George Boole invented binary algebra, now called Boolean logic. And though it’s still the most common technique used in legal discovery to locate potentially relevant data, numerous studies and articles by both the bench and experts in the industry have questioned its effectiveness. As early as 1985, the Blair and Maron study1 found that, though reviewers believed that their keywords had identified 75 percent of the relevant documents, in reality, they had uncovered only 20 percent. In a more recent 2009 Text REtrieval Conference (TREC) study2, keyword searches were found to be even less effective, with only 9 percent of the documents deemed relevant. There have been some prominent judicial opinions that take a negative view on “blind” keyword searches. But many eDiscovery experts agree that keyword searches that follow The Sedona Conference® Best Practices
Commentary on the Use of Search and Retrieval Methods in E-Discovery3 (The

it’s been applied with a black box mentality and has been misused and misunderstood by service providers and consultants. Search is a process and keyword searching is just one tool within this process, best used in combination with such things as concept search, clustering, email threading, and categorization – all of which are most effective with human involvement. These missteps generate unnecessary expenses and are starting to come under the scrutiny of the courts.

Excess, Incomplete Data
A traditional keyword search is a great starting point for finding things that exist, but is ineffective at finding things that deviate slightly and things that are unknown. Search returns are often bloated by false positives and leave behind many of the false negatives that typically go unreviewed. Because a typical keyword search returns only exact matches to the search criteria, many documents that are returned are completely irrelevant. A search for the keyword “apple” could produce documents pertaining to the company, the fruit, or perhaps Chris Martin and Gwyneth Paltrow’s firstborn. Just as problematic, searches can also miss key data if a keyword does not find an exact match in the searched documents. This is the case with misspellings, variations of search terms, or different wording for concepts relevant to the case.

Sedona Conference® Best Practices) are effective when they involve some combination of testing, sampling, and iterative feedback.4 The root of the problem isn’t the tool itself, but the carpenter. Keyword search has been given a bad rap because

Expensive
Not only are current-day searches ineffective, they’re also costly. According to a 2012 Rand Corporation Study5, the review stage of eDiscovery consumes about 73 cents of every

Making Your Keywords Work Smarter, Not Harder

2

CASE IN POINT:
Victor Stanley v. Creative Pipe Highlights Current Limitations of Keyword Searches
The Victor Stanley v. Creative Pipe* case calls out some of the ways in which keyword searches are deficient and outlines approaches that attorneys can employ to defensibly use keyword search given those limitations. The Victor Stanley court noted that designing search protocols “involves technical, if not scientific, knowledge.” The court also observed that designing a computer-assisted privilege review requires (1) careful advance planning by persons qualified to design an effective search methodology; (2) collaboration on search terms; and (3) testing for quality assurance. Jason R. Baron, the director of litigation at the U.S. National Archives and Records Administration, said in response to the findings, that “what Judge Grimm has done is give a road-map to lawyers in the United States on how to present to a court how they went about searching for relevant documents.” * Victor Stanley v. Creative Pipe, Inc. 2008 WL 2221841 (D. Md. May 29, 2008)

and lengthen the timeline. In addition, searches that return false positives and overlook false negatives can inflate the number of documents that move to review. What isn’t known could really cost you.

Courts Want Transparency
More and more, the courts are starting to pay attention to how search terms are developed and used. As evidenced by case law (see Case in Point), the courts have growing concerns about the integrity and accuracy of keywords because the selection process, as it stands, is manual and based on limited insight and understanding of the documents in question. Ralph Losey, lawyer and author of the e-Discovery Team blog, refers to this process as “Go Fish”6 information retrieval. The plaintiffs in Kleen Products v. Packaging Corp. of America7 (No. 10-5711) recently withdrew their demand that defendants apply a predictive coding strategy and agreed to apply their ESI search methodology based on an iterative keyword strategy. In addition, alternative search methodologies may be revisited for any document productions after October 1, 2013. There are currently three other matters in which predictive coding or technology assisted review are in the spotlight that could have additional impact on the expectations of the courts. Aside from the courts, drafting search terms and agreeing to them with opposing counsel before ever looking at the data is doing law firms and clients a disservice.

within the collection developed through the custodian interview process, they run the terms against the ingested data, and – instead of looking at the documents – they review the limited information provided by search reports to evaluate which documents are responsive. Like a volley, the keywords are then refined and lobbed back over to be run again, back and forth, iteration after iteration, until everyone is comfortable with the counts. The problem is that review is focused on what is already known or believed to be known and the whole process doesn’t actually involve the content of the data. Decisions are made by looking at percentages on a spreadsheet. To put this in perspective, imagine running a Google search and instead of pulling up results with snippets, hit highlights, and Web page addresses, you simply receive the total number of pages that hit on the keyword searches that were run. This is essentially what happens with the current discovery search process. It provides no context, no details as to why the searches are ineffective, no understanding of how to refine the searches, and no indication of other areas that should be explored. There is technology that exists today that makes it affordable and possible to dive into the data earlier in the process and then develop keywords to carve up the data and prioritize what gets looked at, when. Preliminary searches around the basics of the case and the key timeframes and participants should then drive keyword development, not the other way around. When a subset of the data is investigated, building and refining a list of keywords is easier and a much more fruitful exercise. The process of developing keywords first and searching second is also problematic in the case of custodians. As shown in the Blair and Maron study, believing you’ve found 75 percent of what you know results in finding 20 percent of what is actually relevant. An effective use of keywords would start by interviewing key personnel, loading data while developing searches from information gathered in interviews,

What’s Gone Wrong
dollar spent on electronically stored information (ESI) production, while collection and processing consume about 8 cents and 19 cents, respectively. Reducing the amount of data that’s moved to review would greatly reduce the high costs associated with review. The typical back-and-forth process of developing and evaluating keyword lists over and over again also wastes time and money. Each iteration and new search can ratchet up the price Keyword searching can be effective for identifying a starting point for finding things that exist within the data set. But it’s failing because it’s being used as a stand-alone tool to identify what to move from processing to review without validating results.

Putting the Cart before the Horse
The typical scenario goes like this: attorneys develop a list of keywords that they think might be contained

Making Your Keywords Work Smarter, Not Harder

3

investigating results to find additional points of interest and low recall, refining, and then re-interviewing as needed to understand and uncover more – then repeating if necessary. Using the data to direct custodian interviews, as opposed to letting custodians point you to where the data might live, will help you uncover more valuable and useful information and help to limit surprises during the review phase.

the scope helps identify things such as sidebar conversations that might not get picked up by blind searches. The themes contained within a data set can go in many directions that all get to the same conclusion – how can you reasonably find the themes when you are only chasing one of them?

screen, and common challenges, such as uncovering a key custodian at the eleventh hour of review, can potentially be avoided. Start with what is known and expand out from there. There’s always information with which to start – use that to point you in the right direction and help develop the search terms. Take a few established things (key time periods of interest, custodians who would have had direct involvement in the matter, the basics of what the litigation is about) to set the direction of the case and go where the data points you. Pick a term to use as a starting point and search through the metadata: file names, email subjects and other areas where custodians are more likely to use the standard language to talk about the subject of interest. From there, you can look into the documents to determine people who were referenced in the document and other relevant materials.
In the Victor Stanley case, Judge Grimm

A New Approach
In addition to fixing some of the problems that are currently hindering search effectiveness, there are other changes and technological solutions that would greatly improve the ways in which keyword searches are conducted and employed.

Treating eDiscovery Search as Enterprise Search
As noted by Kamal Shah in his article,
“Enterprise Search vs. E-Discovery Search: Same or Different?”8, enterprise

search is based on speed and simplicity, which is designed to process a single search query and deliver results for that particular search. As in the case of Google, the search engine executes the query and the user isn’t privy to what the engine has actually searched for. In eDiscovery, speed and simplicity are also important, but in addition – as Judge Grimm stated in the Victor Stanley case – users have to be able to show how their searches were executed.

Search Needs Technology and a Human Touch
Keyword searching should be a blended, iterative approach that combines both humans and technology – never only one or the other. As the Blair and Maron study demonstrates, a computer can’t distinguish relevant cases from irrelevant cases by searching on the full text of a case. Once a person understands the case at hand and the language in the documents, then technology can be applied. A computer can’t interpret the case.

Searching with Blinders on
In discovery, people tend to begin a search thinking they know what they’re searching for and often times don’t deviate from their original assumptions. Keyword searching has traditionally been a culling process to remove nonresponsive documents and has been treated as a linear progression, with search terms being used to narrow down what is relevant to the case. With these approaches, key themes, documents and even potential custodians can get overlooked. Search terms should be used to target what you know and investigation of those results should be used to uncover what you don’t know. Knowledge is power and it’s hard to expand your knowledge if you don’t research the things you don’t know. Keywords should guide the direction of your investigation, not limit it. Broadening

Investigate Data Early and Often
Spending more time at the beginning of the process to fully understand the data helps ensure that the data you’re moving into review is most likely to be relevant. Search is not just about which documents should be produced – early analysis can provide you with insight into the case itself. The more you investigate the data on the front end, the more you can uncover, understand, plan, and make informed decisions throughout the process. Testing search terms early gives you a feel for what’s relevant and what’s not and will inform your direction. You’ll start to uncover documents and individuals who weren’t necessarily on your radar

cites The Sedona Conference® Best Practices as a source of best practices stating, “In this regard, compliance with The Sedona Conference® Best Practices for use of search and information retrieval will go a long way toward convincing the court that the method chosen was reasonable and reliable.” The Sedona Conference® Best Practices document suggests that key components of an effective search methodology include:



Testing searches to identify whether they are producing over- or under-inclusive results. Sampling documents determined to be privileged or not to arrive at a comfort level that the categories are neither overinclusive nor under-inclusive. Getting iterative feedback by refining searches based on the testing results and validating the refinements.9





Making Your Keywords Work Smarter, Not Harder

4

Find the Right Solution
Look for a solution that:





Allows you to weed out false positives – To avoid inclusion of irrelevant files, you need the ability to detect variations in the data. A solution that identifies uniqueness and outliers will help pinpoint issues with particular terms. Lets you establish workflows and replicate exercises – Data tells a story. As you add more data, that story can veer in many directions. You need a solution that can help you retrace your steps, easily apply what you learned in a dynamic fashion, and sample areas of the population that have gone unexplored. Define an iterative process and continue to revisit until there is comfort that a reasonable effort has been made to uncover the documents relevant to the request with documentation to show what was done and steps that were taken to validate. Has flexible and faceted search capabilities – This allows keywords to be removed, expanded, stemmed, and modified iteratively during processing. It provides the ability to apply, evaluate and organize searches, as well as group and classify sets of documents that are isolated and comment as to why they’re important. Provides detailed reporting throughout the process – Instead of producing one final search report, look for a tool that generates details about data early and often.

Gives you the ability to transfer knowledge and background to other involved parties – Make search term development and history transparent to people further downstream in the process, including opposing counsel and the courts, if needed. Offers users Web-based access to the data – This gives users the ability to receive reports and dive deeper into data directly from their browsers.





The Net Net
Keyword searching is currently a powerful but misused tool. Properly used, It can produce less – but more applicable – data in a more costeffective way. Re-examining how keywords are applied and choosing the right tool for the job are vital to making that happen. It’s not just a tool to whittle down data, but a way to investigate data to discover more and review less. Taking the time up-front to assemble and examine the data to develop and iterate on keywords saves time and money further downstream. This measure-twice-cut-once strategy is a reliable way for legal departments and law firms to develop more effective and predictable budgets. And providing transparency into the process of developing keywords will minimize legal exposure and meet the expectations of the courts.





Making Your Keywords Work Smarter, Not Harder

5

TUNNELVISION ALLOWS KEYWORDS TO WORK SMARTER
TunnelVision is a powerful and comprehensive eDiscovery solution that uncovers more relevant and smaller data sets, saves money, and protects you from potential legal exposure by:

Endnotes
1

David Blair & M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System, (1985)
2

F. Zhao, D. Oard and J. Baron, Improving Search Effectiveness in the Legal E-Discovery Process Using Relevance Feedback, (June 2009)
3

The Sedona Conference®, Best Practices Commentary on Search and Retrieval Methods, (August 2007)
4 Philip

Favro, Mission Impossible? The eDiscovery Implications of the ABA’s New Ethics Rules, (August 2012)
5 6



Giving you the ability to iterate on keyword searches and investigate data throughout processing, so that searches produce the most pertinent data and weed out false positives and nonresponsive or duplicate data; Providing a dynamic, flexible environment that offers faceted search capabilities, allows search criteria and history to be replicated, and has the ability to detect variations in the data; and Making the keyword development process transparent to all groups invested in the discovery process, including clients and the courts.

Rand Corporation, Where the Money Goes, (2012)

Ralph Losey, Child’s Game of “Go Fish” is a Poor Model for e-Discovery Search, (2009)
7

Milton I. Shadur, Kleen Products, LLC, et al v. Packaging Corporation of America, (April 2011)
8 9

Kamal Shah, Enterprise Search vs. E-Discovery Search: Same or Different? Ibid





For more information about TunnelVision or to request a demo, please visit www.mindseyesolutions.com.

Jeff Fehrman has worked for more than 15 years in the electronic evidence and information technology fields. He consults with clients on a variety of topics related to eDiscovery, including business processes, data reduction strategies, litigation preparedness, and workflow design. A subject matter expert on electronic evidence, Jeff frequently speaks on innovations and obstacles facing corporations and law firms today, and is also on the Board of Governors for the Organization of Legal Professionals (OLP) and is cofounder of EDD Blog Online. Contact Jeff at [email protected] or (571) 483-0639.

Discover More. Review Less.

TM

2301 Columbia Pike | Suite 121 | Arlington, VA 22204 | www.mindseyesolutions.com | 1.888.770.3876

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close