Search is simple, right? Just type what you are looking for into a google like interface and click, you see the results you were seeking. If only it were that easy. You see, under the hood of search there is far more to understand than what meets the eye. Within the legal profession, attorneys have historically relied upon keyword searches to find the needle in the haystack within the exabyte flood of available information. But how do they know the keywords are right? How do they know they have performed the proper due diligence needed to best serve the client in the most efficient and effective manner? For example, if the keyword is car. That would effectively exclude wheels, ride, vehicle, "beemer", or any additional linguistic term that would express the same concept. I am not an attorney. However, Jeffrey Ritter, an attorney for whom I have a tremendous amount of respect, pointed out to me a case called Hooper that demonstrates that a prevailing practices defense does not apply when affordable technology is readily available that can avert disaster. Furthermore, I believe publicly traded firms engaged in frequent litigtaion have a fiduciary responsibility to thier shareholders to leverage the technology advances that will ultimately have a positive impact on the bottom line. In a presentation I attended on eDiscovery, one of the speakers stated that approximately 25% of litigation spend can be attributed to e-Discovery. The largest chunk of that expense is the document review performed by attorneys, who have provided keywords to fish out the documents that they will then review for relevancy. There are a few things about this model that stinks like rotten fish on a hot summer day.
1) Keywords are not a efficient means of finding all responsive and relevant information. As Judge Facciola stated in O'Keefe: "Whether search terms or 'keywords' will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics.... Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread. This topic is clearly beyond the ken of a layman." Judge Facciola went on to state that experts are needed to be able to speak to this complex area.
The problem is there are numerous search technologies available and the technology keeps evolving. In addition, experts on the comprehensive topic are in short supply. Those who are truly regarded as experts will be the first to admit that they don't know everything about this still evolving field of computer science.
2) Attorneys work on the billable hour. Efficiency is not to their economic advantage as they don't make partner by billing less hours. Consequently, it seems obvious that there is not true vested interest in them becoming more efficient at having less information to review so they work less hours and bill the client less money. Hmmmmm, is that smell getting to you yet?
In all fairness, there are a number of attorneys who see this ethical dileema and have emerged as thought leaders in advocating better protocols and tools for executing search. On http://edrm.net/projects/search a framework is defined for search. This is an excellent starting point for anybody who still believes search is simple.
It is important to understand that electronically stored inforamtion is dynamic. It travels at the speed of light and can be nested within other formats that indexes and parsers will not see. To that point, it is important to clearly understand the search tool in use and what it will and will not do. Furthermore, it is also important to document, test, record, and audit in order to establish a reasonable protocol. Such efforts should be shared with all stakeholders in the litigation as the litigation should be fought on the merits of the case itself.
Lastly, it is valuable to understand that comptuers function based on rules. Rules can also be defined with respect to search in the form of concepts. Probabilistic Latent Semantic Analysis (PLSA) is a newer method that makes sense because it is a statistical technique for the analysis of two-mode and co-occurrence data. PLSA evolved from Latent semantic analysis, adding a sounder probabilistic model. PLSA has applications in information retrieval and filtering, natural language processing, machine learning from text, and related areas. It was introduced in 1999 by Jan Puzicha and Thomas Hofmann, and it is related to non-negative matrix factorization. In English, it allows the machine to learn the concepts using statistics and pull back the responsive information being sought without the complete reliance on a list of keywords that may not be worth the bytes they occupy.
Bottom line, companies that are engaged in frequent litigation will benefit financially by brining in concept search and using it to cull down the amount of data that needs to be reviewed, Taking this a step further, companies can also save money by adding additional technology that will deduplicate and produce the search results in the review platform format utilized by their counsel. However; if the company is large enough, they may want to host their own review platform for counsel to be able to monitor the metrics on attorney performance as well as maintain the security of their information assets. Companies that engage in the practice of sending large volumes of data to counsel are also risking the loss of control of their information. Laptops get stolen. Attorneys move on to other firms and documents get printed and stored in banker boxes to which any para-legal or secretary can typically access a box of settled smoking guns that were never meant to see the light of day. But that is an orphaned data topic which will be covered in another posting. Happy searching and may the force be with you always.