Corporate America Needs Help With Collecting Electronically Stored Information:
ASCLD-LABS International Should Be Their Guide
In December of 2006, several amendments and additions were made to the Federal Rules of Civil Procedure. These changes put companies on notice that electronically stored information is considered evidence and must be handled in a systematic and non-destructive manner. For corporate America these changes translated to significant legal and financial risks. This paper examines the risks and provides guidelines corporations can use to mitigate their exposure.
Russell David Nomer
This paper would not have been possible without the assistance of some very special people. As such, I would like to thank: Professor John Kostanoski, SUNY Farmingdale, John Dowd, Esq. of Akin Gump, Gideon Schor, Esq., John Jablonski, Esq., Glen Kaplan, Esq. of Akin Gump, Jeffrey Ritter of Water’s Edge Consulting, Mr. Jim Christy of DC3, Karen Schuler of Onsite 3, and Eric Seggerbruch, Esq. of Guidance Software. I would also like to thank my wife Anne Nomer and my two children Zachary and Zoe.
What is ESI, eDiscovery and Forensics?
We live in a litigious society. At the essence of each legal matter is evidence. The information age has done much to streamline operational efficiencies within corporate America, however; it has also created an overwhelming amount of electronically stored information which spans servers, hard drives, network arrays, storage area networks, voice over IP systems, personal digital assistants, cell phones, web mail systems, USB san disks, IPods, jump drives, copy machines, scanners, printers, and any device capable of storing digital information.
Electronic Discovery is the preparation, review and production of electronic documents from computer storage media and memory (Nelson Sharon., 2006). This electronically stored information is evidence and must be treated as such when utilized for litigation. Computer Forensics is the use of tightly controlled procedures to collect and preserve data from a single computer or enterprise system. Forensic collection methods preserve files in exact, unchanged form. File attributes such as file create date, file modified date, and archive bit remain unchanged. Forensic collection software circumvents file access control measures employed by the operating system. Forensic collection methods retrieve file fragments and some deleted files from computer systems and handheld devices. Forensic collection experts are trained to carefully document each step followed in the data collection. (Nelson Sharon., 2006) This is useful if the technician doing the collection later needs to testify in court.
Dec 2006 Changes to the Federal Rules of Civil Procedure
In December of 2006, changes to the Federal Rules of Civil Procedure made it clear that Electronically Stored Information is to be treated as discoverable evidence. Consequently, corporations were placed on notice that tampering with evidence or losing evidence would not be acceptable. (Federal Rules of Civil Procedure, 2006) As a result of these changes, many corporations looked to internal and outside counsel for guidance in handling data collection. Unfortunately, attorneys lacked the technical expertise to effectively consult on best practices for collection efforts. As a result, they were forced to engage with IT professionals. Being very analytical beings, IT professionals saw opportunity and the eDiscovery industry launched very quickly. Historically, launches of this magnitude lack standards. The eDiscovery industry launch is no different. To date, the lack of a standard has created a tremendous outpouring of various technology vendors who claim to be experts in eDiscovery. As one might imagine, this outpouring of experts has resulted in conflicting opinions and viewpoints regarding what will work best for businesses engaged in eDiscovery efforts. For example, some experts tell companies to fight tooth and nail not to provide native data. Other experts claim that providing native data will reduce processing costs. In order to bring some sense to the gambler mentality prevalent throughout the eDiscovery world, this article will propose that the lack of a standard in ESI collection within civil litigation can be bridged by examining and adopting how ESI collection is handled within crime labs. ASCLD Labs is the certifying body for crime labs and this paper will examine how the international standard ASCLD labs utilizes to certify crime labs can be leveraged as a standard for ESI collection within corporate America. (ASCLD/LAB, 2006)
American Society of Crime Laboratory Directors / Laboratory Accreditation Board History
“In the fall of 1973, a group of forty-seven (47) crime laboratory directors from around the United States were invited to meet with FBI Director Clarence Kelly, FBI Assistant Director Briggs White and other FBI personnel in Quantico, Virginia. The purpose of the meeting was to open channels of communication between crime laboratories around the country and the FBI. The meeting was well received and led to an agreement that an association of crime laboratory directors should be created. In the spring of 1974, a smaller group of individuals, who attended the initial meeting, met and began working on an organizational proposal. (ASLCLD/LAB, 2006)
In the fall of 1974, a second meeting of laboratory directors was held at Quantico. The participants at this second meeting officially formed the American Society of Crime Laboratory Directors (ASCLD).
During the same time period that ASCLD was being born, a national voluntary proficiency testing program was initiated and carried out by the Forensic Science Foundation with funding from the Law Enforcement Assistance Administration (LEAA). The reported results of this voluntary proficiency testing soon made front page headlines in most newspapers around the country. The results reported from the voluntary testing implied that there were serious concerns about the quality of work performed in some of the nation’s crime laboratories.
The newly formed ASCLD recognized that action must be taken to establish standards of operation for crime laboratories and to take appropriate steps to restore public confidence in the work performed by the nation’s crime laboratories.
As a result, one of the early committees appointed by ASCLD was the Committee on Laboratory Evaluation and Standards. Members of that committee were Tony Longhetti, Jack Cadman, George Ishi, Carlos Rabren, Travis Owen and Ralph Keaton. The committee was chaired by George Ishii, Tony Longhetti and Jack Cadman at various times. For approximately four years, the committee considered and worked on various programs that could be used to evaluate and improve the quality of laboratory operations. The committee considered individual certification, a self-assessment program and an accreditation program based on external peer review as a possible means of achieving the goal.
Each year the committee presented its work and proposals to the ASCLD membership at its annual meeting for input and approval. The committee eventually became the ASCLD Committee on Laboratory Accreditation and a program of laboratory accreditation was approved in concept by the ASCLD membership in the fall of 1980. On June 11, 1981, the committee held an organizational meeting in Quantico. At that meeting the committee which consisted of the original members was expanded by adding Joe Gormly from the National Association of Chiefs of Police and Ron Myers from the National District Attorneys Association. The first Board of Directors of the American Society of Crime Laboratory Directors/Laboratory Accreditation Board (ASCLD/LAB) met and elected Carlos Rabren from Alabama Department of Forensic Science as the first chairman and Travis Owen from the Acadiana Crime Laboratory in Louisiana as the first executive secretary.
In February 1982, an informal meeting of the Board was held at the AAFS meeting in Orlando. At that meeting Chairman Rabren announced receipt of the first applications for accreditation from the eight laboratories of the Illinois State Police. He announced that he had appointed an inspection team consisting of Tom Nasser, James Buttram, Daniel Dowd and Stanley Sobel. An inspection fee of $5,250.00 was approved.
In May 1982, the inspection reports for the eight (8) laboratories were considered by the Board and the eight laboratories from the Illinois State Police became the first eight (8) laboratories accredited by ASCLD/LAB. In September 1982, three (3) laboratories from the Arizona Department of Public Safety were accredited. The next laboratories accredited included six (6) laboratories from the Washington State Patrol, the Oakland Police Department, the Kansas City Police Department, the Burlington County, New Jersey Forensic Laboratory, the Bureau of ATF San Francisco and Rockville laboratories, three (3) laboratories of the Missouri State Highway Patrol and the University of Tennessee Toxicology and Chemical Pathology Laboratory.
In September 1984, four (4) laboratories from the Michigan State Police were accredited bringing the total of accredited laboratories to thirty (30) representing ten (10) laboratory systems. This number of accredited laboratories and accredited organizations met the pre-determined minimum numbers of twenty-five (25) laboratories and ten (10) laboratory systems required by the bylaw to form a Delegate Assembly as the new governing body for ASCLD/LAB.
In November 1984, ASCLD/LAB Chair Thomas Nasser sent a notice to all Delegate Assembly members that the Delegate Assembly would meet for the first time in September 1985 during the annual meeting of ASCLD at Quantico. At the September 1985 meeting of the ASCLD/LAB Board, the US Army Criminal Investigation Laboratory at Fort Gillem, Georgia and seven (7) laboratories of the Oregon State Police were accredited bringing the total number of laboratories eligible for the first Delegate Assembly meeting to thirty-eight (38). The Delegate Assembly held its first meeting and became the official governing body of the ASCLD/LAB Accreditation Program in September 1985.
On February 4, 1988, ASCLD/LAB was incorporated as a non-profit corporation in the state of Missouri. ASCLD/LAB continues to be incorporated in Missouri.
In the five years following the first Delegate Assembly meeting, fifty-four (54) laboratories from sixteen (16) different governmental organizations were granted accreditation. During the next five years from June 1987 to June 1992, thirty-six additional laboratories representing seventeen (17) organizations were accredited. Included in the number of labs accredited during the second five years was the South Australia State Forensic Science Laboratory at Adelaide which in February 1990 became the first international laboratory accredited by ASCLD/LAB.
During the five-year period of June 1992 through June 1997, seventy-two additional laboratories representing thirty-two (32) organizations were accredited. In September 1993, the Centre of Forensic Science in Toronto was accredited becoming the second international laboratory in ASCLD/LAB. In October 1994, Chair Paul Ferrara signed a memorandum of understanding (MOU) with NATA of Australia for joint inspections and accreditations of Australian crime laboratories. Seven (7) additional Australian laboratories were accredited in Australia as a result of the MOU. Accreditation of other international laboratories followed with the accreditation of the New Zealand Police Document Examination Section, the Singapore Centre of Forensic Sciences, the Hong Kong Government Laboratory Forensic Science Division, three (3) New Zealand Environmental Science and Research laboratories, the Centre of Forensic Sciences laboratory in Sault Ste. Marie, Canada and the Hong Kong Police Force Forensic Firearms Examination Bureau.
At the fall 1994 meeting of the Delegate Assembly, Board Chair Paul Ferrara made a presentation and a plea concerning the need for creating a paid position to handle the rapidly increasing workload associated with receiving applications for accreditation and the processing of inspection reports. Dr. Ferrara made it clear that volunteers could no longer effectively manage the day-to-day work required to run a quality accreditation program. On September 1, 1995, Ralph Keaton, who had retired from the state of North Carolina, began working as a part-time Executive Secretary and established an ASCLD/LAB office in his home.
The workload and interest in accreditation continued to grow at an ever increasing rate. In January 2000, office space was leased at 139 J Technology Drive in Garner, North Carolina. The position of Executive Secretary had become a full-time position which was changed to Executive Director. In the spring of 2000, Tara Dolin was employed as a full-time Administrative Assistant and Amy Chalk was employed as a part-time Bookkeeper. In September 2000, ASCLD/LAB employed Richard Frank, Michael Johnston and William Smith as the first three Staff Inspectors. In May 2002, John Neuner was employed as the first Quality Manager for ASCLD/LAB. The number of Staff Inspectors has increased to nine. A listing of all ASCLD/LAB staff with contact information is available on this site.
The accreditation program is and has been a very dynamic program, constantly making upgrades to improve the quality of the accreditation process and to ensure that the stated objectives of the accreditation program are being met. Some the significant changes made over the years are listed below.
In 1992, a requirement for laboratories to conduct an annual review of the laboratory and to submit a report to ASCLD/LAB was added.
In the 1993 Manual, the following changes were made:
Proficiency Review Committees (PRCs) were established and laboratories were required to participate in ASCLD/LAB approved proficiency testing programs which were external to the laboratory.
A requirement for individual competency testing prior to assuming casework responsibility was added.
Four (4) Quality System criteria calling for a Quality Manual, a Quality Manager, Annual Review of the laboratory and an annual audit of the quality system were added to the program as important criteria.
The discipline of DNA was added to the program and laboratories performing DNA analysis were required to follow the SWGDAM “Guidelines for RFLP Typing of DNA.”
In 1997, an Essential requirement was added that each examiner must participate in an annual proficiency test in each discipline in which casework is performed.
In 1999, the following changes were made:
Compliance with Important criteria was elevated from 70% to 75% in order to achieve accreditation.
Criterion 188.8.131.52 was elevated from Important to Essential and the wording was changed to require a “documented” training program in each functional area of the laboratory.
Quality System criteria 184.108.40.206-220.127.116.11 were elevated from Important to Essential.
Individual proficiency testing requirements were upgraded to require successful completion of tests. Successful completion was defined.
Important criterion 18.104.22.168 was added calling for sub discipline proficiency testing.
Criterion 3.3.5, concerning laboratory security was elevated to Essential.
In 2000, the discipline of Crime Scene was added, becoming the first and only optional discipline for accreditation.
In 2001, the following changes were made:
An applicant laboratory was required for the first time to have a criteria file.
Applicant laboratories were required to provide inspection teams with technical procedure manuals prior to the on-site inspection.
The disciplines of Serology and DNA became sub disciplines of the combined Biology discipline.
Auditing of inspection reports was implemented to bring greater consistency to the inspection process.
In 2003, the following changes were made:
Digital Evidence was added as an accredited discipline.
Individual characteristic databases such as CODIS, NIBIN and AFIS were included in the inspection process.
Interim inspections were established as a third form of compliance monitoring, along with proficiency testing and annual audit reports.
The 2003 Delegate Assembly approved the implementation of a dual-track accreditation program which is being implemented effective April 1, 2004. In addition to the ongoing accreditation program which is now referred to as the Legacy Program, ASCLD/LAB has initiated the ASCLD/LAB-International Accreditation Program which is based on the ISO 17025 Standard and Supplemental Requirements which include the essential elements of the Legacy Program and relevant requirements of ILAC G19. Details on the ASCLD/LAB-International Program are available on this site.
As of March 2004, there are 259 laboratories accredited by ASCLD/LAB under the Legacy Program. Included in the accredited laboratories are 159 State laboratories, 62 Local laboratories, 20 Federal laboratories, 9 Private laboratories and 9 International laboratories. As of this date, there are 24 additional laboratories seeking accreditation. (ASLCLD/LAB, 2006)
Clearly, ASCLD laboratories’ experience can be instrumental in providing corporate with a standard for ESI collection. However, before we examine the standard; let us first obtain a better understanding of the business drivers involved in making sure corporations perform ESI collections in a forensically sound manner. To accomplish this understanding, we must first look to the recent changes in the Federal Rules of Civil Procedure.
The Federal Committee on Rules of Practice and Procedure met on June 15, 2005 and considered several changes to the FRCP in relation to electronically stored information (ESI). The rules were approved by the Supreme Court on April 12, 2006 and became effective on December 1, 2006.
Amendments to Rule 16 specified that Parties must discuss electronic discovery issues as part of Rule 16(b) scheduling and planning. In addition, Parties may craft a customized scheme to handle privilege issues. According to the Federal Committee, “The parties may agree to various arrangements. For example, they may agree to initial provision of requested materials without waiver of privilege or protection to enable the party seeking production to designate the materials desired or protection for actual production, with the privilege review of only those materials to follow. Alternatively, they may agree that if privileged or protected information is inadvertently produced, the producing party may by timely notice assert the privilege or protection and obtain return of the materials without waiver. Other arrangements are possible. In most circumstances, a party who receives information under such an arrangement cannot assert that production of the information waived a claim of privilege or of protection as trial-preparation material.” (Federal Rules of Civil Procedure, 2006)
Amendments to Rule 26 state parties must now discuss methods of preservation of discoverable information as well as what format to use for production. The Federal Committee states, “It may be important for the parties to discuss those systems and accordingly important for counsel to become familiar with those systems before the conference. With that information, the parties can develop a discovery plan that takes into account the capabilities of their computer systems.”
New Rule 37(f) provides safe harbor for good faith destruction or modification of information “due to the routine operation of an electronic information system.” The Federal Committee Comments, “Many steps essential to computer operation may alter or destroy information, for reasons that have nothing to do with how that information might relate to litigation. As a result, the ordinary operation of computer systems creates a risk that a party may lose potentially discoverable information without culpable conduct on its part. Under Rule 37(f), absent exceptional circumstances, sanctions cannot be imposed for loss of electronically stored information resulting from the routine, good-faith operation of an electronic information system. ” (Federal Rules of Civil Procedure, 2006)
An amendment to Rule 34(a) (1) specifies data Sampling is now allowed either by agreement or by motion. Sampling may be appropriate where the parties do not know the true sources of relevant data and specific sources of data cannot be otherwise identified in a request to produce. The Federal Committee states, “As with any other form of discovery, issues of burden and intrusiveness raised by requests to test or sample can be addressed” Sampling “is not meant to create a routine right of direct access to a party’s electronic information system, although such access might be justified in some circumstances. Courts should guard against undue intrusiveness resulting from inspecting or testing such systems.” (Federal Rules of Civil Procedure, 2006)
An amendment to Rule 34(b) provides new rules governing the format of production of ESI. Production of ESI should be in “a form or forms in which [ESI] is ordinarily maintained or in a form or forms that are reasonably usable.” A producing party must produce ESI in the requested form or make an objection supported by reasons why it will not produce in the form, along with the proposed alternate form of delivery. The Federal Committee comments, “If the responding party ordinarily maintains the information it is producing in a way that makes it searchable by electronic means, the information should not be produced in a form that removes or significantly degrades this feature.” . . . “The amendment to Rule 34(b) permits the requesting party to designate the form or forms in which it wants [ESI] produced. The form of production is more important to the exchange of [ESI] than of hard-copy materials . . .” (Federal Rules of Civil Procedure, 2006)
eDiscovery Business Drivers
From a corporate perspective, the term risk mitigation is an ideal label to address the collective risks that drive eDiscovery efforts. In most scenarios, massive volumes of electronic content are piled up in a manner consisting of all content types. Often the content is distributed in numerous areas, many of which are walled-off silos. Costs further exacerbate these risks because it is operationally expensive and difficult to manage the content. In addition, corporations are often uncertain as to whether the content should be handled via backup or archive solutions. Lack of eDiscovery standards within corporate has added to this confusion. Furthermore, the threat of inadvertent destruction of data in violation of litigation hold presents additional risks including: reputation risk to the enterprise; exception handling within backup policies; High stakes of spoliation as clearly noted within Arthur Andersen, Zubulake, Morgan Stanley & Co and Sarbanes-Oxley litigations.
Corporate eDiscovery efforts should span the business in an attempt to manage records, identify what’s relevant to the litigation, preserve relevant electronically stored information, and collect relevant electronically stored information in a forensically sound manner. Unfortunately, lack of standards and differing opinions from various judges have only added to the confusion surrounding eDiscovery efforts. Much of the confusion surrounds metadata and embedded data.
Applicable Case Law
For example, In the case of EEOC v. Lexus of Serramonte et al., No. 05-0962, 2006 WL 2567878 (N.D. Cal. Sept. 5, 2006). Production ordered by the court in the form maintained by the defendant. "If Defendants do not maintain the requested information in the format specified by Plaintiff, then they shall produce in whatever form they do maintain it." In Smith v. Clark, 2006 U.S. Dist. LEXIS 38804 (S.D. Ga. Jun. 12, 2006) the court compelled the defendant to produce QuickBooks accounting data in the native format according to the Plaintiff’s request. Nova Measuring Instruments Ltd. v. Nanometrics, Inc., 417 F. Supp. 2d 1121 (N.D. Cal. 2006). Court ordered native file production with metadata when the Nanometrics failed to provide a reason why production should not be in native format. Williams v. Sprint/United Management Co., (230 FRD 640 (D. Kan. 2005). Metadata scrubbing not allowed. Defendant "should have been aware that the spreadsheets' meta data was encompassed within the Court's directive that it produces the electronic Excel spreadsheets as they are maintained in the regular course of business." Defendant "failed to show sufficient cause for its unannounced and unilateral actions in locking certain data and cells" in the Excel Metadata. (Applied Discovery: Case Summaries)
Recent Case Law, Metadata & Embedded Data
Everybody has heard about metadata. The classic definition is “data about data.” It is descriptive data contained in or associated with documents and computer files. Embedded Data is different from metadata. This is actual data obscured within files rather than data about the files. Some embedded data can be exposed by changing display settings in the native application. Examples include tracked changes, comments, and hidden cells. Other embedded data is obscured and cannot be viewed at all using the native application. However, obscured embedded data can be viewed by using alternate methods. Examples of this include author history, printer history, Outlook email information, and fast save data in spreadsheets.
Electronic Discovery efforts are going to differ with each litigation, however; according to The Electronic Discovery Reference Model project, eDiscovery efforts can be broken down into the following components: Records Management, Identification, Preservation, Collection, Processing, Review, Analysis, Production and Presentation.
According to ANSI/ARMA 9-2004, “A record is information created, received, and maintained by an organization or person that is evidence of its activities or operations, and has value requiring its retention for a specific period of time. It can be used in pursuance of legal and regulatory obligations. Records Management is … the field responsible for efficient and systematic … control of records, including processes for maintaining evidence of …business activities. - ISO 15489-1(Ritter, Jeffrey., Worstell, Karen., Evaluating the Electronic Discovery Capabilities of Outside Counsel)
For the purposes of this paper, we are specifically interested in convincing the corporate world that the American Society of Crime Labs Directors international standards which are presently utilized to certify crime labs provide a tried and true risk mitigation strategy for data collection and evidence handling. In light of the scope and order of magnitude of data most companies possess, it is clear that perfection cannot be achieved. As such, our best risk mitigation strategy is a repeatable process. A lack of control over information leads to inefficiency and risk. ASCLD/LAB-International provides us with a scientific, repeatable process that is built upon ISO/IEC 17025:2005 (ASLCLD/LAB, 2006)
ISO/IEC 17025:2005 specifies the general requirements for the competence to carry out tests and/or calibrations, including sampling. It covers testing and calibration performed using standard methods, non-standard methods, and laboratory-developed methods. It is applicable to all organizations performing tests and/or calibrations. These include, for example, first-, second- and third-party laboratories, and laboratories where testing and/or calibration forms part of inspection and product certification. ISO/IEC 17025:2005 is applicable to all laboratories regardless of the number of personnel or the extent of the scope of testing and/or calibration activities. ISO/IEC 17025:2005 is for use by laboratories in developing their management system for quality, administrative and technical operations. Laboratory customers, regulatory authorities and accreditation bodies may also use it in confirming or recognizing the competence of laboratories. The ASCLD/LAB International certification is an ISO-PLUS Program of Crime Laboratory Accreditation. For our purposes, the ISO/IEC 17025 is further enhanced by ASCLD/LAB-International Supplemental Requirements. Essentially, the eDiscovery group within corporations would first obtain ISO/IEC 17025 standards. Proof of ownership of the ISO/IEC 17025 standards must be presented to ASCLD/LAB International prior to obtaining the supplemental requirements. (ASLCLD/LAB, 2006)
The eDiscovery Process
Data collection is the acquisition of electronic information (data) marked as potentially relevant in a litigation. This paper assumes collection of electronic information by the owner of that information which is intended to be reviewed before production to opposing parties. The exigencies of litigation generally require that electronic information should be collected in a manner that is comprehensive, maintains its content integrity and preserves its form. Increasingly, metadata is required to be collected and maintained during this process and information regarding the chain of custody and authentication is required. Also, today the presumption is that this information will be producible in its native file format whenever possible. The process of collecting electronic information will generally provide feedback to the identification function of eDiscovery which may impact and expand identified content. (EDRM: The Electronic Discovery Reference Model)
Based on the results of the Identification phase, adequate planning of the search strategy is a major key to overall effectiveness in the collection effort. The first question that should be asked is where is the data? A good strategy is to work from a topology of the network and then create a map of the types of data, the locations and the custodians. Search terms, phrases and concepts need to be documented at the outset, along with a list of key company events and timeframes, as well as custodians of interest and relevance.
It is imperative the all knowledgeable persons are included in the planning process. The IT and legal staffs of an organization should be integral to this process. The IT staff will be the best source of information regarding the location and systems used to house data. The IT staff will also be invaluable in all technical aspects of the data collection effort. It is also imperative that the legal department be consulted as ultimately they will have to vouch to the court regarding the completeness and integrity of the data collected.
At the outset of a collection effort, appropriate steps must be taken to preserve the content of the electronically stored information (ESI) and its metadata. This includes ensuring that procedures are in place to preserve privileged work product from the other data collected and produced.
As part of the requirement for chain of custody, all data collected needs to be secured by the collection agent in a manner that both (a) prohibits unauthorized access to the data and (b) tracks all attempts to access the data. As discussed in the previous section, this will also ideally include the ability to audit all access activity.
To assure access is controlled; all collection operators should have a network logon account that is associated with their identity. A comprehensive list should also be maintained mapping the individuals involved in collection to any and all accounts that they have used in the collection process.
A secure computer database for collected data ordinarily requires that (a) a user interface exists to manage security and (b) administrative rights to manage all operator access rights. Operators with administrative rights should be extremely limited and all use should be auditable. Understanding the scope of data that is available is critical to executing a series of searches spanning all available sources of evidence. Searches need to span each of the following "silos" of data: Data in currently running production systems, including e-mail, databases, commercial off-the-shelf (COTS) applications, or other active company records.
Offline Data Files stored on network file shares, local desktop or laptop file systems, on portable storage such as CDs or DVDs, on portable storage devices, on external hard drives, in personal storage files such as a PST file.
Archive Data consists of files stored in a corporate records management system or within an archive including e-mail and instant messaging data.
Backup Data consists of files stored on backup media of any sort, including tapes, snapshots, file-based backups, backups of portable storage devices in any location (onsite, offsite, in transit, at employees' homes, or awaiting disposal/re-use).
Understanding the type of data identified is important to determine the kinds of searches to use to collect data in a complete manner. For example, searching for files from a given custodian could require finding all offline files with a "last modified by" or "created by" account of "company\jsmith." However, finding data for this same custodian in the active e-mail system could require searching for all items "from," "to" or "cc" that contain the email address "email@example.com" in addition to the name of the custodian. In addition, the same custodian may have multiple usernames in different mailboxes or in other databases. All of this information about a custodian must be secured to insure that a complete search result is obtained.
Due to the nature of the complex systems in use in most companies today, files typically undergo numerous transformations throughout their lifecycle. These transformations occur both at the hands of end users and automatically by the operating system or other software in use. Operating system and file specific metadata are added or modified, file formats are transformed, encoded, decoded, encrypted, and many other potential changes make it difficult to assess the evidence was actually created, modified or viewed by a particular custodian. (Michael Arkfeld: Electronic Discovery & Evidence)
When a Word document is created, the data is in a binary file typically with a .doc extension. When that same document is attached to an e-mail and sent across the internet, the attachment is encoded, typically through Base64 Multipurpose Internet Mail Extensions (MIME). Other types of MIME encoding such as 7-bit and "quoted printable" still are used, along with another form of encoding called UUEncode. The receiving e-mail system then de-codes the MIME encoding and re-builds the .doc binary file for the intended recipient's use.
While the majority of encoding and decoding occurs properly, sometimes a file will need to be manually decoded using native tools and procedures (i.e., from a software vendor such as Microsoft) or by using a 3rd party forensics solution. Regardless of the system used there is always a risk that decoding errors occur.
In the end it is important that those involved in collection recognize the fact that an element of risk is always involved whenever data translation/conversion is undertaken, and that these transformations need to be understood and mitigated in whatever way is appropriate.
Scalability of the collection mechanisms is paramount. Performing a comprehensive series of searches across any company's infrastructure can involve searching potentially large amounts of data, often resulting in tremendous volumes of data to be de-duplicated and culled, reviewed and redacted.
Because of the uncertainty of many data search results, in many situations it is simpler and less-risky to break a large search into several smaller searches. This will help avoid running out of memory or other glitches during a search. Of course breaking things into bite-sized chunks requires careful management of the overall search process to ensure nothing is overlooked or otherwise left out.
Regardless of the collection method employed, strict chain of custody records must be maintained for all documents, data, and objects collected so that their authenticity can be assured. Without this assurance the data may not be reliable as evidence in litigation. Every collector, whether a third party vendor, an internal corporate representative or outside counsel representative should document procedures for accepting, storing, and retrieving documents, in the event that he or she may be called upon to testify.( Cricket Technologies: Case Index)
Chain of custody records should be maintained for every "touch" of each item by a search operator. Because the volume of audit history that a large-scale collection project generates can be enormous, selecting tools and processes with automated audit history and the scalability to handle all the audit data is extremely important. Technologies such as Windows Event Logs or Syslog have been proven to scale adequately. Numerous native and third-party solutions exist to parse through, analyze, summarize and report on those types of data logs.
Audit history logs should ideally include a simple means of reporting on: Searches by custodian; Searches by operator; Search list; Searches by keyword/phrase/concept; and Searches by project. The chain of custody of actual data collected is an outgrowth of the tracking method employed through the identification of custodians. Correct identification record keeping coupled with correct chain of custody should create a seamless link from the targeted organizations through possible custodians, actual custodians, and finally, data collected and preserved.
Any collection, whatever the method, should be accompanied by detailed documentation. Different collecting organizations have different chain of custody methods and tracking forms. For example, a hard drive computer forensic expert will normally have chain of custody forms that resemble law enforcement documents whereas a company in the act of shipping thousands of tapes will have documentation resembling a spreadsheet.
Chain of custody for original media, such as hard drives, backup tapes, CD, DVD, etc. should include at a minimum: A unique media ID - This becomes the core tracking number and all of the information extraction from the media; Date and time of receipt or collection of the evidence; The name of the person(s) collecting and/or taking possession of the evidence; A description of the type of evidence (8mm tape, hard drive, etc.); A description of what the evidence represents (Exchange server-Chicago); Any label information (exact); Serial numbers; Description of the physical location at the time of possession; Areas for transferring possession of the media within the collecting organization or to a vendor; Description of collection methodology; Detailed description of data harvested on site; and Check lists related to any on-site filtering of data during collection. (The Sedona Conference: Electronic Document Retention and Production)
Large collection projects that are conducted over long periods of time with many custodians, often call for a database application of some kind to track the information about collected data. This database can be part of the identification tracking system which would insure dynamic integration between identified custodians/key players and their collected data.
A best practice during the collection process is to assign a single point of contact ("SPOC") to control the chain of custody record keeping. A SPOC assures that media IDs will be unique and that other information will be maintained consistently.
Correct chain of custody must show consistent application of the directions of the identification team through the collection process. Chain of custody records should include the following: Detailed and descriptive check lists for any filtering, either manual or automated, performed on site or at a processing facility; Any logs or print-outs of the contents of a custodian's data storage showing files collected and not collected - these logs should be maintained with the chain of custody forms; Reports detailing the progress of any automated collection application should be maintained; Any refusal on the part of a custodian to release data should be documented; and Information as to selection of particular records or objects from systems such as databases should be documented in full. (Pike & Fischer: Digital Discovery and e-Evidence)
Particular care must be taken regarding chain of custody when collecting on-site: The collector should verify the information collected by the identification team with the custodian - corrections should be documented; Any partial collecting of data must be documented, in full, as detailed above; Chain of custody forms should include a form for the custodian to sign and date showing their participation, if any; and In the event that the custodian's work has been identified for on-going collection, the collector must be able to identify what was collected on each visit and not collect the same, unchanged data, multiple times.
In the case where search terms or methodologies are re-used for more than one investigation, additional metadata including as a minimum a "project identifier" should be stored with all search audits. The project identifier needs to be included on all reports and the following report should also be ideally available.
During the collection process, relevant data must be collected from each organization or custodian to be deemed part of an appropriate and defensible collection. Costs and benefits must be weighed to determine what is appropriate and reasonable for the litigation. Courts are increasingly being asked to rule on what must be produced when the parties have difficulty agreeing on the appropriate balance. As technology advances, the amount of data that can reasonably be reviewed is increasing, so too is the amount of data that must be collected.
The collector must be careful to employ all specifications from the identification team in order to collect all the targeted data and only the targeted data. Identifiers often target data by: User/owner or location; Date; Type of systems or files; and Keywords contained in files or systems Regardless of the identification method, however, nearly all data must be collected from a media or a network.
There are three primary categories of data capable of being collected. These are: fixed storage, portable storages, and third-party hosted storage environments not under direct control of data owner.
Hard Drives are a primary source of fixed electronic storage. Hard drives exist in a number of computing environments. These include: Network servers; Backup systems such as RAID devices; Computer workstations; Desktop computers; and Home computers.
Portable storage includes a broad category of electronic equipment using a wide variety of electronic storage media. Traditionally, most portable data was contained on some type of electronic storage disk, such as a CD, DVD, Floppy disk or a Zip disk. Another traditional portable storage media are backup tapes. More recently specialized portable storage devices have been created for a variety of uses. The number of portable storage devices has grown tremendously with the advent of PDAs (Portable Digital Assistants) such as a Palm Pilots and Blackberry devices. Cell Phones and IPods are portable electronic devices that can contain relevant data. Newer devices incorporate the function of both PDAs and cell phones. These devices typically store data on a USB drive or other flash memory device. Other types of portable storage devices function much like a portable hard drive using a variety of storage technologies.
Because portable storage disks and devices are by definition portable and because they are typically not connected to a fixed data storage facility, the collection of data from these devices is laborious and often requires the original device from the custodian.
The electronic data of many organizations and individuals are housed by third parties. Traditional third-party storage sites include accountants and law firms who house many records of an organization. Increasingly many organizations use internet-based services to host a portion of their data. These can include sales information, customer service records and a variety of other online data services. In addition, many individuals utilize online email services such as Gmail and Hotmail. Many organizations utilize instant messaging services such as AIM, ICQ, Yahoo and MSN that are hosted online. In addition, many companies provide automatic back-up services for the networks of large and small organizations.
One of the biggest challenges facing a collection and preservation exercise is determining what collection methods are required or advisable. Each type of data storage requires a different strategy and approach. For example, in fixed storage a manual copy is often utilized. This is sometimes referred to as the "drag & drop" or Windows method for copying data. A manual copy is done by copying selected files and/or directories, then pasting them to a network folder and/or CD. The copying is either done by the computer user or the corporate IT staff. It is unavoidable for directory level meta-data to be modified during the "drag and drop" process. This is the meta-data maintained by the operating system (e.g. Microsoft Windows) such as "Create Date," "Last Modification Date" and "Last Access Date." These are external meta-data fields and will not alter the internal meta-data unique to the file type (unless the file is physically opened). The primary concerns with the manual copy method are that the quality of the data is completely dependent on the operator performing the function and the ability to verify/authenticate the process is limited. (Federal Judicial Center: Materials on Electronic Discovery) Opposing counsel may attack the collection process and the integrity of the data as a litigation strategy. Improper collection methods may alter the content of electronic discovery collected and, in the worst case, jeopardize the admissibility or reliability of the evidence collected. By following the strict collection methods specified within ASCLD-LABS International standards, corporations can significantly mitigate the risk of conducting an improper collection.
The Active Data Copy process is designed to capture all of the "active data" on a media. Active data is the information readily available as normally seen by an operating system, including "hidden" operating system files. Traditionally tools, such as Norton Ghost, are used to make active data copies when transferring files from one computer to another. This is a generally accepted collection methodology when the primary concern is the content of the data and not the activity of the user. Deleted files are not active data as they are not seen by the operating system. The active data copy process alters the directory level meta-data while maintaining the internal file meta-data. A significant limitation of this collection method is that it fails to capture so-called "inactive data," generally data that has been deleted or modified. If there are any allegations in a case regarding a data custodian's deliberate or inadvertent destruction of data, the active data copy process will not provide evidence to confirm or refute such allegations. Using non-forensic tools such as Zcopy or Ghost through a network can be acceptable. The important issue is to define the methodology for the collection and use it consistently throughout the collection with the appropriate checks and balances.( ISO/IEC 17025:2005 General Requirements for the competence of testing and calibration laboratories) Once again, using the standards specified by ASCLD as a guideline will help to significantly mitigate this risk.
The "forensic image process" is the process of creating a "mirror image" copy of a media so that both active and inactive data sets are maintained. This collection methodology is generally required when the activity of a user is as important as the content itself or when concerns regarding the destruction of data may be raised. The forensic image process is the process utilized by law enforcement. This is the process that has been utilized in court by ASCLD certified crime labs. Although the financial and manpower cost of companies obtaining ASCLD certification may appear overly extreme, the hefty sanctions in cases such as Zubulake and UBS should make corporations think twice about dismissing the idea. (Xact:Case Law)
In the forensic image process, court accepted forensic tools are used to capture every bit of electronic information stored on a hard drive. The captured information is stored in a "forensic image" which is generally encrypted and can be password protected. Authentication of the data can be verified using a "hash value", otherwise known as a digital thumbprint, to ensure that no alteration has taken place from the time of the acquisition. Hard copy "chain of custody" documentation captures important logistical information related to the acquisition of the data and is typically used to corroborate the time, place, and personnel involved in the data collection.
The forensic image process is highly detailed and generally requires that a trained forensic specialist performs this function. Substantial documentation is created and maintained so that the party performing this task may be prepared to testify concerning the adequacy and reliability of the process. As a result, it is generally recommended as a best practice to have the forensic image process performed by an objective third-party to avoid any process attacks or spoliation concerns.
Organizations generally build their server architecture around redundancy and disaster recovery, so they employ technologies such as software and/or hardware RAID (shorthand for Redundant Array of Independent Disks) to prevent data loss. RAID systems use multiple hard drives and write data to those disks so that in the event of a hard drive crash, the bad drive can be replaced with no data loss to the whole system. The primary concern with creating forensic images from RAID systems is that the original RAID configuration can be very complex and re-building it can be very complicated, especially if there is a combination hardware/software RAID.
Using a network forensic package may be a good methodology for retrieving data from a network. This process will capture the unallocated and slack areas of the network. These software packages are generally expensive and difficult to implement due to network firewalls and other pre-existing security protocols. These tools are very beneficial for doing a hybrid collection methodology as you can then use a forensic tool to selectively identify targeted types of information from workstations and network volumes, such as email files, Microsoft Office documents, Adobe Acrobat files, etc. (EDRM: The Electronic Discovery Reference Model)
Collection and preservation from network servers can be problematic due to the complex hardware architecture of an organization. In the event of dealing with complex server systems, which can be costly to capture and extract data, a best practice process can be implemented to manage the time and cost of collecting and preserving information from servers. The Supervised Tape Archive process entails plaintiff and defense counsels' agreement on the selection of a mutually acceptable expert or experts who observe the full archival of a computer system to a backup tape. The expert does not implement this process, but only looks over the shoulder of the organization's IT personnel responsible for the normal back-up procedures. When the process is completed the expert then takes immediate custody of the backup tapes with the appropriate chain of custody documentation.
A limitation to this process is that deleted data resident on the hard drives of the network servers will not be captured. This limitation generally does not affect the recovery of email system servers, file servers, and database servers, but certain information, such as partially overwritten data, may not be preserved. To overcome this limitation, it may be possible, depending on the hardware and software configuration of the servers, to create forensic images of the hard drives used in the network.
Some of this discussion will overlap with the above, for instance laptop computers have hard drives and so the discussion above on hard drives would apply except that method of securing the hard drive may be different. Alternatively, the methods for handling PDAs, cell phones and DVDs will differ from the discussion above. The backup tape discussion below is for the Portable Storage Section because backup tapes can be easily shipped.
Backup tapes have historically been used for disaster recovery purposes and not as a primarily repository of electronic evidence. Some notable cases assumed that backup tapes were beyond the scope of normal discovery because of their traditional use and because of the cost and difficulty in retrieving specific information from backup tapes. However, as technology has reduced the cost and difficulty in accessing data from backup tapes, they have increasingly been subject to discovery. If important questions remain involving historical data that is no longer available on a live system or the tapes fall under the definition of "ordinary course of business," they are now more likely to be discoverable as evidence. (Preston Gates & Ellis LLP: Electronic Discovery Law)
When discoverable electronic information is contained on backup tapes, understanding the backup system and its operating protocol are essential to collect all relevant evidence and to help control costs and time. When balancing the relevance of the potential information contained on a backup tapes with the costs of processing those tapes, the type of backup software and tape type need to be identified to understand the complexity of the exercise.
"Simple backup" types are backup sessions that are linear in nature and are most commonly associated with one tape drive per computer. Examples of simple backup types are ARC serve, NT Native, and Backup Exec. Simple Backup sessions start archiving the selected data in sequential order beginning with the primary root directory (e.g. C:\). This information is then written to the tape in that fashion.
"Complex backup" types are backup sessions using enterprise level backup software, such as Legato Networker, CommVault Galaxy, Tivoli Storage Manager (TSM), Omni back, and Netbackup. These systems employ a server-based backup system connected to a tape library. The backup server pulls packets (called "threads" or "streams") of information from the other servers connected to the backup system. The threads are numerically associated to the original server and stored onto the tape media in the tape library. Readable data is not stored on the tapes in these complex backup types as the data is contained within the threads on the tapes.
There are two methods for restoring data from a tape backup system: native environment restoration and non-native environment restoration.
The most common way internal IT departments extract data from backup tapes is using the native environment restoration ("NE restoration") process, which requires the original software (including patches), passwords, hard ware and network configuration. The NE restoration process is most appropriate when dealing with a relatively small restoration and if the restoration will not affect the current active data. However, the NE restoration process can be costly and time-consuming if a large data environment needs to recreated, especially if the time line for the restoration reaches far into the past. (Nelson Sharon., Olson, Bruce., Simek John., The Electronic Evidence & Discovery Handbook. American Bar Association. 2006)
The backup process known as non-native environment extraction ("NNE extraction") is the most common collection method employed by third-party vendors specializing in backup tape processing. The NNE extraction process may be used, with varying degrees, to extract the data from the simple backup as well as complex backup tape types.
The primary benefit of the NE restoration process is [ ]. The primary benefit of the NNE extraction process is that it is relatively quick and less expensive to perform than the NE restoration process.
Collection - Cost Drivers
As many companies manage terabytes of information containing potentially relevant information, it is essential to use a culling methodology to try to get the data down to a responsive set of information. Companies can have thousands of computers and hundreds of complex servers that it controls. Not all of those systems contain data that is relevant to a pending litigation. Efficiently narrowing down the relevant from the irrelevant is the key to controlling the costs in an electronic discovery project.
This process may be illustrated by the following example. A corporation generates three backup tape sessions each month for their email and file server information. Each backup contains approximately one terabyte (1 billion pages), much of which is duplicative information. There are 20 target users whose data is being requested from January 1, 2002 to December 31, 2007. The request is for their documents and correspondence related to an employee matter.
The first step is to extract the data using one of the tape extraction methodologies described above (NE restoration vs. NNE extraction).
The next step is to process the data so that it is searchable and reviewable by target user.
One methodology is to ingest all of the data (all three terabytes) into a search engine and then provide the search criteria to limit the data to target users
The other methodology is to locate the Target Users' data in its raw form and then extract that data for further processing.
In this example using a date filter should be applied to minimize the responsive data set. There are several things to consider when deciding on the date issues. The meta-data fields used for email date filtering should be "Sent Date" AND "Received Date." The meta-data fields used for user files should be "Last Modification Date" and "Create Date." (Note: these dates may be modified or not valid depending on the collection methodology.)
Keyword searching for the employment subject matter can then be applied by the processor of the data. It is recommended that some statistical sampling be done at this stage to determine if the keywords may over-produce. Most vendors and tools can provide statistics prior to finalizing the keyword list so that you can be sure the information will be relevant.
Attorney review - After narrowing the dataset using the above technologies the quantity of data needing attorney review should be quite limited. This is important as attorney review can often be the most expensive part of preparing electronic documents for production. Typically the attorney review will identify relevant and non-relevant documents, identify documents that are privileged and therefore should not be produced and sometimes classify the documents by the issues pertinent to the case. There are a number of different choices available for the review stage including review of extracted text, native files or TIFF or PDF images. Regardless of the review approach used, it is important that the reviewers have the ability to examine the native file with the native application if necessary. (EDRM: The Electronic Discovery Reference Model)
Once the reviewers have selected the documents to be produced, the responsive data can be produced in numerous ways, such as paper, TIFF/PDF, load file with TIFF/PDF, native file, etc.
These steps are illustrated below:
1. Extraction (Tapes, Hard Drives, etc…)
2. Target User Filtering
3. Date Filtering
5. Output #1 responsive data Online Review Tool/Litigation Support Database
6. Privilege Review
7. Final Output
8. Tiff, PDF, Paper
Location of Data
The location of the data is important factor in determining the strategy and costs affecting a collection methodology. Items that are not in daily use, such as CD/DVDs, backup tapes, and removable hard drives can often be sent to the law firm or the vendor for processing. However, security, legal and other reasons there are some things that simply cannot be sent. In many cases computer forensic vendors are requested to go into an organization and create forensic images of the target computer’s hard drive so that the user either does not know that their computer was captured or so that their business day is not affected by the capture process. Backup tapes create special issues because they can be placed anywhere including offsite locations. Accurate information about the organization’s disaster recovery plan and whether any deviations or exceptions to this plan have been made. In a large organization where a single disaster recovery system can take hundreds of backup tapes, it may be difficult to locate and process all of the backup tapes.
Other portable storage devises create similar priorities to quickly identify, locate and determine the best approach to restore relevant data. Additionally, offsite data housed by third parties or at the home of a custodian present unique challenges. Therefore, identifying the location and type of information to be collected is one of the most important steps to control costs in the electronic-discovery process.
Doing keyword searching is truly an art form if done properly. It is recommended that an electronic evidence expert (with court experience) work with the legal team very closely throughout this process. This process is designed to assist the client design the most efficient search methodology in order to minimize the producible records (email, attachments, and user files) in the native format.
Minimizing the documents significantly reduces the cost of generating a reviewing database (concordance, summation, online repository), which is generally priced on the gigabyte input as well as the per page output. Also, by focusing on case specific search requirements, documents not relevant to the discovery request will not be included in the privilege review process increasing the review time and minimizing non-responsive records documentation. (Ritter, Jeffrey., Worstell, Karen., Evaluating the Electronic Discovery Capabilities of Outside Counsel: A Model RFI. BNA Publishing 2006)
The use of an expert in this process allows the producing party to have a third party resource able to create affidavits and protocols justifying the methodology.
This also supports the use of statistical sampling. If, for example, one of the requested keywords is “cat”, but “cat” results in 1,000,000 documents that upon closer inspection appear to have no relevancy. The legal team now can try to different variations of keyword strings, such as (cat W/10 dog) which would limit the resulting documents contextually down to a reasonable 10,000 documents.
The general best practices process to vet a keyword list is:
1. Client provides first iteration of keyword requests (“Request List”) in Microsoft Word format;
2. Expert returns red-line version of Request List and discusses the logic requirements with the Client;
3. Client returns comments in red-line format;
4. Expert runs test and provides statistics on results;
5. Client signs off (via email) on final Request List;
6. Processing begins.
This process can be included with the use of contextual searches as well in order to reduce the reviewable population.
There are many different types of search engines out there that are used with varying degrees. Boolean searching is the most common way of querying data. Boolean logic refers to the logical relationship among search terms, and is named for the mathematician George Boole.
The most typical Boolean search operators are:
AND = cat AND dog = the document must have both "cat" and "dog"
OR = cat OR dog = the document can have either word or both words
““= "catalog" = the exact word "catalog" must be in the document. Another example is “cat a log” = the exact term “cat a log” must be in the document, including the spacing.
NOT = cat NOT dog = the document must have "cat", but if the word "dog" is in the document, it will not be a responsive document
( ) = (cat AND dog) NOT (bird OR mouse) = grouping of words = the document must have both words "cat" and "dog", but if the words "bird" or "mouse" are in the document, it will not be responsive.
Wildcard = * = cat* = the document must have a word(s) that start with “cat”, but can have any ending – i.e. “cat”, “catalog”, “cats”, “catastrophe”, etc.
W/ = proximity search = the combination of words need to be within a specific number of words of each other - i.e. (cat W/5 dog) = cat needs to be within 5 words of dog, either before or after = “the cat ate the dog” or “the dog crept up on the cat”.
Other Search Options
The Stemming process compares the root forms of a search words and returns documents that contain works that derive from a common stem. For example, if a STEM search was applied to the word “instructional”, documents containing the word, “instruct”, “instructs”, and “instruction”. Stemming is significantly different from using a wildcard (*) search since the wildcard search would be based on a group of characters versus the linguistic analysis done in a STEM search.
Fuzzy searching allows users to find documents, even if the word being searched is misspelled. A fuzzy search is done by means of fuzzy matching software that returns a list of results based on likely relevance even though the search string doesn’t exactly match. Fuzzy searches generally can be fine tuned and ranked depending on the search engine being used. Fuzzy searches are no stranger to those that have used Optical Character Recognition (“OCR”) from paper documents.
Depending on the search engine being used, Noise Words may become an issue. Noise Words are certain common words that are ignored by indexing or search engines. The most typical Noise Words, but not limited to, are: “the”, “and”, “of”, “his”, “my”, “when”, “there”, “is”, “are”, “or”, and “it”. The search strategy must take into account the Noise Words prior to finalization of the term list, especially if the opposing counsel is involved in the process. It may be a very difficult to re-negotiate keywords based on a perceived technological deficiency. Each vendor and software package has different methodologies for dealing with Noise Words. It is important to have the discussion related to Noise Words prior to creating an index of the documents’ words.
Asian (or Double-Byte) character searches are another challenging piece of the keyword search puzzle. In general it is imperative that the environment be setup properly to maximize the searching of international Unicode information. Asian and foreign language projects are generally handled by specialists due to the complexity of the process. Most vendors can index MS Outlook mail as well as the typical user files types (i.e. Office, HTML, PDF, etc.). Once the data has been indexed by the appropriate search engine, the provided keywords can be applied. The most common litigation support output for Asian language cases is to create a load file with the associated TIFF/PDF and meta-data. Depending on the data set, vendors may be able to provide the OCR text of the English characters and maybe the Asian characters.
If necessary, Summary Translation services can be provided. This is the process where native speakers review the document and code information, including a summary of the document. This is generally a fairly expensive process, but works well when the language can't be indexed or searched via the normal lit support review tools.
As for keywords, the search terms should be provided in exactly the form as needed searched in a Microsoft Word document (Word supports Unicode characters that we can export to the search tool). One of the challenges dealing with Asian languages is that spaces are not used many between words, so words will run together. That makes doing complex searches such as "exact phrase" or "cat AND dog" very difficult. It is a general recommendation, depending on the search criteria, to create single keywords that we can put wildcards around, such as "*Johnny*", so that iterations like "johnnysmith" or "johnnysaidhelikestosearchdata" will be found. Search experts will work with the legal team and assist refining the words once data is indexed. (Discovery Resources: Case Law)
Old Technology/Legacy Systems and Databases
Another critical factor affecting the cost of a collection effort is the currency of the technology involved. Many cases involve legacy databases that use outdated or obsolete technology, including outdated operating systems and hardware. When this is the case unique, customized solutions are often required to collect potentially relevant data. This normally requires extensive reliance on the company’s IT staff that would typically have access to the legacy operating systems and hardware.
Even when legacy data can be read and used, a database by definition contains a large quantity of data that is typically unformatted and becomes useful only when it is put into a report. Databases contain entries and complex table structures appear nonsensical if just providing the raw data. Usually what are relevant are the reports and queries that are populated by the information in the databases. Many database systems do not permit the creation of customized reports containing the information in the form that is deemed potentially relevant in the litigation. Therefore, a third party is often needed to write customized software to extract data from various locations in the database and create a formatted document that can be reviewed.
Because of the costs involved in a customized approach to databases, an organization may be tempted to make portions of a database available to the opposing party so that it will be required to pay the costs of securing the data that it feels may be potentially relevant in the litigation. Great caution should be taken in agreeing to turn over non-reviewed databases to opposing counsel. Whatever cost may be saved with this approach could be overwhelmed by the ultimate outcome of the case. (Applied Discovery: Case Summaries)
When dealing with older technology, legacy systems or databases, it is important to identify the complexity of the collection and production of that information as early as possible. It may be required to secure outside experts who can provide testimony regarding the complexity, timeliness or cost of securing legacy information. If the parties to the case cannot agree on a reasonable approach to this problem, a court may need to render judgment on the appropriateness and limits of collecting and producing legacy data. The courts are split over granting access to the producing party’s databases for the requesting party to run searches. See, for example, In re Honeywell Int’l, Inc. Securities Litigation, 2003 WL 22722961 (S.D.N.Y.) (for providing access); In re Ford Motor Company, 345 F.3d 1315 (11th Cir. 2003) (against providing access).
Spoliation Risk Mitigation by applying ASCLD-LABS International Standards
In closing, although each litigation is going to be different, the process of collecting, preserving and presenting information in a forensically sound manner can be best achieved by following tried and true standards. The standards developed by ASCLD-LABS International have withstood the scrutiny of the criminal justice system for several decades. Given their test of time, these standards can effectively serve as a guide for corporations seeking to mitigate risks associated with eDiscovery.
1. Nelson Sharon., Olson, Bruce., Simek John., The Electronic Evidence & Discovery Handbook. American Bar Association, 2006.
2. ASCLD/LAB-International Supplemental Requirements
3. ASCLD/LAB-International Field Assessment Guide
4. ASCLD/LAB-International Conformance File
5. ASCLD/LAB-International Program Overview Document
6. ISO/IEC 17025:2005 General requirements for the competence of testing and calibration laboratories
7. Applied Discovery: Case Summaries
8. Cricket Technologies: Case Index
9. Discovery Resources: Case Law
10. Federal Judicial Center: Materials on Electronic Discovery
11. Kroll Ontrack: Case Law List
12. Michael Arkfeld: Electronic Discovery and Evidence
13. Pike & Fischer: Digital Discovery & e-Evidence
14. Preston Gates & Ellis LLP: Electronic Discovery Law
15. The Sedona Conference: Electronic Document Retention and Production
16. The Sedona Conference: International Electronic Information Management, Discovery and Disclosure
17. Xact: Case Law
18. Ritter, Jeffrey., Worstell, Karen., Evaluating the Electronic Discovery Capabilities of Outside Counsel: A Model RFI. BNA Publishing, 2006
19. EDRM: The Electronic Discovery Reference Model