Michael E. Byczek
Software Engineer

Discovery of Electronically Stored Information (ESI)

Parties to litigation acquire information about the case through a process called discovery, in which specific documents or data is requested pursuant to rules of court procedure. When the requested material is electronically stored information (ESI), the process is referred to as e-discovery. ESI includes resources such as e-mail, ISP records, hard drives, word processor files, databases, cell phones, removable USB drives, video, photographs, voice recordings, and access logs.

Attorneys and courts rely upon technical consultants to serve as forensic examiners (i.e. retrieve deleted files from a hard drive), I.T. analysts (i.e. identify key components of an opponent’s technological infrastructure and data formats), and review specialists (i.e. develop search strategies that scan millions of files and corresponding metadata to locate the “smoking gun” that will win a lawsuit).

The e-discovery process is comprised of nine stages, as outlined by The Electronic Discovery Reference Model.

Information Management: Corporate e-retention policies anticipate litigation by mitigating risks and expenses of having to comply with court ordered discovery.

Identification: Potential sources of ESI are located and described.

Preservation: Prevent inappropriate modification or deletion of data during the discovery process without interfering with business or personal activities.

Collection: Acquire ESI for review while maintaining data integrity and original formats.

Processing: Filter through ESI to reduce the volume of data in accordance with attorney instructions and review requirements.

Review: Evaluate data for relevance and privilege.

Analysis: Search for key patterns, topics, individuals, content, and context.

Production: Deliver ESI to required parties in requested formats.

Presentation: Display results in native or near-native form at trial, court hearings, or depositions.

The Sedona Conference Working Group Series is another source of model e-discovery & e-retention policies used to solve the challenges posed by ESI during litigation.

eDiscovery Software Tools

Relativity (kCura)

Designed from a data analysis point-of-view to reduce risks and help control costs associated with identifying, collecting, and analyzing electronic data. This includes litigation, internal investigations, and government requests (FOIA). One issue is how to restrict access to personally identifiable information (PII) during litigation and compliance requests.

Six steps are outlined: legal hold; collection; processing; review and analysis; production; and case construction. An investigative approach includes looking at data in layers, uncovering data patterns, and taking immediate action. Relativity is based on a NoSQL data store to start searching while data is still being loaded into the system. Quickly and remotely view folder structure, file names, and system metadata on a custodian machine. Collection can take place remotely via email, within Relativity on a machine, or with a USB drive.

Features: full metadata and container extraction; domain parsing; native application imaging; live details about files being processed; computer-assisted review; email threading; data is group by concept; create and automate custom workflows; and assess/collect custodian data with minimal disruption to client. A chronological timeline shows how the facts and evidence are connected as documents are located. These software-based visualization tools provide the ability to spot trends, identify gaps in analysis, and understand complex decisions.

Concordance Desktop (LexisNexis)

Designed to manage an entire litigation portfolio to collect, sort, organize, search, review, redact, index, and prepare all eDiscovery. The import engine allows work to be performed while data is still being imported. Remote connectivity features database access to co-counsel, expert witnesses, and others without their own license to access the software.

Features: manage and review litigation documents (i.e. email, PDFs, and scans); foreign language; searches; optical character recognition to scan text documents; and share data with other applications.

EnCase (Guidance Software)

Designed to give legal and IT teams technology-based solutions to avoid reinventing the wheel for every notice of litigation. Estimate costs and scope of potential litigation through continuous data assessment before any data is actually collected. Collect data from on-site and the cloud from email servers and document repositories.

Features used for legal hold issues include: notify custodians of being potential parties; reduce risk of spoliation; avoid missing production deadlines; automate legal notices and manager escalations; and meet preservation obligations.

Prior to litigation: identify potential custodians and data sources; understand proportionality of production requests; find out how much data needs to be collected; and obtain insight within minutes.

Avoid preservation sanctions through a strict chain of custody and forensically sound manner. Collect data from Amazon S3, Microsoft Office 365, Google Drive, Box, and Dropbox.

Data is processed with the ability to programmatically eliminate 90% of non-responsive data, remove redundancies, and distribute workloads across several dozen machines.

Ringtail (FTI Technology)

Designed to provide a visual approach to document review and predictive coding to handle any size of a project. Built as a case management system to improve productivity and the creation, configuration, and tracking of all eDiscovery projects.

Visual analytics incorporate pivot tables and data mining to determine what is important, visualize trends, summarize data, see multiple decision points, and identify key facts. Predictive coding is used to improve speed, quality, and consistency of review projects. Multidimensional metadata analysis identifies the intersection of data to prioritize review. Features include: keyword search across millions of documents, attachments, metadata, and coding; concept clustering; visual review; export transcripts to Microsoft Word and Excel; and avoid near misses while searching for key or hot documents.

Eliminate import and export tasks using a single database for easy administration, tracking, and reporting for machine learning capabilities. The one-click validation report promotes the sharing of all coding results, including recall and precision statistics, to document and explain the predictive coding process. Predictive coding eliminates complex mathematics to answer basic legal questions, such as how many documents are needed for a particular project. New enhancements include populations and samples listed in a collapsible tree grid; populations are created from binders and saved searches; and populations used to perform searches.

Hewlett Packard eDiscovery

Designed to cover the entire electronic discovery reference model (EDRM) to avoid switching between different applications at difference stages. Provides features for data processing, ECA, clustering, and visual analytics. Apply policy-driven information governance to know exactly what content is available and whether it is being retained for the right amount of time and for the right reasons.

Predictive technology to expedite analysis and accuracy of review. Analytical and early case assessment strategies to focus on relevant documents. Access all data across hundreds of file formats and hundreds of languages.

HP Legal Hold is a separate software package to manage case management, custodian lists, send/track legal hold notices, interviews, data steward actions, automated collections, and applying in-place holds on data.


First eDiscovery software offered by Microsoft in the Azure Marketplace to identify, preserve, process, analyze, and review data stored on the Azure platform and Office 365. Installation in Azure includes connectivity to several data sources. The Azure cloud offers the ability to add CPUs, RAM, and hard disk space to existing notes in the ZyLAB environment to accommodate large cases or scale-back for small projects.

Main  |   Languages  |   Databases  |   Software Engineering  |   Python  |   Data Science  |   Spreadsheets  |   Algorithms  |   Cloud Platforms  |   Big Data  |   Operating Systems  |   Cyber Security  |   eDiscovery  |   Legal Software