As an active digital forensic practitioner for over 10 years, I have attended many training offerings from many different companies/resources, read many white papers published by any number of scientific and academic entities and worked hundreds of active cases for plaintiffs, defendants and in law enforcement covering PC, Mac and mobile device forensics. One aspect that crosses all of these areas that has waned slightly in the last few years, but still rears its ugly head, are the theoretical questions surrounding digital forensics. Among these we have all heard at one point or another — hash collisions, data cross-contamination and reverse-engineering of hash values to be made into a viewable data file. While we can Google these theories and findings to death, their practical application in “everyday forensics” is reality-based, not theoretical.
The topic of hash collisions generally comes up when working independent analysis in criminal defense cases. This digital version of the “some other dude did it” (or SODDI) defense is based upon the theory that two digital files containing completely different data can be run through a hashing algorithm and obtain the same result. Hash calculation is a big part of forensics and particularly in cases dealing with child exploitation images, the hash value is used to locate those sharing illicit images on the peer-to-peer file-sharing networks. However, we also use hash values to validate evidence files as identical to the original, to cancel out any irrelevant/system files and to validate the authenticity of files across a system or multiple pieces of evidence. Hashing algorithms such as MD5 and SHA-1 have been “broken” for years, but are still in ubiquitous use in digital forensics. Why? Because the practical application of these collisions is so minimal, it is not even worth mentioning in a court of law. But rest assured, it still gets mentioned! The only real application these collisions have is to attempt to obfuscate the facts and/or confuse the finder of fact in a legal proceeding. Simply put, there are no documented cases where someone accused of downloading or sharing illicit images was falsely accused because the images they downloaded/shared possessed the same hash value as some innocuous files they were attempting to download/share. Consider the statistical likelihood that someone downloaded/shared an innocuous file which happened to share the same hash value as an illicit file and also was on a police watch list where a search warrant was executed. All of those factors being in place at once is very unlikely.
While we are constantly testing, honing and refining our knowledge in the field of digital forensics and we may even work in a “lab”, the fact remains that at a practical level, none of us have the ability to re-create these collisions, nor have we seen them “in the wild”, so to speak. They are reserved for a theoretical lab environment where the sole purpose is to find and publish the collision, not to find and report the truth in the evidence.
Before I discuss the practicality of data cross-contamination, I’ll insert a disclaimer that I understand that using sterilized media to store forensic data and conduct analysis is mentioned as potential best practices, as detailed in the Scientific Working Group on Digital Evidence (SWDGE) Best Practices for Computer Forensic Acquisitions (v. 1.0). One of the reasons for this to avoid data cross-contamination. What is that? It is a theory that if you have a piece of media upon which you store data to be analyzed in a forensically-sound environment, that if you do not sterilize the media (i.e., wipe and validate prior to placing the data to be analyzed on the media) that some data from a previous or unrelated case could become part of the current case analysis data, thus potentially contaminating the results with un-related data. This is a viable theory when dealing with physical evidence such as DNA samples or fingerprints, but it has very little, if any practical application in digital forensics. Consider that if you create a forensic data file such as an .e01, raw or .zip file, what is the method and/or likelihood that copying that file onto a piece of non-sterilized media will somehow mix or comingle with pre-existing data? I’ve heard one claim of data cross-contamination from another examiner, but anecdotes are not data, nor was the claim ever validated. We sterilize the media, not because we’ve ever seen it affect any cases, but to avoid questions about it when testifying.
Hash Value Reverse-Engineering
Having obtained much of my initial training in law enforcement and, as such, working a majority of cases involving illicit images, I can recall being trained that catalogs of illicit image hash values are law enforcement sensitive and not to be disseminated to independent examiners or to the general public. Why? Because someone could potentially and theoretically reverse-engineer the hash value to re-create the file, which would be illegal. This came up again in a case worked independently in 2019. I thought this theory and explanation was long gone, but it is not.
The problem with the theory of reverse-engineering a hash value is I’m not sure it’s ever been done, at least not at a practical level. It is a theory. Scientists, academics and lab-rats may have done it, but I don’t know anyone who actively practices digital forensics that either 1) has the knowledge, skills and abilities to do it and/or 2) has the desire to do it. So why is it still mentioned as a consideration in cases? (Hint: see the above note about obfuscation and confusion).
Wrapping It Up
I’m not an academic or a lab-rat. I’m just an old(ish) retired investigator with some skillsets that can often be of benefit to parties involved in litigation. Because of that, I’m concerned with the practicality of digital forensics – What is the best way to get the case analyzed? What evidence is relevant? Where do I need to look for the evidence? What am I missing that could potentially answer important questions? Theoretical considerations like those mentioned here are not worthy of much calorie-burning when trying to answer these questions. In the pragmatic world of digital forensics, we have to consider what is, not what could be. Because the truth lies in the facts of the case and the data which is part of the case, not on theory of what could or may have happened… And likely did not!
Patrick J. Siewert
Professional Digital Forensic Consulting, LLC
Virginia DCJS #11-14869
Based in Richmond, Virginia
Available Wherever You Need Us!
We Find the Truth for a Living!
Computer Forensics — Mobile Forensics — Specialized Investigation
About the Author:
Patrick Siewert is the Principal Consultant of Pro Digital Forensic Consulting, based in Richmond, Virginia. In 15 years of law enforcement, he investigated hundreds of high-tech crimes, incorporating digital forensics into the investigations, and was responsible for investigating some of the highest jury and plea bargain child exploitation investigations in Virginia court history. Patrick is a graduate of SCERS, BCERT, the Reid School of Interview & Interrogation and multiple online investigation schools (among others). He is a Cellebrite Certified Operator and Physical Analyst as well as certified in cellular call detail analysis and mapping. He continues to hone his digital forensic expertise in the private sector while growing his consulting & investigation business marketed toward litigators, professional investigators and corporations, while keeping in touch with the public safety community as a Law Enforcement Instructor.
Linked In: https://www.linkedin.com/company/professional-digital-forensic-consulting-llc