Information Security News
Any engineer or physisist will tell you that Entropy is like Gravity - theres no fighting it, its the law! However, they can both be used to advantage in lots of situations.
In the IT industry, a files entropy refers to a specific measure of randomness called Shannon Entropy, named for Claude Shannon. This value is essentially a measure of the predictability of any specific character in the file, based on preceding characters (full details and math here: http://rosettacode.org/wiki/Entropy). In other words, its a measure of the randomness of the data in a file - measured in a scale of 1 to 8 (8 bits in a byte), where typical text files will have a low value, and encrypted or compressed files will have a high measure.
How can we use this concept? On its own, its not all that useful, when you consider that many data filetypes (MS Office for one) are highly compressed, and so already have a high entropy value. So using just entropy, theres no telling a good MS Office file from one encrypted by ransomware.
However, most data files have a specific file header, which includes a set of identification bytes (called magic bytes) that identify what the data file is. For instance, those bytes for PKZIP files are PK, and PE32 executable files use MZ. You can identify these files by using the files program. While this command is native to Linux, there are nice ports of this for Windows.
If files cant identify a file type, we can then use the entropy value as a second check. If a file is encrypted, it will have a higher entropy value than one that isnt. In this case you are looking for a file where the character distribution is random, or at least much more random than a normal data file.
You need both of these checks to identify suspect files - as mentioned todays complex data files are becoming much more compressed - Office files are actually PKZIP compressed these days, so will identify as PKZIP when checked with files.
Using these two checks to identify suspect files depends on a couple of things:
Using these two checks, I started with a simple powershell script that copies a subdirectory from one location to another, but leaves all of the suspected infected files behind. A log file is created that lists all files copied and all files that are suspect and should be looked at. I used Sigcheck (from sysinternals) to compute the entropy value. If the entropy is above 6 (8 is the maximum) and the magic bytes are unknown, I flag the file as suspect.
And for a test ransomed file, Im using Didier Stevens sample file: ransomed.bin (see the bottom of this story for links).
I used Powershell because most Ransomware affected shops that Ive worked with have been infected in MS-Office files and other Windows data files, so Powershell seemed simpler than adding install python into a customers crash IR situation. You could certainly take this same logic and code an equivalent python script that would cover Windows, Linux and OSX.
After finishing my replication script and taking a step back, I realized a few things:
So I re-wrote the script to be more of a single-minded. The final script simply lists suspect files rather than copies them. And I wrote a short C program to compute the entropy value (thanks to Rosettacode for the starting point on this!), which simply spits out the numeric value, rather than lots of other stats I dont need for this job.
The final output list of suspect files can be used in a few ways:
int makehist(FILE *fh,int *hist,int len)
/* define a reasonable buffer to read the file - 1 byte at a time is too slow */
buflen = fread(
if ((fh = fopen(argv,rb)) == NULL )
printf(Error opening file %s\n
//hist now has no order (known to the program) but that doesnt matter
From this script, you can see that cleaning ramsomware infected files isnt an insurmountable problem. A simple script like this feeding rsync can be used to create a clean copy of a datastore, and identify suspect files. Just be SURE to keep up with evaluating that suspect file list - as noted, depending on your data store there might be lots of clean files in that list at the moment (give me a few weeks to improve this). If you run the script as-is to blindly feed rsync, you wont have a complete copy of your datastore.
As always, this was a 1 evening coding effort, so Im sure that there is more elegant Powershell syntax for one thing or another. Also, youll see that my C code chooses readability over efficiency in a few spots. For either piece of code, if you find any errors, or if you identify better syntax to get the job done, please do use our comment form and let me know. Of more interest, if you find this code useful in your environment and you want to see a version 2 - let me know in the comments also!
As I work on this code, youll find the most up-to-date version at: https://github.com/robvandenbrink/Ransomware-Scan-and-Replicate
Didiers diaries on Ransomware and Entropy can be found here:
Earlier today, I posted a diary protesting an overall trend of calling ransomware infections ransomware attacks . Unfortunately, that previous diary didnt include information on attacks that actually have involved ransomware.
Some tweets about my original write-up got me thinking about it some more..." />
Shown above:" />
Shown above: Commenting on the first diary, @fwosar discusses RDP attacks.
ng>Distribution: both large-scale and targeted
As previously stated, I frequently find ransomware during daily investigations of exploit kit (EK) traffic and malicious spam (malspam) campaigns. However, my visibility is limited. I rarely, if ever, run across activity I consider a targeted attack. That field of view doesnt include ransomware infections seen after brute force attacks using Microsofts Remote Desktop Protocol (RDP). Examples of brute force RDP attacks resulting in ransomware infections have been published as recently as May [2, 3] and June 2016 .
Other sources have reported targeted attacks involving ransomware known as Samas, SamSa, or SamSam [5, 6, 7, 8, 9 to name a few]. Most of these write-ups say organizations in the health industry (as well as other industries) have been targeted. These reports document a trend where an attacker first gains unauthorized access to an organization" />
Shown above: Diagram of a Samas infection chain from the Microsoft report .
Thats certainly an attack.
Id be crazy not to include this information when discussing my disdain for the term ransomware attack. And its something I foolishly omitted in my previous diary on the subject. Ransomware is, indeed, distributed in both large-scale and targeted campaigns.
Large-scale does not equal targeted
Most reports of ransomware infections, especially in the health care industry, imply some sort of targeted attack. But thats not always the case.
For example, in March 2016 we saw reports that a Kentucky-based Methodist Hospital was infected with Locky ransomware through malspam. The malspam contained a Word document with malicious macros masquerading as an invoice . The press played it up as an attack, but malspam is a common tactic of large-scale campaigns distributing Locky, where some messages occasionally slip through spam filters. Even Krebs called it an opportunistic attack when reporting on the incident .
However, opportunistic is not targeted.
In March 2016, Wired published an in-depth write-up on why hospitals are perfect targets for ransomware . In that article, the author discusses Methodist Hospital and other Locky incidents while including targeted attacks by criminals spreading Samsa ransomware. Although the author notes Locky involves spray-and-pray phishing campaigns involving mass emails, this method is still described as a Locky attack.
Wireds article is well-written and worth a read. It includes plenty of detail on the reasons why health care organizations are at risk. But readers who only skim the article will miss some key points, and they could easily confuse large-scale Locky distribution with a targeted attack. In cases like this, I think authors should use Locky campaign instead of Locky attack.
Even considering targeted attacks involving ransomware, I still feel were putting too much emphasis on the attackers and not enough focus on fixing our own vulnerabilities.
Furthermore, I believe media reporting leads some people to confuse large-scale ransomware campaigns with targeted attacks.
The number of ransomware samples found in large-scale campaigns far outweighs the number of ransomware samples reported from targeted attacks. I still believe that, odds are, any given ransomware attack is probably the result of a large-scale campaign.
Id rather see people use ransomwareincident instead of ransomwareattack.
brad [at] malware-traffic-analysis.net