Now Mirai Has DGA Feature Built in
- 2016-12-09 first version
- 2016-12-12 fig-0 update, fix a TLD choosing error in our DGA implement
Nearly 2 weeks ago, 2 new infection vectors (aka TCP ports of 7547 and 5555) were found being used to spread MIRAI malwares
My colleague Genshen quickly set up some honeypots for that sort of vectors and soon had his harvests: 11 samples were captured on Nov 28th. Twill now 53 unique samples have been captured by our honeypots from 6 hosting servers.
When analyzing one of the new samples, my colleague Wenji found some DGA like code and doubted there was DGA feature there. The doubt was soon verified by evidences collected from our sandboxes. Detailed RE work shows there does exist a DGA feature in the newly distributed MIRAI samples spread through TCP ports 7547 and 5555. In this blog I would like to introduce our findings. For a quick information, the attributes of the found DGA are summarized as follow:
- 3 TLDs are used: online/tech/support.
- the L2 domain has a fixed length of 12-byte, with each char randomly chosen from ‘a’~’z’ ‘a’~'y'.
- the generated domain is only determined by month, day and hardcoded seed string.
the generated domain is determined by
year, month, day and hardcoded seed string.
- only one domain is generated in one single day, so the maxium DGA domain number is 365.
- the DGA domains are only used when the hardcoded C2 domains fail to resolve.
With the learned knowledge, we re-implemented the DGA in our program, and used it to predict all 365 possible DGA domains. When looking up their registration information, we found some of them have been registered by the MIRAI author. They are:
And it is worth noticing that the author email@example.com has already registered other mirai C2 domain:
- zugzwang.me email firstname.lastname@example.org
Sample and Analysis
The sample used as illustration in this blog is as follows:
- MD5: bf136fb3b350a96fd1003b8557bb758a
- SHA256: 971156ec3dca4fa5c53723863966ed165d546a184f3c8ded008b029fd59d6a5a
- File type: ELF 32-bit LSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped
The sample is stripped but not packed. According to the experience learned from previously found samples, we soon identified its main modules. The code comparison showed that its resolv_cnc_addr function has a very different CFG (control flow graph) from the previously found samples. The new version of CFG is shown Fig-1.
At the function beginning, since there are as much as 3 C2 controllers are hardcoded in the sample, a random number is generated to randomly select a C2 server from the first and second ones, as shown in Fig-2.
If the selected C2 domain fails to resolve, the bot will neither resolve the unselected nor the 3rd one, but will take a judge to decide whether to take the DGA branch or to resolve the 3rd C2 domain according to current date, as shown in Fig-3.
From the code snippets we can see that if current date is between Nov 1st and Dec 3rd, the 3rd CNC domain will be used. Otherwise the DGA branch will be executed. It indicates that the author doesn’t want their DGA domains being used before Dec 4th, which is verified by the fact that the firstly registered MIRAI DGA domain just corresponds to Dec 4th.
The DGA main funcition is named dga_gen_domain. The domain is generated based on a seed number and current date. The seed is converted from a hardcoded hex-format string by calling strtol(). It seems a wrong string of “\x90\x91\x80\x90\x90\x91\x80\x90” was configured, which leads to the strtol() always returning 0.
The local date is got by calling C library functions of time() and localtime(). Only month and day are used here, as shown in Fig-4.
The L2 domain is generated by repeatedly executing the code block shown in Fig-5. Its length is determined by $t5 and $t2. They are set in Fig-4, from which we can tell that the L2 domain length is 12.
The TLD is determined by the residual value in register $S0 as shown in Fig-6. We can see that 3 TLDs are used here.
Currently the DGA feature is found in the following samples.
They all share the same DGA in terms of seed string and algorithm.
The hardcoded C2 domains in the samples are as follow:
We will keep an eye on the progress of this DGA variant, stay tuned for future update.