Now Mirai Has DGA Feature Built in
Share this

Now Mirai Has DGA Feature Built in

Update History

  • 2016-12-09 first version
  • 2016-12-12 fig-0 update, fix a TLD choosing error in our DGA implement


Nearly 2 weeks ago, 2 new infection vectors (aka TCP ports of 7547 and 5555) were found being used to spread MIRAI malwares

My colleague Genshen quickly set up some honeypots for that sort of vectors and soon had his harvests: 11 samples were captured on Nov 28th. Twill now 53 unique samples have been captured by our honeypots from 6 hosting servers.

When analyzing one of the new samples, my colleague Wenji found some DGA like code and doubted there was DGA feature there. The doubt was soon verified by evidences collected from our sandboxes. Detailed RE work shows there does exist a DGA feature in the newly distributed MIRAI samples spread through TCP ports 7547 and 5555. In this blog I would like to introduce our findings. For a quick information, the attributes of the found DGA are summarized as follow:

  1. 3 TLDs are used: online/tech/support.
  2. the L2 domain has a fixed length of 12-byte, with each char randomly chosen from ‘a’~’z’ ‘a’~'y'.
  3. the generated domain is only determined by month, day and hardcoded seed string.
    the generated domain is determined by year, month, day and hardcoded seed string.
  4. only one domain is generated in one single day, so the maxium DGA domain number is 365.
  5. the DGA domains are only used when the hardcoded C2 domains fail to resolve.

With the learned knowledge, we re-implemented the DGA in our program, and used it to predict all 365 possible DGA domains. When looking up their registration information, we found some of them have been registered by the MIRAI author. They are:

Fig-0, registered DGA domains

And it is worth noticing that the author has already registered other mirai C2 domain:

Sample and Analysis

The sample used as illustration in this blog is as follows:

  • MD5: bf136fb3b350a96fd1003b8557bb758a
  • SHA256: 971156ec3dca4fa5c53723863966ed165d546a184f3c8ded008b029fd59d6a5a
  • File type: ELF 32-bit LSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped

The sample is stripped but not packed. According to the experience learned from previously found samples, we soon identified its main modules. The code comparison showed that its resolv_cnc_addr function has a very different CFG (control flow graph) from the previously found samples. The new version of CFG is shown Fig-1.

Fig-1, resolv_cnc_addr CFG

At the function beginning, since there are as much as 3 C2 controllers are hardcoded in the sample, a random number is generated to randomly select a C2 server from the first and second ones, as shown in Fig-2.

Fig-2, resolv_cnc_addr block 1

If the selected C2 domain fails to resolve, the bot will neither resolve the unselected nor the 3rd one, but will take a judge to decide whether to take the DGA branch or to resolve the 3rd C2 domain according to current date, as shown in Fig-3.

Fig-3, DGA determination

From the code snippets we can see that if current date is between Nov 1st and Dec 3rd, the 3rd CNC domain will be used. Otherwise the DGA branch will be executed. It indicates that the author doesn’t want their DGA domains being used before Dec 4th, which is verified by the fact that the firstly registered MIRAI DGA domain just corresponds to Dec 4th.

The DGA main funcition is named dga_gen_domain. The domain is generated based on a seed number and current date. The seed is converted from a hardcoded hex-format string by calling strtol(). It seems a wrong string of “\x90\x91\x80\x90\x90\x91\x80\x90” was configured, which leads to the strtol() always returning 0.
The local date is got by calling C library functions of time() and localtime(). Only month and day are used here, as shown in Fig-4.

Fig-4, dga_gen_domain entry

The L2 domain is generated by repeatedly executing the code block shown in Fig-5. Its length is determined by $t5 and $t2. They are set in Fig-4, from which we can tell that the L2 domain length is 12.

Fig-5, L2 domain generation loop

The TLD is determined by the residual value in register $S0 as shown in Fig-6. We can see that 3 TLDs are used here.

Fig-6, TLD determination


Currently the DGA feature is found in the following samples.

  • 005241cf76d31673a752a76bb0ba7118
  • 05891dbabc42a36f33c30535f0931555
  • 0eb51d584712485300ad8e8126773941
  • 15b35cfff4129b26c0f07bd4be462ba0
  • 2da64ae2f8b1e8b75063760abfc94ecf
  • 41ba9f3d13ce33526da52407e2f0589d
  • 4a8145ae760385c1c000113a9ea00a3a
  • 551380681560849cee3de36329ba4ed3
  • 72bbfc1ff6621a278e16cfc91906109f
  • 73f4312cc6f5067e505bc54c3b02b569
  • 7d490eedc5b46aff00ffaaec7004e2a8
  • 863dcf82883c885b0686dce747dcf502
  • bf136fb3b350a96fd1003b8557bb758a
  • bf650d39eb603d92973052ca80a4fdda
  • d89b1be09de36e326611a2abbedb8751
  • dbd92b08cbff8455ff76c453ff704dc6
  • eba670256b816e2d11f107f629d08494

They all share the same DGA in terms of seed string and algorithm.
The hardcoded C2 domains in the samples are as follow:


We will keep an eye on the progress of this DGA variant, stay tuned for future update.