Automatic URL Signature Construction and Impact Assessment

  • Shota Fujii Hitachi, Ltd.
  • Nobutaka Kawaguchi Hitachi, Ltd.
  • Tomoya Suzuki Hitachi, Ltd.
  • Toshihiro Yamauchi Okayama University
Keywords: Malware, Malicious URL, Signature


In the more recent cyberattacks and malware, the servers of the attacker (e.g., C2 servers) play an important role. It is important to use network-based signatures to block malicious communications to reduce the impact. However, the signatures must not block harmless communications during normal business operations. Therefore, signature generation requires a high level of understanding of the business, and highly depends on individual skills. It is necessary to test and ensure that the generated signatures do not interfere with benign communications, which results in high operational costs. We propose SIGMA, a system that automatically generates signatures to block malicious communication without interfering with benign communication and then automatically evaluates the impact of the signatures. SIGMA automatically extracts the common parts of malware communication destinations by clustering them and generating multiple candidate signatures. Thereafter, it automatically calculates the impact on normal communication based on business logs, etc., and presents the final signature that has the highest blockability of malicious communication and non-blockability of normal communication to the analyst. We aim to reduce the human factor in generating the signatures, reduce the cost of the impact evaluation, and support the decision of whether to apply the signatures.

In our evaluation, we showed that SIGMA can automatically generate a set of signatures that detect 100% of suspicious URLs with an over-detection rate of just 0.87%, based on the results of 14,238 malware analyses and actual business logs. This result suggests that the cost of generating signatures and evaluating their impact on business operations can be reduced; these are time-consuming and human-intensive processes.


Shota Fujii, Nobutaka Kawaguchi, Shoya Kojima, Tomoya Suzuki, and Toshihiro Yamauchi. assessment. In 2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 95–100, 2022.

Unit42. Case Study: Emotet Thread Hijacking, an Email Attack Technique, 2020.

JPCERT/CC. Malware Used by Lazarus after Network Intrusion, 2020.

You Nakatsuru. Understanding Command and Control - An Anatomy of xxmm Communication -, 2019.

Roberto Perdisci, Wenke Lee, and Nick Feamster. Behavioral clustering of httpbased malware and signature generation using malicious network traces. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI’ 10, p. 26, 2010.

Serita Susumu, Fujii Yasuhiro, Kakuta Tomo, Michiori Yoshitake, Tomoya Ohtori, Kishiro Takeyasu, and Terada Masato. Automatic generation of url regular expression for detecting malicious traffic. In Computer Security Symposium 2014, CSS ’14, pp. 242–249, 2014 (in Japanese).

Terry Nelms, Roberto Perdisci, and Mustaque Ahamad. ExecScent: Mining for new C&C domains in live networks with adaptive control protocol templates. In 22nd USENIX Security Symposium, SEC’13, pp. 589–604, 2013.

JUNIPER NETWORKS. COVID-19 and FMLA Campaigns used to install to new IcedID banking malware, 2020.

SANS ISC InfoSec Forums. More TA551 (Shathak) Word docs push IcedID (Bokbot), 2020.

Squid: Optimising Web Delivery, 2022.

Yuma Kurogome, Yuto Otsuki, Yuhei Kawakoya, Makoto Iwamura, Syogo Hayashi, Tatsuya Mori, and Koushik Sen. Eiger: Automated ioc generation for accurate and interpretable endpoint malware detection. In Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC ’19, pp. 687–701, 2019.

VMware ESXi, 2022. html.

Enterprise Open Source and Linux — Ubuntu, 2022.

INetSim: Internet Services Simulation Suite, 2022.

Ce Li, Qiujian Lv, Ning Li, Yan Wang, Degang Sun, and Yuanyuan Qiao. A novel deep framework for dynamic malware detection based on api sequence intrinsic features. Computers & Security, Vol. 116, p. 102686, 2022.

Omid E. David and Nathan S. Netanyahu. Deepsign: Deep learning for automatic malware signature generation and classification. In 2015 International Joint Conference on Neural Networks, IJCNN ’15, pp. 1–8, 2015.

Mohannad Alhanahnah, Qicheng Lin, Qiben Yan, Ning Zhang, and Zhenxiang Chen. Efficient signature generation for classifying cross-architecture iot malware. In 2018 IEEE Conference on Communications and Network Security, CNS ’18, pp. 1–9, 2018.

Marcus Botacin, Marco Zanata Alves, Daniela Oliveira, and Andr´e Gr´egio. Heaven: A hardware-enhanced antivirus engine to accelerate real-time, signature-based malware detection. Expert Systems with Applications, Vol. 201, No. C, p. 117083, 2022.

Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. Kitsune: An ensemble of autoencoders for online network intrusion detection. In Network and Distributed System Security Symposium 2018, NDSS ’18, 2018.

Roberto Paleari, Lorenzo Martignoni, Emanuele Passerini, Drew Davidson, Matt Fredrikson, Jon Giffin, and Somesh Jha. Automatic generation of remediation procedures for malware infections. In 19th USENIX Security Symposium, SEC’10, 2010.

Dhilung Kirat and Giovanni Vigna. Malgene: Automatic extraction of malware analysis evasion signature. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pp. 769–780, 2015.

Yu Feng, Osbert Bastani, Ruben Martins, Isil Dillig, and Saswat Anand. Automated synthesis of semantic malware signatures using maximum satisfiability. In Network and Distributed System Security Symposium 2017, NDSS ’17, 2017.

Asaf Shabtai, Eitan Menahem, and Yuval Elovici. F-sign: Automatic, function-based signature generation for malware. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 41, No. 4, pp. 494–508, 2011.

M. Zubair Rafique and Juan Caballero. Firma: Malware clustering and network signature generation with mixed network behaviors. In Research in Attacks, Intrusions, and Defenses, RAID ’13, pp. 144–163, 2013.

Technical Papers (Information and Communication Technology)