In terms of URL-based features, some studies have classified malicious URLs into a group with the same attributes. However, the malicious URLs are of two different types, each of which produces entirely different results. Thus, depending on their intention, adversaries leave slightly different behavioral traces within the malicious URLs. This paper presents an in-depth empirical study conducted based on 1,529,433 malicious URLs collected over the past two years.
In particular, we analyze attackers' tactical behavior regarding URLs and extract common features. We then divide them into three different feature pools to determine the level of compromise of unknown URLs. To leverage detection rates, we employ a similarity matching technique. We believe that new URLs can be identified through attackers' habitual URL manipulation behaviors. This approach covers a large set of malicious URLs with small feature sets. The accuracy of the proposed approach (up to 70%) is reasonable and the approach requires only the attributes of URLs to be examined. This model can be utilized during preprocessing to determine whether input URLs are benign, and as a web filter or a risk-level scaler to estimate whether a URL is malicious.