The Internet is open network and provides diffusion network at the same time. Thus, its architecture enables easy malware dissemination and contamination. In this malware propagation, prior studies have focused on malware detection and protection via Web and email. In addition, they have also centralized on social network studies such as Facebook or Twitter, diffusing malware extensively. For instance, static/dynamic malware detection, and exploit kits detection via Web and Email. They have also proposed the analysis of various intercorrelation between follower-followee in malware delivery behaviors through online social networks. However, these approaches are not enough to explain malware distribution networks. The goal of this study is to find high-degree nodes, performing core roles in malware spreading on malware distribution networks, including the fundamental understandings of malware distribution networks. To do so, we performed following studies.
First, we need a malware detection-collection system to gather data generated by a web malware attack. This system should collect all types of malicious URLs participated in malware distribution. Second, we should understand the properties of malicious URLs. Attackers utilize each different role of malicious URLs to distribute malware. For instance, attackers differently construct the geolocation of landing sites that Internet users access and exploit sites that attacks are launched. The difference of this geolocation makes defenders difficulties in searching core nodes and blocking them. That is, attackers manipulate geolocation differently, and make a difficulty in responses. Hence, defenders need to search other significant nodes in the control-possible location, which affects in the block of core nodes. Third, to find significant nodes, we need comprehensive understandings relevance to malware distribution networks. That is, we need to understand the properties that can distinguish malicious networks from benign networks, and various nodes (i.e., malicious URLs, malware, C2 servers) used in malware distribution networks. In addition, we should understand a true that the used nodes have each different risk degree. For instance, the contamination rate of malware that is propagated via Naver with a high Alexa rank definitely differs with that of website low ranked. Thus, we need to reasonably assigns a magnitude of risk to each node engaging in a malware distribution. Lastly, we built new risk assessment methodologies to observe the traits of malware distribution networks, and to estimates the network's overall risk. With these approaches, we list malware distribution networks according to risk magnitude. In particular, this model helps find significant nodes that largely influence malware diffusion within enumerated malware distribution networks; this provides a solution to prohibit malware proliferation. In this dissertation, we provide comprehensive insights into malware distribution networks.