The internet is flooded with malicious and undesirable content that can come in various forms, such as drive-by-downloads and phishing attacks, and lead to information theft and monetary losses. Many security systems, from the internet service provider to the browser itself, act to defend the user from such content. However, most systems have at least one of these three major limitations: (1) they are not personalized and do not account for the differences between users, (2) their defense mechanism is reactive and not proactive, making it unable to predict upcoming attacks, and (3) they track and use the user’s activity to a large extent, thereby invading her privacy in the process. We utilize browsing data of over 20,000 users and create a behavioral network-based framework to predict upcoming internet infections. Our framework only analyzes the users’ previous infections, and disregards their general browsing habits. The architecture of the framework accounts for three factors – the user’s personal infection history, her similarity to other users based on their previous infections in a conceptual network, and the way this network evolves over time. We use these components with machine learning algorithms to assess each user’s personal level of risk. We thus succeed in achieving accurate results on the infection-prone portion of the population, surpassing the existing method, and doing so with substantially less privacy invasion.
A depiction of the conceptual user network by the end of the first two months. Each node represents a user, and users are connected by an edge if they were previously infected in the same domain. Red nodes are ones that became infected during the third month; it can be seen that nodes that belonged to denser and more central areas of the network in the first and second months had a higher risk of future infections.