Home / Blogs

Prediction Methods for Crime

There’s a new sheriff in town and he’s riding the horse of “predictive policing”. Back in July the Santa Cruz Police Department began deploying police officers to places where crime is likely to occur in the future—making use of new predictive modeling programs that are designed to provide daily forecasts of crime hotspots—thereby allowing the Department to preempt more serious crimes before they occurred. You can find a story describing how Santa Cruz is sending in the police before there’s a crime in The New York Times.

In essence, this is another physical-world application of machine learning and clustering technologies—applied to preempting a criminal problem. In the cyber-world we’ve been applying these techniques for a number of years with great success. In fact many of the most important advances in dealing with cybercrime revolve around the replacement of legacy IP reputation systems and domain filtering technologies with dynamic reputation systems—systems easily capable of scaling with both the threat and an ever-expanding Internet (e.g. IPv6).

Just last week Manos Antonakakis (a principal scientist here at Damballa Labs) presented at the USENIX Security 2011 conference in San Francisco about a new generation of technology capable of identifying domain names being used for malicious purposes weeks, if not months, in advance of malware samples being intercepted, analyzed and “protected” against by legacy anti-virus approaches.

The patent-pending technology utilizes passive DNS observations within the upper DNS hierarchy, and the paper describing the first generation of research (and cybercrime proof-points) can be found in the paper “Detecting Malware Domains at the Upper DNS Hierarchy” [PDF]. The system running here within Damballa Labs is affectionately known as “Kopis” and has proved its worth time and again preemptively identifying new botnets and cybercrime campaigns—keeping our Threat Analyst team busy with enumerating the real-world criminals behind the domain abuse.

The Kopis system extends many of the principles and research we learnt and formulated when developing the Notos technology [PDF]—a next generation dynamic reputation system for DNS.

In several ways the Santa Cruz Police Department’s modeling systems approximates an early generation of such a dynamic reputation system—utilizing a mix of long term observations and historical information, combined with real-time crime updates, the output of which is a forecast capable of predicting hotspots for daily crime.

Damballa Labs utilizes Notos and its derivative output evolutions in a number of ways. For example, we’re able to take any observed DNS record (e.g. domain name and resolved IP address) and provide a real-time score of its reputation—even if this is the first time anyone on the Internet has ever tried to resolve that particular domain name. In practice this means that we can predict (with a scale of confidence) that connecting to a device utilizing that particular domain name (or IP) is malicious (or good) and the nature of the threat it represents—all done through passive means, and without having to have observed the maliciousness directly associated with the device anytime in the past.

Systems like Notos make use of big data (i.e. colossal volumes of historical and streaming data) gathered from a global array of sensors. The mix of historical observations and real-time data feeds means that prediction models can be dynamic enough to keep pace with truly agile threats (and threat operators)—and can yield new approaches in unveiling advanced and sophisticated threats. For example, a possible query could be “provide me a list of domain names that are pointing to residential DSL IP addresses within Villianstan, that have never been looked up by any hosts within the country of Villanstan, that have only been looked up by hosts located within Fortune-100 companies in the USA, and that the number of Fortune-100 companies doing so is less than 5 over the last 12 months.” The result of the query would be a (long) list of domain names that are very high contenders for APT victims, which then drives specialist counter-intelligence analysts and law enforcement to uncover the nature of the threat.

In the meantime I’ll be watching with keen interest the successes of the Santa Cruz Police Department and their new modeling programs. Here at Damballa we’ve had phenomenal success in using machine learning and advanced clustering techniques in unveiling and forecasting new threats.

By Gunter Ollmann, CTO, Security (Cloud and Enterprise) at Microsoft

Filed Under

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

Co-designer of the TCP/IP Protocols & the Architecture of the Internet


Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.




Sponsored byVerisign

Brand Protection

Sponsored byCSC

IPv4 Markets

Sponsored byIPv4.Global

Threat Intelligence

Sponsored byWhoisXML API

Domain Names

Sponsored byVerisign