Characterizing the Friction and Incompatibility Between IoC and AI

Home / Blogs

Characterizing the Friction and Incompatibility Between IoC and AI

	By Gunter Ollmann CTO, Security (Cloud and Enterprise) at Microsoft
	January 25, 2017 Views: 6,810 Add Comment

Many organizations are struggling to overcome key conceptual differences between today’s AI-powered threat detection systems and legacy signature detection systems. A key friction area—in perception and delivery capability—lies with the inertia of Indicator of Compromise (IoC) sharing; something that is increasingly incompatible with the machine learning approaches incorporated into the new breed of advanced detection products.

In recent years, Government intelligence, law enforcement agencies, threat feed companies and global companies regularly targeted by sophisticated attackers have incorporated shared IoC detection data into their multi-layered defensive systems. Making use of standards led by the DHS and US-CERT such as Structured Threat Information [removed]STIX), Trust Automated eXchange of Indicator Information (TAXII), and Cyber Observable [removed]CybOX), there are agreed messaging and data structures for describing a threat properties and the attributes that can be used to detect them.

The crux of the incompatibility problem lies in the two-dimensional nature of the IoC data. These standards were driven by the need to share data that could be rapidly incorporated into signature-based detection or threat mitigation systems. The new AI-empowered detection technologies require considerably richer and varied data sources to be incorporated into the machine learning and mathematical data processing that’s going on within the product.

As described in an earlier publication on Machine Learning in Security, legacy signature-based approaches to threat detection can utilize one or two-dimensional data to conclude in a binary fashion when it comes to determining the existence of a threat. Meanwhile, machine learning based systems take in multiple streams of data (typically real-time) and apply mathematical techniques to score and ultimately classify the observation based on statistical and probabilistic scoring. In addition, each class of threat classification may be based on completely different trained models and optimized mathematical constructs—requiring an n-dimensional structure to the data inputs.

These fundamental differences are difficult to understand and are consequently a cause of frustration and friction when it comes to requests for IoC incorporation into AI-powered detection platforms.

One way of conceptualizing the request for two-dimensional IoC data to be incorporated into the new generation of threat detection platforms is to think of an artist constructing an oil painting. Our virtual Rembrandt has been commissioned to produce a family portrait. Over the course of several months, he has compiled multiple sketches and watercolor drawings of each family member. He agrees, crafts, and confirms what the background for the portrait will be (perhaps a countryside panorama with grazing lambs). One day he physically assembles the whole family into the studio and strategically places the members, modifies postures, and generally guides the poses—whereupon he makes multiple sketches, taking special care to note light patterns and the fall of fabrics. Over the coming month, he gradually adds each family member to the canvas using his own carefully manufactured pallet of oil colors—often asking individual family members to sit several times—as he compiles the family portrait. At the end of the process, a masterpiece oil painting of the family is presented.

Just as the painting is about to be finished, the family patriarch tells Rembrandt that he’d really like to include Uncle Humphrey from Portugal too. He’s about 5’6” high, a little porky, scruffy brown hair, billowy beard, green eyes, and likes to wear black—and no, he’ll not be able to visit, and we don’t have any other portraits of him either.

The task of parsing and incorporating IoC data (regardless of formatting standards) into a machine learning system is a lot like asking Rembrandt to add the mysterious Uncle Humphrey to the painting. Yes, it can be forced, but it’s not going to be pretty and there’s a high probability that the results aren’t going to be what you thought they would be—as many of the dimensionalities of the data are either missing or inappropriate.

While STIX, TAXII, or CybOX data may not be compatible with the new AI-based threat detection technologies, many of these detection solutions do have their own “raw” data repositories—consisting of accumulated metadata, statistical objects, packet captures, and other evidence packages—which can often be mined for data artifacts that may overlap some data elements within a supplied IoC package. Caution is warranted, though—as both the structure and richness of the data used for machine learning is often very different.

The first discussions on how to fabricate an IoC package that will be compatible with the current generation of AI-powered threat detection systems are still to take place. It’ll likely be several years before the technologies settle on richer n-dimensional data arrays that can be shared amongst multiple technology providers and appropriately encapsulate a threat. That poseable 3D hologram of Uncle Humphrey will have to wait.

NORDVPN DISCOUNT - CircleID x NordVPN
Get NordVPN [74% +3 extra months, from $2.99/month]

By Gunter Ollmann, CTO, Security (Cloud and Enterprise) at Microsoft

Filed Under

Comments

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.