Home / Blogs

To our readers: Does your company offer DNS or DNS Security services? CircleID has an opening for an exclusive sponsor for our DNS topic. Gain unparalleled results with our deep market integration. Get in touch: [email protected]

Normalize inflated iOS 15 Email Open Rates with Unsupervised ML

Happy Holidays. In September 2021, Apple rolled out iOS 15. For high volume email senders, Apple Mail now essentially tracks every email sent as an “open.”

Seemingly if you are a campaign engineer, high volume sender or ESP, this change can radically affect 25%-45% of your email subscriber base, across both B2B, B2C or D2C.

Inside your email analytics platform, you should see the specific percentage of Apple Mail users in your database, thus a significant rate of change ( delta) regarding inflated “open rates.”

Not surprisingly, Apple’s change means that unique and total open rates will artificially go up, not down, because Apple will not block the tracking pixels used to see who opens an email and who does not. Instead, they will pre-load all tracking pixels before a subscriber sees the email.

Let’s assume your average unique open rate is 25% on prior sends and that 40% of your readers use Apple Mail. After iOS 15 launched in September, your unique open rate will artificially change from 25% to 55%. This privacy change will only affect actual Apple Mail users. It will not affect people who use Gmail, Outlook or another mail app on their iOS device. Furthermore, many people will use a mail application like Gmail or Outlook on desktop but use Apple Mail on their mobile device.

Given these inflated open rates, a sender may want to wrangle some data to ascertain a more accurate read on open rates from your datasets, given your goal is a more engaged subscriber. Given that is the case, you don’t need to depend solely on your email analytics data altogether.

Consider two options:

Plan A: Omit IOS 15 Feature from the dataset altogether

Plan B: Use unsupervised learning to vacate the iOS 15 variable and let a robust algorithm transform the data to a more accurate open rate. Below, we will attempt to describe Plan B in detail:

Plan B: Using Unsupervised Learning to track IOS 15 Open Rates

Note to an ESP:

“We had a chance to think about how we want to track actual engagement “if” we were to optimize for the target variable of “open-rate.” Essentially we see two paths forward:

Path A: The most straightforward path is to merely omit the iOS 15 feature from the dataset and track open rates that way, given that iOS 15 provides inflated open rates.

Path B: This might prove to be more comprehensive, is that we use unsupervised learning or exploratory data analysis to impute missing data for more accurate open-rats in the iOS 15 column/feature/variable.

Since Apple now tracks every email sent as an open, every row in that feature would have a “1” signifying an open.—“1” for opens and “0” for non-opens. If we were to use unsupervised learning, a robust concept in ML for finding and imputing missing data in fields, we would approach it in the following manner:

Instead of omitting the variable altogether, as in Plan A, let us say we vacate the data in the entire column. We then apply an unsupervised learning algorithm to impute missing data in the vacated feature with “1s” or 0s. This way, we can identify a more accurate open rate, even prior to sends, given the benchmarks you provided. Determining a more accurate open rate (which is a “continuous” variable), this kind of problem can also be considered a regression. Imputing missing data can also be achieved by exploratory data analysis or (EDA) or several robust regression models. It doesn’t necessarily have to be unsupervised.

However, when unsupervised learning data is imputed, it immediately finds correlations with other, perhaps more obscure variables, not necessarily dependent solely on CTR. Indeed, a “1” will be correlated to CTR in that column, but a “1” might also correlate with other transformed dataset variables.”

A central application of unsupervised learning is to find these hidden correlations within the tagged data to impute untagged data in a missing feature. Essentially the model will use an unsupervised approach to reconcile who has indeed opened and who has not.

As data scientists, by not vacating the column used by Apple Mail customers, each row in this variable would have a “1” populated. Rather than relying on inflated open rates, we find correlations with potentially unknown or obscure variables, and allow the algorithm to populate the “1.” As we begin working with “wider” datasets for email, we’re likely to find hidden features that are directly tied to a target variable like “open-rate,” such as “5-star” reviews.

For savvy campaign engineers, this can be very important to find your most engaged clients while not relying on the fired iOS 15 pixel that triggered the open rate in the first place. In this case, if we want to estimate a more accurate open rate, we will likely use an unsupervised machine learning model and blend in a send time optimization model for the potential lift.

Although unsupervised learning encompasses many other areas, including summarizing and explaining the characteristics of the untagged data, there are ways we can get a more accurate score through the correlation of other dependent variables in the dataset.

Unsupervised and semi-supervised learning may be more attractive alternatives because relying on domain expertise to properly label data for supervised learning can be highly time-consuming and expensive. Unlike supervised machine learning, where the data is structured and properly labeled, unsupervised machine learning methods cannot be applied directly to a regression or classification problem because you do not know what the values for the output might be, making it impossible to train the algorithm as usual. While Apple Mail might be a small subset of your entire subscriber list, it is still quite significant, and the ramifications of sending mail to a subscriber that is unlikely to engage could result in dissatisfied subscribers.

By Fred Tabsharani, Founder and CEO at Loxz Digital Group

Fred Tabsharani is Founder and CEO of Loxz Digital Group, A Machine Learning Collective with an 18 member team. He has spent the last 15 years as a globally recognized digital growth leader. He holds an MBA from John F. Kennedy University and has added five AI/ML certifications, two from the UC Berkeley (SOI) Google, and two from IBM. Fred is a 10 year veteran of M3AAWG and an Armenian General Benevolent Union (AGBU) Olympic Basketball Champion.

Visit Page

Filed Under

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

Co-designer of the TCP/IP Protocols & the Architecture of the Internet


Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.



IPv4 Markets

Sponsored byIPv4.Global

Domain Names

Sponsored byVerisign


Sponsored byVerisign

Threat Intelligence

Sponsored byWhoisXML API

Brand Protection

Sponsored byCSC