Home / Blogs

Email Recommendation Engine for ESPs – Text Length Optimization (Part II)

Popular email editors today have no way to optimize for text length. An email marketer may attempt to build her content but has no idea whether that content is optimized related to word count for a specific industry/client. As it relates only to text length, does the email have too many or too few words. Currently, there is no built-in predictive model to inform her. Well, until now. Last month we described an evolutionary real-time data-driven process for email campaign builders to have at their disposal. A distinct ability to leverage and optimize their campaigns in real-time by developing an in-session recommendation engine on a per-module basis. Let’s call it a per widget basis for now. But that term may soon be challenged.

In Part I, we performed an analysis of a general concept whereby an email campaign builder can optimize and select a specific “template” type at the outset of email campaign building by utilizing 3-5 classifiers.

In this post, we will attempt to describe how to use machine learning to optimize for two very widely used widgets in an email editor, the text widget and the image widget. The editor has several widgets that the marketer requires to build the campaign, and we will focus on two widgets, text first and subsequently images in Part III.

The text optimization machine learning model provides predictive analytics on the text body (specifically the word count) in the body of the email. Using a certain type of target variable, the model would optimize the word count and recommend the binned word count or length of words in an email, with the highest probability based on the selected target variable, campaign, and industry. Please keep in mind that a marketer must optimize for target variables at the outset of the campaign to utilize this model optimally.

The Dataset used in this Model

The dataset used in this analysis is from the UCI ML Repository. Our Google Colab notebook (images below) within this directory created synthetic data for many additional features typical for an email campaign—open rate, CTR, Unsubscribes, Campaign Type, Industry, deliverability rate, etc. It is important to note that for brevity, we only created a few campaign types. Still, within the email industry, you could use more specific classifiers, such as webinar-based emails, special offer-based emails, or even welcome-based emails as an opener.

The Model Association: Using multiple variables and a target variable

In most studies, building multiple regression models is the final stage of data analysis. These models can contain many variables that operate independently or in concert with one another to explain the variation in the target variable. For a simple example, both gender and education status can predict when a person has a child. In our model below, we’re using several variables for increased accuracy. We’re predicting the target variable ( CTR) in an email for a particular industry (real estate) utilizing a 3rd variable, which we call “campaign type, or in some terminology, “email type.” The model can now understand and better predict the association between these variables.

The popular algorithm used in this model is a Random Forest Classifier, but other recommended models include Gradient Boosted Classifiers and Neural Networks. We will dive into Convolutional Neural Networks a bit later on in Part III of this series, where we optimize for images in an email, but for now, the ensemble used for accuracy in this model is a Random Forest. For increased accuracy, we experimented with other diverse ensembles and rectified any imbalances in the dataset. There were no additional datasets introduced in the model. Once again, as a reminder, the model is optimizing for a single widget using multiple variables, and in most high volume email editors, it is called “text” widget. This widget is where you compile the content in text. We’ll get into optimize buttons and CTAs in the latter part of this series.

Moreover, our AI assistant “Marlowe” for Email Campaigns may be switched “on” for every widget or some widgets. The end-user provides full control over this widget. This is at the discretion of the end-user to allow for AI optimization in between widgets. Your client can choose to build this campaign with the AI Assistant on or off.

It is safe to assume that enormous compute power is needed to run these types of algorithms, given their deep learning nature. You might want to assume that thousands of clients use the engine simultaneously and linearly, either across one widget or all widgets necessary.

The Meta Data Used for Context in this Model

Here is some of the Meta Data Used to create this model. It is important as models become for mature, that the meta data will help explain the reasoning behind the model and for new companies just starting out, you don’t have the reputation of an Airbnb, quite yet. So, you need to use the meta data to explore relationships within the data set and populate the explainability. Why did the model choose an “olive” colored call to action button, when I chose read? Here is the meta data we used in this model for text optimization:

  • ‘Email label’: this variable represents the spam classification from the original dataset
  • ‘Email text’: This variable consists of the body of the email from the original dataset
  • ‘Email Text length’: the word count of the email body
  • ‘Email Title’: the title consists of the first sentence in the email
  • ‘Email title_length’: the word count of the title sentence
  • ‘Email CTR’: a binary value of synthetic data
  • ‘abandoned_cart’: a binary value of synthetic data
  • ‘Email unsubscribed’: a binary value of synthetic data
  • ‘Email open_rate’: a binary value of synthetic data
  • ‘Email num_pics’: integer values 0-6 of synthetic data
  • ‘Email num_videos’: integer values 0-3 of synthetic data
  • ‘Email video_length’: integer values 0-240 (representing length in seconds) of synthetic data

Some feature engineering was applied to add in synthetic data to demonstrate the recommendation engine, these include:

  • ‘Email campain_type’: ‘Survey’, ‘Promotional’, ‘No_Opener’, ‘Revenue_Based’, ‘engagement_campaign’, ‘abandoned_cart’
  • ‘Industry Type’: ‘Automotive’, ‘Industrial’, ‘Real_Estate’, ‘Hospitality’, ‘Medical’
  • ‘Length_binned’: this variable represents length bins of the “text” body

For additional explainability in the machine learning recommender system, we wanted to explain the model to the campaign builders as to why the model chose a specific length type. The binned word count for the number words length in an email was segmented into four categories:

  • small: 0-10 words
  • medium: 11-20 words
  • Long: 21-40 words
  • Longest: 41-200 words

The predictors used in this example and target variables optimized for are:

1: Target Variable =CTR
2: Campaign Type =Promotional
3: Industry =Real Estate

Once parameters are selected at the outset of the model, the campaign builder can go about fine-tuning and optimizing for text length.

Model Outcome:

It seems like Real Estate professionals based on this dataset from UC Irvine like to read longer emails, although 20-40 words in an email isn’t too long. (Bias) In this example, by increasing the word count, you would gain a 12% CTR as opposed 11% CTR with a medium-length word count, which is somewhere between 11-20 words, based on our segments. A 1% CTR, might not sound like much, but when you have 5-20M subscribers, it makes quite a difference in your business approach and strategy for promotional-based campaigns or any other reasonable business tactic for email or other channels.

Pain Points Resolved

The additional insights gleaned from just this one widget says so much about your customer and the industry. But before diving in too deep, let’s figure out what we just solved for. No longer will you need to A/B split test an email. The model already does this for you. Imagine the time savings there. No more waiting for the content team to approve your content; the model also saved you and the company loads of time. What else? How about if I told you that you’d never have to worry about list segmentation again? Let’s wait till Part III on that.

So sticking with the target variable of CTR, if you are building email campaigns for a client in the Real Estate Industry, based on the type of target variable (in this case CTR) the model recommends a longer text body somewhere in between 20-40 words for the email. Keep in mind that this model would differ greatly if a different target variable were input or chosen by the campaign builder.

In Part III of the 5 part series, I will discuss the optimization of images using a convolutional neural network from a dataset from the Stanford Lab of over 16,000 images of vehicles. Optimizing for images in an email is a blast. You can do all sorts of fun things in real-time, and if it’s a product-based email, ever wonder if a 3D image would convert better than a 2D image? How about Augmented images of the product? Do they convert better? Maybe?

See you next time. In part III, we’ll optimize for images, in Part IV we’ll optimize for CTAs, and in Part V, we will introduce Sentiment Analysis for Email.

By Fred Tabsharani, Founder and CEO at Loxz Digital Group

Fred Tabsharani is Founder and CEO of Loxz Digital Group, A Machine Learning Collective with an 18 member team. He has spent the last 15 years as a globally recognized digital growth leader. He holds an MBA from John F. Kennedy University and has added five AI/ML certifications, two from the UC Berkeley (SOI) Google, and two from IBM. Fred is a 10 year veteran of M3AAWG and an Armenian General Benevolent Union (AGBU) Olympic Basketball Champion.

Visit Page

Filed Under


Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

Co-designer of the TCP/IP Protocols & the Architecture of the Internet



New TLDs

Sponsored byRadix


Sponsored byDNIB.com

IPv4 Markets

Sponsored byIPv4.Global

Threat Intelligence

Sponsored byWhoisXML API

Brand Protection

Sponsored byCSC

Domain Names

Sponsored byVerisign


Sponsored byVerisign