From segmentation to hyper-personalization… the use of machine learning in marketing communications

The modern consumer is constantly bombarded with messages. Competition for ever-shorter attention spans is bloody, and it’s all about leveraging every possible advantage. How can data science help in the successful battle to get a brand’s message to consumers? Learn the details of using machine learning in marketing communications.

What did marketing communications used to look like?

In the early years of the commercial Internet, when the use of e-mail in marketing communications was just beginning. The task was fairly simple. There was not yet much competition , the freshness effect worked, consumers were more likely to open and read messages. It can be said that just having an address base and executing any mailings was sufficient for success. It was sufficient to follow a simple model somewhat jokingly referred to as “Everything to Everyone Always” (EEA). The name reflects the tendency to send every prepared communication to the entire available addressee base.

The strategy was based on the assumption that every message sent increases the likelihood of a positive response to some degree. And that the cost of sending each additional message is close to zero. In the case of text messages, cost could of course be a limiting factor in the volume of messages sent. In a great many situations, however, this was dealt with simply by proportionally reducing the volume so as to exhaust, but not exceed, the allocated budget.

From no segmentation to hyper-personalization

The situation began to change with the increase in the number of brands using email and sms channels for communication. Consumers’ inboxes began to “burst at the seams.” The numbers of messages received exceeded the capacity of perception. Marketing communications in many cases began to be perceived as unwanted (“spam”). The widespread use of the PAC model mentioned earlier also contributed to this. Above all, the lack of customization of content to the specific consumer and too frequent mailings played a role. The seemingly zero cost of mailing (“at most, the customer won’t open it”) did not encourage investment of time and resources in targeting precision. What the calculations did not take into account, however, was that the consumer’s reaction to an overabundance of messages would be a gradual resistance to the message and a declining interest in opening the message.

The first step on the way out was to acknowledge that consumers at the base are not all the same. They have different needs and characteristics. Therefore, the content sent to them should be tailored accordingly. Segmentation of the base began. Manual segmentation based on expert knowledge and predefined segments prevailed. In this approach, the number of applicable consumer characteristics and segments was limited by human capabilities.

Increasing data

As the volume and scope of consumer data collected in databases increased, new opportunities opened up. The new data, combined with the growing computing power and sophistication of machine learning algorithms, made it possible to increase the number of features used in segmentation. Segmentation took on a behavioral character. With a large number of analyzed dimensions, it was able to take into account more complex aspects of consumer behavior and their relationship with the brand. The use of machine learning also made it possible to distinguish and analyze more segments. This, in turn, translated into greater consistency of the groups selected and allowed for better tailoring of communications to the audience.

The next step in the described evolution was the use of predictive models for more personalized tailoring of communications at the individual consumer level. Previously, segmentation allowed operating at the level of groups of consumers. Different segments may have received a different message at a different time, but all consumers belonging to a segment received the same thing. As the level of sophistication of segmentation increased, the number of segments could be larger and consumers assigned to them with greater accuracy. However, the level of generalization was still high, and the room for improvement in precision remained large. The use of predictive scoring models was a significant step forward. The model predicted the probability of each individual consumer’s interest in the subject of a given communication. It could have been, for example, a specific product, a product group, a promotional offer.

Machine learning

Based on consumer behavior data collected in the database, the algorithm learned patterns. Based on these, it was able to predict which of the palette of available messages would most positively influence a particular customer. This ensured that every customer in the database, regardless of the segment assigned to him, was able to receive the most appropriate content.

If, for any of the planned options, the model did not forecast sufficiently high consumer interest, the consumer could be excluded from a given mailing. This allowed a large-scale implementation of the principle that if we don’t have anything interesting to say for you at the moment, we’d better keep quiet. This allowed the consumer to feel that he was getting only news relevant to him. There is no shortage of examples of projects where such predictive models have been implemented. As a result, the perception of the recipients has changed from “stop sending me this spam” to “when will I get another newsletter”.

Using machine learning in marketing communications

The scoring models described, combined with a set of additional, even more advanced models, allow personalization to be taken another step further towards so-called hyperpersonalization. This is characterized by, among other things:

  • fine-tuning the moment of dispatch. Each consumer can be assigned an ideal moment (in terms of time, day of the week, time since the previous mailing, time since the previous site visit, etc.).
  • Selection of the best communication channel in conjunction with the ideal moment of dispatch.
  • Selection of the best combination of channels used (a consumer may, for example, respond best to the combination of an email and a text message sent two days later).
  • The ability to individualize content elements (e.g., choosing the right words in the subject line, selecting the best photo and other graphic elements.
  • Dynamic adaptation of the message to the context of the consumer in real time. A different message and channel will be appropriate when the consumer is located in a shopping mall, and another when on the bus on the way to work.

Firstly, it is worth emphasizing that all the above-mentioned personalization takes place on the basis of behavioral patterns that models have “learned” by observing consumers and not on consumer declarations. A customer may declare that he would like to receive mailings on weekday mornings. In reality, mailings sent to him on Saturday evenings are most effective.

Secondly, the maximum power of the described solution comes from the synergy of the individual sub-models of the system. The optimal moment for a given consumer will vary from channel to channel. The optimal sequence and combination of channels may depend on the nature of the promotion being communicated. Finally, the optimal message text may depend on the channel and the location of the user.

Summary

Competition for consumers’ attention will increase as more channels of contact emerge, and it also increases as marketers become more aware of the tools that may be at their disposal. It’s worth asking ourselves at what stage of the described evolution our organization is at, and what we can do to take the next step to gain an additional advantage and not let the competition surpass us.

What problems can be solved with Data Science?

Many companies have already successfully implemented smart solutions in marketing departments, using data and its advanced analytics. The use of data science solutions increases the efficiency of operations, reduces costs, optimizes budgets and improves ROI. Wanting to remain competitive, other companies need to catch up with the leaders as soon as possible. Where to start? How do you effectively use the data available in your organization? Learn the proven approach of Data Science Logic’s experts.

Customer segmentation

Traditional segmentation approaches are limited as to the number of variables that can be considered. For example, a typical RFM (from recency, frequency, monetary value) considers only three variables. Machine learning methods can segment consumers based on a virtually unlimited number of dimensions. They take into account not only demographic data, but also behavioral data related to both purchasing behavior and consumer interactions at various touchpoints (e.g., web, mailings, app). As a result of segmentation, “personae” are created – typical representatives of the segment, whose characteristics allow to differentiate the approach and plan optimal actions for them.

Prediction of customer value

Machine learning models can predict with a high degree of accuracy the value of a customer throughout its lifecycle. This is possible from the very first (even residual) data about the consumer’s relationship with the company. Of course, the more data, the more accurate the prediction. However, already the initial prediction allows you to decide how much it pays to invest in the relationship with a given customer. This allows you to focus your attention and budget on the most profitable customers.

Anti-churn measures

Machine learning makes it possible to pinpoint customers at risk of leaving. This makes it possible to identify and prioritize customers against whom action needs to be taken. Combined with predictive models of customer value, it is possible to make an optimal decision on how much to invest in retaining a given customer (e.g., in the form of a discount). Using direct communication models, it is possible to find the optimal timing, channel and content of an anti-churn message. The model can also identify a good enough offer for a given customer. For example, if a customer decides to stay after receiving a 5% discount, there is no need to offer him a 15% discount. The predictive model can thus contribute to significant budget savings.

Direct communication planning

Predictive modeling significantly supports the process of preparing and planning marketing communications – not only anti-churn. On the basis of data on consumer interactions with the company, it is possible to predict the positive reaction of a given customer to a specific content, offer, moment and dispatch channel. This allows optimization of the budget – for example, choosing a cheaper channel if the expected effect is similar, or improving the consumer experience – less spam, more tailored content, the most convenient communication channel.

Content analysis and recommendations

Advanced predictive models based on so-called deep machine learning are able to process not only numerical data, but also image, text, sound or video. This makes it possible to predict the effect that the content sent in communications will have on a particular customer. This makes it possible to choose the right content and title of the email, the optimal layout and graphics.

Analysis of the incremental effect of promotion

Promotions and price reductions are an important budget item. Not surprisingly, questions arise about their effectiveness and impact on sales. A simple analysis is often not enough. Reducing the price of a product almost always increases its sales. Just by observing the dynamics of sales during the promotional period, one can conclude that it has worked favorably. Meanwhile, it is necessary to find the answer to the question of how much sales would have been if there had not been this promotion. Only by comparing these two values (actual sales and hypothetical sales without the promotion) can the effect of the promotion be assessed. Advanced data science models are capable of estimating baseline sales with a high degree of accuracy, also taking into account such factors as seasonality, cannibalization, weather variability, calendar effects.

The examples presented are just some of the problems that are already being solved with the help of data science, and using the full potential of data can significantly affect a company’s position in the market.

Predictive modeling vs. customer satisfaction measurement

The traditional survey-based approach to measuring customer satisfaction is no longer sufficient. Using the wealth of available data, companies are already breaking through the limitations of available methods. Data-driven customer satisfaction prediction and proactive actions taken based on it are the future of this area.

Customer satisfaction is crucial to maintaining customer loyalty and is the foundation for business growth in a competitive market. Not surprisingly, this topic has long been of interest to marketing professionals. The fruits of this interest are the many methodologies for measuring customer satisfaction described in the literature and used in practice. Among the most popular are the Customer Satisfaction Score, Customer Effort Score or Net Promoter Score (NPS). A common feature of the most popular approaches is their reliance on surveys, in which the consumer is asked properly prepared questions (or even just one question). It should be noted that measuring and analyzing customer satisfaction with the help of the aforementioned surveys has proven successful in many cases. They have been and continue to be an important element in the market success of many companies. However, these methods are not without their drawbacks. According to a survey conducted by McKinsey&Company, among the most frequently mentioned by CS/CX professionals appear:

  • limited range
  • delay of information
  • ambiguity of responses making it difficult to act on them

Limited coverage

According to the study cited earlier, a typical satisfaction survey collects responses from no more than 7% of a company’s customers. The reasons for this are multiple. Among them are budget constraints (the cost of the survey), lack of an available channel of communication with the customer or lack of consent to communication, low customer interest in answering questions. Importantly, the propensity to respond to a survey may vary depending on certain characteristics or customer experience (e.g., dissatisfied customers may respond more readily). This puts a big question mark over the representativeness of the results obtained and their generalizability to the company’s entire consumer base.

Information delay

Surveys by their very nature lag behind the phenomenon they are investigating. This makes it impossible to take pre-emptive action. We can only take action against a dissatisfied customer after he or she has completed the survey. Practitioners in the field of customer satisfaction often emphasize how big a role reaction time plays in problematic experiences. An excellent satisfaction measurement system should operate in near real time. This will make it possible to take immediate action to solve a customer’s problem or erase a bad impression formed in the customer’s mind. Ideally, it should be able to predict a customer’s growing dissatisfaction before it manifests itself in the form of a negative rating in a survey or a nerve-wracking contact with customer service. Such an opportunity does not exist in survey-only measurement systems.

Additional delays in collecting information result from the limited frequency with which questions can be asked of the consumer. Typically, surveys are conducted after a certain stage of the consumer journey, such as after a transaction. Often, there is no opportunity to ask questions at earlier steps, where issues can also arise that negatively impact the customer experience. In extreme cases, a negative experience e.g. when placing an order may end up with the transaction not happening at all. If we survey only after the transaction, in such a situation we will not get any information, because the consumer will not meet the criterion (transaction) to receive a survey.

Of course, there are companies where survey processes are not subject to such restrictions. These companies conduct surveys after every stage of the customer journey and every time the customer interacts with the company. However, it is important to remember that a survey can be perceived by the customer as an intrusive tool. Thus, an excessive number of surveys alone can negatively affect his experience. Hence, it is necessary to maintain a reasonable balance between the frequency of surveys and the consumers’ willingness to respond. In turn, it is best to think of a slightly different solution, which we will present later in the article.

The ambiguity of the answers making it difficult to take
on the basis of them

This limitation is derived from the desire to maintain a balance between the company’s information needs and consumer satisfaction. It stems from both survey coverage of individual steps in the consumer journey (described above) and limitations in survey length. Short surveys (covering one or a few questions) are not very burdensome for the consumer and may also contribute to better response. However, reducing a survey to a single question can make it difficult to understand what factors actually influenced the consumer’s evaluation of the product in this way and not in that way. If we don’t know what, in the case of that consumer, caused the negative evaluation, it is difficult to take any action based on the result to correct the bad impression.

Limiting the number of moments in which questions are asked works similarly. The rating given by the consumer (for example, “seven”) after the transaction is completed is difficult to attribute to the different stages of the consumer’s path. Determining which stages caused the consumer to deduct points from the maximum score would require asking a series of in-depth questions. This increases the time it takes to complete the survey and reduces the chance of getting an answer. On the other hand, more frequent surveying (e.g., after each step of the consumer) also raises the problems described above.

Predictive modeling as a solution

Using machine learning, it is possible to build models capable of predicting consumer satisfaction at any stage of the consumer’s journey to finalizing a transaction. This, of course, requires integrating data from many different sources usually present in a company. Mention can be made here of sales data, from a loyalty program, from a customer service desk, a hotline, a website, financial data, or, finally, data from satisfaction surveys conducted to date. This data is needed at the individual consumer level. It also requires expertise in data science – you need a team capable of building and implementing predictive models. It is worth noting that in such a solution we do not completely abandon surveys. However, they change their character. They become a typically research tool. In turn, they cease to be a tool for ongoing measurement of satisfaction.

The system works in such a way that on the basis of data collected on an ongoing basis about the consumer, it predicts his current satisfaction rate at a given moment. Moreover, it also indicates which factors positively and which negatively influence this result. This makes it possible, firstly, to identify customers against whom it is necessary to take action, and secondly, to recommend specific actions to be taken against them. The whole thing makes it possible to consistently predict customer satisfaction in near real time and take effective action.

A predictive approach to customer satisfaction is certainly the future. Companies that are the first in their market to implement this type of solution will win the battle for the customer. Even if some service “mishap” happens (and this is inevitable in large organizations) they will be able to respond to it appropriately and quickly. Thanks to an efficient system based on prediction, the reaction can be so fast that the consumer won’t even have time to think about looking for a competing supplier.

Growth or cannibalization? Success or failure of promotion?

Promoting a product, especially one that involves a reduction in its price, almost always results in an increase in sales. But does every increase mean that the promotion was profitable? How often does the promoted product take customers away from other substitute products? How to calculate how much is the actual “incremental” of the action?

The fundamental question to solve the problems identified in the introduction is: what if… Or more precisely: what if there were no promotion. How much would the sales of the promoted product have been? How much would other products (especially substitute products for the promoted one) have sold? On the surface, this seems impossible to determine. After all, we are asking about an alternative reality that we are unable to observe. It is impossible to introduce a promotion and not introduce it at the same time. However, it turns out that based on advances in statistics, data science and artificial intelligence research in recent years, we are able to estimate the aforementioned effects in a scientific, methodical and rigorous manner. The method used is based on so-called synthetic control groups. That is, to put it somewhat simply, comparison groups created by a special algorithm on the basis of available observations of sales of similar products.

We will analyze this using the example illustrated in the chart below. The red line shows the actual sales of the product (in units). You can see that before March, sales of the product were traceable. You can also see a clear weekly cycle with peaks on Saturdays and clear declines on Sundays (related to the trade ban and the limited number of outlets that can conduct sales). One can also see an upward trend in sales since the beginning of March. The black vertical dashed line shows the day the promotion started. The price of the product has been significantly reduced. A clear increase in sales can be seen.

kanibalizacja

The light blue dashed line is the algorithm’s estimated sales behavior of the promoted product if there were no promotion (alternative reality). It can be seen that even without the promotion there would have been an increase in sales (in line with the upward trend visible since the beginning of March). However, it would not have been as large. Therefore, it can be concluded that the promotion generated additional sales of the product.

The next chart shows a summary of incremental sales for each day. As in the earlier chart, the black vertical dashed line marks the beginning of the promotion period. Most of the bars are clearly above zero, indicating an estimated increase in sales relative to the base scenario (i.e., no promotion). The period preceding the promotion to the left of the dashed line is the calibration period. Based on this period, the algorithm learns the best combination of products that make up the comparison group (the so-called synthetic control group). The closer the bars in the calibration period are to zero, the better the matched comparison group. Of course, in real-world examples (such as the one presented in this article) it is difficult to find a perfect match. Hence, the bars deviate slightly from 0. What is important, however, is that the magnitude of these deviations is much smaller in the calibration period. This lends credence to the conclusion of a real positive effect of promotions on sales.

efekt promocji

At this point, one could close the analysis and congratulate those responsible for the promotion. The question arises, however, to what extent the promotion attracted new customers or increased demand from existing ones, and to what extent it merely shifted demand from other complementary products that were not promoted during the period. In other words, to what extent did the promotion scanibalize sales of other products.

To answer this question, we will conduct a similar analysis to the one presented above. This time, however, the red line will represent sales of a substitute product to the promoted product. For this particular product, we want to estimate the cannibalization effect.

różnica w sprzedaży

As before, the vertical black dashed line marks the date of the start of the action. After the start of the promotion, the red line is lower than the light blue dashed line, which means that the substitute product is sold less than it would have been sold in the baseline scenario assuming no promotion. It is also worth noting that in the period before the promotion (the calibration period) the two lines are very close to each other, which means that the algorithm has correctly calibrated the comparison group.

The chart below summarizes the effect of cannibalization by day. You can see that on each day of the promotional period the product sold less compared to the reality that the promotion would not have taken place.

In the case analyzed, the incremental sales of the promoted product amounted to 407 units. However, when evaluating a promotional action, one must take into account the effect of cannibalization. In this case, the loss of sales on a product substitutable to the promoted product amounted to 326 units during the promotional period. Without taking this factor into account, we could significantly overestimate the financial effect of the action and draw incorrect conclusions as to its profitability. This, in turn, could translate into suboptimal decisions on organizing similar promotions in the future.

The best way to measure effects is to conduct a randomized experiment. However, this is not always possible. It is difficult to imagine how to run a promotion and not run it at the same time and for the same group of consumers. In such situations, modern analytical methods based on synthetic control groups, among others, such as the one presented in today’s article, can be invaluable in marketing analysis.