Attributional modeling – the key to understanding the effectiveness of marketing activities

In an era of increasing number of communication channels and brand touch points, proper identification of the importance and impact of each channel is becoming increasingly important. Correctly answering the question: to what extent did the use of a given message and channel affect the achievement of a goal is crucial for optimizing activities and maximizing the return on the invested marketing budget. The problem is as important as it is difficult. However, attribution modeling and data science methods come to the rescue.

What is attribution modeling?

Attribution modeling is the process of building a model to assign value to each of the touchpoints along a customer’s conversion path. It aims to understand which marketing channels and activities contribute to achieving business goals, such as making a sale, acquiring a new customer, activating dormant customers, recruiting new loyalty program participants or increasing brand awareness. Under the term attribution model, there can be many different constructs, from very (too) simple to very complex. In general, models can be divided into: single-point, rule-based multi-point and algorithmic multi-point models.

Single-point modeling

Single-point models allocate the entire value of a conversion (or, more broadly, goal achievement) to only one point of contact. Typical approaches are first-click or last-click. These are simplistic models. They do not take into account the entire customer conversion path. They don’t take into account the interactions between different points of contact and the context in which it takes place. Their advantage is simplicity and ease of application. However, in the complex world of today’s marketing, they are too simple to reliably reflect reality.

Rule-based multi-point models

Multi-point models distribute value among different touch points along the customer path. At the same time, they are divided into rule-based and algorithmic models. The former assign value to individual contacts based on predefined rules. For example:

  • linear model – assigns equal value to each contact point encountered by the consumer on his path to conversion;
  • U-shaped model – assigns the greatest value to the first and last points of contact, intermediate points are of lesser (though non-zero) importance in this model;
  • model based on conversion time – assigns the greater value the closer the point was to the moment of conversion. In this model, the greatest weight is assigned to the last contact immediately preceding the conversion.

The advantage of rule-based models is their clarity and relative simplicity. Also that they do not omit any touch points on the path to conversion. However, their weights are given based on arbitrary rules. Justification can be found for each of them. However, it is impossible to say which one is the best. As with single-point models, their disadvantage is also that they do not take into account interactions between different points of contact and do not take context into account.

Algorithmic multipoint models

Algorithmic models, like rule-based models, assign a weight to each touchpoint along the customer path. However, instead of arbitrary rules, they use sophisticated statistical methods to determine these weights. So instead of adopting predefined rules, these models “learn rules” from real data (using machine learning methods). Such models take into account the order of contact points and interactions between them. For example, the impact of an email on conversion may be greater when it was preceded by a banner display. They also take into account context, e.g. time of year, weather, media activity of competitors, pricing. They can operate at a very detailed level, e.g. distinguish the impact of individual creative variants or where and when they are displayed.

It is hard not to agree with the statement that today these types of models are the “gold standard”. Only they make it possible to take into account the entire complexity of consumer-brand contact paths. However, behind the accuracy and benefits of algorithmic models, there are associated challenges. In particular, as to the quantity, quality and scope of the data and the analytical competence required to create them. They also have the disadvantage of limited transparency due to the complexity of the rules that govern reality and are identified by the model. Algorithmic models, however, allow for advanced simulation (what if?) of various scenarios, e.g. what if we dropped channel A altogether? what if we reduced the budget for channel B? what if we switched the order of messages in the sequence? This in turn allows you to optimize your budget and activities. The investment in this type of model can therefore more than pay for itself.

Summary

Marketing attribution models have undergone a long evolution from simple single-point models to multi-point models based on complex machine learning algorithms and statistical methods (including those based on deep artificial neural networks). In doing so, it is still an area of intensive research and experimentation both in the scientific community and among practitioners. Despite the complexities and challenges of their creation and application, they are increasingly accessible thanks to the falling costs of data collection and processing. Thus, we are entering an era where we should not ask “whether” they are worth using, but “how” to build and use them effectively.

Predicting sales in unpredictable times. Why is it important to forecast not only sales but also demand?

Unfortunately, the unstable economic situation is making it increasingly difficult to maintain a profitable retail business. Retailers must be able to predict the future with some accuracy in order to run a profitable business. Forecasting sales and demand are therefore becoming two key aspects of business planning.

Sales prediction or demand prediction – which to choose?

The terms sales prediction and demand prediction are sometimes used interchangeably. However, there is a fundamental difference between them. What does this difference refer to and which prediction should we particularly focus on? That’s what we’ll discuss in today’s article.

To begin with, it is worth taking a moment to recall the relationship between the key terms demand, sales and supply. Demand refers to the amount of products or services that customers would like to purchase in a given period. Sales, on the other hand, is the amount of products or services that were actually sold during that period. For sales to occur, there must be a supply of products or services capable of meeting demand. This is because supply is the amount of products and services supplied that are available during a given period. Therefore, there are no sales when there is no demand. However, there are also no sales when there is demand and not enough supply. Generally, therefore, we can deal with three situations:

  1. Demand = supply
    Ideal situation: customers are satisfied with the ability to meet their needs, and the company is satisfied because it sells all available inventory.
  2. Demand > supply
    Not all customers are able to satisfy their needs, while the company bears the cost of lost potential sales. Such a situation arises, for example, when there is a shortage of a particular commodity in the warehouse or on the store shelf at the time when the consumer would like to purchase it. In a competitive market, the customer can then buy a substitute product/service from a competitor.
  3. Demand < supply
    An unfavorable situation for a company that has frozen money in merchandise lingering on the shelves, loses the ability to use store space and logistical resources to supply products in demand, and runs the risk of losing the value of the product altogether (e.g., as a result of exceeding the expiration date).

Accurate demand prediction avoids situations 2 and 3, or at least minimizes their scale and associated costs. At the same time, we can identify 5 areas where demand prediction brings benefits.

Benefits of demand forecasting

  • Optimization of production and inventory
    With accurate demand prediction, a company can better predict how much product it will need for a given period This allows it to optimize production processes and control inventory levels.
  • Increase sales
    Ensuring the right amount of products in stock allows the company to increase its sales and customer satisfaction.
  • Better planning of marketing campaigns
    By having an accurate prediction of demand, a company can better plan which products (or product categories) and during what period it pays to promote.
  • Optimization of prices
    With demand prediction and knowledge of inventory, a company can optimize the price of a product to balance demand with supply and maximize profit.
  • Cost reduction
    With accurate demand prediction, a company can avoid the costs of excess inventory and unnecessary logistics costs.

Sales prediction vs. demand prediction – differences

However, what if we prepare a sales prediction instead of a demand prediction? In such a situation, we risk underestimating. As we have already noted, sales occur when demand meets supply. In a situation where supply is insufficient (lack of goods) then demand will not be met and sales will be lower than they could be. In the extreme case with a total lack of goods on the shelf, sales will be 0. A predictive sales model can correctly predict the lack of sales in such a case. However, using such a model to decide on the right product inventory will result in underestimation and loss of potential sales. To make matters worse, the accuracy rates of such a model can be very high. This is because we may be dealing with a self-fulfilling prophecy:

No goods → zero sales → model predicts no sales in the next period →
decision to not supply the product (since no sales are assumed) → no goods.

And the circle closes.

This is a potentially costly mistake at the model conception stage and a trap into which companies sometimes fall. Meanwhile, machine learning methods make it possible to build and train predictive models capable of predicting demand (and not just sales). Such models take into account a number of different factors influencing demand (including seasonality, price, weather, promotions) and can operate at any level of aggregation (product group/single product, region/store group/single store, etc.).

Summary

Accurate demand prediction is the key to success. It allows you to reduce costs, increase sales and improve customer satisfaction. However, these benefits can only be provided by the right selection of data science methods suitable for solving this kind of problem.

Where should I look for customers?

A customer base is an important asset for any business. The data collected about customers allows better targeting of communications and preparation of more tailored offers. However, a healthy business needs a steady stream of new customers. In turn, there is usually no (or little) data on them. Where to look for customers? And can data science therefore help in reaching them?

The question posed above is best answered with an example. Some time ago, one company wanted to significantly expand the customer base buying its flagship product. Experience suggested that this product appealed to a completely different group of consumers than the company’s typical customer. An advertising campaign using billboards and flyers was planned. With a limited budget, however, the company did not want to “flood” the entire city and surrounding areas in which it operates with materials. It intended to focus its efforts and budget in locations with the highest probability of high return on investment.

The first idea on how to use data to solve this problem was to see where the current customers purchasing the product were coming from. Their address data was in the database thanks to the loyalty program in place. An analysis of the demographic and behavioral profile of customers buying the flagship item – the subject of the campaign – was conducted. Compared to typical customers, this group was characterized by an overrepresentation of the 30-35 age group by more than 10 percentage points, a higher proportion of men and higher income. It was assumed that particularly attractive from the point of view of the planned campaign would be areas with an above-average share of residents with such characteristics. Therefore, areas (neighborhoods, districts, municipalities) were selected on the basis of several data sources. These came from, among other sources, information made publicly available by the Central Statistical Office and offered commercially by various private providers.

Concerned that simply identifying customer locations of higher interest would not be enough, a more precise estimate of the sales potential of individual locations was needed. In short, an answer was sought to the question: how many sales can we count on? For this purpose, a predictive model was built, which was able to indicate for each area the expected future sales in any defined period. The model used variables such as the age and gender structure of each district, income per household, travel time to the service point, and purchasing behavior of existing customers in the area, among others. The average prediction error of the model varied within +/- 6%. To illustrate the level of detail with which the model was able to pinpoint locations, the following table contains the definition of the top two recommendations of the predictive model.

The areas with the greatest potential identified by the model were also visualized on maps (example of one below).

In order to assess the relevance of the model’s recommendations, the effects of the activities carried out in the group of the top 10 locations identified by the model were compared with the 10 locations from places 11-20 of the ranking. The return on investment in the group recommended by the model was more than 21% higher compared to the comparison group.

Data science in the right way, combining internal and external data sources with different levels of detail (individual customer data with aggregated data describing entire areas), can help solve various problems faced by business. Thus, it contributes to increasing return on investment.

How to optimize marketing efforts with data science?

Email communication is still the leading channel for direct contact with consumers. There are many examples of companies using this channel to successfully increase consumer engagement and turnover. However, running a campaign too aggressively can result in a decrease in the open rate and positive effects of emails.  An overly conservative approach to using mailings can prevent the consumer base from realizing its full purchasing potential. As the number of products being sold increases, customer expectations become more exorbitant, and competition intensifies, optimal campaign planning becomes even more challenging. Fortunately, data science can help.

The recipe for success is simple. You have to send the maximum number of emails to the consumer, not more and not more often, however, than he can bear. You have to send messages to the consumer about potentially the most profitable products, but not ones that don’t interest him at all. Finally, you need to send messages at the right times, both in terms of the consumer’s preferences and the campaign calendar and the need to “spice up” sales. Well, and it would be good not to inform the customer about products he intends to buy anyway. The principles are cruelly simple. So why is their implementation in practice sometimes so difficult?

How to optimize marketing activities – an example

In seeking an answer to this question, we will use the following example, which is very simplified by necessity.  A company sells three products A, B and C. Each of them will be the subject of campaigns that will be implemented during the 4-week period under consideration.

Product A will be promoted in week 23, product B will be promoted in all weeks, and product C will be promoted only in the last 24 weeks. Product C is the most profitable for the company – each unit is a 250 PLN margin. For product A and B, on the other hand, it is 100 and 50 PLN, respectively. The above information is summarized in the table.

Product / Campaign week21222324Unit income
AX100
BXXXX50
CX250

Let’s also assume that the company has only 100 customers and each of them has a different preference for shipping frequency. For example, consumer No. 1 accepts a frequency of no more than once every 2 weeks. So a shipment to him can be made either in weeks 21 and 23 or in weeks 22 and 24. Consumer No. 2 is a bit more oversensitive about the frequency of communication and is already inclined to consider dispatches more frequent than once every 4 weeks as excessive spamming.  A dispatch to him, therefore, can only be made in one of the 4 weeks, and so on.

Finally, let’s assume that our goal is to maximize the effect for the entire four-week period. And by effect, we mean the total additional sales generated by the campaign. We also assume that no more than one message can be sent to a given recipient in a single week. Thus, in weeks with more than one campaign, we must choose which one we will send. We also do not send more than one message regarding a given campaign to the same customer.

Even with such a modest base: 3 products, 4 weeks, 100 consumers, we have 24 x 3 x 100, or 4800 zero-one decisions to make whether or not to send a given customer a given campaign in a given week. We only need to add one more product and extend the period to 5 weeks and the number of decisions grows to 12800. If we wanted to optimize the calendar for a quarter (12 weeks) with the same number of products and customers we already reach 1638400 combinations. And yet hardly any company has only 100 customers and sells 4 products….

So we see that the number of decisions that need to be made is growing very rapidly. Even for such a small base, it is impossible to make so many decisions manually.

So what do we do in such a situation?

  1. Firstly, we can abandon optimization and send everyone everything always or choose randomly.
  2. Secondly, we can adopt an approximate method, which does not guarantee finding the optimal solution, but gives a chance to find a solution better than random. For example, for each consumer, let’s assign the most profitable campaigns sequentially. However, this intuitive way does not at all guarantee the maximum global effect. Perhaps, given all the constraints we have, for a given consumer it would be better to forgo sending the most profitable campaign and thus have the opportunity to send him two slightly less profitable ones, which, however, will bring higher profit overall.
  3. Thirdly, we can use mathematical programming and advanced optimization engines to find the best possible combination of shipments. It is worth knowing that about 85% of Fortune 500 companies use such methods in their business operations. Examples of the results that in the area of email campaign optimization were achieved with the implementation of such solutions:
  • more than 30% increase in open rate (thanks to optimized frequency),
  • more than 6-percent increase in the value of sales in the group targeted in an optimal way compared to the group targeted with simplified decision-making methods,
  • nearly 50 percent savings in time and human resources when planning campaigns.

Finally, it is worth mentioning that in a world where we send text messages in addition to mailings and contact customers through other channels, the number of possible combinations of decisions is even greater. Therefore, the greater will be the benefits of mathematical optimization. We will also avoid a situation in which a manually arranged plan (which is not optimal anyway) will go awry when one campaign needs to be postponed “just by a week”, and due to problems with the supply of a certain product the volume of messages related to it needs to be reduced by 50%. The human planner will fail to modify the plan in time, or will quit overnight. A properly written mathematical optimization program will handle this task in minutes or hours. At the same time, it will allow to achieve the maximum campaign profit possible under the given conditions and constraints.