A friend sent me an article on market basket analysis for SMEs. I looked around for some more tutorials and courses on that topic and I figured out that they were equally bad. In general, this is problem is very similar to recommender systems. Therefore, I think it is time to comment on this.
How affinity analysis is taught (and done)
How is affinity analysis taught? Well, usually it involves some correlations between product sales and all other features as well as a lot of dimensionality reduction. (This is a very condensed statement. However, people tell me that this really done this way because either they don’t know better or that assume that this is what everybody does so it must be correct - swarm intelligence stupidity.) Let’s have a look at what is recorded and used for analysis:
fundamental features:
- what?
- how many items?
- at what price?
- when (timestamp: date,time, day of the week)
other features:
- salesman ID
- customer ID
- materials the product is made out of (especially when it comes to indigence for food products)
- advertising campaign properties
- some more
These features have one thing in common when it comes to affinity analysis. Except for the timestamp, all features cover things that happen inside a SME. The timestamp however, represents seasonal, inter-weekly and inter-daily variability.
The broader picture
Dimensionality reduction? Well, not with me. I’m going to introduce a few more ;). First, I would consider affinity analysis a geospatial time-series problem. No matter what we products we analyze, customers originate from one place on this planet (let’s assume that there are no aliens yet ;)). Some of the things influenced by location are:
- transportation/logistics infrastructure
- in case of a physical shop location (e.g. everything with food: butchery, bakery, supermarket etc.): distance to the shop as a function from roads, rail, airports and commuter paths
temporary and/or seasonal features:
- in general: what happens around a certain location
- seasonal influx of tourists; if so, where are they from
- weather/seasons
- distance to construction sites (very important if you sell food)
- temporary road works/closures/routing (very important if you sell food; similar to construction sites)
- change in product quality (change of vendors, suppliers etc. throughout the year)
- many more
How can we use this additional data? Well, we can link our observations (correlations) to a cause and this cause is more likely to be found outside than inside a business. No matter what business we analyze, there are far more outside than inside factors influencing a business.