April 15, 2012

Estimating Pinterest revenue.

pinterest

The ascent of Pinterest has stirred a lot of attention. Recently, Alexis Madrigal from the Atlantic made a back of the envelope estimation of Pinterest revenue [1]. This, in turn, was commented by Rags Srinivasan on Gigaom who proposed a more complex evaluation [2].

In both revenue estimations, revenue is decomposed as a function of the following key variables :

– N, the number of “active” users

– T, the number of transactions per “active” user per year.

– S, the sales per transaction

– C, the affiliate fee or commission.

Annual revenue is then the product of those terms. R = N x T x S x C.

In Madrigal’s calculation, this becomes :

R = (1e7 x 12 x 10 x 3.75%) = $45 million.

Easy and simple.

In Rags’ case, it is a bit more complex, the Monte Carlo method is used to play several scenarios where the variables are randomly drawn from distributions. Actually this means that we are adding new assumptions to the game, namely the probability distribution of each of those variables.

I have to admit I was very enthusiastic when reading Rags, applying Monte Carlo seemed very clever. And then I started to play a bit with the numbers and assumptions (R scripts below).

Though Rags does not further describe the distributions he used, I obtain the same results from the simulation using normal distributions (a.k.a gaussian or bell curves).

My first surprise was to see – I didn’t look carefully enough Rags’ graph – negative values which does not make sense. They account for approx. 6% of the results of random draws, in particular N,T and S can be negative.

Actually below are the quantiles of revenue expressed in m$ :

5% 10% 20% 30% 40% 50% 60% 70% 80% 90%
-0.13 0.68 2.10 3.48 4.91 6.44 8.33 10.6 13.8 19.2

 

Defining new probability distributions. Clearly, we need to work out sensible probability distributions for each of our random variables:

– N, this should always be positive, let’s assume this to be log-normal.

– T, the number of transactions per “active” user has been studied by Ehrenberg [3]. As a first approximative step we assume that purchases made by each potential buyer are random over time and independent of each other. In other words a simple Poisson process.

– S, the sales value per transaction is assumed to be log-normal, the main reasoning behind this assumption is that sales values are positive.

– C, in this case, let’s consider them to be a uniform law.

Once this has been done, quantiles have dramatically changed.

5% 10% 20% 30% 40% 50% 60% 70% 80% 90%
2.19 3.20 5.01 6.92 9.10 11.8 15.2 19.9 27.2 42.1

Basically Pinterest doubles its revenue with this model.

Distribution of revenue in m$ with new distributions

Cumulative density of revenue in m$ with new distributions.

Rediscussing the assumptions. So far we have just adjusted a bit the distributions.  A natural next step would be to gather some market data to test these distributions and to set better bounds.

In particular, it would be interesting to check the number of Pinterest transactions per “active” user per year as well as the amount of sales per transaction. I would appreciate any comment or source for the former which seems high to me. The latter seems on the other hand too little to me. I would rather have set a much higher average. A Quora discussion indicates that average basket for CyberMonday was around $200 [4], I assume mostly for Electronics. InternetRetailing reports average e-commerce spend to be at $5k [5] in the UK. Given that more than 60% shop more than 3 times a month, the average could be somewhere $100 to $200. We would need to remove the travel category and work this a bit. I any case, moving the average to $20 does not seem unlikely. This would significantly change the revenue estimations (it actually triples them). As a side note, we are assuming variables are independent, this could be also challenged…

Pinterest is having a very strong user base growth, a lot due to the hype and media around it, but nevertheless we should expect number of active users to grow a lot. The current model takes into account this growth with a very large estimate of “really” active users but with such growth Pinterest could well be at 40-50 million UV end of the year in the US with maybe 10-20 million UV for the rest of the world. In any case, it would be interesting to plug the growth of Pinterest into the model.

Conclusion. Irrespective of whether the method is a “back of the envelope” or more “statistically involved”, once again, I think the discussion on the assumptions and on the revenue model are the key questions. The Monte Carlo method is a good tool to play with once those have been clarified. Now aside from its revenues, the question is whether Pinterest will have the time to post any revenue number this year before they are acquired. For sure it could enable Google to populate  its social strategy.

References :

[1] Madrigal A., Why Pinterest Is Playing Dumb About Making Money, The Atlantic, http://www.theatlantic.com/technology/archive/2012/02/why-pinterest-is-playing-dumb-about-making-money/253273/,http://www.theatlantic.com (Feb 17, 2012).

[2] Srinivasan R., How much does Pinterest actually make?, GigaOM , http://gigaom.com/2012/03/30/how-much-does-pinterest-actually-make/, http://gigaom.com (Mar 30, 2012)

[3] Ehrenberg A.,”Repeat Buying”, Journal of Empirical Generalisations in Marketing Science, Vol 5, No.2, http://www.empgens.com/ArticlesHome/Volume5/RepeatBuying.html

[4] http://www.quora.com/Massimo-Arrigoni/Posts/How-many-visits-turn-into-an-order-Coremetrics-tells-you

[5] http://www.bizreport.com/2012/03/uk-online-shoppers-spent-average-5293-online-in-2011.html#

[6] Scott F, A simple Monte Carlo simulation with R, http://blog.0x10.co.uk/2011/02/simple-monte-carlo-simulation-with-r.html

 

R Code

R code derived from [6] using new distributions.

 

 1 ##################### Variables ##################
 2 # Firstly set this to TRUE if we want to save our plot as a 
 3 # PNG and if so, what file and dimensions
 4 bDoPNG <- TRUE
 5 sFilePdf <- “~/pinterest-hyp2-pdf.png”
 6 sFileCum <- “~/pinterest-hyp2-cum.png”
 7 iWidth <- 1024
 8 iHeight <- 768
 9 # The following values represent our 90% confidence interval 
10 # (CI) ranges for the various inputs to our simulation.
11 # We are 90% confident that 
12 vNumberTransactionsPerUserPerYear <- c(1,18)
13 vNumberActiveUsers <- c(1e6,10e6)
14 vSalesPerTransaction <- c(2,18)
15 vAffiliateCommission <- c(0.02,0.04)
16 # This is a quick cheat which basically means there are 
17 # 3.29 standard deviations in a 90% confidence interval
18 iStdDevCheat <- 3.29
19 # This is the number of simulations we are going to run
20 iNumberOfSims <- 100000
21 ##################### Generate the basic data ###################
22 # A new data frame initiated to have iNumberOfSims rows in it
23 dData <- data.frame(seq(1,iNumberOfSims))
24 # We use the rnorm function to generate a distribution across 
25 # all the simulations
26 dData$Transactions <- rpois(iNumberOfSims, mean(vNumberTransactionsPerUserPerYear))
27 dData$ActiveUsers <- mean(vNumberActiveUsers)*rlnorm(iNumberOfSims,,0.6)
28 dData$SalesTransaction <- rlnorm(iNumberOfSims, ,0.7)*8
29 #force positive commissions.
30 dData$Commission <- runif(iNumberOfSims,0.02,0.04)
31 # We can now create our total revenue column based on the 
32 # inputs given. Because R is a vector language, the below 
33 # operation is applied to each row automatically.
34 dData$Revenue <- dData$Transactions *
35                     dData$ActiveUsers *
36                         dData$SalesTransaction *
37                             dData$Commission
38 # We now let R generate a histogram 
39 # actually plotting the results. We will end up with a series of 
40 # buckets (aka breaks) which will go on the X axis and the number 
41 # of simulations that fell within each bucket (on the Y axis)
42 bDoPNG && is.null(png(sFilePdf, width=iWidth, height=iHeight))
43 hHist <- hist(dData$Revenue/1e6,breaks=10000,xlim=c(,100),plot=TRUE,main=“revenue frequency”)
44 bDoPNG && is.null(dev.off())
45 # Cumulative frequency graph
46 hHist <- hist(dData$Revenue/1e6,breaks=250,xlim=c(,100),plot=FALSE)
47 bDoPNG && is.null(png(sFileCum, width=iWidth, height=iHeight))
48 #fMedian <- median(dData$Revenue)
49 dHistData <- data.frame(breaks=hHist$breaks[1:length(hHist$breaks)-1], count=hHist$counts)
50 hHist$counts <- cumsum(hHist$counts)
51 hHist$density<- cumsum(hHist$density)
52 hHist$intensities<-hHist$density
53 plot(hHist, freq=TRUE, xlim=c(,100),main=“(Cumulative) histogram of revenue frequency”, axes=F)
54 # Now draw the X axis explicitly setting values of the ticks/breaks
55 axis(1, at=dHistData$breaks)
56 # Draw the Y axis using default parameters
57 axis(2)
58 box()
59 # That’s it, turn off output if saving PNG
60 bDoPNG && is.null(dev.off())
61 # Show quantile
62 quantile(dData$Revenue,probs = c(0.5,0.6,0.7,0.75,0.8,0.9,0.95,0.99))