Estimating Pinterest revenue.

The ascent of Pinterest has stirred a lot of attention. Recently, Alexis Madrigal from the Atlantic made a back of the envelope estimation of Pinterest revenue . This, in turn, was commented by Rags Srinivasan on Gigaom who proposed a more complex evaluation .

In both revenue estimations, revenue is decomposed as a function of the following key variables :

– N, the number of “active” users

– T, the number of transactions per “active” user per year.

– S, the sales per transaction

– C, the affiliate fee or commission.

Annual revenue is then the product of those terms. R = N x T x S x C.

In Madrigal’s calculation, this becomes :

R = (1e7 x 12 x 10 x 3.75%) = \$45 million.

Easy and simple.

In Rags’ case, it is a bit more complex, the Monte Carlo method is used to play several scenarios where the variables are randomly drawn from distributions. Actually this means that we are adding new assumptions to the game, namely the probability distribution of each of those variables.

I have to admit I was very enthusiastic when reading Rags, applying Monte Carlo seemed very clever. And then I started to play a bit with the numbers and assumptions (R scripts below).

Though Rags does not further describe the distributions he used, I obtain the same results from the simulation using normal distributions (a.k.a gaussian or bell curves).

My first surprise was to see – I didn’t look carefully enough Rags’ graph – negative values which does not make sense. They account for approx. 6% of the results of random draws, in particular N,T and S can be negative.

Actually below are the quantiles of revenue expressed in m\$ :

 5% 10% 20% 30% 40% 50% 60% 70% 80% 90% -0.13 0.68 2.10 3.48 4.91 6.44 8.33 10.6 13.8 19.2

Defining new probability distributions. Clearly, we need to work out sensible probability distributions for each of our random variables:

– N, this should always be positive, let’s assume this to be log-normal.

– T, the number of transactions per “active” user has been studied by Ehrenberg . As a first approximative step we assume that purchases made by each potential buyer are random over time and independent of each other. In other words a simple Poisson process.

– S, the sales value per transaction is assumed to be log-normal, the main reasoning behind this assumption is that sales values are positive.

– C, in this case, let’s consider them to be a uniform law.

Once this has been done, quantiles have dramatically changed.

 5% 10% 20% 30% 40% 50% 60% 70% 80% 90% 2.19 3.20 5.01 6.92 9.10 11.8 15.2 19.9 27.2 42.1

Basically Pinterest doubles its revenue with this model.

Rediscussing the assumptions. So far we have just adjusted a bit the distributions.  A natural next step would be to gather some market data to test these distributions and to set better bounds.

In particular, it would be interesting to check the number of Pinterest transactions per “active” user per year as well as the amount of sales per transaction. I would appreciate any comment or source for the former which seems high to me. The latter seems on the other hand too little to me. I would rather have set a much higher average. A Quora discussion indicates that average basket for CyberMonday was around \$200 , I assume mostly for Electronics. InternetRetailing reports average e-commerce spend to be at \$5k  in the UK. Given that more than 60% shop more than 3 times a month, the average could be somewhere \$100 to \$200. We would need to remove the travel category and work this a bit. I any case, moving the average to \$20 does not seem unlikely. This would significantly change the revenue estimations (it actually triples them). As a side note, we are assuming variables are independent, this could be also challenged…

Pinterest is having a very strong user base growth, a lot due to the hype and media around it, but nevertheless we should expect number of active users to grow a lot. The current model takes into account this growth with a very large estimate of “really” active users but with such growth Pinterest could well be at 40-50 million UV end of the year in the US with maybe 10-20 million UV for the rest of the world. In any case, it would be interesting to plug the growth of Pinterest into the model.

Conclusion. Irrespective of whether the method is a “back of the envelope” or more “statistically involved”, once again, I think the discussion on the assumptions and on the revenue model are the key questions. The Monte Carlo method is a good tool to play with once those have been clarified. Now aside from its revenues, the question is whether Pinterest will have the time to post any revenue number this year before they are acquired. For sure it could enable Google to populate  its social strategy.

References :

 Madrigal A., Why Pinterest Is Playing Dumb About Making Money, The Atlantic, http://www.theatlantic.com/technology/archive/2012/02/why-pinterest-is-playing-dumb-about-making-money/253273/,http://www.theatlantic.com (Feb 17, 2012).

 Srinivasan R., How much does Pinterest actually make?, GigaOM , http://gigaom.com/2012/03/30/how-much-does-pinterest-actually-make/, http://gigaom.com (Mar 30, 2012)

 Ehrenberg A.,”Repeat Buying”, Journal of Empirical Generalisations in Marketing Science, Vol 5, No.2, http://www.empgens.com/ArticlesHome/Volume5/RepeatBuying.html

 Scott F, A simple Monte Carlo simulation with R, http://blog.0x10.co.uk/2011/02/simple-monte-carlo-simulation-with-r.html

R Code

R code derived from  using new distributions.

 1 ##################### Variables ##################  2 # Firstly set this to TRUE if we want to save our plot as a   3 # PNG and if so, what file and dimensions  4 bDoPNG <- TRUE  5 sFilePdf <- “~/pinterest-hyp2-pdf.png”  6 sFileCum <- “~/pinterest-hyp2-cum.png”  7 iWidth <- 1024  8 iHeight <- 768  9 # The following values represent our 90% confidence interval  10 # (CI) ranges for the various inputs to our simulation. 11 # We are 90% confident that  12 vNumberTransactionsPerUserPerYear <- c(1,18) 13 vNumberActiveUsers <- c(1e6,10e6) 14 vSalesPerTransaction <- c(2,18) 15 vAffiliateCommission <- c(0.02,0.04) 16 # This is a quick cheat which basically means there are  17 # 3.29 standard deviations in a 90% confidence interval 18 iStdDevCheat <- 3.29 19 # This is the number of simulations we are going to run 20 iNumberOfSims <- 100000 21 ##################### Generate the basic data ################### 22 # A new data frame initiated to have iNumberOfSims rows in it 23 dData <- data.frame(seq(1,iNumberOfSims)) 24 # We use the rnorm function to generate a distribution across  25 # all the simulations 26 dData\$Transactions <- rpois(iNumberOfSims, mean(vNumberTransactionsPerUserPerYear)) 27 dData\$ActiveUsers <- mean(vNumberActiveUsers)*rlnorm(iNumberOfSims,,0.6) 28 dData\$SalesTransaction <- rlnorm(iNumberOfSims, ,0.7)*8 29 #force positive commissions. 30 dData\$Commission <- runif(iNumberOfSims,0.02,0.04) 31 # We can now create our total revenue column based on the  32 # inputs given. Because R is a vector language, the below  33 # operation is applied to each row automatically. 34 dData\$Revenue <- dData\$Transactions * 35                     dData\$ActiveUsers * 36                         dData\$SalesTransaction * 37                             dData\$Commission 38 # We now let R generate a histogram  39 # actually plotting the results. We will end up with a series of  40 # buckets (aka breaks) which will go on the X axis and the number  41 # of simulations that fell within each bucket (on the Y axis) 42 bDoPNG && is.null(png(sFilePdf, width=iWidth, height=iHeight)) 43 hHist <- hist(dData\$Revenue/1e6,breaks=10000,xlim=c(,100),plot=TRUE,main=“revenue frequency”) 44 bDoPNG && is.null(dev.off()) 45 # Cumulative frequency graph 46 hHist <- hist(dData\$Revenue/1e6,breaks=250,xlim=c(,100),plot=FALSE) 47 bDoPNG && is.null(png(sFileCum, width=iWidth, height=iHeight)) 48 #fMedian <- median(dData\$Revenue) 49 dHistData <- data.frame(breaks=hHist\$breaks[1:length(hHist\$breaks)-1], count=hHist\$counts) 50 hHist\$counts <- cumsum(hHist\$counts) 51 hHist\$density<- cumsum(hHist\$density) 52 hHist\$intensities<-hHist\$density 53 plot(hHist, freq=TRUE, xlim=c(,100),main=“(Cumulative) histogram of revenue frequency”, axes=F) 54 # Now draw the X axis explicitly setting values of the ticks/breaks 55 axis(1, at=dHistData\$breaks) 56 # Draw the Y axis using default parameters 57 axis(2) 58 box() 59 # That’s it, turn off output if saving PNG 60 bDoPNG && is.null(dev.off()) 61 # Show quantile 62 quantile(dData\$Revenue,probs = c(0.5,0.6,0.7,0.75,0.8,0.9,0.95,0.99))