Logrithmic Transformations

Examining Predictors of Wine Price

CSV

Codebook

Investigating the Relationship

The literature has suggested that the price of wine is predictive of a wine’s rating which attempts to quantify a wine’s quality on a 100pt scale (Snipes & Taylor, 2014). Therefore, the relationship between wine rating and price was evaluated. The following scatter plot demonstrates this relationship for 200 wines made in seven geographic regions.

Figure 1: Scatterplot of the relationship between 200 wine prices and their ratings.

This figure shows a non-linear relationship between wine rating and price. In order to re-express the data to meet the linearity assumption of regression analyses, either the x (rating) or y-variable (price) must be transformed.

Creating Models

In accordance with the Rule of the Bulge, we elected to apply a downward power transformation to the y-variable by log-transforming wine price by the natural log. The natural log of wine price was then regressed on wine rating. The equation for this model is shown below:

\[ \begin{split} \mathrm{Model~1}: \hat{\mathrm{ln(Price_i)}} &= -19.06 + 0.25(\mathrm{Rating}_i) \end{split} \]

Two additional models were made to examine other possible predictors of price: one that only included the effects of whether or not a wine is made in California and another that included both wine rating and whether or not a wine is made in California:

\[ \begin{split} \mathrm{Model~2}: \hat{\mathrm{ln(Price_i)}} &= 3.46 + 0.015(\mathrm{California}_i) \\[1em] \mathrm{Model~3}: \hat{\mathrm{ln(Price_i)}} &= -19.48 + 0.25(\mathrm{Rating}_i) + 0.16(\mathrm{California}_i) \end{split} \]

Comparing Models

Using AICc and \({R}^2\) values to compare models, we determined that Model 3 was the most appropriate model for the data. The following table shows this comparison.


Table 1. Unstandardized Coefficients & Confidence Intervals for a Series of OLS Regresison Models Fitted to Estimate Variation in Price of Wines
=================================================================================
                                 Model 1           Model 2          Model 3      
Rating                            0.245                              0.249       
                              (0.216, 0.273)                     (0.220, 0.277)  
                                                                                 
Region = California (dummy)                         0.015            0.164       
                                               (-0.207, 0.236)   (0.022, 0.305)  
                                                                                 
Intercept                        -19.059            3.456           -19.478      
                            (-21.688, -16.429) (3.326, 3.586)  (-22.106, -16.851)
                                                                                 
---------------------------------------------------------------------------------
AICc                              284.73           461.84            281.7       
R2                                0.588            0.0001            0.598       
Residual Std. Error               0.488             0.760            0.483       
=================================================================================
                                                                                 

Interpretting Model 3

In order to make a coherent, useful interpretation of Model 3, it is necessary to back-transform the model by exponentiating our equation by base-e. Rating = 1 and California = 1 were substituted into the equation to make sense of a one-unit change in each, and the equation was then simplified. The steps of this process are shown below.

\[ \begin{split} \hat{\mathrm{ln(Price_i)}} &= -19.48 + 0.25(\mathrm{Rating}_i) + 0.16(\mathrm{California}_i) \\[1em] \hat{\mathrm{Price_i}} &= e^{-19.48} \times e^{0.25(\mathrm{Rating}_i)} \times e^{0.16(\mathrm{California}_i)} \\[1em] \hat{\mathrm{Price_i}} &= e^{-19.48} \times e^{0.25(1)} \times e^{0.16(1)} \\[1em] \hat{\mathrm{Price_i}} &= (3.47 \times 10^{-9}) \times 1.28 \times 1.17 \end{split} \]

Drawing from this simplified model, we can conclude that on average, each one unit change in wine rating (i.e., 86 to 87) is associated with a 1.28-fold increase in the price of a wine. We can also conclude that being made in California (i.e., California = 1) is associated with a 1.17-fold increase in the price of a wine. Finally, the differential price between wines from California and those made in a different region to increase as rating increases due to interactions between rating and being made in California. The figure below demonstrates these findings.

Figure 2: Best fits for wines from CA and wines not from CA, according to Model 3.