In the last lesson we guessed at a trendline for some data points. Here, let's rigorously compute the best possible trendline.
It turns out that it's mathematically possible to solve for the slope $m$ and y-intercept $b$, that forms the best possible trendline for some data points. (How this actually works is beyond our scope here, but in short, if one minimizes the vertical distance between each data point and the trendline, the following equations will result.)
To start, you'll need a parameter called $c$ which is
$$c=N\Sigma x_i^2 - (\Sigma x_i)^2.$$
Here $N$ is the number of data points you have (6 here) and $x_i$ is the $i^{th}$ data point. So, $\Sigma x_i^2$ means to take each x-axis data point one by one, square it, then add all of the squared data points together. Similarly, $(\Sigma x_i)^2$ means to add all x-axis data points together, then square the whole sum.
Next, we can compute the slope for the best trendline from this equation:
$$m=\frac{N\Sigma x_i y_i-\Sigma x_i \Sigma y_i}{c}.$$
The y-intercept comes from:
$$b=\frac{\Sigma x_i^2 \Sigma y_i - \Sigma x_i \Sigma x_i y_i}{c}.$$
Now you try. Fill in the ?? spots to allow for computing a slope and y-intercept to the best possible trendline through the data points.
Type your code here:
See your results here:
We got m=.224 and b=37.58 for our final slope and y-intercept.
Here's another data set you can try this one. The x-data is the height of women between 30 and 39 years of age (in meters). The y-data is their weight (in kg*10).
x={14.7, 15.0, 15.2, 15.5, 15.7, 16.0, 16.3, 16.5, 16.8, 17.0, 17.3, 17.5, 17.8, 18.0, 18.3}
y={52.21, 53.12, 54.48, 55.84, 57.20, 58.57, 59.93, 61.29, 63.11, 64.47, 66.28, 68.10, 69.92, 72.19, 74.46}
Share your code
Show a friend, family member, or teacher what you've done!