Sort yourself out

Categories: Journal, Machine Learning
Comments: Comments Off
Published on: August 18, 2018

Achievement: HAL 8999 – 1/100

  • Set up virtualenv for HAL8999
  • Installed sklearn, pandas, numpy, matplotlib
  • Unable to install tensorflow since I’m on python 3.7 and the pip installs only work for 3.6 and earlier. I can sort that out later.
  • Read up through Example 1-1 in Hands-On Machine Learning
    • Author is a little fast and loose with the example code and imports

I ended up burning close to an hour figuring out why my plot and model didn’t match the author’s even though we were using the same data. The issue turns out that I’d left out the following line when massaging / mangling the data:

It’s not immediately clear to me why presorting the values would make any difference but the unsorted dataframe included values well outside the range used in the author’s jupyter notebook. My guess is that by using the unsorted data I was applying the wrong GDP values to the wrong countries and so some outlier data made it into the model. Clearly my pandas-fu is weak. Po would be sad.

Also, Visual Studio Code is oddly pickier about the import of sklearn.linear_model and refused to initialize the model unless I specified the whole sklearn.linear_model.LinearRegression() where Jupyter was fine with linear_model.LinearRegression().

Comments are closed.

Welcome , today is Monday, August 3, 2020