A little more work on that map generator while dinner was cooking and now I have a hideous Voronoi tessellation of what was intended to be tectonic plates. I’m clearly going to have to rethink how I’m selecting the plate “centers” because I’m not getting the effect I want. Next up, Lloyd’s relaxation.
Doing It Wrong
NOT THE BEEEES!!!
So after a couple weeks of futzing around with voronoi tessellations, locating implementations, rewriting chunks of those implementations to fit my needs, not being happy with the results, ripping it all out and starting over etc I decided I’d stick with what I Know and use a radial hex map.
One cup of coffee, about an hour of coding (about half of which was sorting out an error in mapping Hex coordinates to Vector3s) and viola, one randomly colored hex grid (green tinted because, omg, random color selection is an aesthetic nightmare).
The mesh isn’t optimal but I’ll deal with that if it becomes a problem later.
Next up, plate tectonics and heightmaps.
It moved
It’s been a long week so I only have the basic WASD controller set up with a proper state machine. Now I can run around the featureless checkerboard plain with a few amenities like
- walking around a featureless checkerboard plain
- autorunning around a featureless checkerboard plain
- autowalking around a featureless checkerboard plain
I’ll add jumping on a featureless checkerboard plain a little later once I decide how to pass the collision detection to the state machine as well as a few other movement states:
- free fall
- sliding down slopes
No point in either of those yet since there’s no falling off or sliding down this which is good since the “player” has no rigidbody and only translates along the (x,z) plane. Baby steps.
Since running around a featureless plain isn’t particularly fun, it’s time to add some features to the environment. I could hand edit the Unity Terrain assets but since the goal is to go full Dwarf Fortress on the world generation (well maybe half-assed Dwarf Fortress) we’ll be starting with plate tectonics and building the terrain up starting with geologic time.
Back in the Saddle Again
I spend a lot of time tinkering with various software projects, going at problems the wrong way, and generally making stupid decisions that a professional or more formally trained person would cringe to look at. Herein begins my cautionary tale as I learn painful lessons the hard way so you, dear reader, don’t have to. This is not a self promotion journal intended to demonstrate success or competency but rather the dev blog version of Ow, My Balls!
What’s Up First?
I’ve been fooling around with Unity3D for a while now so what’s the first project any burgeoning game developer should start with? That’s right, time to build my own MMORPG! I mean, a few orcs, a little hack and slash, how hard could it be?
In honor of my entrance to role playing games and as an homage to the first RPG sandbox I’ve ever played in I’ve named the project Hommlet and set up source control for it because I’m not a complete savage. I don’t really feel like mucking about with the network code or data storage model so in the spirit of Doing It Wrong, I’ll be starting with a basic character controller and will move forward from there based on the immortal words of Alex DeLarge
Thinking was for the gloopy ones and that the oomny ones use like inspiration and what Bog sends
So tonight I’ll raise a glass to old Bog and see what inspiration comes.
Precepts… Five, no eight, better make it sixteen
Coming away from the first class on the Bodhisattva Precepts at Buddha Eye Temple I have the following to roll around in my head for a while.
The Bodhisattva Precepts used in the Soto Zen tradition encompass and expand on the ethics / sila encompassed in the Noble Eightfold Path. The sixteen Bodhisattva Precepts include multiple precepts in three groups following the tradition of Buddhism as a list of lists. For the householder rather than the monastic, sila i.e. morality or virtue should be based on an inquiry into what advances oneself and others towards realization rather than simply an adherence to a set of rules. (more…)
Call me Sisyphus
I don’t know why I continue to use Windows. Every single god damned thing feels like pushing a giant rock up a hill. I’ll admit that the fact that I’ve been doing most of my work on Linux for the past twenty years inclines me to use the linux idiom for getting things done and I’m likely just not thinking like a Windows user but my god does MSFT make doing things other than what they want to you to do in the way they want you to do it a massive pain in the ass.
I started using computers with nothing but a command prompt and while I appreciate some of the amenities of modern computing, I’d still like a marginally functional command prompt with a reasonable set of tools. I swear it takes me ten times as long to get basic work done in Windows than it does in Linux.
I think the only thing that is keeping me from just installing Ubuntu on my desktop is my recollection of how clunky desktop linux was a decade ago and my continuing vice of playing MMOs.
Also, I’m back from training people in what can only be described as a suburban Hellscape. How do you politely tell someone that you’d rather put a gun in your mouth than live in the same community they’ve chosen to make their home?
Cheating Irises
HAL8999 7/100
I was sick yesterday but did spend some time looking over some “cheat sheets” that people had put together for various machine learning topics. Some were good, some were just stupid (I’m looking at you Machine Learning in Emoji). Also went through a very simply classifier based on the iris data set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
from sklearn import neighbors, datasets, preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # load the data into training and test sets iris = datasets.load_iris() X, y = iris.data[:, :2], iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=13) # scale the data scaler = preprocessing.StandardScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # train a k-nearest neighbors model knn = neighbors.KNeighborsClassifier(n_neighbors = 5) knn.fit(X_train, y_train) # Test the model y_pred = knn.predict(X_test) print(accuracy_score(y_test, y_pred)) |
1 |
0.8157894736842105 |
![]() Model Selection |
|
Transformers, more than meets the eye
HAL8999 6/100
- Watched the “Learn how to Learn” Google talk on youtube
- Updated the jupyter notebooks for handson-ml from github and read through the Ch2 notebook to address the CategoricalEncoder issue from yesterday
- Looked at a basic transformer
Part of building a data pipeline is likely to include the creation of custom transformer classes to perform operations specific to the project or data source. For example, one of the products I work on stores xml data in a database with the newlines encoded as ‘\n’. When the data is pulled from the database those ‘\n’ sequences are converted to newline characters before the xml is passed to the parser. It’s a very simple operation but without it the data would fail xml validation.
The scikit-learn package provides a structure for building transformers for a data pipeline that is based on duck typing i.e. “looks like a duck, walks like a duck, etc” rather then through object inheritance. Essentially, if your class has fit(X) and transform(X) methods, it counts as a transformer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
from sklearn.base import BaseEstimator, TransformerMixin rooms_idx, bedrooms_idx, population_idx, household_idx = 3, 4, 5, 6 class CombinedAttributesAdder(BaseEstimator, TransformerMixin): """Adds combined attributes rooms_per_household, population_per_household, and bedrooms_per_room""" def __init__(self, add_bedrooms_per_room=True): self.add_bedrooms_per_room = add_bedrooms_per_room def fit(self, X, y=None): """Fit is required even though it does nothing Parameters ---------- X : array-like, shape [n_samples, n_feature] Returns ------- self """ return self def transform(self, X, y=None): """Transform X adding additional attributes. Parameters ---------- X : array-like, shape [n_samples, n_features] The data to encode. Returns ------- np.c_ : concatenation of X and additional attribute columns """ rooms_per_household = X[:, rooms_idx] / X[:, household_idx] population_per_household = X[:, population_idx] / X[:, household_idx] if self.add_bedrooms_per_room: bedrooms_per_room = X[:, bedrooms_idx] / X[:, rooms_idx] return np.c_[X, rooms_per_household, population_per_household, bedrooms_per_household] else: return np.c_[X, rooms_per_household, population_per_household] attr_adder = CombinedAttributesAdder() housing_xtra_attrs = attr_adder.transform(X) |
from handson-ml import BrokeAsFuck
HAL8999 – 5/100
Today while going back through the Hands On Machine Learning book Ch2 I learned that the CategoricalEncoder referenced in the section on handling categorical attributes still isn’t in scikit-learn. I checked the reqirements.txt which shows scikit-learn=0.19.1. Checking my virtualenv, I should be good.
1 2 3 4 |
$ grep scikit-learn handson-ml/requirements.txt scikit-learn==0.19.1 $ pip freeze |grep scikit-learn scikit-learn==0.19.2 |
Turns out that the CategoricalEncoder isn’t going to be in scikit-learn until 0.20 so to get it you have to grab 0.20 from Github rather than just use pip.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
$ pip install Cython ... $ pip install https://github.com/scikit-learn/scikit-learn/archive/master.zip … Successfully built scikit-learn Installing collected packages: scikit-learn Successfully installed scikit-learn-0.20.dev0 $ python >>> from sklearn.preprocessing import CategoricalEncoder >>> encoder = CategoricalEncoder() Traceback (most recent call last): File "", line 1, in File "D:\Projects\HAL8999\lib\site-packages\sklearn\preprocessing\data.py", line 2917, in init "CategoricalEncoder briefly existed in 0.20dev. Its functionality " RuntimeError: CategoricalEncoder briefly existed in 0.20dev. Its functionality has been rolled into the OneHotEncoder and OrdinalEncoder. This stub will be removed in version 0.21. |
Fucking hell…
So, if you’re going to write a book, it’s probably a good idea to use the stable branch of your libraries rather than the bleeding edge dev branch.
It will be a good exercise to convert the book’s example code to work with the standard OneHotEncoder but I’ve always been a fan of “just works” as a design principle.
Long days, no blog post
HAL8999 – [3,4]/100
- Chapter 2 of Hands on ML continues
- Creation of test sets
- Stratified sampling
- sklearn’s StratifiedShuffleSplit
- Visualizing data with matplotlib
- Coorelation coefficients
Getting a good train-test split
Since you can’t train a model and just expect it to work well right out of the box it’s standard practice to split off about 20% of the data set to test the model against. The naieve way to do this is to just grab 20% of the data at random but that runs into a number of issues:
- depending on how you do it, you may grab different train/test sets every time the model runs
- grabbing data points at random can result in sampling bias to creep in if you happen to get an unrepresentative sample
Solution?
Stratified sampling
Rather than just grabbing data points at random we can ensure that we can get a more representative distribution of sampled data points for some attributes (sex, income, ethnic background, etc) to ensure that random selection hasn’t introduced bias into the training and test sets.
In this example we can be pretty certain that median income correlates strongly with median housing price and we want to be certain we get a representative distribution of districts with respect to median income. The way to do this is to add a column to the data set that groups median income into categories and we can then sample based on the category. This will improve our chance of getting a more representative sampling of the underlying median income attribute.
Once the test set has been selected out we work entirely with the training set so as to not introduce bias based on knowledge of the test data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
from sklearn.model_selection import StratifiedShuffleSplit # add a column for the median_income stratafied sample housing["income_cat"] = np.ceil(housing["median_income"] / 1.5) housing["income_cat"].where(housing["income_cat"] < 5, 5, inplace=True) housing["income_cat"].hist() plt.title("Full Data Set") plt.show() split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=1337) for train_index, test_index in split.split(housing, housing["income_cat"]): stratified_train = housing.loc[train_index] stratified_test = housing.loc[test_index] plt.title("Training and Test Sets") plt.hist([stratified_test["income_cat"], stratified_train["income_cat"]], stacked=True) plt.show() stratified_train.drop("income_cat", axis=1, inplace=True) stratified_test.drop("income_cat", axis=1, inplace=True) housing = stratified_train.copy() |

