Transformers, more than meets the eye

Categories: Journal, Machine Learning
Tags:
Comments: Comments Off
Published on: August 26, 2018

HAL8999 6/100

  • Watched the “Learn how to Learn” Google talk on youtube
  • Updated the jupyter notebooks for handson-ml from github and read through the Ch2 notebook to address the CategoricalEncoder issue from yesterday
  • Looked at a basic transformer

Part of building a data pipeline is likely to include the creation of custom transformer classes to perform operations specific to the project or data source. For example, one of the products I work on stores xml data in a database with the newlines encoded as ‘\n’. When the data is pulled from the database those ‘\n’ sequences are converted to newline characters before the xml is passed to the parser. It’s a very simple operation but without it the data would fail xml validation.

The scikit-learn package provides a structure for building transformers for a data pipeline that is based on duck typing i.e. “looks like a duck, walks like a duck, etc” rather then through object inheritance. Essentially, if your class has fit(X) and transform(X) methods, it counts as a transformer.

Comments are closed.

Welcome , today is Saturday, November 17, 2018