DBT MySQL producing boolean columns as INT, should be TINYINT

Issue

I’m trying to change how we make some transformations in our tables on RDS MySql. This table have 20 million records and 200 columns. We have a pipeline executed monthly where we download the table to an EC2, use python to do the transformation, then it is reuploaded.

Upon presenting dbt, boss wants to see it working because of the benefits: everything will stay on SQL (I am the only python person in our small 20 people company), will have documentation, automated tests and version control [all this is really needed at the moment]. I made it happen, wrote SQL on dbt that produces the same results of the python script and runs directly on the mysql database using this https://pypi.org/project/dbt-mysql/ adapter.

There are some problems and the one of them i think will start helping me most is about the boolean in mysql. I already know all that thing about boolean, tinyint(1), etc, etc. But all columns intended to be "boolean" are going to the tables as INT, and I want them as tinyint, because it is taking 4 times the space it should.

Edit: added more information thanks to feedback

My raw table comes with all columns as str, i’m trying to cast the correct types. As this one should be boolean, i expected it to be converted to tinyint(1). When I create a table via pandas and there is a bool column, the table column is tinyint(1). But when I try to do something like this in SQL, the column becomes int.

The code is really just that:

SELECT IF(myStrColumn = '1', TRUE, FALSE)
FROM myRawTable

The resulting column is given as int, but i wanted it to be tinyint(1) to represent boolean.

tinyint is not a valid type to be passed to cast as per documentation https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast so it doesn’t work

Solution

After looking at the MySQL docs, I think you have two options:

  1. Create a new, custom table materialization that allows you to leverage the MySQL syntax:
create table my_table (my_col tinyint) as select ...
  1. Add a post-hook that narrows the column after you’ve created the table:
config(
    materialized="table",
    post_hook="alter table {{ this }} modify my_col tinyint"
)

For #1, there is a guide to creating materializations in the dbt docs, but it is a complex and advanced topic. I think the dbt-mysql adapter uses the vanilla/default table materialization in the global project. You may want to check out the MySQL incremental materialization macro, which is here.

Answered By – tconbeer

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published