DEV Community

Charlotte Towell
Charlotte Towell

Posted on

MySQL to BigQuery Migration (Syntax & Other Conversions)

I recently completed the final stage of a data migration project for the start-up I work at by rewriting a bunch of MySQL queries into BigQuery queries.

With all the different flavours of SQL (T-SQL, MySQL, Google SQL...) and the move from on-premise to cloud (or cloud to other cloud), I feel as if data migrations and the subsequent reworking of all its dependencies are extremely common right now. This is the 2nd migration project I have been involved in since starting my data career and perhaps the most finicky thing is the slight syntax differences between each version of SQL.

In doing this MySQL to BigQuery mass rewrite, I put together a list of all of the translations I encountered and little tricks for working with BigQuery.

I plan to write another post on using clustering and partioning with BigQuery, coming from a relational database design into BigQuery's preferred denormalised structure.

Let's get into it.

Functions

MySQL BigQuery
NOW CURRENT_DATETIME
CONCAT_WS STRING_AGG or ARRAY_TO_STRING
STR_TO_DATE('31/01/24','%d,%m,%y') PARSE_DATE(’%d,%m,%y’ , '31/01/24’)
GROUP_CONCAT ARRAY_AGG
LOCATE(substring, string) INSTR(string, substring)
MD5() TO_HEX(MD5())
SUBSTRING_INDEX(substring, delim, index) SPLIT(substring, delim)[SAFE_OFFSET(index-1)]*

* indexing in MySQL is 1-based, BigQuery is 0-based

Key Words

MySQL BigQuery
UNION UNION ALL
SET @variable = value; DECLARE variable {TYPE} DEFAULT (value)
Something AS 'column' Something AS column (no quotes)

Data Types

MySQL BigQuery
SIGNED INT64
BOOL INT64
DECIMAL NUMERIC

Other Tips & Tricks

  • In MySQL ORDER BY NULL which is used for performance improvement with use of GROUP BY has no BigQuery equivalent
  • MySQL allows subqueries in join predicates unlike BigQuery. Instead you should define the subquery as the joined table instead.
-- MYSQL:
JOIN database.joined_table b
    ON (
        a.column1 = b.column1
        AND a.column2 = (
            SELECT
                c.column2
            FROM
                database.inner_joined_table c
            WHERE
                c.column3 = b.column3
            )
    )
Enter fullscreen mode Exit fullscreen mode
-- BigQuery:
JOIN (
    SELECT
        b.column1, c.column3
    FROM
        database.joined_table b
    INNER JOIN
        database.inner_joined_table c
    ON
        c.column3 = b.column3
) b
ON
    a.column1 = b.column1 AND b.column3 = a.column2
Enter fullscreen mode Exit fullscreen mode
  • GROUP BY in MySQL does not need to specify all non-aggregate columns and instead selects a random value. In Bigquery you must either add non-aggregate columns to the GROUP BY clause or aggregate them.
  • Using JSON_VALUE with an iterator in MySQL does not work in BigQuery. Instead you must use CTEs and the UNNEST function to achieve the same behaviour.
-- BigQuery:
WITH json_row_data AS (
    SELECT 
        id
        ,JSON_EXTRACT_ARRAY(json_data, '$') AS r
    FROM
        database.json_data
),
unnested_rows AS (
    SELECT
        id
        ,unnested_row AS json_data
    FROM json_row_data, UNNEST(r) unnested_row
)
SELECT
    id,
    JSON_VALUE(ajd.json_data, '$.myColumn') AS `myColumn`
FROM unnested_rows ajd
GROUP BY id
Enter fullscreen mode Exit fullscreen mode
  • In general, using BigQuery window functions is preferred over recreating the login in a custom GROUP BY as the window functions are better optimised.
  • The above point also applies for cases when tables are joined to themselves to access previous rows. Look into if using LEAD and ROW_NUMBER could satisfy your requirements instead. This also results in more cost-efficient queries as BigQuery costs are directly related to how much data is read from each table (no point to access the same table twice if unneeded!)

This is definitely not a comphrensive list of all translations between the two different syntaxes but is a bunch of the things I came across in my migration task so hopefully it may help someone else who is similarly going through a process of rewriting queries into BigQuery!

Top comments (0)