Throwback Thursday had gotten boring. So from now on, Both Ollie and I will write a post with the same title. You decide who wrote it better. Today’s title is
(Ollie is the hands-down winner this week, as i had to ask him for the technical details part of my oost).
‘There’s PII in My Data but There’s Data in My PII
Often when i work with clients, I find they are accidentally (or some intentionally) capturing data in their Google Analytics. The most common data points i see are email addresses and birthday (occasionally I even see names and addresses).
There are two primary ways the data is getting into Google Analytics. The first is accidentally. When a form uses the GET method. When a form is submitted using the GET method, the form data is appended to the URL as a query string. The query string contains the name/value pairs of the form data, which gets captured by Google Analytics as the Page Page + Query String, Page Location and Referrer.
Some clients, especially with Google Universal Analytics create Custom Dimensions to store name or email. Most have done this not knowing that it is a violation of (at minimum) Google Analytics Terms and Conditions.
But, you say, I need that data in my analytics for retargeting and building audiences. You can accomplish this using anonymized data. Also, most forms that capture this type of data also write this data to a database, where it is First Party Data (your data – not Third Party data – Google’s data).
This is even a better option if you are using GA4, since you get a free BigQuery account with your GA4 account. You can link your GA4 account to BigQuery (full instructions below, thanks, Ollie!) and write the form data to a different table in BigQuery. Then, as long as you have a common key, you can join the data in BigQuery. Which leads me to my favorite data science joke…A BigQuery query walks into a bar, walks up to two tables, and asks, “Can I join you?”
Full instructions (thanks, Ollie!)
Writing Form Data to BigQuery
To write form data to a BigQuery table, you can use the BigQuery Python client library. Here’s some sample code that demonstrates how to write form data to a BigQuery table: python from google.cloud import bigquery # Set up a client object to interact with BigQuery client = bigquery.Client() # Get the data from the form form_data = { 'name': 'John Smith', 'email': 'johnsmith@example.com', 'phone': '555-555-5555' } # Set up the BigQuery table and schema table_id = "my-project.my_dataset.my_table" schema = [ bigquery.SchemaField("name", "STRING"), bigquery.SchemaField("email", "STRING"), bigquery.SchemaField("phone", "STRING") ] table = bigquery.Table(table_id, schema=schema) # Insert the form data into the BigQuery table rows_to_insert = [(form_data['name'], form_data['email'], form_data['phone'])] result = client.insert_rows(table, rows_to_insert) print("Inserted {} rows into BigQuery table {}".format(len(rows_to_insert), table_id))
In this example, the form_data
variable represents the data collected from the form, and the table_id
variable represents the ID of the BigQuery table you want to insert the data into. The schema
variable defines the structure of the table, and the rows_to_insert
variable contains the data to be inserted. Note that you’ll need to set up authentication for the client object to interact with BigQuery. You can find more information on how to do that in the BigQuery documentation: https://cloud.google.com/bigquery/docs/authentication/
Linking your GA4 data to BigQuery
You can link your GA4 account to BigQuery by following these steps: 1. Select your GA4 property in your Google Analytics account. 2. Click on the Admin icon in the bottom left-hand corner of the page. 3. Under the “Property” column on the right-hand side, click on “Data Streams.” 4. Select the data stream that you want to link to BigQuery. 5. Toggle the switch for “BigQuery Link” to ON. 6. Click on the “Link to BigQuery” button. 7. Choose the BigQuery project and dataset that you want to use. 8. Click on the “Enable” button. Once you have completed these steps, your GA4 data stream will be linked to the selected BigQuery project and dataset. You can then use BigQuery to analyze your GA4 data and run custom queries.
Joining Two Tables in BigQuery
To join two tables in BigQuery, you can use the SQL `JOIN` clause. Here’s an example query that demonstrates how to join two tables: “`sql SELECT table1.column1, table1.column2, table2.column1 FROM table1 JOIN table2 ON table1.column1 = table2.column1 “` The above example joins `table1` and `table2` on the column `column1`, and selects columns `column1` and `column2` from `table1`, as well as `column1` from `table2`. You can modify this query to join the tables on different columns and select different columns as per your requirements. Note that when joining two tables, you need to ensure that both tables have the same data type for the columns being joined.