Smells Like Data Science – A ‘Heartwarming’ Case Study

Hearts & Science (H&S) wanted to ‘democratize’ data science within their organization. Yes, they have a very strong data science team, but they wanted all of their media activation and analyst teams to be able to speak credibly about what was possible with data science. (To be fair, they

did also cross-train the data scientists on all things possible in-tool with GA360 & GMP. But this case study is about the data science training).

H&S partnered with Google and turned to Napkyn to deliver this training. Recognizing that it takes many years of education and experience to become a data scientist, and the amount of material to cover was vast. Napkyn realized the first challenge would be selecting the relevant content. Then, the larger challenge would be making it interesting, informative and digestible.

After a very productive discovery session with H&S, Napkyn chose a cross-functional team to conquer this challenge. The approach was a boot-camp style six hour session. A LOT of data science all in one day.

Yet another challenge was that the content would have to build upon itself, which meant starting out with the basics and building from there. How would it be possible to keep people interested through the whole day? Napkyn’s team thought backwards.

Knowing the major challenge in the media worlds is the impending extinction of the third party cookie (and all the other browser restrictions), tackling that problem with data science was just the hook that would keep the attendees attention.

The agenda was agreed upon. Napkyn went away and created the materials. The training was scheduled. Due to the fact there would be people in every time zone, the best available time for the H&S team was 12pm – 6pm eastern on a Friday.

Finally the day was upon us. Everyone joined without incident which was amazing as it was an online meeting and there was a triple digit invitee list. I settled in and told my dog, Mikey, it would be a long day. Luckily there were well timed breaks, so he’d at least get to go out.

The first session of the day, ‘GCP and its role in Marketing Analytics’ started with the basics.
‘What is Google Cloud Platform (GCP)?’ From there, ‘Marketing Analytics and Machine Learning Components’ led into an introduction to BigQuery using a Google Analytics dataset and then moved on to other ways to ingest data into BigQuery.

After a break (Mikey said, ‘Thanks!’), we moved on to BigQuery Machine Learning (BQML), starting with an overview then moving to the different models BQML supports and their intended uses, everything from linear regression to matrix factorization.

From there we learned about some different ways of building and testing BQML models.

With the basics under our belt, we continued on to exploring and preparing the raw data exported from google analytics using SQL coding and BigQuery Dataprep. The goal of this session was to learn how to access and explore the raw table data in BigQuery and transform the data into the best format to train machine learning models. It definitely built on the previous sessions. And, yes, it was comprehensive.

At this point, I thought my non-data scientist brain was going to explode. Lucky for me (and Mikey) it was time for another break.

4:30pm on a Friday, after drinking from the data science information firehose, it was on to the final session, ‘Use Cases’. Starting with a basic review of what attribution models are available with Google Analytics ‘in-tool’, we moved on to why create a behavior-based machine learning attribution and what is the advantage of using BQML

And although throughout the day some of the training had been slides, a lot of it had been practical, doing as we were learning – it now got really real. Feature mining, extracting/generating useful features, and creating new tables for training ML models. We even got in to feature importance and dimensionality reduction. Heady stuff!

We created a classification model using the BQML features and used the trained model to predict outcomes. Then we implemented a behavior-based attribution model , and evaluated its performance. The Holy Grail!

After reviewing the results, just to keep it real the pros and cons of machine learning attribution was discussed.

This (one specific pro and future con – the fate of cookies) dovetailed nicely into the final topic, ‘ The Impact of Third-party Cookies Extinction on Attribution Modeling’, which, of course led to the topic of the importance and the possibilities of Conversion Modeling and introducing First Party Data (a la Ads Data Hub and other first party data sources) into your models.

Although there had been questions throughout the day, 6pm on a Friday, after a day chock-full of learning, questions were still coming in. Finally, when asked, ‘Does anyone else have a question?’ there was silence. A perfect ending to a perfectly productive session.

Not only was the training a success for both teams, it also led Napkyn to pursue a few other important projects. Realizing how integral machine learning and artificial intelligence is to the insights GA4 offers, we have published ebooks, blogs and hosted webinars on the cookie-less world and how both GA4 and data science projects can not just fill in the gaps that the cookie-void leaves, but really leapfrog your analytics to the next dimension.

We also focused our attribution modeling projects not just on traditional models, but shifted the focus on incremental, fractional and behavior based approaches.

So we say, Thank you Hearts & Sciences not only for the opportunity to work with you and our evolving partnership, but for opening our eyes to some important work we needed to do.’

And Mikey, the data scientist, says ‘Thank you both. I learned a lot, not the least of which is my favorite model for segmentation, K-9 means clustering.