BigQuery vs. Redshift: Which cloud data warehouse is right for you?
The data warehouse is the source of truth from your business's data set. Choosing the right solution is critical. This article explains how BigQuery and Redshift compare in factors such as performance, security, and cost so that you can select the right warehouse for your needs.
Data-driven marketing campaigns can help you find and activate the right prospects with relevant content.
However, organizational silos, stale customer data, and insufficient skills can handicap even the most sophisticated marketing campaigns. Fully managed cloud data warehouses like Google BigQuery and Amazon Redshift can overcome these challenges to a large extent. Both of these platforms provide a real-time, scalable, and cost-effective repository for the data you want to house for the long term. And offers a platform for conducting all of your marketing analytics and business intelligence use cases.
Comparing BigQuery vs. Redshift with respect to performance, usability, security, and costs will help you choose the most suitable data warehouse for your needs. For more data warehouse comparisons, you can also check out our breakdowns of Snowflake vs. Redshift and Snowflake vs. BigQuery.
The better your query performance, the faster and more cost effectively you’ll be able to find insights such as which user action led to the most conversions or which common user attributes correlate with better lifetime value.
According to a benchmark study by research firm GigaOm, Redshift outperformed BigQuery by nearly five times with respect to query execution time and cost. But it’s worth doing a benchmark study with your own production environment data sets to find out which data warehouse delivers the best performance for your needs.
The GigaOm benchmark study used an industry-standard TPC-DS data set that amounted to 30TB of data. A total of 99 different queries were executed on this data set, and each query was executed three times. The best runtime out of the three executions was used in the final result.
To ensure a level playing field, both data warehouses were set with comparable configurations and costs.
- Redshift ran on an AWS server that had 38 ra3.4xlarge nodes — costing $123.88 per hour.
- BigQuery ran on a GCP server with 3,000 slots — costing $120.00 per hour.
Executing all 99 test queries on Redshift cost $110.73, while it cost $511.09 on BigQuery.
However, don’t let this benchmark result alone influence your final decision — it may not accurately reflect your needs for two reasons. Firstly, each warehouse has a different architecture that suits different workloads better than others. And secondly, your business workloads may differ from the TPC-DS one used in this study.
On top of that, factors like warehouse configuration, number of joins used, internal data schema, and data size can all impact the result.
To get a clear picture of what it will cost to run queries on your own datasets, it’s best to test both these platforms in your production environment. Use real data sets and queries from your marketing analytic operations and benchmark both warehouses to truly understand which is better for you.
Benchmarking your live workloads is also one of the most useful ways to utilize the $300 of free credits that BigQuery offers and the 750 free hours per month offered by Redshift.
In general, however, Redshift is more cost effective for running regular queries or API calls used in daily marketing reports, while BigQuery is better for processing low-frequency workloads that deal with more complex schemas and resource-intensive queries made up of multiple joins or aggregates.
Before you can load any customer data or write a single query on Redshift, you’ll first need to define node type, create clusters, configure databases, and manage cluster permissions. Redshift provides documentation for each step, but you may still require some engineering support to get these steps right.
So, if you have no or very little engineering support to lean on, then BigQuery or Redshift serverless are better options. Both these options can help you focus on the core marketing analytics without getting distracted by server management tasks like adding compute power or allocating more nodes.
Both BigQuery and Redshift are valuable in different ways. BigQuery is a ready-made, pay-per-use solution that prioritizes time over flexibility, while Redshift is a tailor-made solution that can be optimized for your specific needs. However, the right choice here boils down to how much engineering support is available to you.
Both BigQuery and Redshift are built around the industry standard SQL query interface.
BigQuery uses Google Cloud console as its web UI. It allows you to easily load and select data through a graphical user interface. You can use this web editor to run queries that filter, join, and aggregate your data sets. The editor also automatically highlights syntax errors and provides upfront query information. In addition to this web interface, you can also query via the command-line tool, REST API, or Client Libraries.
Redshift doesn’t have a web UI like BigQuery. It only has a command line interface (CLI) and an API reference. However, an SQL-like user interface allows anyone familiar with data operations to run queries effortlessly on the CLI. Additionally, there is a Redshift Advisor to provide smart recommendations on designing tables and optimizing queries.
So, you should have no trouble querying either of these warehouses for your marketing analytics. The APIs offered by both these warehouses can help you build real-time dashboards that constantly show important marketing metrics like conversion rate, churn, CLTV, and performance of a campaign. However, choose BigQuery if you prefer a graphical user interface over a CLI.
Both warehouses use identity and access management (IAM) and provide single sign-on (SSO) to help you define granular access controls and prevent login breaches across your entire marketing team.
For identity and access management, BigQuery uses Google Cloud IAM, and RedShift uses Amazon IAM. These IAM solutions help you authorize which team member can take what action. This is helpful while delegating specific marketing tasks. For example, you can prevent your social media team from accessing any email or CRM data and grant them access only to relevant data sets. The IAMs also verify logins. So, you can securely delegate a marketing task to a remotely located team member without any security risks.
There is very little to choose between both these data warehouses on the security front. However, one difference is that BigQuery encrypts all data by default, while Redshift has the option to encrypt data. So, remember to turn on the encryption option while sending data from Redshift to another tool.
BigQuery and Redshift are both designed to take maximum advantage of their respective Google and Amazon ecosystems. They both also provide APIs to integrate with all your marketing stack CRM, payment, social media, email marketing, advertising, and third-party tools.
Because both data warehouses are part of a larger ecosystem, you won’t need an external data storage, ETL streaming, or analytic platform for most of your marketing use cases.
For example, BigQuery uses Google Cloud Platform services and may use Cloud Bigtable to capture raw data. It can also leverage other Google solutions like Cloud Datastore, Cloud Spanner, and Cloud Dataflow.
Similarly, Redshift runs on Amazon Elastic Compute Cloud (EC2) instances, uses AWS parallel processing to read and load data, and stores data on Amazon Simple Storage Service (S3). Redshift can also make use of Amazon Glue to ingest data and analyze real-time data streams using Kinesis.
In addition to this, both data warehouses provide API integration options to sync data from all your marketing tools. For example, you can send customer payment information into BigQuery using its API. Or you can send information from your social media and email campaigns to Redshift using its API.
However, building individual pipelines that integrate all your tools will eventually result in more engineering effort, bloated schemas, and maintenance problems. To avoid such a problem, both warehouses allow you to route customer data across multiple sources and destinations from a single Customer Data Platform.
Both data warehouses have competitive on-demand pricing plans and reserved long-term discounts.
BigQuery’s on-demand compute resources are priced at $5 per TB of queried data. The compute power in this plan automatically scales up to 2,000 BigQuery CPUs, according to workload requirements. Teams with more predictable workloads can also avail a flat rate discount by reserving BigQuery slots in advance.
Storage costs $0.01 or $0.02 per GB based on when it was last modified.
Redshift offers 750 free hours every month. This roughly translates into running a single DC2 large node with 160GB of compressed SSD storage. Beyond this limit, you’ll need to choose a suitable, per hourly basis, on-demand plan. The DC2 large plan costs $0.25 per hour, and the DC2 8X large costs $4.80 per hour. Redshift-managed storage RA3 plans come in three variants — costing $1.08, $3.26, and $13.04 per hour.
Redshift serverless pricing starts at $0.36 per RPU hour, and storage costs $0.024 per GB per month. Redshift also offers long-term reserved pricing discounts that range from 20% to 63%, depending on the duration and upfront payment terms you choose.
In general, BigQuery’s on-demand compute prices are more cost effective for smaller and more frequent workloads, while Redshift’s pay-per-node pricing makes processing hundreds of ad hoc queries more cost effective. So, choose the right plan depending on how frequently you need to process large workloads.
BigQuery is better suited to process simple, everyday queries like finding purchase or ad click trends. Redshift is more suited to finding user patterns and insights across several thousands of customer data points, in an ad-hoc manner.
BigQuery vs. Redshift: Load customer data without extensive development
mParticle’s ready-made integrations with BigQuery and Redshift can help you eliminate the cumbersome task of building ETL pipelines for all of your different customer data sources—web, mobile app, payment, OTT and social media touchpoints. You can use any of mParticle's 300+ integrations to stream customer data sources in real time without building custom data pipelines, and leverage native SDKs and APIs to collect data across all your customer web, mobile, payment, and social media touchpoints. You can also manage data quality by enforcing a consistent schema on incoming data from all sources, and unify cross-channel events into a single view of the customer. For an example, check out this article detailing how you can use mParticle and Redshift to improve data uploads and power custom analytics.