Snowflake Architecture

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings. Snowflake is a true SaaS offering which is available in all major cloud environments like AWS, AZURE and GCP. It completely runs on cloud.

Snowflake do not require

  1. Hardware procurement, installation and configuration
  2. Ongoing maintenance or tuning it is handled by Snowflake in background

In another way it is like create account in Snowflake, load data and use it, it support different kinds of data formats like CSV, JSON , AVRO etc.

Shared disk architecture vs Shared nothing architecture

In distributed computing there are two kinds of architectures one is shared disk architecture where the storage is common for all compute nodes or servers like Oracle RAC, shared nothing architecture is each node or server has its own storage and the data is distributed across all the nodes.

Snowflake combines both in its architecture. Storage is common for all the compute nodes which looks like shared storage architecture and each virtual warehouse has multiple nodes which process the query in parallel(MPP) which is equal to shared nothing architecture.

Below are the key components in Snowflake architecture

Database Storage (Storage resources)

When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage. users can access this data using SQL queries like other RDBMS databases. Database storage includes storage for user data, Time-travel and fail-safe.

Based on the cloud provider Snowflake save data in respective storage for example in AWS it use the S3 to store the data and the storage is distributed across three AZ’s(data centers). It charge $40/TB for on demand storage and for upfront storage it charges $23/TB.

Query processing (compute)

Query processing is done using virtual warehouses in Snowflake. Virtual warehouses are collection of machines or cluster of machines which process the queries. Warehouses are independent of each other, they don’t share any memory or compute.

What is Virtual warehouse ?

Virtual warehouses are collection of machines which are used to execute the query (DDL command don’t need warehouse to execute).

Virtual warehouses usage cost is based on the size of warehouse and usage time, warehouse size is defined like size of t-shirt , it is defined as Small, medium, large and etc. User need to pay for use (in terms of seconds, pay/second except for first one minute). Snowflake use the concept of credit/hour for warehouse usage, based on Snowflake version credits will be converted to $. For business critical edition 1 credit = $4. Below are the different warehouses with credit usage information.

Cloud services

These are collection of services to coordinate the access to Snowflake which includes different components of Snowflake from users authentication to query result dispatch to user. Below are some of major services in the list.

  • Authentication – user authentication
  • Metadata management – metadata to plan and optimize the query
  • Query parsing and optimization – query parser, plan generator and optimizer
  • Result cache – cache to store the previous query result
  • Access control – user access permissions

For cloud services snowflake use its own internal virtual warehouses and will be charged to client based on usage.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s