SolutionsClient Repository

Problem Architecture Key Features Operational Excellence Code Highlights Outcomes Demo Talk Track Interview Mapping

SnowflakeSQLStored ProceduresETL ObservabilityRBACschemachangeCI/CDData Modeling

Snowflake-Native Client Repository

Built the single source of truth for customer data across multiple source systems. Every customer record is deduplicated, enriched, and available for analytics, reporting, and downstream applications within a 2-hour SLA.

0 stars0 forksUpdated todaygithub.com/jdariya/snowflake-client-repository

Problem Statement

Business Problem

Customer data was siloed across three source systems with no unified view. The same customer appeared with different IDs in each system, making cross-system analytics impossible. Compliance reporting required manual data reconciliation that took days.

Technical Challenge

There was no canonical customer entity model. Source systems had different schemas, different key structures, and different update frequencies. ETL jobs had no observability — failures were discovered by downstream users, not by the data team.

Architecture Flow

Source Ingestion

Step 1 of 5

Multiple source systems (SQL Server, PostgreSQL, MySQL) ingested via Fivetran into the LANDING schema. Raw tables preserved as-is.

FivetranSnowflakeLANDING Schema

Key Features

The ETL_CONTROL table defines which tables are managed by the framework. Each stored procedure writes a start and end record to ETL_LOG with row counts and status. ETL_REFRESH_HISTORY tracks the last successful load time per table. This enables SLA monitoring via simple SQL queries.

Code ref:migrations/client_repository_raw/tables/V01.1.3__etl_log.sql

Operational Excellence

idempotency

MERGE statements with hash keys ensure safe re-runs. Repeatable scripts (R__ prefix) are re-applied on every deployment.

retries

SP_RUN_ALL catches exceptions and logs errors without crashing the pipeline. Snowflake Tasks retry on transient errors.

logging

ETL_LOG table captures every load event. ETL_REFRESH_HISTORY tracks last successful load per table for SLA monitoring.

access Control

Separate LOADER, TRANSFORMER, and REPORTER roles with object-level grants. No shared credentials.

cicd

Azure DevOps runs schemachange migrations on merge. Separate pipelines for dev, staging, and production environments.

environment Promotion

schemachange migration scripts are environment-agnostic. Environment-specific variables injected by Azure DevOps pipeline.

Code Highlights

ETL Log Table

migrations/client_repository_raw/tables/V01.1.3__etl_log.sql

DDL for the ETL observability logging table that captures every load event.

View

Hash Key Function

migrations/client_repository_raw/functions/V00.1.2__hash_key.sql

Utility function for computing surrogate hash keys from business key fields.

View

SP Run All

migrations/client_repository/procedures/R__sp_run_all.sql

Master orchestration stored procedure that executes all entity SPs in dependency order.

View

Azure DevOps Pipeline (Dev)

pipelines/azure-pipelines-dev.yml

CI/CD pipeline for running schemachange migrations in the development environment.

View

Outcomes

Source Systems Unified

SQL Server, PostgreSQL, MySQL all feeding one canonical model

< 2 hours

Data Freshness SLA

Automated Snowflake Tasks run every 2 hours

< 1 hour

Compliance Report Time

Down from 2–3 days of manual reconciliation

100%

ETL Observability

Every load event logged with row counts and durations

Demo Talk Track

3–5 Minute Version

"This is the Client Repository — a Snowflake-native enterprise data warehouse I built to create a single source of truth for customer data. The business problem: customer data was siloed across three source systems. The same customer had different IDs in each system, making cross-system analytics impossible. Compliance reporting required days of manual reconciliation. The technical architecture starts with Fivetran ingesting raw data from SQL Server, PostgreSQL, and MySQL into a LANDING schema. I preserve the raw data as-is — no transformations at ingestion. The transformation layer is built on Snowflake stored procedures. I built an ETL control framework with three key tables: ETL_CONTROL defines which tables to process, ETL_LOG captures every load event with row counts and durations, and ETL_REFRESH_HISTORY tracks the last successful load per table. This gives the team full observability via simple SQL queries. For deduplication, I built hash key utility functions that compute a surrogate key from business key fields. Every MERGE statement uses this hash key, which means the pipeline is fully idempotent and handles source system key changes gracefully. The orchestration is a master stored procedure — SP_RUN_ALL — that executes each entity SP in dependency order: Person first, then Client, then Policy, and so on. If any SP fails, the error is logged and execution stops to prevent cascading issues. For schema management, I used schemachange with a strict versioning convention. Every DDL change has a version number and is tracked in a CHANGE_HISTORY table. Azure DevOps runs schemachange on every deployment. The result: three source systems unified, data freshness under 2 hours, and compliance reporting reduced from days to under an hour."

30–60 Second Version

"I built a Snowflake-native client repository that unified customer data from three source systems. Key design decisions: an ETL observability framework with structured logging tables, hash-based deduplication for idempotent loads, and a master stored procedure for dependency-ordered orchestration. Compliance reporting went from days to under an hour."

Interview Mapping

How this solution demonstrates enterprise data platform delivery values:

Extreme Ownership

Designed the full data model, ETL framework, observability layer, and CI/CD pipeline end-to-end.

Curiosity

Researched schemachange as an alternative to Flyway for Snowflake-native schema management — a better fit for the team's existing tooling.

Grit

Unifying three source systems with different schemas and key structures required weeks of data profiling and iterative model refinement.

Psychological Safety

The ETL observability framework was designed so any team member could diagnose a pipeline failure without needing to ask me.