Built the single source of truth for customer data across multiple source systems. Every customer record is deduplicated, enriched, and available for analytics, reporting, and downstream applications within a 2-hour SLA.
Customer data was siloed across three source systems with no unified view. The same customer appeared with different IDs in each system, making cross-system analytics impossible. Compliance reporting required manual data reconciliation that took days.
There was no canonical customer entity model. Source systems had different schemas, different key structures, and different update frequencies. ETL jobs had no observability — failures were discovered by downstream users, not by the data team.
Multiple source systems (SQL Server, PostgreSQL, MySQL) ingested via Fivetran into the LANDING schema. Raw tables preserved as-is.
The ETL_CONTROL table defines which tables are managed by the framework. Each stored procedure writes a start and end record to ETL_LOG with row counts and status. ETL_REFRESH_HISTORY tracks the last successful load time per table. This enables SLA monitoring via simple SQL queries.
migrations/client_repository_raw/tables/V01.1.3__etl_log.sqlMERGE statements with hash keys ensure safe re-runs. Repeatable scripts (R__ prefix) are re-applied on every deployment.
SP_RUN_ALL catches exceptions and logs errors without crashing the pipeline. Snowflake Tasks retry on transient errors.
ETL_LOG table captures every load event. ETL_REFRESH_HISTORY tracks last successful load per table for SLA monitoring.
Separate LOADER, TRANSFORMER, and REPORTER roles with object-level grants. No shared credentials.
Azure DevOps runs schemachange migrations on merge. Separate pipelines for dev, staging, and production environments.
schemachange migration scripts are environment-agnostic. Environment-specific variables injected by Azure DevOps pipeline.
migrations/client_repository_raw/tables/V01.1.3__etl_log.sqlDDL for the ETL observability logging table that captures every load event.
migrations/client_repository_raw/functions/V00.1.2__hash_key.sqlUtility function for computing surrogate hash keys from business key fields.
migrations/client_repository/procedures/R__sp_run_all.sqlMaster orchestration stored procedure that executes all entity SPs in dependency order.
pipelines/azure-pipelines-dev.ymlCI/CD pipeline for running schemachange migrations in the development environment.
"This is the Client Repository — a Snowflake-native enterprise data warehouse I built to create a single source of truth for customer data. The business problem: customer data was siloed across three source systems. The same customer had different IDs in each system, making cross-system analytics impossible. Compliance reporting required days of manual reconciliation. The technical architecture starts with Fivetran ingesting raw data from SQL Server, PostgreSQL, and MySQL into a LANDING schema. I preserve the raw data as-is — no transformations at ingestion. The transformation layer is built on Snowflake stored procedures. I built an ETL control framework with three key tables: ETL_CONTROL defines which tables to process, ETL_LOG captures every load event with row counts and durations, and ETL_REFRESH_HISTORY tracks the last successful load per table. This gives the team full observability via simple SQL queries. For deduplication, I built hash key utility functions that compute a surrogate key from business key fields. Every MERGE statement uses this hash key, which means the pipeline is fully idempotent and handles source system key changes gracefully. The orchestration is a master stored procedure — SP_RUN_ALL — that executes each entity SP in dependency order: Person first, then Client, then Policy, and so on. If any SP fails, the error is logged and execution stops to prevent cascading issues. For schema management, I used schemachange with a strict versioning convention. Every DDL change has a version number and is tracked in a CHANGE_HISTORY table. Azure DevOps runs schemachange on every deployment. The result: three source systems unified, data freshness under 2 hours, and compliance reporting reduced from days to under an hour."
"I built a Snowflake-native client repository that unified customer data from three source systems. Key design decisions: an ETL observability framework with structured logging tables, hash-based deduplication for idempotent loads, and a master stored procedure for dependency-ordered orchestration. Compliance reporting went from days to under an hour."
How this solution demonstrates enterprise data platform delivery values:
Designed the full data model, ETL framework, observability layer, and CI/CD pipeline end-to-end.
Researched schemachange as an alternative to Flyway for Snowflake-native schema management — a better fit for the team's existing tooling.
Unifying three source systems with different schemas and key structures required weeks of data profiling and iterative model refinement.
The ETL observability framework was designed so any team member could diagnose a pipeline failure without needing to ask me.