From Hours to Minutes: The Databricks Orchestration Overhaul Nobody Talks About

A real-world story of parallelism, metadata-driven orchestration, and not crying at 4 AM waiting for pipelines to finish.

Contents

1 The Setup (a.k.a. “everything was fine until it wasn’t”)
2 The Problem: Sequential Execution is a Silent Killer
3 The Fix, Part 1: Metadata-Driven Execution Orders
4 The Fix, Part 2: Parallelism with ThreadPoolExecutor
5 The Results
6 The Lowkey Most Important Part, Don’t Skip This
7 What’s Next

The Setup (a.k.a. “everything was fine until it wasn’t”)

Picture this, Classic Databricks Medallion Architecture (Bronze, Silver, Gold). Data lands in Bronze via Fivetran (client-managed, not my problem 🙏). My job starts at Bronze and ends at a clean, analytics-ready Gold layer. Simple enough, right?

Here’s what the stack looked like:

For the Bronze → Silver leg, I had three specialized notebooks handling different load patterns:

Notebook	What it does
nb_silver_dim_loader	Handles SCD1 & SCD2 dimension loads
nb_silver_fact_loader	Full overwrite for fact tables
nb_silver_fact_incremental_loader	Incremental merge for high-volume facts

Sitting above all three? An orchestrator notebook, the conductor. It reads a metadata table, figures out what needs to run, and calls the right notebook for each table.

Sounds clean. And for a while, it was.

Then the table count grew and the pipeline that used to wrap up in 45 minutes started clocking in at 2.5 hours. Every Single Day 🙁

That’s when I knew we had a problem.

The Problem: Sequential Execution is a Silent Killer

Here’s what was actually happening under the hood.

The original orchestrator was doing this (conceptually):

Run Table A → wait → Run Table B → wait → Run Table C → wait → ...

One table at a time. Fully sequential. Every notebook call was blocking the next one. If Table A took 8 minutes, nothing else could start until it finished. This is the equivalent of cooking a full meal by finishing the pasta, then starting the sauce, then boiling the water. Nobody does that. But that’s exactly what our pipeline was doing. So, I dug in. And what I found was embarrassingly simple.

The original logic looked roughly like this (simplified version of what we had):

# Old approach - clean but painfully sequential
results = [run_silver_task(task) for task in tasks_to_run]

# Old approach - clean but painfully sequential
results = [run_silver_task(task) for task in tasks_to_run]

One list comprehension. One task at a time. For 140+ tables across Bronze → Silver → Gold, you can do the math on why we were bleeding time.

Now, sequential made sense initially and the reason was legit: dimension tables need to load before fact tables. You can’t have a fact row referencing a dim key that doesn’t exist yet. Real dependency, real constraint. But that constraint doesn’t mean everything has to wait for everything else. That’s where we were going wrong.

The Fix, Part 1: Metadata-Driven Execution Orders

Before I could introduce parallelism, I needed a way to encode dependencies safely. The answer was already sitting in my metadata table, just unused.

Here’s the DDL for the execution table:

CREATE TABLE dev_silver.metadata.execution_table (
  src_system        	  STRING,
  src_catalog       	  STRING,
  src_table_name    	  STRING,
  src_schema        	  STRING,
  load_type         	  STRING,
  dst_catalog       	  STRING,
  dst_table_name    	  STRING,
  dst_schema        	  STRING,
  layer             	  STRING,
  is_active_flag    	  BOOLEAN,
  execution_order   	  INT,           -- 👈 the whole game lives here
  execution_pipeline 	  STRING,
  business_key_columns  ARRAY<STRING>,
  primary_key_columns   ARRAY<STRING>,
  sql_file_path     	  STRING,
  cdc_source_config 	  STRING,
  comment           	  STRING
)

CREATE TABLE dev_silver.metadata.execution_table (
  src_system        	  STRING,
  src_catalog       	  STRING,
  src_table_name    	  STRING,
  src_schema        	  STRING,
  load_type         	  STRING,
  dst_catalog       	  STRING,
  dst_table_name    	  STRING,
  dst_schema        	  STRING,
  layer             	  STRING,
  is_active_flag    	  BOOLEAN,
  execution_order   	  INT,           -- 👈 the whole game lives here
  execution_pipeline 	  STRING,
  business_key_columns  ARRAY<STRING>,
  primary_key_columns   ARRAY<STRING>,
  sql_file_path     	  STRING,
  cdc_source_config 	  STRING,
  comment           	  STRING
)

The magic column is execution_order. Here’s how I used it:

Order 1 → All dimension tables (SCD1/SCD2). No dependencies, run them first.
Order 2 → Fact tables that depend only on dims from Order 1.
Order 3 → Fact tables that depend on other facts from Order 2.

The rule: within the same execution order, tables are independent and can run in parallel. Across orders, we wait for the previous batch to complete before starting the next.

This is basically topological sorting, but implemented in a metadata table that any data engineer on the team can update, no code changes needed. Add a new table? Just assign it an execution_order and flip is_active_flag to True. That’s it.

The Fix, Part 2: Parallelism with ThreadPoolExecutor

Now that I had a safe execution order model, I could rewrite the orchestrator to run tasks in parallel within each batch.

Here’s the full updated Bronze → Silver orchestrator:

Step 1: Identify the Tasks

layer_name = 'bronze_to_silver'

tasks_df = METADATA_DF.filter(
    (col('layer') == layer_name) & 
    (col('is_active_flag') == True)
).select('dst_table_name', 'execution_pipeline', 'execution_order').distinct()

if tasks_df.count() == 0:
    dbutils.notebook.exit('No active Silver Layer tasks found.')

unique_orders = [
    row.execution_order 
    for row in tasks_df.select('execution_order').distinct().orderBy('execution_order').collect()
]

print(f"Found {tasks_df.count()} tasks across {len(unique_orders)} execution batches.")
display(tasks_df.orderBy('execution_order'))

layer_name = 'bronze_to_silver'

tasks_df = METADATA_DF.filter(
    (col('layer') == layer_name) & 
    (col('is_active_flag') == True)
).select('dst_table_name', 'execution_pipeline', 'execution_order').distinct()

if tasks_df.count() == 0:
    dbutils.notebook.exit('No active Silver Layer tasks found.')

unique_orders = [
    row.execution_order 
    for row in tasks_df.select('execution_order').distinct().orderBy('execution_order').collect()
]

print(f"Found {tasks_df.count()} tasks across {len(unique_orders)} execution batches.")
display(tasks_df.orderBy('execution_order'))

This reads from the metadata table, pulls only active tasks for the bronze_to_silver layer, and extracts the distinct execution orders. These orders become our batch sequence.

Step 2: Define the Task Runner

def run_silver_task(task):
    dst_table = task['dst_table_name']
    pipeline_type = task['execution_pipeline']

    runner_notebook_map = {
        'scd1': '../silver/nb_silver_dim_loader',
        'scd2': '../silver/nb_silver_dim_loader',
        'fact_overwrite': '../silver/nb_silver_fact_loader',
        'fact_incremental_merge': '../silver/nb_silver_fact_incremental_loader'
    }

    notebook_to_run = runner_notebook_map.get(pipeline_type)
    if not notebook_to_run:
        return (dst_table, 'SKIPPED', f"No runner notebook found for pipeline type: {pipeline_type}")

    print(f"[STARTING] Task: {dst_table} using pipeline: {pipeline_type}")
    try:
        result = dbutils.notebook.run(
            path=notebook_to_run,
            timeout_seconds=7200,  # 2 hour timeout per task
            arguments={'dst_table_name': dst_table}
        )
        return (dst_table, 'SUCCESS', result)
    except Exception as e:
        error_message = str(e)
        return (dst_table, 'FAILED', error_message)

def run_silver_task(task):
    dst_table = task['dst_table_name']
    pipeline_type = task['execution_pipeline']

    runner_notebook_map = {
        'scd1': '../silver/nb_silver_dim_loader',
        'scd2': '../silver/nb_silver_dim_loader',
        'fact_overwrite': '../silver/nb_silver_fact_loader',
        'fact_incremental_merge': '../silver/nb_silver_fact_incremental_loader'
    }

    notebook_to_run = runner_notebook_map.get(pipeline_type)
    if not notebook_to_run:
        return (dst_table, 'SKIPPED', f"No runner notebook found for pipeline type: {pipeline_type}")

    print(f"[STARTING] Task: {dst_table} using pipeline: {pipeline_type}")
    try:
        result = dbutils.notebook.run(
            path=notebook_to_run,
            timeout_seconds=7200,  # 2 hour timeout per task
            arguments={'dst_table_name': dst_table}
        )
        return (dst_table, 'SUCCESS', result)
    except Exception as e:
        error_message = str(e)
        return (dst_table, 'FAILED', error_message)

The runner_notebook_map is a clean dispatch pattern, the orchestrator doesn’t need to know how SCD2 loading works. It just knows which notebook handles it and passes the table name as an argument. Each notebook is self-contained and idempotent.

The 2-hour timeout per task is non-negotiable. Without it, a hung notebook can block your entire thread pool indefinitely. Don’t skip this.

Step 3: The Actual Parallelism

results = []
all_tasks = [row.asDict() for row in tasks_df.collect()]

for order in unique_orders:
    tasks_in_group = [t for t in all_tasks if t['execution_order'] == order]
    print(f"n>>> Processing Execution Order: {order} ({len(tasks_in_group)} tasks in parallel)")
    
    with ThreadPoolExecutor(max_workers=5) as executor:
        group_results = list(executor.map(run_silver_task, tasks_in_group))
        results.extend(group_results)

results = []
all_tasks = [row.asDict() for row in tasks_df.collect()]

for order in unique_orders:
    tasks_in_group = [t for t in all_tasks if t['execution_order'] == order]
    print(f"n>>> Processing Execution Order: {order} ({len(tasks_in_group)} tasks in parallel)")
    
    with ThreadPoolExecutor(max_workers=5) as executor:
        group_results = list(executor.map(run_silver_task, tasks_in_group))
        results.extend(group_results)

This is the heart of the change. For each execution order:

We collect all tables assigned to that order.
We fire them all simultaneously using ThreadPoolExecutor with up to 5 parallel workers.
We wait for all tasks in the batch to complete before moving to the next order.

The max_workers=5 limit is intentional. It prevents overwhelming the cluster with too many concurrent notebook calls. Tune this based on your cluster size and the nature of your workloads.

Step 4: Summarize Results and Fail Loudly

successful_tasks = [res for res in results if res[1] == 'SUCCESS']
failed_tasks     = [res for res in results if res[1] == 'FAILED']
skipped_tasks    = [res for res in results if res[1] == 'SKIPPED']

print("n--- SILVER LAYER SUMMARY ---")
print(f"Total tasks processed: {len(results)}")
print(f"Successful: {len(successful_tasks)}")
print(f"Failed:     {len(failed_tasks)}")
print(f"Skipped:    {len(skipped_tasks)}")

if failed_tasks:
    print("n--- FAILED TASK DETAILS ---")
    for task, status, message in failed_tasks:
        print(f'Task: {task}n  Error: {message}n')
    raise Exception("One or more Silver Layer tasks failed. Please check the logs.")

print("nSilver Layer run completed.")

successful_tasks = [res for res in results if res[1] == 'SUCCESS']
failed_tasks     = [res for res in results if res[1] == 'FAILED']
skipped_tasks    = [res for res in results if res[1] == 'SKIPPED']

print("n--- SILVER LAYER SUMMARY ---")
print(f"Total tasks processed: {len(results)}")
print(f"Successful: {len(successful_tasks)}")
print(f"Failed:     {len(failed_tasks)}")
print(f"Skipped:    {len(skipped_tasks)}")

if failed_tasks:
    print("n--- FAILED TASK DETAILS ---")
    for task, status, message in failed_tasks:
        print(f'Task: {task}n  Error: {message}n')
    raise Exception("One or more Silver Layer tasks failed. Please check the logs.")

print("nSilver Layer run completed.")

This part matters more than people think. If any task fails, we:

Print every failed task with its full error message.
Raise an exception to fail the orchestrator job visibly.

Silent failures in data pipelines are genuinely dangerous. Your downstream Gold layer will just quietly produce wrong numbers. Fail loudly, fix fast.

The Results

That’s a 60%+ reduction in runtime without touching the underlying notebooks, without changing the cluster size, and without spending a single extra dollar on compute.

The first time the pipeline finished in under 30 minutes I just sat there staring at the clock in disbelief. No more doom-scrolling through job logs at 4 AM. No more that sinking feeling when the run time kept climbing. It was just… done. Clean, fast, and on time. The kind of quiet win that makes you exhale for the first time in months.

The metadata-driven approach also means the team can onboard new tables by just inserting a row into the execution table. No PR needed for the orchestrator. No code review for adding a new dim or fact. Just update the metadata and it picks up on the next run.

But the real win wasn’t just the speed. It was how the whole system finally felt.

The Lowkey Most Important Part, Don’t Skip This

If you skimmed the whole post (no judgment), at least lock these in:

Sequential pipelines don’t hit different at scale

What works for 10 tables will genuinely humble you at 140. Design for parallelism from the start, future you will send a thank you note.

Execution order is your safety net

You don’t need to parallelize everything, that’s the trap. You parallelize within dependency boundaries. Execution orders are what keep your dims loading before your facts without you babysitting the pipeline.

Metadata-driven orchestration is main character energy

When adding a new table requires zero code changes, your pipeline becomes self-service. That’s real leverage.

Fail loudly. Always. Non-negotiable

Silent failures are your 6am on-call alert waiting to happen. Build exception handling that surfaces errors immediately, clearly, and without mercy.

What’s Next

The Bronze → Silver leg was a parallelism problem. Clean diagnosis, clean fix.

The Silver → Gold leg? That one had a different kind of villain, stale views, undetected SQL changes, and materialized views quietly serving wrong data to dashboards. Plus its own parallelism story.

That’s Part 2. Coming soon.

If this was helpful, follow me for more real-world data engineering content. I write about the stuff that actually happens at work, not just the happy path from the docs.

If you liked this article, follow me on LinkedIn or other social media to stay up-to-date with my latest posts. You might also like the following posts about related topics: