AzQuack

You can have a full DuckLake implementation on Azure without running a PostgreSQL server for metadata.

It is not serverless in this sample: the public query app and the internal catalog app both stay warm with one replica each, because Quack serves a live DuckDB process and the catalog app owns a single DuckDB catalog file.

The capacity is intentionally small, and Azure Container Apps Consumption billing uses vCPU/GiB seconds and requests, with active/idle behavior for warm replicas. See the official Azure Container Apps pricing page and billing guide to do the exact math for your region and subscription.

This repo deploys:

App	Replicas	CPU	Memory	Role
`query`	1	`0.5` vCPU	`1Gi`	public Quack endpoint for local DuckDB clients
`catalog`	1	`0.5` vCPU	`1Gi`	internal Quack endpoint that owns DuckLake metadata
Total warm ACA capacity	2	1.0 vCPU	2Gi	always running while the experiment is up

That gives you the raw allocation inputs for estimating the Container Apps part of the bill:

monthly vCPU-seconds = 1.0 * seconds_in_month
monthly GiB-seconds  = 2.0 * seconds_in_month

Add Storage, Azure Files, Key Vault, ACR, and Log Analytics for the full estimate.

Warning

This is a beta experiment. Quack is under active development, and DuckLake-with-Quack catalog support is new in DuckDB v1.5.3. Use this repo to experiment, not as a production lakehouse architecture.

The Idea

DuckLake needs:

a metadata catalog
a place for data files
a DuckDB process that can read and write both

This repo makes the metadata catalog another DuckDB process, reached through Quack. That removes PostgreSQL from the experiment while keeping the local client experience simple:

local DuckDB -> public Quack endpoint -> DuckLake -> internal Quack catalog

One Deployment

After you create an AZD environment, azd up deploys the PostgreSQL-free stack:

DuckLake data files in Azure Blob / ADLS-style storage
DuckLake metadata in a DuckDB file on Azure Files
an internal Container App that exposes the metadata DuckDB file over Quack
a public Container App that local DuckDB clients attach to over Quack
separate public and internal Quack tokens in Key Vault
managed identities for image pull and Blob data access

No Azure PostgreSQL Flexible Server is created.

Azure Deployment

flowchart LR
    subgraph local["Local machine"]
        duckdb["DuckDB CLI v1.5.3"]
    end

    subgraph azure["Azure resource group"]
        kv["Key Vault\nquack-token\ncatalog-quack-token"]
        acr["Azure Container Registry\nAzQuack image"]
        law["Log Analytics"]

        subgraph aca["Container Apps Environment"]
            query["Public query Container App\n0.5 vCPU / 1Gi\nmin=1 max=1"]
            catalog["Internal catalog Container App\n0.5 vCPU / 1Gi\nmin=1 max=1"]
        end

        blob["Storage Account\nlakehouse container\naz://lakehouse/data/"]
        files["Catalog Storage Account\nAzure Files share\n/catalog/catalog.duckdb"]
    end

    duckdb -- "Quack over HTTPS\npublic token" --> query
    query -- "internal Quack\ncatalog token" --> catalog
    query -- "DuckLake data files" --> blob
    catalog -- "DuckDB catalog file" --> files
    query -.->|"ACA secret refs"| kv
    catalog -.->|"ACA secret refs"| kv
    query -.->|"pull image"| acr
    catalog -.->|"pull image"| acr
    query -.->|"logs"| law
    catalog -.->|"logs"| law

Before You Start

Install:

Azure Developer CLI: azd
Azure CLI: az
DuckDB CLI v1.5.3
Docker
curl
Python 3

Your Azure principal must be able to create resources and role assignments. In practice, use Owner, or Contributor plus User Access Administrator, at the target subscription or resource-group scope.

Log in:

az login
az account set --subscription <subscription-id>
azd auth login

Check DuckDB:

duckdb -csv -c 'SELECT version();'

It must print:

v1.5.3

Run the local build validation:

./scripts/check.sh

That runs Python compile checks, Bicep build, and Docker build.

Deploy It

Create a fresh environment:

azd env new <unique-env-name> --location westus --subscription <subscription-id>

For example:

azd env new azquack-quackcat --location westus --subscription <subscription-id>

Then deploy:

azd up

The environment name becomes the resource group name:

<unique-env-name> -> rg-<unique-env-name>

Use a new environment name for each experiment. That keeps each deployment isolated from previous Azure resources.

Check It

Make sure the intended environment is selected:

azd env list
azd env select <unique-env-name>

Run the live validation:

./scripts/validate-deployment.sh

Expected ending:

Deployment validation passed.

The validator runs the main live checks:

Azure shape: no PostgreSQL, two ACR-backed Container Apps, internal catalog ingress
rollback shape: one active healthy revision per app, query scale 1/1, sticky sessions disabled, public /readyz metadata hidden
Quack auth: valid token works, wrong token fails
DuckLake writes: metadata and Blob data files are created
persistence: a query restart preserves DuckLake rows and the catalog remains healthy
hygiene: recent logs do not contain token values available to the validator

The script reads values from the active AZD environment.

Test Query Replicas

The default deployment keeps the public query app at one replica. To test whether Azure Container Apps sticky sessions work with DuckDB Quack, deploy the query app with two replicas and keep the catalog app at one replica:

azd env set QUERY_MIN_REPLICAS 2
azd env set QUERY_MAX_REPLICAS 2
azd env set QUERY_STICKY_SESSIONS sticky
azd env set QUERY_EXPOSE_PLATFORM_METADATA true
azd up
./scripts/validate-sticky-sessions.py

The sticky-session validator checks the live Azure settings, confirms two query replicas are running, confirms the catalog is still single-replica, then runs repeated Quack calls from multiple local DuckDB clients to see whether each attached session stays on one query replica.

The current result is negative: ACA affinity works for normal cookie-aware HTTP calls, but the DuckDB Quack client does not appear to replay ACA's affinity cookie. Repeated Quack calls after ATTACH can fail with Invalid connection id when the query app has more than one replica. Keep the default 1/1 query replica shape unless Quack adds a cookie-aware or otherwise replica-safe client path.

Query It From DuckDB

After validation:

./scripts/connect-local.sh

The script:

checks the local DuckDB CLI is v1.5.3,
reads QUACK_URI and KEY_VAULT_NAME from the active AZD environment,
reads the public Quack token from Key Vault,
attaches the public query app,
inserts one smoke-test row,
reads the result back through Quack.

Equivalent SQL:

INSTALL quack;
LOAD quack;

CREATE OR REPLACE SECRET azquack_remote (
    TYPE quack,
    SCOPE 'quack:<query-app-fqdn>:443',
    TOKEN '<token-from-key-vault>'
);

ATTACH 'quack:<query-app-fqdn>:443' AS remote (TYPE quack);

FROM remote.query('FROM whoami()');
FROM remote.query(
    'INSERT INTO azquack.demo.events
     SELECT 2, ''local-client-smoke'', now()
     WHERE NOT EXISTS (
       SELECT 1 FROM azquack.demo.events WHERE event_id = 2
     )'
);
FROM remote.query('SELECT * FROM azquack.demo.events ORDER BY event_id');

Warning

The public Quack token is a write credential. A holder can run SQL against objects visible to the query app. That includes transitive write access to DuckLake through the internal catalog app. Quack authorization callbacks are not implemented in this prototype.

You are done when:

./scripts/check.sh                  passes
azd up                              completes
./scripts/validate-deployment.sh    prints "Deployment validation passed."
./scripts/connect-local.sh          attaches, writes, and reads back data

How It Works

Query Flow

Your laptop does not attach to Azure Storage or Azure Files directly. It attaches to the public query app:

sequenceDiagram
    participant Local as Local DuckDB
    participant KV as Key Vault
    participant Query as Public query app
    participant Catalog as Internal catalog app
    participant Blob as Blob / ADLS data files
    participant Files as Azure Files catalog.duckdb

    Local->>KV: read quack-token
    Local->>Query: ATTACH quack:query-app-fqdn:443
    Local->>Query: remote.query('SELECT ...')
    Query->>Catalog: DuckLake metadata over quack:catalog-app-fqdn:443
    Catalog->>Files: read/write catalog.duckdb
    Query->>Blob: read/write DuckLake data files
    Query-->>Local: result set

The important boundary:

local users get quack-token
the query app gets catalog-quack-token
the catalog app is internal-only
the query app is the only public ingress

Catalog Flow

DuckLake metadata is stored through Quack, not PostgreSQL.

sequenceDiagram
    participant ACA as Container Apps platform
    participant Query as Query app DuckDB
    participant Catalog as Catalog app DuckDB
    participant Files as Azure Files
    participant Blob as Blob / ADLS

    ACA->>Query: provide catalog-quack-token secret ref
    Query->>Catalog: ATTACH ducklake:quack:catalog-app-fqdn:443
    Query->>Catalog: catalog metadata operations
    Catalog->>Files: open /catalog/catalog.duckdb
    Query->>Blob: write/read DuckLake data files

The query app creates an internal Quack secret:

CREATE OR REPLACE SECRET azquack_catalog_quack (
    TYPE quack,
    SCOPE 'quack:<internal-catalog-app-fqdn>:443',
    TOKEN '<internal-token-from-key-vault>'
);

Then it attaches DuckLake:

ATTACH 'ducklake:quack:<internal-catalog-app-fqdn>:443' AS azquack (
    DATA_PATH 'az://lakehouse/data/',
    AUTOMATIC_MIGRATION true
);

USE azquack;

The catalog app is the only process that opens:

/catalog/catalog.duckdb

Storage And Identity

flowchart TB
    qid["Query app\nuser-assigned identity"]
    cid["Catalog app\nuser-assigned identity"]
    op["Operator principal\noptional for local tests"]
    acaStorage["Container Apps Environment\nAzure Files storage mount"]

    kvPublic["Key Vault secret\nquack-token"]
    kvCatalog["Key Vault secret\ncatalog-quack-token"]
    data["Data Storage Account\nlakehouse container\naz://lakehouse/data/"]
    acr["Azure Container Registry\nAcrPull"]
    share["Catalog Azure Files\nmounted with storage account key"]

    qid -->|"Key Vault Secrets User"| kvPublic
    qid -->|"Key Vault Secrets User"| kvCatalog
    qid -->|"Storage Blob Data Contributor"| data
    qid -->|"AcrPull"| acr

    cid -->|"Key Vault Secrets User"| kvCatalog
    cid -->|"AcrPull"| acr
    acaStorage -->|"storage account key"| share
    cid -.->|"uses mounted /catalog path"| share

    op -.->|"Key Vault Secrets User"| kvPublic
    op -.->|"Storage Blob Data Reader"| data

The local DuckDB client does not talk to Blob Storage or Azure Files directly.

What Gets Deployed

Resource	Purpose
Storage Account	DuckLake Parquet/data files under `az://lakehouse/data/`
Catalog Storage Account	Azure Files share for `/catalog/catalog.duckdb`
Internal Container App	DuckDB catalog file exposed only inside ACA through Quack
Public Container App	DuckDB query endpoint exposed to local clients through Quack
Azure Container Registry	Stores the container image
Container Apps Environment	Hosts both Container Apps
Key Vault	Stores public and internal Quack tokens
Log Analytics workspace	Container App logs
Managed identities	Separate identities for query and catalog apps

Security Boundaries

This deployment is intentionally small and easy to inspect. It is not hardened.

Boundary	Current behavior
Public ingress	query app only
Catalog ingress	internal ACA ingress only
Public token	write access to the query app
Internal token	query app to catalog app
Data storage	Blob public access disabled
Data auth	managed identity, shared-key disabled
Catalog file storage	Azure Files mount with storage account key
ACR	admin credentials disabled
Key Vault	Azure RBAC, public network access enabled

A hardened version should add private networking, stricter Quack authorization, network restrictions in front of the query app, and backup/restore automation for the catalog DuckDB file.

Cost Model

This deployment removes the PostgreSQL Flexible Server cost.

flowchart LR
    cost["Billable surfaces"]

    ca["Container Apps\n2 apps\n0.5 vCPU / 1Gi each\nminReplicas=1\nmaxReplicas=1"]
    storage["Storage\nBlob / ADLS data account\nAzure Files catalog account\n16 GiB share quota"]
    acr["ACR Basic\nadmin disabled"]
    kv["Key Vault Standard"]
    logs["Log Analytics\nPerGB2018\n30-day retention"]

    cost --> ca
    cost --> storage
    cost --> acr
    cost --> kv
    cost --> logs

It still creates billable resources:

two always-on Container App replicas
two Storage Accounts
Azure Files
Azure Container Registry Basic
Key Vault
Log Analytics

For a 730-hour month, the raw allocation inputs are:

seconds in month = 730 * 60 * 60 = 2,628,000
vCPU-seconds     = 1.0 * 2,628,000 = 2,628,000
GiB-seconds      = 2.0 * 2,628,000 = 5,256,000

Azure documents current rates and any applicable included grants or discounts. Verify them for your subscription and region:

Generated Environment Values

The AZD preprovision hook creates these values when missing:

Value	Meaning
`QUACK_TOKEN`	token for local DuckDB clients
`CATALOG_QUACK_TOKEN`	token for query app -> internal catalog app
`DUCKLAKE_DATA_PATH`	default `az://lakehouse/data/` path
`OPERATOR_PRINCIPAL_ID`	principal allowed to read the public Quack token
`OPERATOR_PRINCIPAL_TYPE`	defaults to `User`

Warning

.azure/*/.env contains generated secret values. The repo ignores it, but treat the active AZD environment as sensitive local state.

Cleanup

Remove the whole experiment:

azd down --purge --force --no-prompt

Key Vault soft delete is enabled. Purge protection is disabled for this prototype so cleanup can fully remove resources when your account has purge permission.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.codex/skills/azquack-deploy-samples		.codex/skills/azquack-deploy-samples
.github/workflows		.github/workflows
deploy		deploy
docs		docs
infra		infra
scripts		scripts
src/azquack_server		src/azquack_server
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
azure.yaml		azure.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AzQuack

The Idea

One Deployment

Azure Deployment

Before You Start

Deploy It

Check It

Test Query Replicas

Query It From DuckDB

How It Works

Query Flow

Catalog Flow

Storage And Identity

What Gets Deployed

Security Boundaries

Cost Model

Generated Environment Values

Cleanup

Learn More

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AzQuack

The Idea

One Deployment

Azure Deployment

Before You Start

Deploy It

Check It

Test Query Replicas

Query It From DuckDB

How It Works

Query Flow

Catalog Flow

Storage And Identity

What Gets Deployed

Security Boundaries

Cost Model

Generated Environment Values

Cleanup

Learn More

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages