Project dependencies
For a long time, dbt has supported code reuse and extension by installing other projects as packages. When you install another project as a package, you are pulling in its full source code, and adding it to your own. This enables you to call macros and run models defined in that other project.
While this is a great way to reuse code, share utility macros, and establish a starting point for common transformations, it's not a great way to enable collaboration across teams and at scale, especially in larger organizations.
This year, dbt Labs is introducing an expanded notion of dependencies
across multiple dbt projects:
- Packages — Familiar and pre-existing type of dependency. You take this dependency by installing the package's full source code (like a software library).
- Projects — A new way to take a dependency on another project. Using a metadata service that runs behind the scenes, dbt Cloud resolves references on-the-fly to public models defined in other projects. You don't need to parse or run those upstream models yourself. Instead, you treat your dependency on those models as an API that returns a dataset. The maintainer of the public model is responsible for guaranteeing its quality and stability.
Prerequisites
- Available in dbt Cloud Enterprise. If you have an Enterprise account, you can unlock these features by designating a public model and adding a cross-project ref. enterprise
- Use a supported version of dbt (v1.6, v1.7, or go versionless with "Versionless") for both the upstream ("producer") project and the downstream ("consumer") project.
- Define models in an upstream ("producer") project that are configured with
access: public
. You need at least one successful job run after defining theiraccess
. - Define a deployment environment in the upstream ("producer") project that is set to be your Production environment, and ensure it has at least one successful job run in that environment.
- Each project
name
must be unique in your dbt Cloud account. For example, if you have a dbt project (codebase) for thejaffle_marketing
team, you should not create separate projects forJaffle Marketing - Dev
andJaffle Marketing - Prod
. That isolation should instead be handled at the environment level.- We are adding support for environment-level permissions and data warehouse connections; please contact your dbt Labs account team for beta access.
- The
dbt_project.yml
file is case-sensitive, which means the project name must exactly match the name in yourdependencies.yml
. For example, if your project name isjaffle_marketing
, you should usejaffle_marketing
(notJAFFLE_MARKETING
) in all related files.
Use cases
The following setup will work for every dbt project:
- Add any package dependencies to
packages.yml
- Add any project dependencies to
dependencies.yml
However, you may be able to consolidate both into a single dependencies.yml
file. Read the following section to learn more.
About packages.yml and dependencies.yml
The dependencies.yml
. file can contain both types of dependencies: "package" and "project" dependencies.
- Package dependencies lets you add source code from someone else's dbt project into your own, like a library.
- Project dependencies provide a different way to build on top of someone else's work in dbt.
If your dbt project doesn't require the use of Jinja within the package specifications, you can simply rename your existing packages.yml
to dependencies.yml
. However, something to note is if your project's package specifications use Jinja, particularly for scenarios like adding an environment variable or a Git token method in a private Git package specification, you should continue using the packages.yml
file name.
Use the following toggles to understand the differences and determine when to use dependencies.yml
or packages.yml
(or both). Refer to the FAQs for more info.
Example
As an example, let's say you work on the Marketing team at the Jaffle Shop. The name of your team's project is jaffle_marketing
:
name: jaffle_marketing
As part of your modeling of marketing data, you need to take a dependency on two other projects:
dbt_utils
as a package: A collection of utility macros you can use while writing the SQL for your own models. This package is open-source public and maintained by dbt Labs.jaffle_finance
as a project use-case: Data models about the Jaffle Shop's revenue. This project is private and maintained by your colleagues on the Finance team. You want to select from some of this project's final models, as a starting point for your own work.
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
projects:
- name: jaffle_finance # case sensitive and matches the 'name' in the 'dbt_project.yml'
What's happening here?
The dbt_utils
package — When you run dbt deps
, dbt will pull down this package's full contents (100+ macros) as source code and add them to your environment. You can then call any macro from the package, just as you can call macros defined in your own project.
The jaffle_finance
projects — This is a new scenario. Unlike installing a package, the models in the jaffle_finance
project will not be pulled down as source code and parsed into your project. Instead, dbt Cloud provides a metadata service that resolves references to public models defined in the jaffle_finance
project.
Advantages
When you're building on top of another team's work, resolving the references in this way has several advantages:
- You're using an intentional interface designated by the model's maintainer with
access: public
. - You're keeping the scope of your project narrow, and avoiding unnecessary resources and complexity. This is faster for you and faster for dbt.
- You don't need to mirror any conditional configuration of the upstream project such as
vars
, environment variables, ortarget.name
. You can reference them directly wherever the Finance team is building their models in production. Even if the Finance team makes changes like renaming the model, changing the name of its schema, or bumping its version, yourref
would still resolve successfully. - You eliminate the risk of accidentally building those models with
dbt run
ordbt build
. While you can select those models, you can't actually build them. This prevents unexpected warehouse costs and permissions issues. This also ensures proper ownership and cost allocation for each team's models.
How to write cross-project ref
Writing ref
: Models referenced from a project
-type dependency must use two-argument ref
, including the project name:
with monthly_revenue as (
select * from {{ ref('jaffle_finance', 'monthly_revenue') }}
),
...
Cycle detection
Currently, the default behavior for "project" dependencies enforces that these relationships only go in one direction, meaning that the jaffle_finance
project could not add a new model that depends, on any public models produced by the jaffle_marketing
project. dbt will check for cycles across projects and raise errors if any are detected.
However, many teams may want to be able to share data assets back and forth between teams. We've added support for enabling bidirectional dependencies across projects, currently in beta.
To enable this in your account, set the environment variable DBT_CLOUD_PROJECT_CYCLES_ALLOWED
to TRUE
in all your dbt Cloud environments. This allows you to create bidirectional dependencies between projects, so long as the new dependency does not introduce any node-level cycles.
When setting up projects that depend on each other, it's important to do so in a stepwise fashion. Each project must run and produce public models before the original producer project can take a dependency on the original consumer project. For example, the order of operations would be as follows for a simple two-project setup:
- The
project_a
project runs in a deployment environment and produces public models. - The
project_b
project addsproject_a
as a dependency. - The
project_b
project runs in a deployment environment and produces public models. - The
project_a
project addsproject_b
as a dependency.
If you enable this feature and experience any issues, please reach out to dbt Cloud support.
For more guidance on how to use dbt Mesh, refer to the dedicated dbt Mesh guide and also our freely available dbt Mesh learning course.
Safeguarding production data with staging environments
When working in a Development environment, cross-project ref
s normally resolve to the Production environment of the project. However, to protect production data, set up a Staging deployment environment within your projects.
With a staging environment integrated into the project, dbt Mesh automatically fetches public model information from the producer’s staging environment if the consumer is also in staging. Similarly, dbt Mesh fetches from the producer’s production environment if the consumer is in production. This ensures consistency between environments and adds a layer of security by preventing access to production data during development workflows.
Read Why use a staging environment for more information about the benefits.
Staging with downstream dependencies
dbt Cloud begins using the Staging environment to resolve cross-project references from downstream projects as soon as it exists in a project without "fail-over" to Production. To avoid causing downtime for downstream developers, you should define and trigger a job before marking the environment as Staging:
- Create a new environment, but do NOT mark it as Staging.
- Define a job in that environment.
- Trigger the job to run, and ensure it completes successfully.
- Update the environment to mark it as Staging.
Comparison
If you were to instead install the jaffle_finance
project as a package
dependency, you would instead be pulling down its full source code and adding it to your runtime environment. This means:
- dbt needs to parse and resolve more inputs (which is slower)
- dbt expects you to configure these models as if they were your own (with
vars
, env vars, etc) - dbt will run these models as your own unless you explicitly
--exclude
them - You could be using the project's models in a way that their maintainer (the Finance team) hasn't intended
There are a few cases where installing another internal project as a package can be a useful pattern:
- Unified deployments — In a production environment, if the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both
jaffle_finance
andjaffle_marketing
, they could use dbt's selection syntax to create a new "passthrough" project that installed both projects as packages. - Coordinated changes — In development, if you wanted to test the effects of a change to a public model in an upstream project (
jaffle_finance.monthly_revenue
) on a downstream model (jaffle_marketing.roi_by_channel
) before introducing changes to a staging or production environment, you can install thejaffle_finance
package as a package withinjaffle_marketing
. The installation can point to a specific git branch, however, if you find yourself frequently needing to perform end-to-end testing across both projects, we recommend you re-examine if this represents a stable interface boundary.
These are the exceptions, rather than the rule. Installing another team's project as a package adds complexity, latency, and risk of unnecessary costs. By defining clear interface boundaries across teams, by serving one team's public models as "APIs" to another, and by enabling practitioners to develop with a more narrowly defined scope, we can enable more people to contribute, with more confidence, while requiring less context upfront.
FAQs
Can I define private packages in the dependencies.yml
file?
Related docs
- Refer to the dbt Mesh guide for more guidance on how to use dbt Mesh.
- Quickstart with dbt Mesh