CLI Reference

How to Use

The basic syntax of executing a command is:

$ pb <command> <subcommand> [parameters]

cleanup

Display and remove materials, older than a retention time period specified by the user (default: 180 days).

Command

pb cleanup materials -r <number of days>

Optional Parameters

-r

Retention time in number of days. Say if you pass 1 as argument, that all materials created prior to one day (24 hours) will be listed. This is followed by prompts asking the user for confirmation, post which you can view the material names and delete them.

compile

Generates SQL queries from models.

Command

pb compile

This will create SQL queries from the model YAML file, storing the generated results in output subfolder of the project’s folder. With each run, a new folder will be created inside it, with name being the number indicating the models were run. You can manually execute these SQL files on the warehouse.

Optional Parameters

clean_output

This will empty the output folder(s) before executing the command.

-c

In case you want to use a site config file other than the one in /.pb/ directory. An example of how to use:

$ pb compile -c MyOtherConnection/siteconfig.yaml

-t

You can use it to define target name (as mentioned in siteconfig.yaml).

Say your siteconfig.yaml has two targets, dev and test, and you want to use the test instance:

$ pb compile -t test

--begin_time || --end_time

What timestamp to run your models for.

When you want to run the model as of now (the default if the flag is not specified):

$ pb compile

When you want to utilize all the data in source tables from 1 June 2022:

$ pb compile --begin_time 2022-06-01T12:00:00.0Z

To utilize data until a user-defined timestamp (epoch), say 6 August 2022:

$ pb compile --end_time 1659794654

-p

In case you want to use a project file other than the one in current directory:

$ pb compile -p MyOtherProject

Fetch a project from a URL such as GitHub:

$ pb compile -p git@github.com:<orgname>/<repo>

Extending the previous command, you can also fetch a specific tag:

$ pb compile -p git@github.com:<orgname>/<repo>/tag/<tag_version>/<folderpath>

--rebase-incremental

Any incremental models will be rebased - that is built afresh from their inputs - instead of starting from a previous run. Intended as something that can be done every once in a while to address the stray delete / update in inputs that is generally just appended to. Also useful if there was a major migration / cleanup of an input table.

--seq_no

Continue a previous run by specifying its sequence number. Models that already exist would not be rebuilt, unless --force is also specified. Check discover command or run logs for existing sequence numbers.

--migrate_on_load

By default this flag creates a folder called migrations inside the project folder that has migrated version of your project to the latest schema version and executes the desired operation. This simplifies the process of upgrading your project to the latest schema version, without changing source files in your original project folder.

--migrated_folder_path

Continuing from previous command, you can use this flag to change folder location of the migrated project.

discover

Discovers elements in the warehouse, such as models, entities, features and sources.

Command

pb discover

This allows you to discover all the registered elements in the warehouse.

Subcommands

You can give a subcommand to discover all the entities, features, materials, models and sources in the warehouse.

$ pb discover entities

$ pb discover features

$ pb discover materials

$ pb discover models

$ pb discover sources

Optional Parameters

-e

If you want to discover specific entities with that name.

$ pb discover -e 'TheName'

-m

To discover a specific model.

$ pb discover -m 'MY_DATABASE.PROD_SCHEMA.CREATED_MODEL'

-c

To use a site config other than the default one. Extending the previous example to show a combination of both:

$ pb discover -m 'MY_DATABASE.PROD_SCHEMA.CREATED_MODEL' -c siteconfig.yaml

-s

To discover entities in a specified schema.

-s "*"

To discover entities across all schemas (note: case sensitive).

-u

Discover the entities having these source URL’s. Say you want to discover all the entities coming from GitHub:

$ pb discover -u %github%

-t

Select a target (as mentioned in siteconfig.yaml).

-p

Use a project folder other than the one in current directory.

$ pb discover -p ThisFolder/ThatSubFolder/SomeOtherProject/

-f

Specify this flag with a file path to dump the discovery output into a csv file.

$ pb discover -f path/to/csv_file.csv

-k

Restricts discovery to the specified model keys.

$ pb discover -k entity_key:mode_type:model_name

help

Provides information on any command.

Command

$ pb help

Get list of all the commands.

Subcommand

$ pb help <command name>

Get usage information for a specific command, with subcommands and optional parameters.

init

Create connection and initialize projects.

Subcommands

pb init connection

This will input values for a warehouse connection and then store it in siteconfig.yaml file at .pb folder of home directory.

pb init pb-project

This will generate files in a folder named HelloPbProject with sample data, that you can change as per project information, models, etc.

Optional Parameters

pb-project -o

To create a pb-project with a different name, specify it as an additional parameter at end of command.

$ pb init pb-project -o SomeOtherProject

This will create pb-project in the folder SomeOtherProject.

connection -c

To create siteconfig at a location other than .pb inside home directory.

$ pb init connection -c myconfig.yaml

This will create myconfig.yaml in the current folder.

insert

This command allows you to store the test dataset in your own warehouse (Snowflake). It creates the tables sample_rs_demo_identifies and sample_rs_demo_tracks in your warehouse schema specified in the test connection.

# Select the first connection named test having target and output as dev, of type Snowflake.
$ pb insert

# By default it'll pick up connection named test. To use connection named red:
$ pb insert -n red

# To pick up connection named red, with target test .
$ pb insert -n red -t test

migrate

Using this command, you can migrate your project to the latest schema. Say, your project is on schema version 9, and you want to migrate to 18. You can do that using one of the two subcommands of migrate:

Command

pb migrate <subcommand>

Subcommands

pb migrate manual

Based on the current schema version of your project, it will enlist all the steps needed to migrate it to the latest one.

pb migrate auto

Automatically migrate from one version to another.

Optional Parameters

-p

To use a project file other than the one in current directory.

-c

To use a siteconfig file other than the one in user’s home directory.

-t

Target name to be used. defaults to the one specified in siteconfig file.

-v

Version to which the project is to be migrated. Defaults to max version.

-d

(auto) Destination folder where migrated project files are to be stored.

Usage: pb migrate auto -d FolderName

--force

(auto) In case of any warnings, they are ignored and the project is migrated.

--inplace

(auto) Overwrite the source folder and store migrated project files in place of original.

Usage: pb migrate auto --inplace

Note

Also, you may refer Migrate your existing project.

run

Creates ID Stitched or Feature Table on the Warehouse.

Command

pb run

This will genrate the SQL from model files and also execute it on the warehouse. After the command execution is complete, you can see the names of output tables on the screen, which you can access from the warehouse.

Optional Parameters

Same as in compile command, with these addtional parameters:

--force

It will do a force run, even if the material already exists.

-- write_output_csv

Write all the generated tables to CSV files in the specified directory.

$ pb run -- WriteOutputHere.csv

--model_args

Use this to customise behavior of any individual model by passing configuration params to it. Different model types support different params. The only argType supported currently is breakpoint for models of type feature_table_model.

The parameter breakpoint allows you to generate and run SQL only till a specific feature/tablevar. It is specified in the format modelName:argType:argName where argName is the name of feature/tablevar. For example:

$ pb run --model_args domain_profile:breakpoint:salesforceEvents

--model_refs

Restricts the operation to a specified model. You can specify model references, such as pb run --model_refs models/user_id_stitcher.

--ignore_model_errors

Use this to let the project continue running in case of an erroneous model. So, the execution wouldn’t stop due to 1 bad model.

--grep_var_dependencies

It uses regex pattern matching over fields from vars to find the references to other vars and set dependencies. By default, it is set to true.

show

This command is a valuable tool for obtaining a comprehensive overview of the models, id_clusters, packages, and more in a project. Its capacity to provide detailed information makes it particularly useful when searching for specific details, such as all the models in your project.

Command

$ pb show

Subcommands

$ pb show models

This subcommand allows you to view information about the models in your project.

The output includes the following information about each model:

Warehouse name: Name of the table/view to be created in the warehouse.
Model type: Whether the model is an id_stitcher / feature_table_model / sql_template etc.
Output type: If the output created is Ephemeral / Table / View.
Run type: In case the model’s run type is Discrete / Incremental.
SQL type: Whether the SQL type of the model is Single select / Multiple statements.

$ pb show dependencies

This subcommand generates a graph file (dependencies.png) showing dependencies of all the models in your project.

$ pb show dataflow

This subcommand generates a graph file (dataflow.png) showing dataflow of all the models in your project.

$ pb show idstitcher-report --id_stitcher_model models/<ModelName> --migrate_on_load

This subcommand creates a detailed report about IDStitcher model runs. By default it picks up the last run, which can be changed using flag -l. The display output consists of:

ModelRef: The model reference name.
Seq No: Sequence Number of the run you’re creating report for.
Material Name: The name of output as created in warehouse.
Creation Time: Time when material object was created.
Model Converged: If true then it indicates a successful run.
Pre Stitched IDs before run: Count of all the ID’s before stitching.
Post Stitched IDs after run: Count of unique ID’s after stitching.

An HTML report is also generated with relevant results and graphics including largest cluster, ID graph, etc. It is saved in output folder, whose exact path is shown on screen when you execute the command.

$ pb show user-lookup -v '<trait value (email, user id, etc.) you are trying to discover>'

This subcommand lists all the features associated with a user by using any of the traits (flag -v) as ID Types.

Flags

--include_disabled - let disabled models be part of the generated graph image (applicable to dataflow and dependencies).
--seq_no - using this, specific run for an ID stitcher model can be specified (applicable for idstitcher-report).

Optional Parameters

-h: Displays help information for the command.
-p: Specifies the project path for which to list the models. If not specified, the project in current directory is used.
-c: File location of the siteconfig to be used. If not specified, defaults to the one in user’s home directory.
-t: Target name to be used. Defaults to the target specified in siteconfig file.

query

This command will execute SQL query on the warehouse and print the output on screen (default: 10 rows).

Command

$ pb query <query>

For instance, you want to print output of a specific table/view named user_id_stitcher: pb query "select * from user_id_stitcher".

If you want to reference a model with name user_id_stitcher: pb query "select * from {{this.DeRef("models/user_id_stitcher")}}".

Optional Parameters

-f: Export output to a CSV file.
–max_rows: Maximum number of rows that will be printed (default 10).
–seq_no: Sequence number for the run.

validate

Validates aspects of the project and configuration.

Command

$ pb validate

This command allows you to run various tests on the project related configurations and validate those. This includes but is not limited to validating the project configuration, privileges associated with the role specified in the site configuration of the project’s connection, etc.

Subcommands

$ pb validate access

This will run tests on the role specified in site configuration of the project’s connection and validate if the role has privileges to access all the related objects in the warehouse. This will throw error if the role does not has privileges to access the input tables or does not has permissions to write the material output in the output schema.

version

Shows the current PB version along with its GitHash.

Command

pb version