CLI Reference
cleanup
Display and remove materials, older than a retention time period specified by the user (default: 180 days).
Command
pb cleanup materials -r <number of days>
Optional Parameters
-r
Retention time in number of days. Say if you pass 1 as argument, that all materials created prior to one day (24 hours) will be listed. This is followed by prompts asking the user for confirmation, post which you can view the material names and delete them.
compile
Generates SQL queries from models.
Command
pb compile
This will create SQL queries from the model YAML file, storing the generated results in output subfolder of the project’s folder. With each run, a new folder will be created inside it, with name being the number indicating the models were run. You can manually execute these SQL files on the warehouse.
Optional Parameters
clean_output
This will empty the output folder(s) before executing the command.
-c
In case you want to use a site config file other than the one in /.pb/
directory. An example of how to use:
$ pb compile -c MyOtherConnection/siteconfig.yaml
-t
You can use it to define target name (as mentioned in siteconfig.yaml).
Say your siteconfig.yaml has two targets, dev and test, and you want to use the test instance:
$ pb compile -t test
--begin_time
|| --end_time
What timestamp to run your models for.
When you want to run the model as of now (the default if the flag is not specified):
$ pb compile
When you want to utilize all the data in source tables from 1 June 2022:
$ pb compile --begin_time 2022-06-01T12:00:00.0Z
To utilize data until a user-defined timestamp (epoch), say 6 August 2022:
$ pb compile --end_time 1659794654
-p
In case you want to use a project file other than the one in current directory:
$ pb compile -p MyOtherProject
Fetch a project from a URL such as GitHub:
$ pb compile -p git@github.com:<orgname>/<repo>
Extending the previous command, you can also fetch a specific tag:
$ pb compile -p git@github.com:<orgname>/<repo>/tag/<tag_version>/<folderpath>
--rebase-incremental
Any incremental models will be rebased - that is built afresh from their inputs - instead of starting from a previous run. Intended as something that can be done every once in a while to address the stray delete / update in inputs that is generally just appended to. Also useful if there was a major migration / cleanup of an input table.
--seq_no
Continue a previous run by specifying its sequence number. Models that already exist would not be rebuilt, unless --force
is also specified. Check discover command or run logs for existing sequence numbers.
--migrate_on_load
By default this flag creates a folder called migrations inside the project folder that has migrated version of your project to the latest schema version and executes the desired operation. This simplifies the process of upgrading your project to the latest schema version, without changing source files in your original project folder.
--migrated_folder_path
Continuing from previous command, you can use this flag to change folder location of the migrated project.
discover
Discovers elements in the warehouse, such as models, entities, features and sources.
Command
pb discover
This allows you to discover all the registered elements in the warehouse.
Subcommands
You can give a subcommand to discover all the entities
, features
, materials
, models
and sources
in the warehouse.
$ pb discover entities
$ pb discover features
$ pb discover materials
$ pb discover models
$ pb discover sources
Optional Parameters
-e
If you want to discover specific entities with that name.
$ pb discover -e 'TheName'
-m
To discover a specific model.
$ pb discover -m 'MY_DATABASE.PROD_SCHEMA.CREATED_MODEL'
-c
To use a site config other than the default one. Extending the previous example to show a combination of both:
$ pb discover -m 'MY_DATABASE.PROD_SCHEMA.CREATED_MODEL' -c siteconfig.yaml
-s
To discover entities in a specified schema.
-s "*"
To discover entities across all schemas (note: case sensitive).
-u
Discover the entities having these source URL’s. Say you want to discover all the entities coming from GitHub:
$ pb discover -u %github%
-t
Select a target (as mentioned in siteconfig.yaml).
-p
Use a project folder other than the one in current directory.
$ pb discover -p ThisFolder/ThatSubFolder/SomeOtherProject/
-f
Specify this flag with a file path to dump the discovery output into a csv file.
$ pb discover -f path/to/csv_file.csv
-k
Restricts discovery to the specified model keys.
$ pb discover -k entity_key:mode_type:model_name
help
Provides information on any command.
Command
$ pb help
Get list of all the commands.
Subcommand
$ pb help <command name>
Get usage information for a specific command, with subcommands and optional parameters.
init
Create connection and initialize projects.
Subcommands
pb init connection
This will input values for a warehouse connection and then store it in siteconfig.yaml file at .pb
folder of home directory.
pb init pb-project
This will generate files in a folder named HelloPbProject with sample data, that you can change as per project information, models, etc.
Optional Parameters
pb-project -o
To create a pb-project with a different name, specify it as an additional parameter at end of command.
$ pb init pb-project -o SomeOtherProject
This will create pb-project in the folder SomeOtherProject
.
connection -c
To create siteconfig at a location other than .pb
inside home directory.
$ pb init connection -c myconfig.yaml
This will create myconfig.yaml
in the current folder.
insert
This command allows you to store the test dataset in your own warehouse (Snowflake).
It creates the tables sample_rs_demo_identifies
and sample_rs_demo_tracks
in your warehouse schema specified in the test
connection.
# Select the first connection named test having target and output as dev, of type Snowflake.
$ pb insert
# By default it'll pick up connection named test. To use connection named red:
$ pb insert -n red
# To pick up connection named red, with target test .
$ pb insert -n red -t test
migrate
Using this command, you can migrate your project to the latest schema.
Say, your project is on schema version 9, and you want to migrate to 18.
You can do that using one of the two subcommands of migrate
:
Command
pb migrate <subcommand>
Subcommands
pb migrate manual
Based on the current schema version of your project, it will enlist all the steps needed to migrate it to the latest one.
pb migrate auto
Automatically migrate from one version to another.
Optional Parameters
-p
To use a project file other than the one in current directory.
-c
To use a siteconfig file other than the one in user’s home directory.
-t
Target name to be used. defaults to the one specified in siteconfig file.
-v
Version to which the project is to be migrated. Defaults to max version.
-d
(auto) Destination folder where migrated project files are to be stored.
Usage:
pb migrate auto -d FolderName
--force
(auto) In case of any warnings, they are ignored and the project is migrated.
--inplace
(auto) Overwrite the source folder and store migrated project files in place of original.
Usage:
pb migrate auto --inplace
Note
Also, you may refer Migrate your existing project.
run
Creates ID Stitched or Feature Table on the Warehouse.
Command
pb run
This will genrate the SQL from model files and also execute it on the warehouse. After the command execution is complete, you can see the names of output tables on the screen, which you can access from the warehouse.
Optional Parameters
Same as in compile
command, with these addtional parameters:
--force
It will do a force run, even if the material already exists.
-- write_output_csv
Write all the generated tables to CSV files in the specified directory.
$ pb run -- WriteOutputHere.csv
--model_args
Use this to customise behavior of any individual model by passing configuration params to it. Different model types support different params. The only argType supported currently is breakpoint
for models of type feature_table_model
.
The parameter breakpoint
allows you to generate and run SQL only till a specific feature/tablevar. It is specified in the format modelName:argType:argName
where argName is the name of feature/tablevar. For example:
$ pb run --model_args domain_profile:breakpoint:salesforceEvents
--model_refs
Restricts the operation to a specified model.
You can specify model references, such as pb run --model_refs models/user_id_stitcher
.
--ignore_model_errors
Use this to let the project continue running in case of an erroneous model. So, the execution wouldn’t stop due to 1 bad model.
--grep_var_dependencies
It uses regex pattern matching over fields from vars to find the references to other vars and set dependencies. By default, it is set to true
.
show
This command is a valuable tool for obtaining a comprehensive overview of the models, id_clusters, packages, and more in a project. Its capacity to provide detailed information makes it particularly useful when searching for specific details, such as all the models in your project.
Command
$ pb show
Subcommands
$ pb show models
This subcommand allows you to view information about the models in your project.
The output includes the following information about each model:
Warehouse name: Name of the table/view to be created in the warehouse.
Model type: Whether the model is an id_stitcher / feature_table_model / sql_template etc.
Output type: If the output created is Ephemeral / Table / View.
Run type: In case the model’s run type is Discrete / Incremental.
SQL type: Whether the SQL type of the model is Single select / Multiple statements.
$ pb show dependencies
This subcommand generates a graph file (dependencies.png) showing dependencies of all the models in your project.
$ pb show dataflow
This subcommand generates a graph file (dataflow.png) showing dataflow of all the models in your project.
$ pb show idstitcher-report --id_stitcher_model models/<ModelName> --migrate_on_load
This subcommand creates a detailed report about IDStitcher model runs. By default it picks up the last run, which can be changed using flag -l
. The display output consists of:
ModelRef: The model reference name.
Seq No: Sequence Number of the run you’re creating report for.
Material Name: The name of output as created in warehouse.
Creation Time: Time when material object was created.
Model Converged: If true then it indicates a successful run.
Pre Stitched IDs before run: Count of all the ID’s before stitching.
Post Stitched IDs after run: Count of unique ID’s after stitching.
An HTML report is also generated with relevant results and graphics including largest cluster, ID graph, etc.
It is saved in output
folder, whose exact path is shown on screen when you execute the command.
$ pb show user-lookup -v '<trait value (email, user id, etc.) you are trying to discover>'
This subcommand lists all the features associated with a user by using any of the traits (flag -v
) as ID Types.
Flags
--include_disabled
- let disabled models be part of the generated graph image (applicable todataflow
anddependencies
).--seq_no
- using this, specific run for an ID stitcher model can be specified (applicable foridstitcher-report
).
Optional Parameters
-h: Displays help information for the command.
-p: Specifies the project path for which to list the models. If not specified, the project in current directory is used.
-c: File location of the siteconfig to be used. If not specified, defaults to the one in user’s home directory.
-t: Target name to be used. Defaults to the target specified in siteconfig file.
query
This command will execute SQL query on the warehouse and print the output on screen (default: 10 rows).
Command
$ pb query <query>
For instance, you want to print output of a specific table/view named user_id_stitcher: pb query "select * from user_id_stitcher"
.
If you want to reference a model with name user_id_stitcher: pb query "select * from {{this.DeRef("models/user_id_stitcher")}}"
.
Optional Parameters
-f: Export output to a CSV file.
–max_rows: Maximum number of rows that will be printed (default 10).
–seq_no: Sequence number for the run.
validate
Validates aspects of the project and configuration.
Command
$ pb validate
This command allows you to run various tests on the project related configurations and validate those. This includes but is not limited to validating the project configuration, privileges associated with the role specified in the site configuration of the project’s connection, etc.
Subcommands
$ pb validate access
This will run tests on the role specified in site configuration of the project’s connection and validate if the role has privileges to access all the related objects in the warehouse. This will throw error if the role does not has privileges to access the input tables or does not has permissions to write the material output in the output schema.