Introduction

What is PB?

PB stands for Profile Builder. It allows you to transform the data in your warehouse, without writing complex SQL queries! Each PB model represents a transformation which takes an existing table or another model’s output as its input and uses the inputs to create an output table or view. SQL transformations are run within the warehouse while Python transformations may use ML Inference in notebooks running outside the warehouse.

../_images/IntroBlockDiagram.png

Current app version is v 0.10.6.

How can I use PB?

You can create a PB Project, which can then create ID stitcher or Entity Features using CLI or web app.

CLI

Following is the output of pb showcasing its various functionalities.

$ pb <command> <subcommand> [parameters]
Usage:
pb [command]

Available Commands:
cleanup     cleanup materials
compile     compile models
discover    discover models, sources, entities, features, materials
help        Help about any command
init        Create an initial pb project
insert      load sample data in warehouse
migrate     migrate a project schema to the latest version
query       run SQL query
run         run models
show        Show models in the project
validate    validate aspects of the project and configuration
version     print the version

Flags:
-h, --help   help for pb

Use "pb [command] --help" for more information about a command.

PB Project

Models contained in PB Project can be run using the command line tool to create initial projects, run the project, query the warehouse for sample outputs, discover features in the warehouse, etc.

PB Project is a collection of interdependant warehouse transformations. Each transformation is captured in a PB model. Various types of models are supported like ID stitching, entity features, custom models using Python. External transformations in Python are also on the roadmap.

Scheduling via web app

Using the command line tool, you can run PB projects any time you want. However, once your models are stable, you might want to schedule their invocations periodically(eg. everyday, every 12 hours, etc). That can be done by scheduling your PB run tasks on the RudderStack web app.

../_images/RS-WebApp.png

What can I achieve using PB?

ID Stitching

With ID Stitching, you can map the same ID across different platforms. For example, a website has multiple features and 3rd party tools like live chat, google analytics, SalesForce CRM etc. Let’s assume a user is not logged into live chat but is on SalesForce where they interacted with your sales team. After the data from all of these sources is loaded into your warehouse, the ID Stitching feature can tie together all of these different IDs to a single user.

Note

Anonymous ID is when you haven’t logged in, so a random number is generated in the Database. Non-anonymous ID is what is known like an email, phone number etc. So, after a user browses privately and then finally logs in, then PB will stitch that this is same ID. For more details, refer the page Identity Stitching.

Entity Features

Entity Features are generated based on events, user attributes, and other defined criteria.

One can define models that can create entity features reflecting a complete 360 data for users with id stitching, external sources, etc.

Model files are defined in YAML.

Note

Check out RudderStack’s YAML Refresher for a quick guide on base concepts, syntax, and best practices for writing code in YAML.

Materializations

The models described above, when run, will produce materials - that is, tables / views in the database that contain the results of that model run. There are two main run types:

  • discrete: The model result is calculated from it’s inputs whenever run (the default)

  • incremental: In this mode, model run reads updates from input sources and result from previous run. This makes incremental runs efficient. A good example is ID Stitcher, which builds upon graph from its previous run by incorporating recent data.

Data Sources

You can transform data from these sources:

  • Rudder EventStreams: These are loaded from Events.

  • ETL Extract: They are loaded from Cloud Extract.

  • External Tables: Your existing tables on the Warehouse, generated by other tools.

Supported Warehouses

Right now, we are supporting Snowflake, Redshift and Databricks, with more data warehouses in the pipeline.

Contact Us

In case you’re facing any issues with PB or want to talk to our team, feel free to contact us.