Introduction

What is WHT?

WHT stands for WareHouse Transformations. It allows you to transform the data in your warehouse, without writing complex SQL queries! Each WHT model represents a transformation which takes an existing table or another model’s output as its input and uses the inputs to create an output table or view. SQL transformations are run within the warehouse while Python transformations may use ML Inference in notebooks running outside the warehouse.

../_images/IntroBlockDiagram.png

How can I use WHT?

WHT Project

WHT Project is a collection of inter-dependant warehouse transformations. Each transformation is captured in a WHT model. Various types of models are supported like ID stitching, feature tables, sessionisation (TBD). External transformations in Python are also on the roadmap.

CLI

Models contained in WHT Project can be run using command line tool to create initial projects, run the project, query warehouse for sample outputs, discover features in the warehouse, etc.

$ wht <command> <subcommand> [parameters]
Usage:
wht [command]

Available Commands:
compile     compile models
discover    discover models, sources, entities, features
help        Help about any command
init        Create an initial WHT project or ML project
run         run models

Flags:
-h, --help   help for wht

Use "wht [command] --help" for more information about a command.

Scheduling via web app

Using the command line tool, you can run WHT projects any time you want. However, once your models are stable, you might want to schedule their invocations periodically, say every day. That can be done by scheduling your WHT run tasks on the RudderStack web app (TBD).

../_images/RS-WebApp.png

ID Stitching

With ID Stitching, you can map the same ID across different platforms. For example, a website is having live chat, google analytics, SalesForce CRM etc. A user is probably not logged in to live chat but is on SalesForce where they talked to your sales team. Now, after data from all these sources gets loaded on the warehouse, then ID Stitching can tie together all these different ID’s to a single user.

ID Stitching is done by creating Model files in YAML, wherein one specifies the fields that are same in different sources and the conditions for joining them.

Note

Anonymous ID is when you haven’t logged in, so a random number is generated in the Database. Non-anonymous ID is what is known like an email, phone number etc. So, after a user browses privately and then finally logs in, then WHT will stitch that this is same ID.

Customer 360

An application which brings traits of each user at a single table. WHT allows you to create customer 360 from various sources including feature table YAML, ML notebooks, external sources, etc.

AI/ML using PyWHT

PyWHT is a python package which brings feature table and feature discovery capability of WHT into python notebooks. It brings the power of accessing WHT capabilities into python. This includes discovering and retrieving features and their metadata in ML notebooks. ML notebooks can do training on features input and also do inference; once they do then they can write back results into WH using PyWHT which will again become discoverable as part of customer 360.

Data Sources

You can do transformations on these data sources:

  • Rudder EventStreams: These are loaded from Events.

  • ETL Extract: They are loaded from Cloud Extract.

  • External Tables: Your existing tables on the Warehouse, generated by other tools like DBT and Airbyte.

Supported Warehouses

Right now, we are supporting Snowflake, with more data warehouses planned in the future.

Note

The same WHT Project can be loaded by an ML Notebook. This feature is currently a WIP.