Introduction
What is WHT?
WHT stands for WareHouse Transformations. It allows you to transform the data in your warehouse, without writing complex SQL queries! Each WHT model represents a transformation which takes an existing table or another model’s output as its input and uses the inputs to create an output table or view. SQL transformations are run within the warehouse while Python transformations may use ML Inference in notebooks running outside the warehouse.
![../_images/IntroBlockDiagram.png](../_images/IntroBlockDiagram.png)
How can I use WHT?
WHT Project
WHT Project is a collection of inter-dependant warehouse transformations. Each transformation is captured in a WHT model. Various types of models are supported like ID stitching, feature tables, sessionisation (TBD). External transformations in Python are also on the roadmap.
CLI
Models contained in WHT Project can be run using command line tool to create initial projects, run the project, query warehouse for sample outputs, discover features in the warehouse, etc.
$ wht <command> <subcommand> [parameters]
Usage:
wht [command]
Available Commands:
compile compile models
discover discover models, sources, entities, features
help Help about any command
init Create an initial WHT project or ML project
run run models
Flags:
-h, --help help for wht
Use "wht [command] --help" for more information about a command.
Scheduling via web app
Using the command line tool, you can run WHT projects any time you want. However, once your models are stable, you might want to schedule their invocations periodically, say every day. That can be done by scheduling your WHT run tasks on the RudderStack web app (TBD).
![../_images/RS-WebApp.png](../_images/RS-WebApp.png)
ID Stitching
With ID Stitching, you can map the same ID across different platforms. For example, a website is having live chat, google analytics, SalesForce CRM etc. A user is probably not logged in to live chat but is on SalesForce where they talked to your sales team. Now, after data from all these sources gets loaded on the warehouse, then ID Stitching can tie together all these different ID’s to a single user.
ID Stitching is done by creating Model files in YAML, wherein one specifies the fields that are same in different sources and the conditions for joining them.
Note
Anonymous ID is when you haven’t logged in, so a random number is generated in the Database. Non-anonymous ID is what is known like an email, phone number etc. So, after a user browses privately and then finally logs in, then WHT will stitch that this is same ID.
Customer 360
An application which brings traits of each user at a single table. WHT allows you to create customer 360 from various sources including feature table YAML, ML notebooks, external sources, etc.
AI/ML using PyWHT
PyWHT is a python package which brings feature table and feature discovery capability of WHT into python notebooks. It brings the power of accessing WHT capabilities into python. This includes discovering and retrieving features and their metadata in ML notebooks. ML notebooks can do training on features input and also do inference; once they do then they can write back results into WH using PyWHT which will again become discoverable as part of customer 360.
Data Sources
You can do transformations on these data sources:
Rudder EventStreams: These are loaded from Events.
ETL Extract: They are loaded from Cloud Extract.
External Tables: Your existing tables on the Warehouse, generated by other tools like DBT and Airbyte.
Supported Warehouses
Right now, we are supporting Snowflake, with more data warehouses planned in the future.
Note
The same WHT Project can be loaded by an ML Notebook. This feature is currently a WIP.