Identity Stitching
Each model YAML is a dictionary with a list of resources which maps resource names to resource configurations. Identity stitching is a YAML object with following keys:
![../_images/ID-Stitcher-Diagram.png](../_images/ID-Stitcher-Diagram.png)
1. resources
The key with which you start the YAML model. |
|
type |
object |
2. <<output table name>>
Name of the ID stitched table created on the DW. Say if you define this as |
|
type |
object |
3. entity_key
Set the value of this key to |
|
type |
string |
4. resource_type
As we are doing ID Stitching, set the value to |
|
type |
string |
5. resource_spec
Give specifications for fetching data |
||
type |
object |
|
properties |
||
|
type |
time |
|
type |
object |
validity_time
Fetch data which was loaded upto this time duration. Say you want to fetch all data that’s been loaded upto last 1 day, so please set this value to 24h. |
|
type |
time |
6. id_stitching
The key If you’re creating an ID Stitched table then remove this key. |
||
type |
object |
|
properties |
||
|
type |
string |
|
type |
list |
Note
If this key is set, then next two key-value pairs are listed in it (id_types and inputs).
7. id_types
types
This is a list of data types that will be fetched during ID stitching.
For instance, if you want to stitch together Salesforce ID, RudderStack anonymous ID, Email and Domain:
types:
salesforce_id:
rudder_anon_id:
email:
domain:
Note
This is not the same as database column type, such as char or int.
filters
For each type, you can add filters
which will define what data should be included or excluded from the set of values.
For instance, if you don’t want to include null or empty values in rudder_anon_id, then write it as:
types:
salesforce_id:
filters:
- type: exclude
value: ""
- type: exclude
value: "na"
If you want email addresses in a proper format:
types:
email:
filters:
- type: include
regex: "[A-Za-z0-9+_.-]+@(.+)"
8. inputs
Edges are added to known user ID signals (such as anon_id, email, etc.) by a list of all inputs. Every input defines the expression which gives identity of each row in the column. |
||
properties |
||
|
Give a name for the input model. |
|
type |
string |
|
|
Mention the table in DW that you are referring from, along with the timestamp column. Usage:
|
|
type |
string |
|
|
List of all the IDs from the source table that are to be stitched, along with column expression. You can add list of all the same fields from this source ( conditions for joining them ( |
|
type |
list |
|
|
In case the data is to be fetched as-it-is then simply mention the column name. In case an SQL expression is to be applied on the fetched then write that. |
|
type |
string |
|
|
The kind of data it belongs to, as defined in types under id_types. |
|
type |
string |
Note
You can also refer table from another Database/Schema in the same DW. For example, ref: { table: AnotherDatabase.ThisSchema.ThatTable, timestampCol: activityDate }
Warning
The resultant expression or data in sql
must evaluate to type string only.
Code Examples
See ID stitcher sample.