Identity Stitching

Each model YAML is a dictionary with a list of resources which maps resource names to resource configurations. Identity stitching is a YAML object with following keys:

../_images/ID-Stitcher-Diagram.png

1. resources

The key with which you start the YAML model.

type

object

2. <<output table name>>

Name of the ID stitched table created on the DW. Say if you define this as final_id_stitcher then the output table will be named something like Material_final_id_stitcher_<rest of generated table name>.

type

object

3. entity_key

Set the value of this key to user.

type

string

4. resource_type

As we are doing ID Stitching, set the value to id_stitcher.

type

string

5. resource_spec

Give specifications for fetching data

type

object

properties

  • validity_time

type

time

  • id_stitching

type

object

validity_time

Fetch data which was loaded upto this time duration. Say you want to fetch all data that’s been

loaded upto last 1 day, so please set this value to 24h.

type

time

6. id_stitching

The key id_stitching is to be put only if you’re creating a feature table (described in next section).

If you’re creating an ID Stitched table then remove this key.

type

object

properties

  • id_types

type

string

  • inputs

type

list

Note

If this key is set, then next two key-value pairs are listed in it (id_types and inputs).

7. id_types

types

This is a list of data types that will be fetched during ID stitching.

For instance, if you want to stitch together Salesforce ID, RudderStack anonymous ID, Email and Domain:

types:
 salesforce_id:
 rudder_anon_id:
 email:
 domain:

Note

This is not the same as database column type, such as char or int.

filters

For each type, you can add filters which will define what data should be included or excluded from the set of values. For instance, if you don’t want to include null or empty values in rudder_anon_id, then write it as:

types:
 salesforce_id:
filters:
 - type: exclude
   value: ""
 - type: exclude
   value: "na"

If you want email addresses in a proper format:

types:
 email:
filters:
 - type: include
   regex: "[A-Za-z0-9+_.-]+@(.+)"

8. inputs

Edges are added to known user ID signals (such as anon_id, email, etc.) by a list of all inputs. Every input

defines the expression which gives identity of each row in the column.

properties

  • name

Give a name for the input model.

type

string

  • ref

Mention the table in DW that you are referring from, along with the timestamp column.

Usage: ref: { table: <name of source table>,

timestampCol: <name of timestamp column in the table> }

type

string

  • ids

List of all the IDs from the source table that are to be stitched, along with

column expression. You can add list of all the same fields from this source (type) and the

conditions for joining them (sql).

type

list

  • sql

In case the data is to be fetched as-it-is then simply mention the column name.

In case an SQL expression is to be applied on the fetched then write that.

type

string

  • type

The kind of data it belongs to, as defined in types under id_types.

type

string

Note

You can also refer table from another Database/Schema in the same DW. For example, ref: { table: AnotherDatabase.ThisSchema.ThatTable, timestampCol: activityDate }

Warning

The resultant expression or data in sql must evaluate to type string only.

Code Examples

See ID stitcher sample.