Identity Stitching

Before starting with Identity Stitching, let’s understand Identity. An Identity is a trait or feature that defines the user. When a user visits the website owner tracks user activity using various analytical tools like google analytics. For product enhancement and user engagement, the data produced by these technologies is crucial. These data are stored in various warehouses. To club these data into a single entity is tedious and hard.

../_images/identity_club.jpg

Fig. 1 Picture of single identity created from different multiple identity

Identity Stitching addresses this problem and comes up with a solution to tie all these different identities together whenever possible in a privacy-preserving way. You learn more about Identity Stitching by clicking here. To define Identity Stitcher for an entity we first need to understand below YAML snippet. This YAML snippet contains fields and values which defines configuration for creating ID Stitcher view.

models:
  - name: domain_profile_id_stitcher
    model_type: id_stitcher
    model_spec:
      validity_time: 24h # 1 day
      entity_key: user
      main_id_type: main_id
      edge_sources:
        - input: salesforceTasks
        - input: salesforceContact
        - input: websitePageVisits
        - input: webhookSource
        - input: websiteSource

Let’s look at each field to know more about it.

name

It is the name of the view, that id stitcher will create. Say if you define this as final_id_stitcher

then the output view will be named something like Material_final_id_stitcher_<rest of generated table name>.

type

string

model_type

For ID Stitcher we specify the model type as id_stitcher.

This is field help us to distinguish between feature table and id stitcher.

type

string

model_spec

This section has dependencies on input, wht_project yaml.

We specify a type as InputRef, WhtProjectRef for some fields which indicates it is referencing fields of input and wht_project yaml respectively.

Here we specify entity, validity time, primary id and input sources for ID Stitcher view.

type: object

properties

  • entity_key

It is the name of entity that we defines in input yaml.

type

WhtProjectRef

  • validity_time

Fetch data which was loaded upto this time duration.

Say you want to fetch all data that’s been loaded upto last 1 day,

so please set this value to 24h.

type

time

  • main_id_type

Name of primary id for ID Stitcher view.

type

string

  • edge_sources

In input yaml we specify the different sources of input tables. We use that source to create the ID Stitcher view.

Here we specify all those inputs that need be clubbed together.

properties:

  • input: It is the name of single input source defines in input.yaml.

type

InputRef