Feature Tables

The key with which you assign the values for creating a feature table. A list of features can be defined, with each feature having multiple tablevar and/or macro.:

- name: domain_profile
  model_type: feature_table_model
  model_spec:
    validity_time: 24h # 1 day
    id_stitching:
      model: domain_profile_id_stitcher
    features:
      - tablefeature:
          name: is_mql
          ref:
            input: salesForceLeadsTable
          value: max(case when salesForceLeadsTable.mql__c == 'True' then 1 else 0 end)
          description: Whether a domain is mql or not
      - tablevar:
          name: domainSummary_account_type
          ref:
            input: domainSummary
          value: domainSummary.account_type
      - macro:
          name: call_conversion
          value: type = 'Call' or tasksubtype='Call' or type like '%demo%') and \
                (rlike(lower(subject), '.*\\bcall\\b.*') or rlike(lower(subject), '.*\\bdemo\\b.*'
      - tablevar:
          name: salesCallEventTime
          ref:
            input: salesforceTasks
          value: min(activitydate)
          filter: macro(call_conversion)
      - feature:
          name: sales_conversion_timestamp
          value: salesCallEventTime
      - tablevar:
          name: salesEvents
          ref:
            input: salesforceTasks
          value: json_agg(activitydate, case when (type='Email' or tasksubtype = 'Email') then case when lower(subject) like '%[in]%' then 'sf_inbound_email' \
                else 'sf_outbound_email' end when macro(call_conversion) then 'sf_call' else null end as event)
          description: Salesforce touches are converted to one of following events - sf_inbound_email, sf_outbound_email, sf_call, null
      - tablevar:
          name: page_events_var
          ref:
            input: websitePageVisits
          filter: timestamp < sales_conversion_timestamp and timestamp > var('start_date')
          value: json_agg(timestamp, case when split_part(trim(context_page_path, '/'), '/', 1) in var('valid_page_paths') then split_part(trim(context_page_path, '/'), '/', 1) \
                else var('default_page_path') end as event)
      - feature:
          name: page_events
          value: page_events_var
      - tablevar:
          name: webhookFormSubmit
          ref:
            input: webhookSource
          filter: variable_1 is null and timestamp < sales_conversion_timestamp and timestamp > var('start_date')
          value:  min(timestamp)
      - feature:
          name: formSubmit
          value: min(webhookFormSubmit, websiteFormSubmit)

Let’s look at each field to know more about it.

name

Name of the feature table created on the DW. Say if you define this as feature_table_model

then the output table will be named something like Material_final_feature_table_<rest of generated table name>.

type

string

model_type

Define the types of models.

type

string

model_spec

Give specifications of ID stitching

type: list

properties

  • id_stitching

Defines the ID stitcher to be used.

type

string

  • validity_time

Fetch data which was loaded upto this time duration.

Say you want to fetch all data that’s been loaded upto last 1 day,

so please set this value to 24h.

type

time

  • features

The key with which you assign the values for creating a feature table.

A list of features can be defined, with each feature having multiple tablevar and/or macro.

properties:

type

list

tablevar and tablefeature

Here you define the source data, along with any conditions.tablevars are effectively helper

columns which ARE NOT included in the feature table tablefeatures ARE included in the output feature table.

properties

  • name

This is the name in which you’ll store the retrieved data.

type

string

  • ref

properties:

  • input: A reference to the source table from which data is to be fetched.

type

list

  • filter

Optional, in case you want to apply any conditions for fetching data (same as where clause in SQL).

type

string

  • value

The data that will be stored in the name.

type

string

  • default

Optional, a default value for any entities for which the calculated value would otherwise be NULL.

type

string

  • description

Optional, a textual description of what it does.

type

string

  • window

Optional, for specifying a window over which the value should be calculated.Note: You can have more than one tablevar, for defining a feature.

type

object

properties

  • order_by

Optional, list of sql expressions to use in ordering the data.

type

list

  • partition_by

Not allowed in tablevars / tablefeatures. These are already effectively partitioned by id stitched entity, and can not be partitioned any other way.

type

list

inputvar

With syntax identical to tablevar, the difference is that instead of each value being associated a row of the feature table,

it’s associated with a row of the specified input. Effectively, while a tablevar can be thought of as adding a helper column to the feature table,

an inputvar can be thought of as adding a helper column to the input.

properties

  • name

This is the name in which you’ll store the retrieved data.

type

string

  • ref

properties:

  • input: A reference to the source table from which data is to be fetched.

type

list

  • filter

Optional, in case you want to apply any conditions for fetching data (same as where clause in SQL).

type

string

  • value

The data that will be stored in the name.

type

string

  • default

Optional, a default value for any entities for which the calculated value would otherwise be NULL.

type

string

  • description

Optional, a textual description of what it does.

type

string

  • window

Optional, for specifying a window over which the value should be calculated.

type

object

properties

  • order_by

Optional, list of sql expressions to use in ordering the data.

type

list

  • partition_by

Optional, list of sql expressions to use in partitioning the data.

type

list

macro

An optional section, in case you wish to perform

a computation and store in a variable, to be referenced by a tablevar.

properties

  • name

The name in which you will store the calculated data.

type

string

  • value

Regular expression or SQL clause to calculate the value.

This section, if defined, proceedes tablevar.

type

string

feature

This is the computed feature.

properties

  • name

Define a name for the feature.

type

string

  • value

The value that will be stored in the feature.

It can be same as a ``tablevar` or any aggregation applied on it.

You can refer multiple tablevar or a `macro`.

type

string

  • filter

Optional, in case you want to apply any conditions for fetching data (same as where clause in SQL).

type

string

  • default

Optional, a default value for any entities for which the calculated value would otherwise be NULL.

type

string

  • description

Optional, a textual description of what it does.

type

string

  • window

Optional, for specifying a window over which the value should be calculated.

type

object

properties

  • order_by

Optional, list of sql expressions to use in ordering the data.

type

list

  • partition_by

Optional, list of sql expressions to use in partitioning the data.

type

list

Other changes

As we are creating a Feature Table, set the value of resource_type to feature_table_model.

resource_type: feature_table_model

Feature Table Code Examples

See Feature table sample.

Feature Tables ML

Models from ML Notebooks

You can take ML Notebooks in Jupyter and add ML/AI capabilities to it.

Feature Tables External

Integration with DBT

WHT supports DBT models so that customers on that platform can quickly start using the tool. Though WHT also does transformation like DBT, it empowers users who are already more comfortable in that platform.