Skip to content

Getting Started#

This page shows you how to get up and running with tdprepview in minutes.


Get a Free Demo Environment#

To try tdprepview, you’ll need:

  • A Python client
  • A Teradata Vantage instance

Options:

  • Run locally with Vantage Express
  • Use any hosted Teradata environment (≥ 17.20.x works)
  • Easiest option: sign up for the free Clearscape Analytics Experience
    → You’ll get both a pre-configured Python environment and a cloud Vantage instance.

Installation#

Install tdprepview from PyPI:

pip install tdprepview --upgrade

After installation, restart your Jupyter notebook kernel or Python session.

Links:
- tdprepview on PyPI


Code Example#

Understand the power of tdprepview in 7 steps

import teradataml as tdml
tdml.create_context(host=..., username=..., password=...)

DF_train_raw = tdml.DataFrame("churn_raw_trainig")
DF_train_raw

Output:

customerid exited surname creditscore geography gender age tenure balance hascrcard isactivemember estimatedsalary bank_products
15768104 0 Wright 788 Spain Male 37.0 8 141541.25 0 0 66013.27 RetirementAccount
15809826 1 Craigie 728 France Female 46.0 2 109705.52 1 0 20276.87 PersonalLoan
15717736 0 Dr. Shen 639 Germany Female 46.0 10 110031.09 1 1 133995.59 CertificateOfDeposit,CheckingAccount
15748589 0 Winter 736 France Female 30.0 9 0.0 1 0 34180.33 CreditCard,HomeEquityLoan
15704053 1 T'ang 710 Spain Male 62.0 3 131078.42 1 0 119348.76 RetirementAccount,PersonalLoan
15806808 1 Hope 834 Germany Female None 8 112281.60 1 0 140225.14 CheckingAccount,AutoLoan,RetirementAccount
15694530 0 Porter 672 France Male 28.0 4 167268.98 1 1 169469.30 HomeEquityLoan
15712903 0 Diaz 499 France Female 21.0 3 176511.08 1 1 153920.22 InvestmentFund
15791045 0 Boni 568 France Female 38.0 3 132951.92 0 1 124486.28 RetirementAccount
15803790 0 Allen 638 Germany Male 37.0 2 89728.86 1 1 37294.88 CertificateOfDeposit,PersonalLoan
import tdprepview
steps = [
    (['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary'],
        tdprepview.SimpleImputer(strategy='mean')),

    (['geography', 'gender', 'bank_products'],
        tdprepview.ImputeText(kind='mode')),

    (['hascrcard', 'isactivemember'],
        tdprepview.SimpleImputer(strategy='most_frequent')),

    (['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary'],
        tdprepview.MinMaxScaler()),

    (['age'],
        tdprepview.PowerTransformer(method='yeo-johnson')),

    (['geography'],
        tdprepview.OneHotEncoder(max_categories=20)),

    (['gender'],
        tdprepview.LabelEncoder(elements='TOP1')),

    (['bank_products'],
        tdprepview.MultiLabelBinarizer(delimiter=',', max_categories=20)),

    ({'columns_exclude': ['customerid', 'exited', 'surname']},
        tdprepview.Cast(new_type='FLOAT'))
]

pl = tdprepview.Pipeline(steps)
pl.fit(DF_train_raw)

Output:

Fitting started.
--------------------------------
Step 1 / 9 completed: Impute on ['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary']
Step 2 / 9 completed: ImputeText on ['geography', 'gender', 'bank_products']
Step 3 / 9 completed: Impute on ['hascrcard', 'isactivemember']
Step 4 / 9 completed: Scale on ['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary']
Step 5 / 9 completed: PowerTransformer on ['age']
Step 6 / 9 completed: OneHotEncoder on ['geography']
Step 7 / 9 completed: LabelEncoder on ['gender']
Step 8 / 9 completed: MultiLabelBinarizer on ['bank_products']
Step 9 / 9 completed: Cast on {'columns_exclude': ['customerid', 'exited', 'surname']}
--------------------------------
Fitting completed.

fig = pl.plot_sankey()
fig

Output:

# 3. inspect the transformed training dataset
DF_train_transformed = pl.transform(DF_train_raw)  # <- note the similarity to sklearn!
DF_train_transformed

Output:

customerid exited surname creditscore geography__OHE_1_France geography__OHE_2_Germany geography__OHE_3_Spain geography__OHE_0_otherwise gender age tenure balance hascrcard isactivemember estimatedsalary bank_products__MLB_1_MortgageLoan bank_products__MLB_2_RetirementAccount bank_products__MLB_3_CreditCard bank_products__MLB_4_InvestmentFund bank_products__MLB_5_HomeEquityLoan bank_products__MLB_6_PersonalLoan bank_products__MLB_7_CheckingAccount bank_products__MLB_8_SavingsAccount bank_products__MLB_9_CertificateOfDeposit bank_products__MLB_10_AutoLoan
15717736 0 Dr. Shen 0.578 0.0 1.0 0.0 0.0 0.0 0.23790555341066777 1.0 0.4385489168710325 1.0 1.0 0.6701150534805628 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0
15791045 0 Boni 0.436 1.0 0.0 0.0 0.0 0.0 0.19079254093235418 0.3 0.5299040526811483 0.0 1.0 0.6225546634070515 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15704053 1 T'ang 0.72 0.0 0.0 1.0 0.0 1.0 0.30525780332841196 0.3 0.5224368777603339 1.0 0.0 0.5968595861395666 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
15806808 1 Hope 0.968 0.0 1.0 0.0 0.0 0.0 0.19651612931765508 0.8 0.4475187337010524 1.0 0.0 0.7012718701142033 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0
15712903 0 Diaz 0.298 1.0 0.0 0.0 0.0 0.0 0.03822566855778497 0.3 0.703517005509408 1.0 1.0 0.7697672022558565 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
15706602 0 Bates 0.82 0.0 0.0 1.0 0.0 0.0 0.15475860151334106 0.1 0.47076594043557923 0.0 1.0 0.7834711401017695 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
15797081 1 Ajuluchukwu 0.522 0.0 1.0 0.0 0.0 0.0 0.2528863151567174 0.9 0.4603004964963864 1.0 1.0 0.6934279375298211 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
15768104 0 Wright 0.876 0.0 0.0 1.0 0.0 1.0 0.18404331957205763 0.8 0.5641383892504567 0.0 0.0 0.3301045104125301 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15694530 0 Porter 0.644 1.0 0.0 0.0 0.0 1.0 0.11217587124389981 0.4 0.6666809354076416 1.0 1.0 0.847535232752731 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
15748589 0 Winter 0.772 1.0 0.0 0.0 0.0 0.0 0.13009228671990822 0.9 0.0 1.0 0.0 0.1708934800026808 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
DF_train_transformed.show_query()

Output:

WITH preprocessing_steps AS
    (
        SELECT
        customerid AS c_i_0,
        exited AS c_i_1,
        surname AS c_i_2,
        creditscore AS c_i_3,
        geography AS c_i_4,
        gender AS c_i_5,
        age AS c_i_6,
        tenure AS c_i_7,
        balance AS c_i_8,
        hascrcard AS c_i_9,
        isactivemember AS c_i_10,
        estimatedsalary AS c_i_11,
        bank_products AS c_i_12,
        COALESCE( c_i_3 , 650.7018 ) AS c_i_13,
        COALESCE( c_i_4 , 'France' ) AS c_i_18,
        COALESCE( c_i_5 , 'Male' ) AS c_i_19,
        COALESCE( c_i_6 , 38.87352 ) AS c_i_14,
        COALESCE( c_i_7 , 4.9726 ) AS c_i_15,
        COALESCE( c_i_8 , 75582.44 ) AS c_i_16,
        COALESCE( c_i_9 , 1.0 ) AS c_i_21,
        COALESCE( c_i_10 , 1.0 ) AS c_i_22,
        COALESCE( c_i_11 , 100690.6 ) AS c_i_17,
        COALESCE( c_i_12 , 'RetirementAccount' ) AS c_i_20,
         ( ( c_i_13 )  - 350.0 ) / NULLIF( 500.0 , 0)  AS c_i_23,
        CASE c_i_18 WHEN 'France' THEN 1 ELSE 0 END  AS c_i_29,
        CASE c_i_18 WHEN 'Germany' THEN 1 ELSE 0 END  AS c_i_30,
        CASE c_i_18 WHEN 'Spain' THEN 1 ELSE 0 END  AS c_i_31,
        CASE WHEN (c_i_18) IS NOT IN ('France', 'Germany', 'Spain') THEN 1 ELSE 0 END  AS c_i_32,
        CASE c_i_19 WHEN 'Male' THEN 1 ELSE 0 END  AS c_i_33,
         ( ( c_i_14 )  - 18.0 ) / NULLIF( 74.0 , 0)  AS c_i_24,
         ( ( c_i_15 )  - 0.0 ) / NULLIF( 10.0 , 0)  AS c_i_25,
         ( ( c_i_16 )  - 0.0 ) / NULLIF( 250898.1 , 0)  AS c_i_26,
        CAST( (c_i_21) AS FLOAT ) AS c_i_53,
        CAST( (c_i_22) AS FLOAT ) AS c_i_54,
         ( ( c_i_17 )  - 11.58 ) / NULLIF( 199941.8 , 0)  AS c_i_27,
        CASE WHEN (POSITION(',MortgageLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_34,
        CASE WHEN (POSITION(',RetirementAccount,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_35,
        CASE WHEN (POSITION(',CreditCard,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_36,
        CASE WHEN (POSITION(',InvestmentFund,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_37,
        CASE WHEN (POSITION(',HomeEquityLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_38,
        CASE WHEN (POSITION(',PersonalLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_39,
        CASE WHEN (POSITION(',CheckingAccount,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_40,
        CASE WHEN (POSITION(',SavingsAccount,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_41,
        CASE WHEN (POSITION(',CertificateOfDeposit,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_42,
        CASE WHEN (POSITION(',AutoLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_43,
        CAST( (c_i_23) AS FLOAT ) AS c_i_44,
        CAST( (c_i_29) AS FLOAT ) AS c_i_45,
        CAST( (c_i_30) AS FLOAT ) AS c_i_46,
        CAST( (c_i_31) AS FLOAT ) AS c_i_47,
        CAST( (c_i_32) AS FLOAT ) AS c_i_48,
        CAST( (c_i_33) AS FLOAT ) AS c_i_49,
        CASE WHEN c_i_24 >= 0.0 THEN (POWER(c_i_24 + 1 ,-1.968491)-1)/(-1.968491) ELSE -(POWER(-c_i_24 + 1,2-(-1.968491))-1)/(2-(-1.968491)) END AS c_i_28,
        CAST( (c_i_25) AS FLOAT ) AS c_i_51,
        CAST( (c_i_26) AS FLOAT ) AS c_i_52,
        CAST( (c_i_27) AS FLOAT ) AS c_i_55,
        CAST( (c_i_34) AS FLOAT ) AS c_i_56,
        CAST( (c_i_35) AS FLOAT ) AS c_i_57,
        CAST( (c_i_36) AS FLOAT ) AS c_i_58,
        CAST( (c_i_37) AS FLOAT ) AS c_i_59,
        CAST( (c_i_38) AS FLOAT ) AS c_i_60,
        CAST( (c_i_39) AS FLOAT ) AS c_i_61,
        CAST( (c_i_40) AS FLOAT ) AS c_i_62,
        CAST( (c_i_41) AS FLOAT ) AS c_i_63,
        CAST( (c_i_42) AS FLOAT ) AS c_i_64,
        CAST( (c_i_43) AS FLOAT ) AS c_i_65,
        CAST( (c_i_28) AS FLOAT ) AS c_i_50
        FROM
            demo_user.order_raw_trainig t
    )

    SELECT
        c_i_0 AS customerid,
        c_i_1 AS exited,
        c_i_2 AS surname,
        c_i_44 AS creditscore,
        c_i_45 AS geography__OHE_1_France,
        c_i_46 AS geography__OHE_2_Germany,
        c_i_47 AS geography__OHE_3_Spain,
        c_i_48 AS geography__OHE_0_otherwise,
        c_i_49 AS gender,
        c_i_50 AS age,
        c_i_51 AS tenure,
        c_i_52 AS balance,
        c_i_53 AS hascrcard,
        c_i_54 AS isactivemember,
        c_i_55 AS estimatedsalary,
        c_i_56 AS bank_products__MLB_1_MortgageLoan,
        c_i_57 AS bank_products__MLB_2_RetirementAccount,
        c_i_58 AS bank_products__MLB_3_CreditCard,
        c_i_59 AS bank_products__MLB_4_InvestmentFund,
        c_i_60 AS bank_products__MLB_5_HomeEquityLoan,
        c_i_61 AS bank_products__MLB_6_PersonalLoan,
        c_i_62 AS bank_products__MLB_7_CheckingAccount,
        c_i_63 AS bank_products__MLB_8_SavingsAccount,
        c_i_64 AS bank_products__MLB_9_CertificateOfDeposit,
        c_i_65 AS bank_products__MLB_10_AutoLoan
    FROM
    preprocessing_steps t
input_schema = "input_db"
output_schema = "production_db"
view_ADS_training = "churn_ADS_trainig"
view_ADS_scoring = "churn_ADS_scoring"

# training
pl.transform(
    create_replace_view=True, # <- this parameter is key. it will call a REPLACE VIEW statement.

    schema_name=input_schema,
    table_name=view_raw_training,
    return_type=None,   
    output_schema_name=output_schema,
    output_view_name=view_ADS_training)

#note how we can take the pipeline fitted with the training data set for the scoring data set as well!
#tdprepview will take *automatically* care of managing columns that were not present at training or at scoring (e.g the target column)
pl.transform(
    create_replace_view=True, # this parameter is key. it will call a REPLACE VIEW statement.

    schema_name=input_schema,
    table_name=view_raw_scoring,
    return_type=None,
    output_schema_name=output_schema,
    output_view_name=view_ADS_scoring)

# let's inspect the inference view 
# this can now be accessed from any SQL interface
DF_ADS_scoring = tdml.DataFrame.from_query(
    "SELECT * FROM production_db.churn_ADS_scoring"
)
DF_ADS_scoring

Output:

customerid surname creditscore geography__OHE_1_France geography__OHE_2_Germany geography__OHE_3_Spain geography__OHE_0_otherwise gender age tenure balance hascrcard isactivemember estimatedsalary bank_products__MLB_1_MortgageLoan bank_products__MLB_2_RetirementAccount bank_products__MLB_3_CreditCard bank_products__MLB_4_InvestmentFund bank_products__MLB_5_HomeEquityLoan bank_products__MLB_6_PersonalLoan bank_products__MLB_7_CheckingAccount bank_products__MLB_8_SavingsAccount bank_products__MLB_9_CertificateOfDeposit bank_products__MLB_10_AutoLoan
15602909 Dickson 0.508 0.0 0.0 1.0 0.0 0.0 0.20981537732382088 1.0 0.0 1.0 1.0 0.8313059600343701 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
15618203 Tien 0.846 0.0 1.0 0.0 0.0 1.0 0.2621881681708609 0.8 0.46312686305715345 1.0 1.0 0.43357527040368743 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
15674811 Kellway 0.778 0.0 1.0 0.0 0.0 1.0 0.12129049768892528 0.3 0.23669362183292741 1.0 1.0 0.5277654797546086 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
15583863 Chimaobim 0.662 0.0 1.0 0.0 0.0 1.0 0.2528863151567174 0.8 0.5697379932331094 0.0 0.0 0.9366172056068316 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
15668775 Pendred 0.814 1.0 0.0 0.0 0.0 1.0 0.19651612931765508 0.3 0.5211163416542414 1.0 0.0 0.7192991160427685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
15679909 Dr. Pugliesi 0.63 0.0 0.0 1.0 0.0 1.0 0.20981537732382088 0.8 0.0 1.0 0.0 0.660896020742036 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0
15609618 Fanucci 0.742 0.0 1.0 0.0 0.0 1.0 0.11217587124389981 0.9 0.6156903539723896 0.0 1.0 0.5065942189177051 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
15576216 Chienezie 0.61 0.0 1.0 0.0 0.0 0.0 0.18404331957205763 0.4 0.4338923252109123 1.0 0.0 0.3978332694814191 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15685476 Tseng 0.616 1.0 0.0 0.0 0.0 1.0 0.13859542861642782 0.5 0.39889556756308636 0.0 1.0 0.24906397761748666 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
15603582 Robertson 0.438 0.0 0.0 1.0 0.0 0.0 0.16244323778498831 0.3 0.0 1.0 0.0 0.6701247563040845 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Deploy with Bring Your Own Model (BYOM) and ONNXPredict#

You can seamlessly integrate tdprepview preprocessing steps into an ONNXPredict pipeline,
allowing you to deploy machine learning models directly in Vantage — no Python container needed.

Example walkthrough:
Clearscape Cookbook: LightGBM + ONNX + tdprepview

Workflow