Getting Started#
This page shows you how to get up and running with tdprepview in minutes.
Get a Free Demo Environment#
To try tdprepview, you’ll need:
- A Python client
- A Teradata Vantage instance
Options:
- Run locally with Vantage Express
- Use any hosted Teradata environment (≥ 17.20.x works)
- Easiest option: sign up for the free Clearscape Analytics Experience
→ You’ll get both a pre-configured Python environment and a cloud Vantage instance.
Installation#
Install tdprepview from PyPI:
After installation, restart your Jupyter notebook kernel or Python session.
Links:
- tdprepview on PyPI
Code Example#
Understand the power of tdprepview
in 7 steps
import teradataml as tdml
tdml.create_context(host=..., username=..., password=...)
DF_train_raw = tdml.DataFrame("churn_raw_trainig")
DF_train_raw
Output:
customerid | exited | surname | creditscore | geography | gender | age | tenure | balance | hascrcard | isactivemember | estimatedsalary | bank_products |
---|---|---|---|---|---|---|---|---|---|---|---|---|
15768104 | 0 | Wright | 788 | Spain | Male | 37.0 | 8 | 141541.25 | 0 | 0 | 66013.27 | RetirementAccount |
15809826 | 1 | Craigie | 728 | France | Female | 46.0 | 2 | 109705.52 | 1 | 0 | 20276.87 | PersonalLoan |
15717736 | 0 | Dr. Shen | 639 | Germany | Female | 46.0 | 10 | 110031.09 | 1 | 1 | 133995.59 | CertificateOfDeposit,CheckingAccount |
15748589 | 0 | Winter | 736 | France | Female | 30.0 | 9 | 0.0 | 1 | 0 | 34180.33 | CreditCard,HomeEquityLoan |
15704053 | 1 | T'ang | 710 | Spain | Male | 62.0 | 3 | 131078.42 | 1 | 0 | 119348.76 | RetirementAccount,PersonalLoan |
15806808 | 1 | Hope | 834 | Germany | Female | None | 8 | 112281.60 | 1 | 0 | 140225.14 | CheckingAccount,AutoLoan,RetirementAccount |
15694530 | 0 | Porter | 672 | France | Male | 28.0 | 4 | 167268.98 | 1 | 1 | 169469.30 | HomeEquityLoan |
15712903 | 0 | Diaz | 499 | France | Female | 21.0 | 3 | 176511.08 | 1 | 1 | 153920.22 | InvestmentFund |
15791045 | 0 | Boni | 568 | France | Female | 38.0 | 3 | 132951.92 | 0 | 1 | 124486.28 | RetirementAccount |
15803790 | 0 | Allen | 638 | Germany | Male | 37.0 | 2 | 89728.86 | 1 | 1 | 37294.88 | CertificateOfDeposit,PersonalLoan |
import tdprepview
steps = [
(['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary'],
tdprepview.SimpleImputer(strategy='mean')),
(['geography', 'gender', 'bank_products'],
tdprepview.ImputeText(kind='mode')),
(['hascrcard', 'isactivemember'],
tdprepview.SimpleImputer(strategy='most_frequent')),
(['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary'],
tdprepview.MinMaxScaler()),
(['age'],
tdprepview.PowerTransformer(method='yeo-johnson')),
(['geography'],
tdprepview.OneHotEncoder(max_categories=20)),
(['gender'],
tdprepview.LabelEncoder(elements='TOP1')),
(['bank_products'],
tdprepview.MultiLabelBinarizer(delimiter=',', max_categories=20)),
({'columns_exclude': ['customerid', 'exited', 'surname']},
tdprepview.Cast(new_type='FLOAT'))
]
pl = tdprepview.Pipeline(steps)
Output:
Fitting started. -------------------------------- Step 1 / 9 completed: Impute on ['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary'] Step 2 / 9 completed: ImputeText on ['geography', 'gender', 'bank_products'] Step 3 / 9 completed: Impute on ['hascrcard', 'isactivemember'] Step 4 / 9 completed: Scale on ['creditscore', 'age', 'tenure', 'balance', 'estimatedsalary'] Step 5 / 9 completed: PowerTransformer on ['age'] Step 6 / 9 completed: OneHotEncoder on ['geography'] Step 7 / 9 completed: LabelEncoder on ['gender'] Step 8 / 9 completed: MultiLabelBinarizer on ['bank_products'] Step 9 / 9 completed: Cast on {'columns_exclude': ['customerid', 'exited', 'surname']} -------------------------------- Fitting completed.
# 3. inspect the transformed training dataset
DF_train_transformed = pl.transform(DF_train_raw) # <- note the similarity to sklearn!
DF_train_transformed
Output:
customerid | exited | surname | creditscore | geography__OHE_1_France | geography__OHE_2_Germany | geography__OHE_3_Spain | geography__OHE_0_otherwise | gender | age | tenure | balance | hascrcard | isactivemember | estimatedsalary | bank_products__MLB_1_MortgageLoan | bank_products__MLB_2_RetirementAccount | bank_products__MLB_3_CreditCard | bank_products__MLB_4_InvestmentFund | bank_products__MLB_5_HomeEquityLoan | bank_products__MLB_6_PersonalLoan | bank_products__MLB_7_CheckingAccount | bank_products__MLB_8_SavingsAccount | bank_products__MLB_9_CertificateOfDeposit | bank_products__MLB_10_AutoLoan |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15717736 | 0 | Dr. Shen | 0.578 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.23790555341066777 | 1.0 | 0.4385489168710325 | 1.0 | 1.0 | 0.6701150534805628 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
15791045 | 0 | Boni | 0.436 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.19079254093235418 | 0.3 | 0.5299040526811483 | 0.0 | 1.0 | 0.6225546634070515 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15704053 | 1 | T'ang | 0.72 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.30525780332841196 | 0.3 | 0.5224368777603339 | 1.0 | 0.0 | 0.5968595861395666 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15806808 | 1 | Hope | 0.968 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.19651612931765508 | 0.8 | 0.4475187337010524 | 1.0 | 0.0 | 0.7012718701142033 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
15712903 | 0 | Diaz | 0.298 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.03822566855778497 | 0.3 | 0.703517005509408 | 1.0 | 1.0 | 0.7697672022558565 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15706602 | 0 | Bates | 0.82 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.15475860151334106 | 0.1 | 0.47076594043557923 | 0.0 | 1.0 | 0.7834711401017695 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
15797081 | 1 | Ajuluchukwu | 0.522 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.2528863151567174 | 0.9 | 0.4603004964963864 | 1.0 | 1.0 | 0.6934279375298211 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
15768104 | 0 | Wright | 0.876 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.18404331957205763 | 0.8 | 0.5641383892504567 | 0.0 | 0.0 | 0.3301045104125301 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15694530 | 0 | Porter | 0.644 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.11217587124389981 | 0.4 | 0.6666809354076416 | 1.0 | 1.0 | 0.847535232752731 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15748589 | 0 | Winter | 0.772 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.13009228671990822 | 0.9 | 0.0 | 1.0 | 0.0 | 0.1708934800026808 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Output:
WITH preprocessing_steps AS
(
SELECT
customerid AS c_i_0,
exited AS c_i_1,
surname AS c_i_2,
creditscore AS c_i_3,
geography AS c_i_4,
gender AS c_i_5,
age AS c_i_6,
tenure AS c_i_7,
balance AS c_i_8,
hascrcard AS c_i_9,
isactivemember AS c_i_10,
estimatedsalary AS c_i_11,
bank_products AS c_i_12,
COALESCE( c_i_3 , 650.7018 ) AS c_i_13,
COALESCE( c_i_4 , 'France' ) AS c_i_18,
COALESCE( c_i_5 , 'Male' ) AS c_i_19,
COALESCE( c_i_6 , 38.87352 ) AS c_i_14,
COALESCE( c_i_7 , 4.9726 ) AS c_i_15,
COALESCE( c_i_8 , 75582.44 ) AS c_i_16,
COALESCE( c_i_9 , 1.0 ) AS c_i_21,
COALESCE( c_i_10 , 1.0 ) AS c_i_22,
COALESCE( c_i_11 , 100690.6 ) AS c_i_17,
COALESCE( c_i_12 , 'RetirementAccount' ) AS c_i_20,
( ( c_i_13 ) - 350.0 ) / NULLIF( 500.0 , 0) AS c_i_23,
CASE c_i_18 WHEN 'France' THEN 1 ELSE 0 END AS c_i_29,
CASE c_i_18 WHEN 'Germany' THEN 1 ELSE 0 END AS c_i_30,
CASE c_i_18 WHEN 'Spain' THEN 1 ELSE 0 END AS c_i_31,
CASE WHEN (c_i_18) IS NOT IN ('France', 'Germany', 'Spain') THEN 1 ELSE 0 END AS c_i_32,
CASE c_i_19 WHEN 'Male' THEN 1 ELSE 0 END AS c_i_33,
( ( c_i_14 ) - 18.0 ) / NULLIF( 74.0 , 0) AS c_i_24,
( ( c_i_15 ) - 0.0 ) / NULLIF( 10.0 , 0) AS c_i_25,
( ( c_i_16 ) - 0.0 ) / NULLIF( 250898.1 , 0) AS c_i_26,
CAST( (c_i_21) AS FLOAT ) AS c_i_53,
CAST( (c_i_22) AS FLOAT ) AS c_i_54,
( ( c_i_17 ) - 11.58 ) / NULLIF( 199941.8 , 0) AS c_i_27,
CASE WHEN (POSITION(',MortgageLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_34,
CASE WHEN (POSITION(',RetirementAccount,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_35,
CASE WHEN (POSITION(',CreditCard,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_36,
CASE WHEN (POSITION(',InvestmentFund,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_37,
CASE WHEN (POSITION(',HomeEquityLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_38,
CASE WHEN (POSITION(',PersonalLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_39,
CASE WHEN (POSITION(',CheckingAccount,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_40,
CASE WHEN (POSITION(',SavingsAccount,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_41,
CASE WHEN (POSITION(',CertificateOfDeposit,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_42,
CASE WHEN (POSITION(',AutoLoan,' IN ','||c_i_20||','))>0 THEN 1 ELSE 0 END AS c_i_43,
CAST( (c_i_23) AS FLOAT ) AS c_i_44,
CAST( (c_i_29) AS FLOAT ) AS c_i_45,
CAST( (c_i_30) AS FLOAT ) AS c_i_46,
CAST( (c_i_31) AS FLOAT ) AS c_i_47,
CAST( (c_i_32) AS FLOAT ) AS c_i_48,
CAST( (c_i_33) AS FLOAT ) AS c_i_49,
CASE WHEN c_i_24 >= 0.0 THEN (POWER(c_i_24 + 1 ,-1.968491)-1)/(-1.968491) ELSE -(POWER(-c_i_24 + 1,2-(-1.968491))-1)/(2-(-1.968491)) END AS c_i_28,
CAST( (c_i_25) AS FLOAT ) AS c_i_51,
CAST( (c_i_26) AS FLOAT ) AS c_i_52,
CAST( (c_i_27) AS FLOAT ) AS c_i_55,
CAST( (c_i_34) AS FLOAT ) AS c_i_56,
CAST( (c_i_35) AS FLOAT ) AS c_i_57,
CAST( (c_i_36) AS FLOAT ) AS c_i_58,
CAST( (c_i_37) AS FLOAT ) AS c_i_59,
CAST( (c_i_38) AS FLOAT ) AS c_i_60,
CAST( (c_i_39) AS FLOAT ) AS c_i_61,
CAST( (c_i_40) AS FLOAT ) AS c_i_62,
CAST( (c_i_41) AS FLOAT ) AS c_i_63,
CAST( (c_i_42) AS FLOAT ) AS c_i_64,
CAST( (c_i_43) AS FLOAT ) AS c_i_65,
CAST( (c_i_28) AS FLOAT ) AS c_i_50
FROM
demo_user.order_raw_trainig t
)
SELECT
c_i_0 AS customerid,
c_i_1 AS exited,
c_i_2 AS surname,
c_i_44 AS creditscore,
c_i_45 AS geography__OHE_1_France,
c_i_46 AS geography__OHE_2_Germany,
c_i_47 AS geography__OHE_3_Spain,
c_i_48 AS geography__OHE_0_otherwise,
c_i_49 AS gender,
c_i_50 AS age,
c_i_51 AS tenure,
c_i_52 AS balance,
c_i_53 AS hascrcard,
c_i_54 AS isactivemember,
c_i_55 AS estimatedsalary,
c_i_56 AS bank_products__MLB_1_MortgageLoan,
c_i_57 AS bank_products__MLB_2_RetirementAccount,
c_i_58 AS bank_products__MLB_3_CreditCard,
c_i_59 AS bank_products__MLB_4_InvestmentFund,
c_i_60 AS bank_products__MLB_5_HomeEquityLoan,
c_i_61 AS bank_products__MLB_6_PersonalLoan,
c_i_62 AS bank_products__MLB_7_CheckingAccount,
c_i_63 AS bank_products__MLB_8_SavingsAccount,
c_i_64 AS bank_products__MLB_9_CertificateOfDeposit,
c_i_65 AS bank_products__MLB_10_AutoLoan
FROM
preprocessing_steps t
input_schema = "input_db"
output_schema = "production_db"
view_ADS_training = "churn_ADS_trainig"
view_ADS_scoring = "churn_ADS_scoring"
# training
pl.transform(
create_replace_view=True, # <- this parameter is key. it will call a REPLACE VIEW statement.
schema_name=input_schema,
table_name=view_raw_training,
return_type=None,
output_schema_name=output_schema,
output_view_name=view_ADS_training)
#note how we can take the pipeline fitted with the training data set for the scoring data set as well!
#tdprepview will take *automatically* care of managing columns that were not present at training or at scoring (e.g the target column)
pl.transform(
create_replace_view=True, # this parameter is key. it will call a REPLACE VIEW statement.
schema_name=input_schema,
table_name=view_raw_scoring,
return_type=None,
output_schema_name=output_schema,
output_view_name=view_ADS_scoring)
# let's inspect the inference view
# this can now be accessed from any SQL interface
DF_ADS_scoring = tdml.DataFrame.from_query(
"SELECT * FROM production_db.churn_ADS_scoring"
)
DF_ADS_scoring
Output:
customerid | surname | creditscore | geography__OHE_1_France | geography__OHE_2_Germany | geography__OHE_3_Spain | geography__OHE_0_otherwise | gender | age | tenure | balance | hascrcard | isactivemember | estimatedsalary | bank_products__MLB_1_MortgageLoan | bank_products__MLB_2_RetirementAccount | bank_products__MLB_3_CreditCard | bank_products__MLB_4_InvestmentFund | bank_products__MLB_5_HomeEquityLoan | bank_products__MLB_6_PersonalLoan | bank_products__MLB_7_CheckingAccount | bank_products__MLB_8_SavingsAccount | bank_products__MLB_9_CertificateOfDeposit | bank_products__MLB_10_AutoLoan |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15602909 | Dickson | 0.508 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.20981537732382088 | 1.0 | 0.0 | 1.0 | 1.0 | 0.8313059600343701 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
15618203 | Tien | 0.846 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.2621881681708609 | 0.8 | 0.46312686305715345 | 1.0 | 1.0 | 0.43357527040368743 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
15674811 | Kellway | 0.778 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.12129049768892528 | 0.3 | 0.23669362183292741 | 1.0 | 1.0 | 0.5277654797546086 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15583863 | Chimaobim | 0.662 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.2528863151567174 | 0.8 | 0.5697379932331094 | 0.0 | 0.0 | 0.9366172056068316 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
15668775 | Pendred | 0.814 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.19651612931765508 | 0.3 | 0.5211163416542414 | 1.0 | 0.0 | 0.7192991160427685 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
15679909 | Dr. Pugliesi | 0.63 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.20981537732382088 | 0.8 | 0.0 | 1.0 | 0.0 | 0.660896020742036 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
15609618 | Fanucci | 0.742 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.11217587124389981 | 0.9 | 0.6156903539723896 | 0.0 | 1.0 | 0.5065942189177051 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
15576216 | Chienezie | 0.61 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.18404331957205763 | 0.4 | 0.4338923252109123 | 1.0 | 0.0 | 0.3978332694814191 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15685476 | Tseng | 0.616 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.13859542861642782 | 0.5 | 0.39889556756308636 | 0.0 | 1.0 | 0.24906397761748666 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
15603582 | Robertson | 0.438 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.16244323778498831 | 0.3 | 0.0 | 1.0 | 0.0 | 0.6701247563040845 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Deploy with Bring Your Own Model (BYOM) and ONNXPredict#
You can seamlessly integrate tdprepview preprocessing steps into an ONNXPredict pipeline,
allowing you to deploy machine learning models directly in Vantage — no Python container needed.
Example walkthrough:
Clearscape Cookbook: LightGBM + ONNX + tdprepview