tdprepview#
Python package for building in-database data preparation pipelines as Teradata SQL views.
Get Started! User Guide API Reference
Why tdprepview
#
Lightning Fast in development, deployment and execution
tdprepview generates one SQL view with a WITH
clause for all preprocessing and a final SELECT
that exposes the desired columns with correct names—clean, auditable, and fast. All without moving data to a Python Client.
-
In-DB Pipelines --- Define preprocessing once and execute inside Teradata as a single optimized query in a single view.
-
Rich Preprocessors --- From imputers, scalers, binning, encoders to PCA and more.
-
sklearn-like API --- Familiar
Pipeline(steps=...)
,fit
,transform
, and consistent options. -
Portable Pipelines --- Save / load as
dict
or JSON withto_dict()
,to_json()
,from_dict()
,from_json()
. -
Auto Pipeline --- Generate pipelines from a
tdml.DataFrame
via heuristics withPipeline.from_DataFrame(...)
orauto_code(...)
. -
Pipeline DAG --- Visualize the flow with
plot_sankey()
; supports schema-changing transforms.
Supported Preprocessors#
(as of v1.5.0)
-
Impute
Fill missing values with classic and sklearn-compatible imputers.
-
Transform
Normalize, scale, cut, and apply custom transformations.
-
Discretize
Turn continuous variables into bins or binary features.
-
Feature Engineering
Expand your feature space with engineered variables.
-
Dimensionality Reduction & Misc
Reduce dimensions or perform type conversions.