tdprepview#

Python package for building in-database data preparation pipelines as Teradata SQL views.

Why `tdprepview`#

Lightning Fast in development, deployment and execution

tdprepview generates one SQL view with a WITH clause for all preprocessing and a final SELECT that exposes the desired columns with correct names—clean, auditable, and fast. All without moving data to a Python Client.

In-DB Pipelines --- Define preprocessing once and execute inside Teradata as a single optimized query in a single view.
Rich Preprocessors --- From imputers, scalers, binning, encoders to PCA and more.
sklearn-like API --- Familiar Pipeline(steps=...), fit, transform, and consistent options.
Portable Pipelines --- Save / load as dict or JSON with to_dict(), to_json(), from_dict(), from_json().
Auto Pipeline --- Generate pipelines from a tdml.DataFrame via heuristics with Pipeline.from_DataFrame(...) or auto_code(...).
Pipeline DAG --- Visualize the flow with plot_sankey(); supports schema-changing transforms.

Supported Preprocessors#

(as of v1.5.0)

Impute

Fill missing values with classic and sklearn-compatible imputers.
Transform

Normalize, scale, cut, and apply custom transformations.
Discretize

Turn continuous variables into bins or binary features.
Feature Engineering

Expand your feature space with engineered variables.
Dimensionality Reduction & Misc

Reduce dimensions or perform type conversions.
- PCA
- TryCast
- Cast

tdprepview#

Why tdprepview#

Supported Preprocessors#

Why `tdprepview`#