Skip to content

Logo

tdprepview#

Python package for building in-database data preparation pipelines as Teradata SQL views.

Get Started! User Guide API Reference


Why tdprepview#

Lightning Fast in development, deployment and execution

tdprepview generates one SQL view with a WITH clause for all preprocessing and a final SELECT that exposes the desired columns with correct names—clean, auditable, and fast. All without moving data to a Python Client.

  • In-DB Pipelines --- Define preprocessing once and execute inside Teradata as a single optimized query in a single view.

  • Rich Preprocessors --- From imputers, scalers, binning, encoders to PCA and more.

  • sklearn-like API --- Familiar Pipeline(steps=...), fit, transform, and consistent options.

  • Portable Pipelines --- Save / load as dict or JSON with to_dict(), to_json(), from_dict(), from_json().

  • Auto Pipeline --- Generate pipelines from a tdml.DataFrame via heuristics with Pipeline.from_DataFrame(...) or auto_code(...).

  • Pipeline DAG --- Visualize the flow with plot_sankey(); supports schema-changing transforms.


Supported Preprocessors#

(as of v1.5.0)