Skip to content

tdprepview docs

Changelog

Changelog#

All notable changes to this project will be documented here.
Entries are listed from latest to oldest.

v1.5.0 (2024-09-19)#

Added#

Preprocessing Function
- TargetEncoder: Encodes categories based on a shrunk estimate of target means, mixing global and conditional means.

v1.4.0 (2024-04-08)#

Added#

Automatic Pipeline Creation
- Pipeline.from_DataFrame(...)
- auto_code(...)
- Automatically builds a pipeline based on heuristics using datatypes and distributions.

v1.3.2 (2024-03-28)#

Added#

Pipeline Persistence
- Serialize with:
- mypipeline.to_dict()
- mypipeline.to_json("mypipeline.json")
- Restore with:
- Pipeline.from_dict(mydict)
- Pipeline.from_json("mypipeline.json")

v1.3.1 (2024-03-26)#

Added#

Preprocessing Function
- PowerTransformer (analogous to sklearn PowerTransformer)

v1.3.0 (2024-03-06)#

Added#

Preprocessing Function
- Cast (analogous to SQL, useful to convert all but target and key to float in the final step)

Changed#

column_exclude option for pipeline step inputs.

v1.2.0 (2024-03-06)#

Added#

Preprocessing Function
- MultiLabelBinarizer: Encode multi-value text columns (delimited) into binary variables.

v1.1.0 (2024-03-01)#

Bugfixes, compatibility updates, optional requirements, feature hashing.

Added#

Preprocessing Function
- SimpleHashEncoder: Hash-encode text columns using in-DB HASHROW.

Changed#

Smarter SQL execution:
Uses tdml.get_context().execute(q) or tdml.execute_sql(q) depending on version.
Bugfix: IterativeImputer query generation.
plotly & seaborn now optional dependencies.

v1.0.2 (2023-03-06)#

Major overhaul:
tdprepview now supports schema-changing transformations (e.g. OneHotEncoding), based on a directed acyclic graph (DAG).

Added#

plot_sankey() for pipeline DAG visualization.
New preprocessing functions (adapted from sklearn):
- SimpleImputer, IterativeImputer, StandardScaler, MaxAbsScaler, MinMaxScaler, RobustScaler
- Normalizer, QuantileTransformer, Binarizer, PolynomialFeatures
- OneHotEncoder, PCA

Changed#

Single optimized query with WITH AS for all preprocessing.
Flexible Pipeline steps with new input filtering via dictionaries:
- {'pattern': ...}
- {'prefix': ...}
- {'suffix': ...}
- {'dtype_include': [...]}
- {'dtype_exclude': [...]}
Options for column renaming in steps:
- {'prefix': ...}
- {'suffix': ...}

v0.1.4 (2023-02-17)#

Added#

Preprocessing Function
- DecisionTreeBinning

v0.1.3 (2023-02-16)#

Added#

Quickstart guide in README.

v0.1.2 (2023-02-15)#

Fixed#

Added *.sql to MANIFEST.in so SQL templates are included in the distribution.

Changed#

Converted HISTORY and README from reStructuredText → Markdown.

v0.1.0 (2023-02-15)#

Added#

First release on PyPI.
Pipeline with fit and transform functions.
Preprocessing functions:
- Impute, ImputeText, TryCast, Scale, CutOff, FixedWidthBinning, ThresholdBinarizer
- ListBinarizer, VariableWidthBinning, LabelEncoder, CustomTransformer
Example notebooks & demo notebook.