Changelog#
All notable changes to this project will be documented here.
Entries are listed from latest to oldest.
v1.5.0 (2024-09-19)#
Added#
- Preprocessing Function
TargetEncoder
: Encodes categories based on a shrunk estimate of target means, mixing global and conditional means.
v1.4.0 (2024-04-08)#
Added#
- Automatic Pipeline Creation
Pipeline.from_DataFrame(...)
auto_code(...)
- Automatically builds a pipeline based on heuristics using datatypes and distributions.
v1.3.2 (2024-03-28)#
Added#
- Pipeline Persistence
- Serialize with:
mypipeline.to_dict()
mypipeline.to_json("mypipeline.json")
- Restore with:
Pipeline.from_dict(mydict)
Pipeline.from_json("mypipeline.json")
v1.3.1 (2024-03-26)#
Added#
- Preprocessing Function
PowerTransformer
(analogous to sklearn PowerTransformer)
v1.3.0 (2024-03-06)#
Added#
- Preprocessing Function
Cast
(analogous to SQL, useful to convert all but target and key to float in the final step)
Changed#
column_exclude
option for pipeline step inputs.
v1.2.0 (2024-03-06)#
Added#
- Preprocessing Function
MultiLabelBinarizer
: Encode multi-value text columns (delimited) into binary variables.
v1.1.0 (2024-03-01)#
Bugfixes, compatibility updates, optional requirements, feature hashing.
Added#
- Preprocessing Function
SimpleHashEncoder
: Hash-encode text columns using in-DBHASHROW
.
Changed#
- Smarter SQL execution:
- Uses
tdml.get_context().execute(q)
ortdml.execute_sql(q)
depending on version. - Bugfix:
IterativeImputer
query generation. plotly
&seaborn
now optional dependencies.
v1.0.2 (2023-03-06)#
Major overhaul:
tdprepview
now supports schema-changing transformations (e.g. OneHotEncoding), based on a directed acyclic graph (DAG).
Added#
plot_sankey()
for pipeline DAG visualization.- New preprocessing functions (adapted from sklearn):
- SimpleImputer, IterativeImputer, StandardScaler, MaxAbsScaler, MinMaxScaler, RobustScaler
- Normalizer, QuantileTransformer, Binarizer, PolynomialFeatures
- OneHotEncoder, PCA
Changed#
- Single optimized query with
WITH AS
for all preprocessing. - Flexible
Pipeline
steps with new input filtering via dictionaries:{'pattern': ...}
{'prefix': ...}
{'suffix': ...}
{'dtype_include': [...]}
{'dtype_exclude': [...]}
- Options for column renaming in steps:
{'prefix': ...}
{'suffix': ...}
v0.1.4 (2023-02-17)#
Added#
- Preprocessing Function
DecisionTreeBinning
v0.1.3 (2023-02-16)#
Added#
- Quickstart guide in README.
v0.1.2 (2023-02-15)#
Fixed#
- Added
*.sql
to MANIFEST.in so SQL templates are included in the distribution.
Changed#
- Converted HISTORY and README from reStructuredText → Markdown.
v0.1.0 (2023-02-15)#
Added#
- First release on PyPI.
Pipeline
withfit
andtransform
functions.- Preprocessing functions:
- Impute, ImputeText, TryCast, Scale, CutOff, FixedWidthBinning, ThresholdBinarizer
- ListBinarizer, VariableWidthBinning, LabelEncoder, CustomTransformer
- Example notebooks & demo notebook.