Skip to content

Changelog#

All notable changes to this project will be documented here.
Entries are listed from latest to oldest.


v1.5.0 (2024-09-19)#

Added#

  • Preprocessing Function
    • TargetEncoder: Encodes categories based on a shrunk estimate of target means, mixing global and conditional means.

v1.4.0 (2024-04-08)#

Added#

  • Automatic Pipeline Creation
    • Pipeline.from_DataFrame(...)
    • auto_code(...)
    • Automatically builds a pipeline based on heuristics using datatypes and distributions.

v1.3.2 (2024-03-28)#

Added#

  • Pipeline Persistence
    • Serialize with:
    • mypipeline.to_dict()
    • mypipeline.to_json("mypipeline.json")
    • Restore with:
    • Pipeline.from_dict(mydict)
    • Pipeline.from_json("mypipeline.json")

v1.3.1 (2024-03-26)#

Added#

  • Preprocessing Function
    • PowerTransformer (analogous to sklearn PowerTransformer)

v1.3.0 (2024-03-06)#

Added#

  • Preprocessing Function
    • Cast (analogous to SQL, useful to convert all but target and key to float in the final step)

Changed#

  • column_exclude option for pipeline step inputs.

v1.2.0 (2024-03-06)#

Added#

  • Preprocessing Function
    • MultiLabelBinarizer: Encode multi-value text columns (delimited) into binary variables.

v1.1.0 (2024-03-01)#

Bugfixes, compatibility updates, optional requirements, feature hashing.

Added#

  • Preprocessing Function
    • SimpleHashEncoder: Hash-encode text columns using in-DB HASHROW.

Changed#

  • Smarter SQL execution:
  • Uses tdml.get_context().execute(q) or tdml.execute_sql(q) depending on version.
  • Bugfix: IterativeImputer query generation.
  • plotly & seaborn now optional dependencies.

v1.0.2 (2023-03-06)#

Major overhaul:
tdprepview now supports schema-changing transformations (e.g. OneHotEncoding), based on a directed acyclic graph (DAG).

Added#

  • plot_sankey() for pipeline DAG visualization.
  • New preprocessing functions (adapted from sklearn):
    • SimpleImputer, IterativeImputer, StandardScaler, MaxAbsScaler, MinMaxScaler, RobustScaler
    • Normalizer, QuantileTransformer, Binarizer, PolynomialFeatures
    • OneHotEncoder, PCA

Changed#

  • Single optimized query with WITH AS for all preprocessing.
  • Flexible Pipeline steps with new input filtering via dictionaries:
    • {'pattern': ...}
    • {'prefix': ...}
    • {'suffix': ...}
    • {'dtype_include': [...]}
    • {'dtype_exclude': [...]}
  • Options for column renaming in steps:
    • {'prefix': ...}
    • {'suffix': ...}

v0.1.4 (2023-02-17)#

Added#

  • Preprocessing Function
    • DecisionTreeBinning

v0.1.3 (2023-02-16)#

Added#

  • Quickstart guide in README.

v0.1.2 (2023-02-15)#

Fixed#

  • Added *.sql to MANIFEST.in so SQL templates are included in the distribution.

Changed#

  • Converted HISTORY and README from reStructuredText → Markdown.

v0.1.0 (2023-02-15)#

Added#

  • First release on PyPI.
  • Pipeline with fit and transform functions.
  • Preprocessing functions:
    • Impute, ImputeText, TryCast, Scale, CutOff, FixedWidthBinning, ThresholdBinarizer
    • ListBinarizer, VariableWidthBinning, LabelEncoder, CustomTransformer
  • Example notebooks & demo notebook.