You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Removed the deprecated MatplotlibWriter datset. Matplotlib objects can now be handled using MatplotlibDataset.
Group datasets documentation according to the dependencies to clean up the nav bar.
Added mode save argument to ibis.TableDataset, supporting "append", "overwrite", "error"/"errorifexists", and "ignore" save modes. The deprecated overwrite save argument is mapped to mode for backward compatibility and will be removed in a future release. Specifying both mode and overwrite results in an error.
Added credentials support to ibis.TableDataset.
Added the following new datasets:
Type
Description
Location
openxml.PptxDataset
A dataset for loading and saving .pptx files (Microsoft PowerPoint) using python-pptx
kedro_datasets.openxml
Graduated the following experimental datasets to core:
Type
Description
Location
langchain.ChatOpenAIDataset
Kedro dataset for loading a ChatOpenAI LangChain model.
kedro_datasets.langchain
langchain.OpenAIEmbeddingsDataset
Kedro dataset for loading an OpenAIEmbeddings model.
kedro_datasets.langchain
langchain.ChatAnthropicDataset
A dataset for loading a ChatAnthropic LangChain model.
kedro_datasets.langchain
langchain.ChatCohereDataset
A dataset for loading a ChatCohere LangChain model.
kedro_datasets.langchain
Added the following new experimental datasets:
Type
Description
Location
langfuse.LangfuseTraceDataset
Kedro dataset to provide Langfuse tracing clients and callbacks
kedro_datasets_experimental.langfuse
langchain.LangChainPromptDataset
Kedro dataset for loading LangChain prompts
kedro_datasets_experimental.langchain
pypdf.PDFDataset
Kedro dataset to read PDF files and extract text using pypdf
kedro_datasets_experimental.pypdf
langfuse.LangfusePromptDataset
Kedro dataset for managing Langfuse prompts
kedro_datasets_experimental.langfuse
opik.OpikPromptDataset
A dataset to provide Opik integration for handling prompts
kedro_datasets_experimental.opik
opik.OpikTraceDataset
Kedro dataset to provide Opik tracing clients and callbacks
kedro_datasets_experimental.opik
Bug fixes and other changes
Add HTMLPreview type.
Fixed StudyDataset to properly propagate a RDB password through the dataset's credentials.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
A dataset for loading and saving .docx files (Microsoft Word) using python-docx
kedro_datasets.openxml
Bug fixes and other changes
Fixed PartitionedDataset to reliably load newly created partitions, particularly with ParallelRunner, by ensuring load() always re-scans the filesystem .
Add a parameter encoding inside the dataset SQLQueryDataset to choose the encoding format of the query.
Corrected the APIDataset docstring to clarify that request parameters should be passed via load_args, not as top-level arguments.
Breaking changes
kedro-datasets now requires Kedro 1.0.0 or higher.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
The CLI option --group-in-memory was altered to --group-by, which can receive the values memory or namespace. Functionality for grouping by memory was not altered.
Added a parameter to enable/disable lazy saving for PartitionedDataset.
Added ibis-athena and ibis-databricks extras for the backends added in Ibis 10.0.
Renamed MatplotlibWriter to MatplotlibDataset for consistency with other dataset naming conventions. MatplotlibWriter is deprecated and will be removed in a future release.
Added the following new experimental datasets:
Type
Description
Location
optuna.StudyDataset
A dataset for saving and loading Optuna studies.
kedro_datasets_experimental.optuna
darts.DartsTorchModelDataset
A dataset for securely saving and loading Darts Torch Forecasting Models.
kedro_datasets_experimental.darts
Bug fixes and other changes
Fixed polars.CSVDatasetsave method on Windows using utf-8 as default encoding.
Made table_name a keyword argument in the ibis.FileDataset implementation to be compatible with Ibis 10.0.
Fixed how sessions are handled in the snowflake.SnowflakeTableDataset implementation.
Fixed credentials handling in pandas.GBQQueryDataset and pandas.GBQTableDataset.
Breaking changes
Removed tracking.MetricsDataset and tracking.JSONDataset.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
Supported passing database to ibis.TableDataset for load and save operations.
Added functionality to save pandas DataFrames directly to Snowflake, facilitating seamless .csv ingestion.
Added Python 3.9, 3.10 and 3.11 support for snowflake.SnowflakeTableDataset.
Enabled connection sharing between ibis.FileDataset and ibis.TableDataset instances, thereby allowing nodes to save data loaded by one to the other (as long as they share the same connection configuration).
Added the following new experimental datasets:
Type
Description
Location
databricks.ExternalTableDataset
A dataset for accessing external tables in Databricks.
kedro_datasets_experimental.databricks
safetensors.SafetensorsDataset
A dataset for securely saving and loading files in the SafeTensors format.
kedro_datasets_experimental.safetensors
Bug fixes and other changes
Delayed backend connection for pandas.GBQTableDataset. In practice, this means that a dataset's connection details aren't used (or validated) until the dataset is accessed. On the plus side, the cost of connection isn't incurred regardless of when or whether the dataset is used. Furthermore, this makes the dataset object serializable (e.g. for use with ParallelRunner), because the unserializable client isn't part of it.
Removed the unused BigQuery client created in pandas.GBQQueryDataset. This makes the dataset object serializable (e.g. for use with ParallelRunner) by removing the unserializable object.