Skip to main content

How-Tos

Editable Datasets in Dataiku

Robert Rouse
AuthorRobert Rouse

In data analytics, effective data management is crucial. Dataiku offers an innovative feature, editable datasets, which allows users to directly edit data within the platform. This article (and companion video) provides a quick guide on how to leverage editable datasets to streamline your data workflows.

Why Use Editable Tables in Dataiku?

Editable tables offer several advantages:

  • Direct Editing: Make changes directly within the platform.
  • Metadata Customization: Add, group, or rename values efficiently.
  • Spreadsheet-Like Interface: Familiar functionality similar to Excel.
  • Workflow Integration: Join datasets with any data source, including APIs and databases.

Getting Started

  1. Log in to Dataiku and navigate to the project where you want to create an editable table.
  2. Click on “Datasets” on the left panel and select “New Dataset.”

Importing Data

Selecting “Editable Datasets.”
  1. Select “Editable” as the dataset type.
  2. Choose your starting point: From Scratch: Create a new, empty dataset. From Existing Dataset: Use an existing dataset. From File: Import data from an external file (e.g., Excel spreadsheet).
  3. For this example, we’ll import an Excel spreadsheet: Click on “From File”. Drag and drop your Excel file into Dataiku or browse to upload it. Preview and adjust the schema as needed.

Editing Datasets

  1. Start Editing Directly: Click “Edit” to unlock the dataset for editing. Modify column values, add metadata, or rename fields as required. For example, in our case we’re fixing URL errors by correcting “http://” prefixes.
  2. Change Data Types: Identify incorrect data types and change them. Click on the column header, then select the appropriate data type (e.g., “String”).
  3. Save Changes: Click “Save” after editing to store changes.
Editing URL typos found in a dataset.

Joining and Integrating with Other Sources

  1. Join with Other Sources: Click on “Flow” to view the data workflow. Add a “Join” step to combine your edited dataset with other data sources, such as databases or APIs.
  2. Integrate with Workflows: Incorporate the edited dataset into your data pipeline. Analyze or visualize the data using notebooks, dashboards, or custom scrips.

Editable tables in Dataiku provide a powerful solution for managing data directly within the platform. This feature not only improves data accuracy, but also enhances workflow efficiency by eliminating the need for external tools.

Related Resources