Skip to main content

Tools of the Trade

Streamlining Data Projects with Custom Python Environments in Dataiku

Robert Rouse
AuthorRobert Rouse

Every analytics project deals with different sources of data, and ingesting or structuring that data sometimes requires code development. Dataiku is well-suited to meet that need, offering a robust solution for managing Python packages and environments you can use when the standard database connections are insufficient. In the latest episode of Dataiku Delights, “Easy Python Package Management” I shed light on this useful feature, demonstrating the seamless integration of custom Python environments in Dataiku.

Embracing External Libraries for Enhanced Functionality

Dataiku’s default installation includes a plethora of Python libraries essential for the day-to-day tasks of data scientists, engineers, and analysts. However, specialized tasks often demand specialized tools. Dataiku makes it easy to extend its capabilities unique to your needs.

For example, interacting with an API to retrieve QuickBooks data can be simpler if we add a Python library designed for that task. The default toolkit may fall short when dealing with the nuanced requirements of such specialized data sources. This is where external libraries save us time and effort. They offer pre-built functionalities making tasks like authentication and data retrieval much easier.

The Simplified Process of Package Management

Adding Python libraries often involves command-line interfaces or extra management tools like Anaconda. Dataiku’s integrated GUI-based approach makes it easier for a wide range of users at varying skill levels. Creating custom Python environments within Dataiku is as simple as navigating to the code environments section and specifying the required packages.

navigating to the code environments section to install custom Python environments in Dataiku
Navigating to the code environments section and specifying the packages you require.

A Case in Point: OAuth Simplification

This tutorial demonstrates that process with the addition of the Python QuickBooks library to interact with the QuickBooks API. I chose this example because it’s a common data source with a common problem: How do I write code to authenticate using OAuth and then communicate with API endpoints? In my experience, authentication is the hardest part of API-based data ingestion. OAuth methods vary from one provider to another and are sometimes a mystery to a visualization specialist like myself who must find a way to obtain the data with limited coding.

selecting the custom Python environments in Dataiku
Don’t forget to select your custom code environment.

More Efficient Data Handling

The ability to manage custom Python environments and integrate external libraries is a testament to Dataiku’s commitment to providing a flexible and powerful platform for data professionals. This feature not only enhances the platform’s versatility, but also empowers users to tailor the environment to their project’s specific needs. In the fast-paced world of data analytics, such capabilities are invaluable, enabling professionals to stay ahead of the curve and deliver impactful insights.