Dataiku’s overhauls Data Science Studio, bringing data teams the product they need

May 19, 2015
Vote on Hacker News


At the front of the Data Scientist movement, a Zuckbergian Florian Douetteau poises himself to do for big data teams what Salesforce did for sales teams. At only two years old, Dataiku has already built up a powerful team of experienced data professionals (Douetteau himself is a member of the Exalead-mafia), and is seeing %5.5 user growth week over week currently. In France, they have established a dominate presence, with clients ranging from Blablacar to Vente-Privée, and they currently count a handful of American clients, including one in the Pharmaceutical Industry.

Today, Dataiku is announcing a large overhaul to their product – Douetteau himself hesitates to call it a V2 – that will bring collaboration to the forefront of the product.  I sat down with Douetteau to talk about the product and startup’s development:

What do you see as the biggest change between V1 & V2? What did you learn from V1 users that influenced this?

First, I would say that version naming is becoming quite arbitrary in modern software management. Typically, we’ve released a new version of DSS every quarter. So technically speaking, V2 is actually the sixth version of DSS. But sometimes it’s good to take a pause, setup a landmark, and say: we’re happy.

To build DSS V2, we focused on refining the design and user flows in order to provide DSS users with a faster iterative process. That’s because we believe that building a data analytics process is very similar to building a brand new product: our users need room for quick iterations. Some data will lead to a successful data project, some data won’t provide any value. To figure this out quickly, users need to iterate fast in order to FAIL fast. Hence the need for a streamlined user experience and enhanced productivity.

Your mission has always been to “clean” dirty data: do you see V2 as a push in this direction?

Data Cleansing is a complex problem. It won’t be solved in a day.

Here is my take on the situation: when a data team spends 1000 hours on a data project (and believe me, that’s a nice estimation), they spend approximately 600 hours on data cleansing, 200 hours in documenting and packaging, 100 hours in modeling, and another 100 hours on algorithms and optimization.

Our goal with DSS was to reduce the time spent on data cleansing and on project packaging from 800 hours to approximately 500 hours total. I’m pretty confident that DSS V2 will reduce that time to 400 hours. We’re getting closer and closer to a situation where data professionals are able to spend more time doing fun and valuable thinking and problem solving and much less time doing repetitive error-prone cleansing tasks.

What are the biggest problems users face today? What types of profiles of users are using the V1 today?

As Big Data solutions get more mature and provide more ROI, lots of our customers are moving from the “1 or 2 data guys in a corner” to the “let’s build a data team” paradigm. So what was previously a geeky/techy problem has turned into a team/organization challenge. As I’ve always had a passion for helping business minded people and technology minded people work together in a collaborative manner, I welcomed this change in paradigm whole-heartedly.

When my colleagues and I came together to build DSS, solving this team/organization challenge become one of our key missions. By looking at V1 users, I think we’ve done a good job so far. Today, DSS is actively used in a collaborative manner by very different profiles with divers skill sets – business analysts (people crunching excel reports), data scientists (people building models), and data engineers (people in charge of the database and data integration) – to create value from data.

What’s next? What’s missing? Where would you like DSS to be in 12 months?

A wise friend told me that it’s normal if I’m never satisfied with the product. Indeed, it won’t ever be finished but maybe that’s what is so exciting about building software.

Part of our goal with the next versions is to always keep DSS compatible with the most updated big data and machine learning technologies. We also want to make sure that our users are always free to choose and combine the languages and technologies they want to use in their projects. In light of this, we will continue to add subsequent version support for 2015’s super trendy Spark (data bricks, the company behind Spark raised $33M last year) as well as the older but still very popular MATLAB (which was created in 1984, a time when big data was still merely an idea straight out of science-fiction novels).

In the next 12 months, we believe that companies who want to turn raw data into business impacting predictions quickly will recognize DSS as the must have tool for their data teams.