Episode 151 – Do you only need 6 principles?

A little while ago, Dave came across an article by Francesca Lazzeri titled “The Data Science Mindset: Six Principles to Build Healthy Data-Driven Organizations” and in this episode we’re giving our view and expand on those principles.

Is it really possible to define a successful data science organization following 6 concrete principles?

Are these principle a step by step, one after the other plan you can follow on the road to success, or are these principle something you need to keep in mind from the start up until the end of days?

1. Understand the Business and Decision-Making Process 

We’re pretty much agreeing with this one and expanding on it, we talk about the benefits of doing this exercise on streamlining the organization and security. However, to achieve the C-level support which we agree is needed, some free-form experimentation needs to take place to get to a position where you actually have something that can be shown in a clear and concise way to said C-level.

However, when the step to production is made, higher management approval, or actually, real active support, will be a primary requirement for the future health of your project.

2. Establish Performance Metrics 

Basically this principle seems to go back to making your project S.M.A.R.T.: make sure you have a set goal in mind and a way to measure your success, or failure. Going through this exercise is probably mandatory before you go to the C-level and ask for their support since a good CEO or CFO isn’t going to give you a pile of cash if you cannot state your success criteria and intended goals… Think R.O.I. here people!

As a little bonus, there is a nice “make machine learning algorithms understandable via simple questions list here. take a look at the article, principle 2 under paragraph 3 “Define the success metrics”.

3. Architect the End-to-End Solution

While Dave has questions around the amount of detail (down to the product level) that is required at this step in the process, But the choice of products will have a large impact on the financial picture of the environment and therefore, your budget will influence your end-to-end architecture.

However, going back to our initial questions around whether these principles are a step by step guide to follow or a bunch of guidelines to keep in mind, we feel this principle needs to be revisited a couple of times along the road…

As a bonus, we discuss in a little more depth how the choice of big data tools and products can and will influence your spend.

4. Build Your Toolbox of Data Science Tricks

This is where you could see a refinement of the brad strokes that are defines in the previous principle. On the other hand, since this talks specifically of “data science” tricks, this could also be seen as attached to the Data Scientist role specifically.

In this section we also cover how you should avoid “inventing the wheel” over and over again and how standardizing on a set of technologies can really help and accelerate your project. Important to understand is that in this modern age, standardizing does not mean it is set in stone for ever. Quite the opposite, agile methodology includes updating and adapting your standards to new realities all the time! The DevOps practices around CI/CD will almost always require some forethought when new tools are introduced and this can works well with a level of standard enforcement.

5. Unify Your Organization’s Data Science Vision

We both agreed here that this step is way late in the 6 steps, idf the steps are supposed to be gone through in sequence. More likely, this principle should be well defined from the start and be a primary principle across the whole thing.

Apart from that, we completely support the idea that you need a common vision well established in order to make your data environment successful.

This is also a principle that we feel is really not a good fit for the “step by step” approach but should really be part of every step in your data journey, from the earliest start until, again, the end of time.

6. Keep Humans in the Loop

Again a principle that really seems out of place in a step by step approach (you mean people were not involved in the previous 5 steps?), but very important nevertheless. There is always the danger that products and tools become the focus of a big data science project or environment. It should never be forgotten that it is people that create value and tools help the people doing so more quickly or more precisely, improving the value that is generated.

Apart from that, becoming a data driven company and the whole concept of digital transformation in most cases requires a fundamental culture shift across the company. Getting the people on board is one of the most important things to get right!

Also, v-team definitely means “Vendetta-Team” since V is for Vendetta!

Conclusion

We believe that the six principles are definitely good to keep in mind ad the article does overall a good job at describing them. However while we like the idea of ading a practical example to the theory, in this case those parts really were too generic and became too much of an Azure advertorial.

We also miss a discussion on what can go wrong: more lessons are learned from failure than from success and the content of the article would be much improved if the practical part also included the sour with the sweet because without the sour, the sweet ain’t so sweet.


Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Jhon Masschelein

Author: Jhon Masschelein

Tackler of advanced Cloud and Hadoop challenges in a world of open-source technologies. – Impossible is merely a matter of time and effort. –