Open Data Science

Everyone is talking about data science. One study found that 96% of companies believe that data science is integral to the success of their business. Yet, most of these organizations (70%) are not realizing its full potential. They cite such factors as poor data quality, lack of talent, and access to proper tools and technology[1]. Data science is on the verge of changing the nature of business. Obviously, to achieve this end we must overcome these barriers. This is the point of Open Data Science. Let’s begin with an example of open.

The effectiveness of the open software movement is demonstrated by the success of Linux. In 1982, during this Greed is Good era, Richard Stallman came along championing the free Software movement. As Stallman explains; “When we call software ‘free’ we mean that it respects the users’ essential freedoms: the freedom to run it, to study and change it, and to redistribute copies with or without changes…This is a matter of freedom, not price, so think ‘free speech’, not ‘free beer’”[2]. To ensure that it was understood the movement was about liberty and not price, the free software movement was recast as the open software movement.

Under the open software philosophy, Stallman wrote and distributed GNU, which was the beginning of a Unix compatible operating system.  Unfortunately, Stallman ran into issues with completing the kernel. At about this time, Linus Torvalds, a University of Helsinki student, created a Unix compatible kernel which he named Linux. Citing that he was following in the footsteps of scientist and academics who built their work on the foundation of others, Torvalds did not feel that it made sense to charge people. Linux became the combination of the editor, complier, and tools written by Stallman with the operating system kernel written by Tovalds. I should note here that Stallman prefers the term GNU/Linux, since Linux is really just the kernel. Most people, however, typically just use the term Linux. Within a year, the Linux user community numbered in the tens of thousands. If there was an error in the software, it was corrected by the user community while others added such functionality as a graphical user interface and networking capabilities. This success, the ability of a community to work together to build and maintain a robust operating system, demonstrates the very essence of open systems which leads us to Open Data Science.

A key characteristic of data science is that it is an interdisciplinary field, meaning that data science is related to more than one branch of knowledge. As Conway’s data science Venn diagram shows us, data science exists in the overlap between statistics, hacking skills, and subject matter expertise. It is an inclusive process with each of these disciplines working together. This inclusion is the very essence of open systems. Open Data science is the creation of a more connected environment through the application of open software. This environment in turn drives availability, innovation, interoperability, and transparency. This is what we saw in the case of Linux. In coming together in an open environment, they maintained and enhanced an operating system that is now a standard in the industry.

In the proprietary world, profit margin limits the resources corporations can apply to a particular product or technology. Even if we were to eliminate this constraint, the direction they take with their technology is driven by their own particular vision, their own understanding and philosophy of data science. In an open environment, it is the data science community that drives the direction of technology. Through access to source code, innovators are able to experiment with new methodologies, making them accessible to the data science community. In turn, the community can perfect these methodologies just as the Linux community is perfecting Linux.

There is another important lesson we can draw from the Linux example, Torvalds said; “Computers were actually better for kids when they were less sophisticated when dweebie youngsters like me could get under the hood and tinker”[3]. They were better because it was fun to get inside the guts of a thing and muck with it. You were excited to experiment. “Folks do their best work when they are driven by passion. When they are having fun. This is as true for playwrights and sculptors and entrepreneurs as it is for software engineers.”[4] If you are passionate about data science, you want to do more than access some cloud-based application. You want to get into the guts of the thing, to experiment. Open Data Science tools and technologies give you the freedom to get inside the engine, to get grease under your fingernails.

Now consider the issues preventing companies from fully realizing the benefits of data science; data quality, lack of talent, and access to proper tools and technology. While the very nature of Open Data Science make the tools and technology available, as the community expands so too will the scope and diversity of data science technology. As participants in the community work together they will define better methodologies to address various issues such as data quality. Also, improved software access and openness, as well as the cross-fertilization of ideas will increase the number and quality of talented data scientists.

Open Data Science is democratic in the very best sense of the word. In Democracy in America, Alex de Tocqueville wrote, “In no country in the world has the principle of association been more successfully used, or more unsparingly applied to a multitude of different objects than in America”.[5]  Just as democracy has transformed the world, so too will the democratic principles of Open Data Science transform data science.

 

Additional Sources of Information:

The Open Data Science Initiative: http://opendsi.cc/

Journey to Open Data Science: https://www.slideshare.net/continuumio/journey-to-open-data-science

The Open Data Science Continuum: https://www.continuum.io/open-data-science

 

[1] Open Data Science Survey, September-October 2016

[2] Issacson, Walter, The Innovators, pg. 272

[3] Torvalds & Diamond, Just for Fun, pg. 74, 4, 17

[4] Issacson, Walter, The Innovators, pg. 272

[5] Alexis de Tocqueville, Democracy in America,

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s