The first article in this series discussed the interconnectedness of people in social networks. The second discussed modeling techniques to visualize these networks. This post goes beyond visualization, presenting various analyses in understanding the spread of a virus within social networks. Through these analyses, we can get a clearer understanding of the characteristics of those … Continue reading Studying Covid-19 with Network Analysis
Data Science
Covid-19, Network Analysis, and Network Models
In part 1 of this post, I discussed the interconnectivity of social networks highlighting the closeness we all share. I also noted the impact this has on the spread of a virus such as Covid-19. In this post, I explore how to visualize networks to better understand its transmission, enabling us to combat its spread. … Continue reading Covid-19, Network Analysis, and Network Models
Kevin Bacon, Covid-19, & Social Networks (Part 1)
Let me start by noting that I am not suggesting that Kevin Bacon has Covid-19 or is spreading the virus. To the best of my knowledge, the guy isn’t even infected. So why bring him up? Do you remember that game Six Degrees of Kevin Bacon? There is a good lesson to learn about Covid-19 … Continue reading Kevin Bacon, Covid-19, & Social Networks (Part 1)
Tidy Data
According to some estimates between 50% to 80% of the work of a data scientist is spent collecting and preparing data, what the New York Times calls janitor work[1]. When we consider the iterative nature of the data science process (refer to The Data Science Process ), we see each cycle typically repeats the data preparation step. As our understanding of the data evolves as well as the refinement of the model, we find ourselves often going back to further develop the data. While data preparation has never been an easy process, in a big data world the greater variety of data and data sources makes it all the more difficult. These sources rarely store or present data in a structure that facilitates analysis. To address this issue, we need to tidy the data. Let me explain…
The Data Science Process
We live in a world where larger and larger volumes of varied data types are coming at us in ever increasing speeds, i.e. we live in a world of big data. In order to make sense of big data, we have turned to data science. Data Science is a tool employed by the transliterate to transform data into information.
From Data Literacy to Transliteracy
From Data Literacy to Transliteracy Understanding necessary skills for data democratization.
Overfit / Underfit – Shaving with Occam’s Razor
Using the principle of Occam's razor to optimize your supervised learning model.
Overfitting / Underfitting – How Well Does Your Model Fit?
Supervised machine learning is inferring a function which will map input variables to an output variable. Let’s unpack this definition a bit with an example. Say that we are a bank that wants to determine to whom we should give a loan. The objective, therefore, is to infer a function that examines a set … Continue reading Overfitting / Underfitting – How Well Does Your Model Fit?
They Want to Get Rid of Me!! – Rise of the Citizen Data Scientist
They want to get rid of me. Wait, let me rephrase that, they want to get rid of us!! I can’t blame them, if I were them I would want to get rid of us too. Our friends at Gartner started all this talk about citizen data scientists.
Open Data Science
Everyone is talking about data science. One study found that 96% of companies believe that data science is integral to the success of their business. Yet, most of these organizations (70%) are not realizing its full potential. They cite such factors as poor data quality, lack of talent, and access to proper tools and technology[1]. … Continue reading Open Data Science