atlas news
    
Glowing Python
04  juin     07h32
The Central Limit Theorem, a hands-on introduction
   The central limit theorem can be informally summarized in few words: The sum of x , x , ... xn samples from the same distribution is normally distributed, provided that n is big enough and that the distribution has a finite variance. to show this in an experimental way, let’s define a function...
07  avril     11h50
A Simple model that earned a Silver medal in predicting the results of the NCAAW tournament
   This year I decided to join the March Machine Learning Mania NCAAW challenge on Kaggle. It proposes to predict the outcome of each game into the basketball NCAAW tournament, which is a tournament for women at college level. Participants can assign a probability to each outcome and they’re...
11  novembre     09h22
Visualize the Dictionary of Obscure Words with T-SNE
   I recently published on a wrapper around The Dictionary of Obscure Words originally from this website http: phrontistery.info for Python and in this post we’ll see how to create a visualization to highlight few entries from the dictionary using the dimensionality reduction technique called T SNE...
29  juin     11h01
Solving the Travelling Salesman Problem with MiniSom
   Have you ever heard of the Travelling Salesman Problem I’m pretty sure you do, but let’s refresh our mind looking at its formulation: Given a list of points and the distances between each pair of points, what is the shortest possible path that visits each point and returns to the starting point ...
17  mai     18h07
Neural Networks Regularization made easy with sklearn and matplotlib
   Using regularization has many benefits, the most common are reduction of overfitting and solving multicollinearity issues. All of this is covered very well in literature, especially in Hastie et all . Howerver, wihout touching too many details we can have a very straigthforward interpretation of...
01  mai     08h50
Tornado plots with matplotlib
   Lately there’s a bit of attention about charts where the values of a time series are plotted against the change point by point. This thanks to this rather colorful and cluttered Tornado plot. In this post we will see how to make one of those charts with our favorite plotting library, matplotlib,...
14  avril     10h20
Recoloring NoIR images on the Raspberry Pi with OpenCV
   Not too long ago I’ve been gifted a Raspberry Pi camera, after taking some pictures I realized that it produced very weird colors and I discovered that it was a NoIR camera It means that it has no infrared filter and that it can take pictures in the darkness using an infrared LED. Since I never...
05  avril     06h40
What makes a word beautiful?
   What makes a word beautiful Answering this question is not easy because of the inherent complexity and ambiguity in defining what it means to be beautiful. Let’s tackle the question with a quantitative approach introducing the Aesthetic Potential, a metric that aims to quantify the beaty of a word...
17  mars     10h12
Ridgeline plots in pure matplotlib
   A Ridgeline plot also called Joyplot allows us to compare several statistical distributions. In this plot each distribution is shown with a density plot, and all the distributions are aligned to the same horizontal axis and, sometimes, presented with a slight overlap. There are many options to...
11  septembre     09h58
Organizing movie covers with Neural Networks
   In this post we will see how to organize a set of movie covers by similarity on a D grid using a particular type of Neural Network called Self Organizing Map SOM . First, let’s load the movie covers of the top movies according to IMDB the files can be downloaded here and convert the images...
11  août     10h28
Visualizing distributions with scatter plots in matplotlib
   Let’s say that we want to study the time between the end of a marked point and next serve in a tennis game. After gathering our data, the first thing that we can do is to draw a histogram of the variable that we are interested in: import pandas as pd import matplotlib.pyplot as plt url ’https...
07  juin     07h45
Exporting Decision Trees in textual format with sklearn
   In the past we have covered Decision Trees showing how interpretable these models can be see the tutorials here . In the previous tutorials we have exported the rules of the models using the function export graphviz from sklearn and visualized the output of this function in a graphical way with an...
17  mai     07h29
Feelings toward immigration of people from other EU Member States in November 2018
   In this post we will see a snippet about how to plot a part of the results of the eurobarometer survey released last March. In particular, we will focus on the responses to the following question: Please tell me whether the following statement evokes a positive or negative feeling for you...
17  avril     10h08
Visualizing atmospheric carbon dioxide
   Let’s have a look at how to create a visualization that shows how CO concentrations evolved in the atmosphere. First, we fetched from the Earth System Research Laboratory website like follows: import pandas as pd data url ’ftp: aftp.cmdl.noaa.gov products trends co co weekly mlo.txt’ co ...
28  mars     14h09
Speeding up the Sieve of Eratosthenes with Numba
   Lately, on invitation of my right honourable friend Michal, I’ve been trying to solve some problems from the Euler project and felt the need to have a good way to find prime numbers. So implemented the the Sieve of Eratosthenes. The algorithm is simple and efficient. It creates a list of all...
23  mars     07h38
Visualizing the trend of a time series with Pandas
   The trend of time series is the general direction in which the values change. In this post we will focus on how to use rolling windows to isolate it. Let’s download from Google Trends the interest of the search term Pancakes and see what we can do with it: import pandas as pd import matplotlib...
20  mars     13h59
Ravel and unravel with numpy
   Raveling and unraveling are common operations when working with matricies. With a ravel operation we go from matrix coordinate to index coordinates, while with an unravel operation we go the opposite way. In this post we will through an example how they can be done with numpy in a very easy way....
22  janvier     15h22
A visual introduction to the Gap Statistics
   We have previously seen how to implement KMeans. However, the results of this algorithm strongly rely on the choice of the parameter K. According to statistical folklore the best K is located at the ’elbow’ of the clusters inertia while K increases. This heuristic has been translated into a more...
29  juin     11h59
Plotting a calendar in matplotlib
   And here’s a function to plot a compact calendar with matplotlib: import calendar import numpy as np from matplotlib.patches import Rectangle import matplotlib.pyplot as plt def plot calendar days, months : plt.figure figsize , non days are grayed ax plt.gca .axes ax...
30  mai     09h27
Visualizing UK Carbon Emissions
   Have you ever wanted to check carbon emissions in the UK and never had an easy way to do it Now you can use the Official Carbon Intensity API developed by the National Grid. Let’s see an example of how to use the API to summarize the emissions in the month of May. First, we download the data with...
09  octobre     12h00
Spotting outliers with Isolation Forest using sklearn
   Isolation Forest is an algorithm to detect outliers. It partitions the data using a set of trees and provides an anomaly scores looking at how isolated is the point in the structure found, the anomaly score is then used to tell apart outliers from normal observations. In this post we will see an...
13  juillet     12h42
Dates in Pandas Cheatsheet
   Lately I’ve been working a lot with dates in Pandas so I decided to make this little cheatsheet with the commands I use the most. Importing a csv using a custom function to parse dates import pandas as pd def parse month month : Converts a string from the format M in datetime format...
16  juin     15h27
A heatmap of male to female ratios with Seaborn
   In this post we will see how to create a heatmap with seaborn. We’ll use a dataset from the Wittgenstein Centre Data Explorer. The data extracted is also reported here in csv format. It contains the ratio of males to females in the population by age for to data reported after this period...
27  avril     08h45
Solving the Two Spirals problem with Keras
   In this post we will see how to create a Multi Layer Perceptron MLP , one of the most common Neural Network architectures, with Keras. Then, we’ll train the MLP to tell apart points from two different spirals in the same space. To have a sense of the problem, let’s first generate the data to...
21  mai     06h25
An intro to Regression Analysis with Decision Trees
   It’s a while that there are no posts on this blog, but the Glowing Python is still active and strong I just decided to publish some of my post on the Cambridge Coding Academy blog. Here are the links to a series of two posts about Regression Analysis with Decision Trees: . Getting started...