Glowing Python
04  juin     07h32
The Central Limit Theorem, a hands-on introduction
   The central limit theorem can be informally summarized in few words: The sum of i x sub 1 sub , x sub 2 sub , ... x sub n sub i samples from the same distribution is normally distributed, provided that i n i is big enough and that the distribution has a finite variance. to show this in...
07  avril     11h50
A Simple model that earned a Silver medal in predicting the results of the NCAAW tournament
   This year I decided to join the a href https: www.kaggle.com c ncaaw-march-mania-2021 leaderboard target blank March Machine Learning Mania 2021 - NCAAW a challenge on Kaggle. It proposes to predict the outcome of each game into the basketball NCAAW tournament, which is a tournament for...
11  novembre     09h22
Visualize the Dictionary of Obscure Words with T-SNE
   I recently published on a wrapper around The Dictionary of Obscure Words (originally from this website a href http: phrontistery.info http: phrontistery.info a ) for Python and in this post we’ll see how to create a visualization to highlight few entries from the dictionary using the...
29  juin     11h01
Solving the Travelling Salesman Problem with MiniSom
   Have you ever heard of the Travelling Salesman Problem? I’m pretty sure you do, but let’s refresh our mind looking at its formulation: Given a list of points and the distances between each pair of points, what is the shortest possible path that visits each point and returns to the starting point? ...
17  mai     18h07
Neural Networks Regularization made easy with sklearn and matplotlib
   Using regularization has many benefits, the most common are reduction of overfitting and solving multicollinearity issues. All of this is covered very well in literature, especially in a href https: web.stanford.edu hastie ElemStatLearn (Hastie et all) a . Howerver, wihout touching too many...
01  mai     08h50
Tornado plots with matplotlib
   Lately there’s a bit of attention about charts where the values of a time series are plotted against the change point by point. This thanks to this rather colorful and cluttered a href https: pbs.twimg.com media EWoCUylXsAcZn p?format jpg&name large Tornado plot a . br br In this post we...
14  avril     10h20
Recoloring NoIR images on the Raspberry Pi with OpenCV
   Not too long ago I’ve been gifted a Raspberry Pi camera, after taking some pictures I realized that it produced very weird colors and I discovered that it was a NoIR camera It means that it has no infrared filter and that it can take pictures in the darkness using an infrared LED. Since I never...
05  avril     06h40
What makes a word beautiful?
   What makes a word beautiful? Answering this question is not easy because of the inherent complexity and ambiguity in defining what it means to be beautiful. Let’s tackle the question with a quantitative approach introducing the i Aesthetic Potential i , a metric that aims to quantify the beaty of...
17  mars     10h12
Ridgeline plots in pure matplotlib
   A Ridgeline plot (also called Joyplot) allows us to compare several statistical distributions. In this plot each distribution is shown with a density plot, and all the distributions are aligned to the same horizontal axis and, sometimes, presented with a slight overlap. br br There are many...
11  septembre     09h58
Organizing movie covers with Neural Networks
   In this post we will see how to organize a set of movie covers by similarity on a 2D grid using a particular type of Neural Network called a href https: glowingpython.blogspot.com 2013 09 self-organizing-maps.html Self Organizing Map (SOM) a . First, let’s load the movie covers of the top 100...
11  août     10h28
Visualizing distributions with scatter plots in matplotlib
   Let’s say that we want to study the time between the end of a marked point and next serve in a tennis game. After gathering our data, the first thing that we can do is to draw a histogram of the variable that we are interested in: br br pre class prettyprint br import pandas as pd br ...
07  juin     07h45
Exporting Decision Trees in textual format with sklearn
   In the past we have covered Decision Trees showing how interpretable these models can be (see the tutorials a href https: glowingpython.blogspot.com 2016 05 an-intro-to-regression-analysis-with.html here a ). In the previous tutorials we have exported the rules of the models using the function...
17  mai     07h29
Feelings toward immigration of people from other EU Member States in November 2018
   In this post we will see a snippet about how to plot a part of the results of the a href https: en.wikipedia.org wiki Eurobarometer eurobarometer a survey a href http: data.europa.eu euodp en data dataset S2215 90 3 STD90 ENG released a last March. In particular, we will focus on the...
17  avril     10h08
Visualizing atmospheric carbon dioxide
   Let’s have a look at how to create a visualization that shows how CO2 concentrations evolved in the atmosphere. First, we fetched from the Earth System Research Laboratory a href https: www.esrl.noaa.gov gmd ccgg trends data.html website a like follows: pre class prettyprint br import...
28  mars     14h09
Speeding up the Sieve of Eratosthenes with Numba
   Lately, on invitation of my right honourable friend a href https: twitter.com mikel87 Michal a , I’ve been trying to solve some problems from the a href https: projecteuler.net Euler project a and felt the need to have a good way to find prime numbers. So implemented the a href https...
23  mars     07h38
Visualizing the trend of a time series with Pandas
   The trend of time series is the general direction in which the values change. In this post we will focus on how to use rolling windows to isolate it. Let’s download from Google Trends the interest of the search term i Pancakes i and see what we can do with it: pre class prettyprint br ...
20  mars     13h59
Ravel and unravel with numpy
   Raveling and unraveling are common operations when working with matricies. With a ravel operation we go from matrix coordinate to index coordinates, while with an unravel operation we go the opposite way. In this post we will through an example how they can be done with numpy in a very easy way....
22  janvier     15h22
A visual introduction to the Gap Statistics
   We have previously seen how to a href https: glowingpython.blogspot.com 2012 04 k-means-clustering-with-scipy.html implement KMeans a . However, the results of this algorithm strongly rely on the choice of the parameter K. According to statistical folklore the best K is located at the a href...
29  juin     11h59
Plotting a calendar in matplotlib
   And here’s a function to plot a compact calendar with matplotlib: pre class prettyprint br import calendar br import numpy as np br from matplotlib.patches import Rectangle br import matplotlib.pyplot as plt br br def plot calendar(days, months): br plt.figure(figsize (9, 3)...
30  mai     09h27
Visualizing UK Carbon Emissions
   Have you ever wanted to check carbon emissions in the UK and never had an easy way to do it? Now you can use the a href https: api.carbonintensity.org.uk Official Carbon Intensity API a developed by the National Grid. Let’s see an example of how to use the API to summarize the emissions in...
09  octobre     12h00
Spotting outliers with Isolation Forest using sklearn
   Isolation Forest is an algorithm to detect outliers. It partitions the data using a set of trees and provides an anomaly scores looking at how isolated is the point in the structure found, the anomaly score is then used to tell apart outliers from normal observations. In this post we will see an...
13  juillet     12h42
Dates in Pandas Cheatsheet
   Lately I’ve been working a lot with dates in Pandas so I decided to make this little cheatsheet with the commands I use the most. br br h3 Importing a csv using a custom function to parse dates h3 pre class prettyprint br import pandas as pd br br def parse month(month): br ...
16  juin     15h27
A heatmap of male to female ratios with Seaborn
   In this post we will see how to create a heatmap with seaborn. We’ll use a dataset from the a href http: www.wittgensteincentre.org dataexplorer Wittgenstein Centre Data Explorer a . The data extracted is also reported a href https: gist.github.com JustGlowing...
27  avril     08h45
Solving the Two Spirals problem with Keras
   In this post we will see how to create a Multi Layer Perceptron (MLP), one of the most common Neural Network architectures, with Keras. Then, we’ll train the MLP to tell apart points from two different spirals in the same space. br To have a sense of the problem, let’s first generate the data to...
21  mai     06h25
An intro to Regression Analysis with Decision Trees
   It’s a while that there are no posts on this blog, but the Glowing Python is still active and strong I just decided to publish some of my post on the a href https: blog.cambridgecoding.com Cambridge Coding Academy blog a . Here are the links to a series of two posts about Regression Analysis...