In my first article on Time Series, I hope to introduce the basic ideas and definitions required to understand basic Time Series analysis. We will start with the essential and key mathematical definitions, which are required to implement more advanced models. The information will be introduced in a similar manner as it was in a McGill graduate course on the subject, and following the style of the textbook by *Brockwell and Davis.*

A ‘** Time Series’ **is a collection of observations indexed by time. …

Suppose we wish to perform supervised learning on a classification problem to determine if an incoming email is spam or not spam. The *spam dataset *consists of 4601 emails, each labelled as real (or not spam) (0) or spam (1). The data also contains a large number of predictors (57), each of which is either a character count, or a frequency of occurrence of a certain word or symbol. In this short article, we will briefly cover the main concepts in tree based classification and compare and contrast the most popular methods.

This dataset and several worked examples are covered in detail in The Elements of Statistical Learning, II edition. …

In this article, we will demonstrate how combinatorial optimization (in particular, *Genetic Algorithms*) can be used for model selection.

One popular and rising use of Data science techniques is in the field of *Sports Analytics. *Baseball related statistics in particular are so popular, that they have their own name*, **Sabermetrics.** *We will use the popular sport as a chance to learn about metaheuristics and combinatorial optimization. In particular, we will analyze a baseball salary dataset in R.

Before we dive into the dataset, we need to introduce the general problem we are facing.

In combinatorial problems, we have a large set of predictors…

*Support Vector Machines *are a popular tool used in several branches of Machine Learning. In particular, they are extremely useful for binary classification. Support Vector Machines have their basis in the concept of separating hyperplanes, so it is useful to first be introduced to this concept.

In this article, I introduce the method of classification via separating hyperplanes. We start off simple and describe how even linear regression can be used to make simple binary classifications. We then move on to using separating hyperplanes and ‘*optimal’* separating hyperplanes. Finally, with this background, we are in the position to introduce support vectors and the powerful *Support Vector Machine*. …

**Are you serious about Machine Learning? Do you want to review the gritty details behind the algorithms?**

You may have forgotten that each Machine Learning algorithm has an in depth history, complete with a theoretical basis and justification that involves analysis, linear algebra and even proofs.

Well the good news is, two of the most comprehensive books in the subject are provided by the authors *free of charge.*

**Lets get right to it**

In my opinion, one book on statistical and Machine learning stands above all the rest in terms of providing a comprehensive analysis.

Logistic Regression is a common method used for fitting a binary or categorical response variable. But did you know that if you are not careful, logistic regression can miss out on important features?

In this in depth article, we will use the South African Heart Disease data, in combination with Chapters 4 and 5 of *The Elements of Statistical Learning II *to demonstrate why you need to be careful when applying logistic regression. We will also use this example as a starting point to introduce basis functions and splines.

In particular, we will learn about non-linearity, and how non-linear features can be incorporated into models. It will be fun, so, if you want to learn more intermediate statistical techniques, or are just looking for a review, please read along. …

**Let us start easy. **What is the square root of 100? In other words, what number times itself is equal to 100? Easy, the answer is 10.

What is the square root of -1?

Well that isn’t so obvious. Since a negative number times a negative number is positive, how is it possible that we can solve for the square root of -1?

The answer is; **we define it. **And this isn’t just a trivial definition. Assigning this answer, **i**, the ‘imaginary unit’ is the beginning step of a beautiful and deep field, known as *complex analysis.*

So, why should you care about complex analysis? It turns out, that despite the name being ‘imaginary’, the impact of the field is most certainly real. From engineering, to applied and pure mathematics, physics and more, complex analysis continuous to show up. Moreover, there are several undeniable examples we will cover, that demonstrate that complex analysis is indeed a useful and important field. …

In this article we will briefly introduce the Expectation-Maximization (‘EM’) algorithm and walk through several numerical examples. By the end of this article, you should have a better idea of what the EM algorithm is, why it is useful, and how it can be implemented. The EM algorithm is often said to be used when there is ‘missing data’, but in general, it can also be viewed as an algorithm to use when there is ‘latent’ data.

**General Idea:**

- We maximise parameters given observed data, that is, we want to find what the parameters should be to result in generating the data we observe with the highest probability. …

Ordinary Least Squares (‘*OLS’*) is one of the oldest and most simple algorithms used for regression. However, there are now several variants that were invented to address some of the weakness encountered when using regular least squares regression.

Despite being one of the oldest algorithms, linear models are still very useful. In fact they often can outperform fancy and sophisticated models. They are particularly useful when there is not a huge amount of observations, or when the inputs reliably predict the response (low signal to noise ratio).

In this article, we will first review the basic formulation of regression using linear regression, discuss how we solve for the parameters (weights) using gradient descent, and then introduce Ridge Regression. We will then discuss the Lasso, and finally the Elastic Net. This article also will belong to my series on building Machine Learning algorithms from Scratch (mostly). So far, I have discussed Logistic regression from scratch, deriving principal components from the singular value decomposition and genetic algorithms. …

In this article we will *briefly *introduce* *five* *‘non-standard’ programming languages that are making noise in the Data Science world. These languages also have the potential to be the ‘next big thing’. In particular, by ‘standard’ we refer to the usual Data Science languages; R, Python, Java and JavaScript. By ‘next big thing’, we refer to the position that Python currently holds in the Data Science world.

Will one of these non-standard languages take over the Machine Learning world? Will Julia replace Python as the go-to language for Machine learning? Will the speed of python eventually lead to its slow demise? …

About