Supervised learning

  • B.4.2 – Outline the structure of neural networks.
  • B.4.3 – Compare applications that use neural network modelling.
  • B.4.4 – Compare different ways in which neural networks can be used to recognize patterns.

There are two primary types of machine learning, supervised and unsupervised. Supervised learning is the process by which an algorithm (we’ll shortly see we’re talking about artificial neural networks or ANNs) changes how it selects answers over time. It does this through a rather complex process, but at its most basic it attempts to answer a question, then given the real answer it will update the method by which it makes the decision. 

For example, imagine you write a program which, after taking a photo, measures different attributes about animals and then determines if they are a cat or dog. At first your program is terrible, and is really just making a guess. To help the program get better you gather 1,000 photos of cats and dogs and sort them into two piles of 500 each. You then show the photos one at a time to your algorithm, and if it gets it wrong you tweak the way it measures the attributes. You do this multiple times until you get much improved accuracy around 98%.  The process can be described in a more general manner by the image below. 

Writing task: Features and issues

In the example above, with trying to determine if a photo is a cat or dog, you’re likely to hit many obstacles. For example, look at this photo and try to decide if it’s a cat or dog. 

  1. What features should the algorithm look for in order to determine if a photo is a cat or dog? 
  2. Will it ever be able to get 100% accuracy? Why or why not. 
  3. Why would having more examples to teach the algorithm be better?
  4. How is the way the algorithm similar to the way people learn?

Neural networks

In most cases in this topic the algorithm mentioned above will be a neural network. It can be difficult to understand what a neutral network is at first, but thankfully some lovely people on the internet have taken the time to explain it in simpler terms. The video below does it quite nicely, while the presentation afterwards is what we will cover in class. Below that are a number of links which can help you gain a better understanding. 

Neural network steps

To summarize the sections, video, and slideshow above, a feed forward neural network performs the steps below. You should be familiar with the overall process and the highlighted words. 

  1. Network layout is chosen, with number inputs, hidden layers, and outputs decided.
  2. Weights and bias are randomly initialized.
  3. Training data and test data are selected. This is labeled data.
  4. Training data is sent into the feed forward network. 
  5. The output is checked against the expected output.
  6. Backpropagation using gradient descent is used to adjust the weights to get the output closer to the expected output. 
  7. Steps 4 to 6 are repeated until it’s determined proper training has been done. 
  8. The trained neural network is given the test data to see how good it is. 


Making a Simple Neural Network

A Gentle Introduction to Neural Networks

Natural language processing

  • B.4.5 – Identify the key structures of natural language
  • B.4.6 – Discuss the differences between human and machine learning when related to language.
  • B.4.7 – Outline the evolution of modern machine translators.
  • B.4.8 – Describe the role of chatbots to simulate conversation
  • B.4.9 – Discuss the latest advances in natural language processing.

Natural language processing (NLP) encompasses a number of areas, such as those below, and uses all the techniques we’ve discussed thus far. 

NLP and linguistics overlaps heavily, and the techniques that linguists use to analyze sentences, grammar, syntax and semantics are used in order to extract features to learn with. This feature extraction can be easy enough for a computer, but determining meaning from a sentence or paragraph is difficult. People have the advantage of context and culture in order to understand what someone is saying, while a computer relies purely on the mechanics of speech. Below are the steps a speech recognition system might go through to linguistically break apart and determine what is being said. 

NLP and neural networks

There are countless uses for neural networks, including NLP when it comes areas like are speech recognition and OCR. Other interesting areas, such word embeddings and recurrent neural networks also make use of them. 

Speech recognition

Recognizing speech has become increasingly important as more and more we find ourselves talking to our phones, computers, and other devices, such as with our navigation system in our car. At the heart of speech recognition algorithms is often some form of neural network. This neural network will be learn as you use it. For example, let’s say you ask your phone, “Find me the nearest coffee shop,” and then it gives you a list of toffee shops. That’s obviously wrong, so you repeat it more slowly or perhaps just type it in. The app will recognize that you didn’t like its results and then update its algorithm to try and better handle your particular accent or way of speaking. 


Optical character recognition, of both handwritten and typed text, is an important area of research for use in reading documents, for use in self-driving vehicles, translation tools, and a number of other areas. The MNIST collection of handwritten numbers is perhaps the most well known OCR learning example, and also used in the above 3Blue1Brown video. 

Unsupervised learning

  • B.4.4 – Compare different ways in which neural networks can be used to recognize patterns.

Where supervised learning uses labeled data sets to train a network, unsupervised works with data that has no labels. It tries to group similar information together using some sort of general rules. The process resembles the image below. Take a look at the corresponding image in the supervised section above, and compare. 

Unsupervised learning often works by clustering outputs with similar features together, with one of the most common clustering algorithms being k-nearest neighbors. If objects were to have two features, and you graphed them on a 2D plane, you will get one or more groups. The following is an example of using k-nearest neighbors where the value of k = 3.  

Stanford has two visualizations for seeing this in practice, both here and here.

Aside from k-nearest neighbors, there are other clustering algorithms which allow us to find common traits within a set of data. You don’t need know any of these for a test, but you do need to have a basic understanding that clustering is used to group together sets of similar data. 

Applying unsupervised learning

You will need to understand unsupervised learning in the context of speech recognition, optical character recognition, and natural language processing. In each of these cases you’ll be looking for common traits among whatever data samples you are given, and then attempting to group them.  


Written task: Unsupervised learning

Thinking about the clustering of information, and how it might apply to the features of languages. Let’s say you are given a thousands of articles on different topics, ranging from engineering to art to cooking. 

  1. What sort of features would extract from the texts? (think in terms of vocabulary and grammar)
  2. How would these features help you cluster the texts into different groups? Give an example. 

This slideshow to the right gives a more thorough understanding of NLP, and covers a number of areas that you may encounter on an IB exam. It’s important that you’re able to understand how grammatical features, and features of the structure of the text, can be used in order solved problems using NLP techniques. 

Genetic Algorithms

  • B.4.1 – Outline the use of genetic algorithms.

Genetic algorithms are a strategy to train learning systems using ideas from evolution. With neural networks we focused on a single network and then provided it examples in order to learn. With genetic algorithms we create many learning systems, each with their own specific attributes, and then set them free without any samples to learn from. After awhile we look at their progress, take the ones that have done the best, and then combine them and simulate mutations in order to get an improved solution, . 

In biology this seems fairly straightforward. We can picture what a generation looks like, and we have an understanding of what DNA and mutations are, but with learning systems it becomes somewhat less intuitive. The questions that need to be answers are as follows. 

  • What is meant by a generation?
  • How can a learning system have DNA
  • How can we reproduce code?
  • What does selection mean, and how do we determine which is best?
  • How does one mutate a program?

In order to get a better understanding of how genetic algorithms work, we’ll look at the classic knapsack problem. 

The knapsack problem concerns a knapsack that can only carry so much stuff, and the possible items it can hold. The goal is to fill the pack with as much stuff as possible that is worth the most money. With a small number of items this is trivial to brute force, but when the number of items becomes quite large, finding the answer can be rather difficult. In order to solve this, we’ll use a genetic algorithm by breaking down the problem into corresponding biological components, starting with the DNA. 


Let’s say that our knapsack is like the one above and it can store 15kg, but the items it can store are listed below. 

That means each knapsack can hold at most ABCDEFG, or 7 items, which means we can encode each using a series of ones and zeros to indicate if the knapsack holds the item or not. Essentially, it will be a list like the following, which is in fact our DNA. 

[1, 0, 0, 1, 0, 0, 1]

This particular backpack would hold items A, D, and G, making it’s weight 13kg with items valued at 11 ducats (or whatever useless paper currency people use these days). 


To create a generation, or the offspring, you can randomly make a bunch of lists of ones and zeros. Each of these lists stores the genetic code of one knapsack. 

[1, 0, 0, 1, 0, 0, 1]


[0, 1, 0, 1, 1, 0, 0]

[0, 0, 0, 1, 1, 0, 0]

Selection or Fitness

Every genetic algorithm, and really ever learning algorithm, has what’s called a fitness function, which decides if the member of the new generation is a viable one. In the case of the knapsack, all knapsacks cannot be more than 15 kg in weight, so the fitness function should probably include something restricting those over 15 kg from continuing into the next generation. 

Furthermore, the fitness function should try to prioritize those which have close to, or maximum weight, and which have the highest cost. When moving onto the reproduction step, you’ll want these in order to breed the next, and best possible, generation. 


The reproduction step can be done in a number of different ways, but it’s important it prioritizes those which best meet the fitness function. If we follow the ]model of reproduction in people, that would involve defining a crossover point and crossing over point among the chromosomes and then swapping a bit from each parents. So let’s say we take the two best from our current generation (meaning those that best met the criteria of our fitness function) and then swap their bits. It may look something like below. 

We do a bunch of these crossovers and then create enough lists for the next generation, with everything playing out just like you learned in biology class of course. 


Mutations are what make sure that enough diversity remains in the population. Within the knapsack problem mutations would occur by switching a 1 to 0 and vice versa. Generally, mutation rates start out low, but they can be adjusted however one likes in order to add more variety to each generation. 

Practical activity: Solving the knapsack problem

Your task to solve the knapsack problem by using a genetic algorithm. To do so you can either attempt to do it on your own, using the above concepts as guidance, or you can follow the directions in this base code. 

With neural networks

Aside from supervised learning using labeled data, and unsupervised learning with a reward system, you can train neural networks using genetic algorithms with an appropriate fitness function. In this case, the DNA can be represented as the weights of the network, and you’ll need to extract them and perform some sort of crossover each generation. It will resemble the following: 

The overall process functions like this: 

  • Generate different neural networks with random weights.
  • Run the networks for their prescribed task. 
  • Which ever networks do best, extract their weights. 
  • Crossover
  • Mutate
  • Repeat the process. 

The example and videos in the next section provide a good look at how this process works. 

Genetic algorithm examples

If you’d like to get a more in-depth overview, check out this page. If you’d like to see genetic algorithms in action, here is an example where a bot learns to walk, while below are some videos which show training over time.