Supervised Learning

Spread the Love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Chat with newcomers to the exciting world of machine learning, and it won’t be long before they start dispensing the term, “supervised learning,” like candy.

  • “Have you tried solving it with supervised learning?”
  • “Using supervised learning, I was able to come up with these great insights.”
  • “Those data issues were causing all kinds of problems until I used supervised learning to fix them.”
  • “A drop of supervised learning will get that stain right out!”

I’m not exactly sure why they do this.  My guess is that they’re trying very hard to sound cool.  Supervised learning is probably one of the first concepts they learn that they think they understand.  Unfortunately, they just end up sounding dorky.

In this post, I’ll explain what supervised learning really is.  By the time you finish reading, you’ll truly understand the concept.  More importantly, you’ll become an authentically cool kid — and not a dork who only thinks he is.

Components Of A Supervised Learning Problem

The first time I heard “supervised learning”, I imagined an intimidating nun lording over me, ready to rap my knuckles with her menacing slide rule whenever I got distracted from or otherwise lost focus in my studies.  Thankfully, supervised learning doesn’t involve nuns.  The supervision actually comes from the training data itself.

Before defining the term, though, we need to look at the components that make up a supervised learning problem.  Let’s begin with a list of the data points that the learning model will be training on.

Data Point
dp1
dp2
dp3
dp4
dp5
dp6
etc.

The data set contains a target variable, and each data point has a unique target value, which describes the data point in some way.  Depending on what the data is about, the target variable can pretty much be anything.  Temperature, distance, gradient, price, amplitude of a sound, number of people in a given area — whatever.

Data PointTarget Variable
dp1tv1
dp2tv2
dp3tv3
dp4tv4
dp5tv5
dp6tv6
etc.etc.

For some problems, the data points may also include labels — which is a way to partition the training set into categories.

Data PointTarget VariableLabel
dp1tv1lbl1
dp2tv2lbl2
dp3tv3lbl3
dp4tv4lbl2
dp5tv5lbl4
dp6tv6lbl3
etc.etc.etc.

Label definitions describe the conditions that must be satisfied in order for a data point to receive a particular label.

lbl1 → target variable is less than the value a
lbl2 → target variable is between the values a and b
lbl3 → target variable is between the values b and c
lbl4 → target variable is greater than the value c
Data PointTarget VariableLabel
dp1tv1lbl1
dp2tv2lbl2
dp3tv3lbl3
dp4tv4lbl2
dp5tv5lbl4
dp6tv6lbl3
etc.etc.etc.

Often, the labels are colored to make them easier to distinguish from one another.  Red is a popular choice.  Blue, green, and yellow are good choices as well.

lbl1 → target variable is less than the value a
lbl2 → target variable is between the values a and b
lbl3 → target variable is between the values b and c
lbl4 → target variable is greater than the value c
Data PointTarget VariableLabel
dp1tv1lbl1
dp2tv2lbl2
dp3tv3lbl3
dp4tv4lbl2
dp5tv5lbl4
dp6tv6lbl3
etc.etc.etc.

The training set also has additional variables, called features — with each data point having unique feature values.  These feature values describe the data point in more detail.  Like the target variable, feature variables can be anything, depending on what the training set is about.  Temperature, distance, gradient, price, amplitude of a sound, number of people in a given area — whatever.  The only thing that matters is that each feature impacts the target variable in some way.

lbl1 → target variable is less than the value a
lbl2 → target variable is between the values a and b
lbl3 → target variable is between the values b and c
lbl4 → target variable is greater than the value c
Data PointFeature 1Feature 2Feature 3Target VariableLabel
dp1f1-1f2-1f3-1tv1lbl1
dp2f1-2f2-2f3-2tv2lbl2
dp3f1-3f2-3f3-3tv3lbl3
dp4f1-4f2-4f3-4tv4lbl2
dp5f1-5f2-5f3-5tv5lbl4
dp6f1-6f2-6f3-6tv6lbl3
etc.etc.etc.etc.etc.etc.etc.

When we plot the data on a graph, you’ll usually see the data points represented as dots, and colored with the labels they’re given.

Here’s an example of a two-dimensional featurespace representation of the data points, based on their F1 and F2 feature values:

It's training data confetti!

The Textbook Definition

Okay, now that we have the components, onward to the definition!  Supervised learning basically boils down to a task where the learning model needs to correctly replicate each data point’s target value ( tv1 for dp1; tv2 for dp2, etc. ), but do it with only the information provided by the data point’s features ( f1-1, f2-1, f3-1, … for dp1; f1-2, f2-2, f3-2, … for dp2, etc. ).

The target variable is the sole reason why this type of problem is called “supervised learning”.  It singularly determines whether the learning model is correct or not.  For example, if the data set represents all the homework problems in a textbook, then the target variable represents the solutions in the back.  If the student’s answers do not match the solutions, then he has to continue studying until he can derive the right answers.  The target variable dictates the entire learning process — thus, it’s the supervisor.

The formal definition is this — the learning model must find a trained model that best describes the relationship between each data point’s feature values and its target value.  Let’s represent the trained model with the function, TM(  ), whose parameters are the feature variables, and whose output is the target variable:  TM( f1, f2, … ) = tv.  So,

TM( f1-1, f2-1, f3-1, … ) = tv1
TM( f1-2, f2-2, f3-2, … ) = tv2
TM( f1-3, f2-3, f3-3, … ) = tv3
TM( f1-4, f2-4, f3-4, … ) = tv4
TM( f1-5, f2-5, f3-5, … ) = tv5
TM( f1-6, f2-6, f3-6, … ) = tv6
etc.

Now, it’s nice that we have a neat, organized lookup table, but this doesn’t really help the learning model much.  The whole point of learning is to describe the relationship in such a way that the learning model can plug in an unfamiliar data point which isn’t in the training set and still be able to correctly identify the target value.

TM( f1-new, f2-new, f3-new, … ) = tvnew

This means it’s better if we can somehow represent the trained model with an equation of some sort, like:

( 2 * ( f1 ) + √f2 ) / f3 + … = tv

Unfortunately, finding the right equation is not easy.  This is where all the fancy-schmancy statistics come in — to make things just a tad easier.

The learning model usually begins with some random equation, TM1( ).  Actually, oftentimes we can extract clues from the training data and from the details of the particular problem to “mentor”  the learning model and provide “hints”  about what the correct equation should look like.  So it’s not entirely random; it’s often more like an educated guess.

The learning model plugs each data point into TM1( ) and gets predicted target values back.

TM1( f1-1, f2-1, f3-1, … ) = ptv1-1
TM1( f1-2, f2-2, f3-2, … ) = ptv1-2
TM1( f1-3, f2-3, f3-3, … ) = ptv1-3
TM1( f1-4, f2-4, f3-4, … ) = ptv1-4
TM1( f1-5, f2-5, f3-5, … ) = ptv1-5
TM1( f1-6, f2-6, f3-6, … ) = ptv1-6
etc.

And if the data points have labels, the trained model also derives predicted labels.

TM1( f1-1, f2-1, f3-1, … ) = ptv1-1, plbl1
TM1( f1-2, f2-2, f3-2, … ) = ptv1-2, plbl1
TM1( f1-3, f2-3, f3-3, … ) = ptv1-3, plbl2
TM1( f1-4, f2-4, f3-4, … ) = ptv1-4, plbl3
TM1( f1-5, f2-5, f3-5, … ) = ptv1-5, plbl2
TM1( f1-6, f2-6, f3-6, … ) = ptv1-6, plbl4
etc.

Our table now looks like this:

TM( ) = ?

plbl1, lbl1 → ( predicted ) target variable is less than the value a
plbl2, lbl2 → ( predicted ) target variable is between the values a and b
plbl3, lbl3 → ( predicted ) target variable is between the values b and c
plbl4, lbl4 → ( predicted ) target variable is greater than the value c
Data PointFeature 1Feature 2Feature 3TM( ) UsedPredicted Target VariablePredicted LabelTarget VariableLabel
dp1f1-1f2-1f3-1TM1()ptv1-1plbl1tv1lbl1
dp2f1-2f2-2f3-2TM1()ptv1-2plbl1tv2lbl2
dp3f1-3f2-3f3-3TM1()ptv1-3plbl2tv3lbl3
dp4f1-4f2-4f3-4TM1()ptv1-4plbl3tv4lbl2
dp5f1-5f2-5f3-5TM1()ptv1-5plbl2tv5lbl4
dp6f1-6f2-6f3-6TM1()ptv1-6plbl4tv6lbl3
etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.

The learning model then compares the predicted target variable with the actual target variable to determine how far off it is, which is called the error.

TM( ) = ?

plbl1, lbl1 → ( predicted ) target variable is less than the value a
plbl2, lbl2 → ( predicted ) target variable is between the values a and b
plbl3, lbl3 → ( predicted ) target variable is between the values b and c
plbl4, lbl4 → ( predicted ) target variable is greater than the value c
Data PointFeature 1Feature 2Feature 3TM( ) UsedPredicted Target VariablePredicted LabelTarget VariableLabelError
dp1f1-1f2-1f3-1TM1()ptv1-1plbl1tv1lbl1err1-1
dp2f1-2f2-2f3-2TM1()ptv1-2plbl1tv2lbl2err1-2
dp3f1-3f2-3f3-3TM1()ptv1-3plbl2tv3lbl3err1-3
dp4f1-4f2-4f3-4TM1()ptv1-4plbl3tv4lbl2err1-4
dp5f1-5f2-5f3-5TM1()ptv1-5plbl2tv5lbl4err1-5
dp6f1-6f2-6f3-6TM1()ptv1-6plbl4tv6lbl3err1-6
etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.

It then plugs those error values into some fancy-schmancy statistical equations to make adjustments to its trained model.  Let’s call this new, adjusted model, TM2( ).  Plugging in the same data points returns a new set of predicted target values and predicted labels:

TM2( f1-1, f2-1, f3-1, … ) = ptv2-1, plbl2
TM2( f1-2, f2-2, f3-2, … ) = ptv2-2, plbl2
TM2( f1-3, f2-3, f3-3, … ) = ptv2-3, plbl3
TM2( f1-4, f2-4, f3-4, … ) = ptv2-4, plbl3
TM2( f1-5, f2-5, f3-5, … ) = ptv2-5, plbl4
TM2( f1-6, f2-6, f3-6, … ) = ptv2-6, plbl1
etc.

This also produces a new set of errors, err2-1, err2-2, err2-3, etc. So, our table now looks like this:

TM( ) = ?

plbl1, lbl1 → ( predicted ) target variable is less than the value a
plbl2, lbl2 → ( predicted ) target variable is between the values a and b
plbl3, lbl3 → ( predicted ) target variable is between the values b and c
plbl4, lbl4 → ( predicted ) target variable is greater than the value c
Data PointFeature 1Feature 2Feature 3TM( ) UsedPredicted Target VariablePredicted LabelTarget VariableLabelError
dp1f1-1f2-1f3-1TM2()ptv2-1plbl2tv1lbl1err2-1
dp2f1-2f2-2f3-2TM2()ptv2-2plbl2tv2lbl2err2-2
dp3f1-3f2-3f3-3TM2()ptv2-3plbl3tv3lbl3err2-3
dp4f1-4f2-4f3-4TM2()ptv2-4plbl3tv4lbl2err2-4
dp5f1-5f2-5f3-5TM2()ptv2-5plbl4tv5lbl4err2-5
dp6f1-6f2-6f3-6TM2()ptv2-6plbl1tv6lbl3err2-6
etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.

The learning model takes the new error numbers and plugs them into the same fancy-schmancy statistical equations again in order to update its trained model once more.  Let’s call this one, TM3( ).  Plugging in the same data points produces a new set of predicted target values and predicted labels:

TM3( f1-1, f2-1, f3-1, … ) = ptv3-1, plbl2
TM3( f1-2, f2-2, f3-2, … ) = ptv3-2, plbl2
TM3( f1-3, f2-3, f3-3, … ) = ptv3-3, plbl3
TM3( f1-4, f2-4, f3-4, … ) = ptv3-4, plbl3
TM3( f1-5, f2-5, f3-5, … ) = ptv3-5, plbl4
TM3( f1-6, f2-6, f3-6, … ) = ptv3-6, plbl1
etc.

The learning model keeps repeating this process over and over, until:

  1. the final predicted target values are pretty close to the actual target values
  2. the final predicted labels match the actual labels
  3. the final error values are minimal

It ends up with the following:

TM( ) = TMfinal( )

plbl1, lbl1 → ( predicted ) target variable is less than the value a
plbl2, lbl2 → ( predicted ) target variable is between the values a and b
plbl3, lbl3 → ( predicted ) target variable is between the values b and c
plbl4, lbl4 → ( predicted ) target variable is greater than the value c
Data PointFeature 1Feature 2Feature 3TM( ) UsedPredicted Target VariablePredicted LabelTarget VariableLabelError
dp1f1-1f2-1f3-1TMfinal()ptvfinal-1plbl1tv1lbl1errfinal-1
dp2f1-2f2-2f3-2TMfinal()ptvfinal-2plbl2tv2lbl2errfinal-2
dp3f1-3f2-3f3-3TMfinal()ptvfinal-3plbl3tv3lbl3errfinal-3
dp4f1-4f2-4f3-4TMfinal()ptvfinal-4plbl2tv4lbl2errfinal-4
dp5f1-5f2-5f3-5TMfinal()ptvfinal-5plbl4tv5lbl4errfinal-5
dp6f1-6f2-6f3-6TMfinal()ptvfinal-6plbl3tv6lbl3errfinal-6
etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.

The learning model has settled on TMfinal(  ).

And that’s pretty much supervised learning in a nutshell!  Eazy-peezy, right?

I've Got Tone!

Okay, I’m a machine learning expert, but even I find what I’ve written to be complete gobbledygook!

So, let’s instead look at this from a different angle.  What do we normally call supervised learning in our everyday conversations?  You get a ginormous hint from the fact that a target variable does the supervising:

Yep, thaat’s riigghhtt!  Supervised learning is just a really, really awkward way of saying target practice!  Here’s an example.

You’re hanging out with your friends at a local bar.  They introduce you to the game of darts.  Since you’ve never played before, you do poorly.  But your friends are very supportive, and you find yourself having a lot of fun.

You decide to get better at this game, so you purchase some darts and a dartboard to practice at home.

(image by Nina Garman from Pixabay)

Once you’ve set everything up, you’re ready to begin!  Each data point represents a single practice throw.

Practice Throw
pt1
pt2
pt3
pt4
pt5
pt6
etc.

For each throw, the dart has to land somewhere.  The target variable represents the landing spot, and each throw gets a unique target value.

Practice ThrowWhere Dart Lands (Target Variable)
pt1?
pt2?
pt3?
pt4?
pt5?
pt6?
etc.etc.

Let’s define this landing spot as the distance from the center of the dartboard.  So, if the dart hits dead on center, the target value is zero.  If the dart lands three inches from the center, then the target value equals three, and so on.

In addition, each throw has a label.  Because you’re a beginner, you decide to keep things simple and use only two labels — BULLSEYE and NOT_BULLSEYE:

  • BULLSEYE is defined as the dart hitting any spot within ¾ inch of the center
  • NOT_BULLSEYE is for any location beyond ¾ inch.

These labels are colored as well. The dartboard conveniently color-codes the BULLSEYE region in red for us:

We’ll color the NOT_BULLSEYE area in blue.

So, we have:

BULLSEYE → target variable is less than or equal to ¾ inch
NOT_BULLSEYE → target variable is greater than ¾ inch
Practice ThrowWhere Dart Lands (Target Variable)Label
pt1??
pt2??
pt3??
pt4??
pt5??
pt6??
etc.etc.etc.

Next, we need to look at all the hundreds of different factors that affect whether you hit the BULLSEYE or not, including:

  • the horizontal angle of your shoulder
  • the vertical angle of your shoulder
  • the angle of your elbow
  • the angle of your wrist
  • how you hold the dart
  • where you position each of your fingers
  • the shape of the dart
  • the weight of the dart
  • how hard you throw the dart
  • etc.
BULLSEYE → target variable is less than or equal to ¾ inch
NOT_BULLSEYE → target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleElbow AngleWrist AngleHow Hold DartWhere Fingers Are PositionedDart ShapeDart WeightHow Hard Throw DartWhere Dart Lands (Target Variable)Label
pt1???????????
pt2???????????
pt3???????????
pt4???????????
pt5???????????
pt6???????????
etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.etc.

Oh, my, that’s a lot of information to deal with!  But you know what?  Your amazing brain can effortlessly handle it all!

Unfortunately, because I’m explaining this with a pen-n-paper system ( more accurately, keyboard-n-screen ), I have to keep things simple and limit everything to just two factors:  the horizontal angle of your shoulder and the vertical angle of your shoulder.  We’ll assume that you’re able to keep all the other factors the same every single time, and that they don’t affect your throws in any way.

There’s a reason why you never hear of pen-n-paper learning systems.  The processing power of a pen and a piece of paper is so limited that it can only handle easy toy problems.

BULLSEYE → target variable is less than or equal to ¾ inch
NOT_BULLSEYE → target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleWhere Dart Lands (Target Variable)Label
pt1????
pt2????
pt3????
pt4????
pt5????
pt6????
etc.etc.etc.etc.etc.

Let’s define:

  • The horizontal and the vertical position to be your arm extending out directly in front of you.
  • A 90° on the vertical is you raising your arm directly over your head.
  • A -90° on the vertical is your arm by your side.
  • A 90° along the horizontal is your arm extended out parallel to your shoulders.
  • A -90° along the horizontal is your arm extended across your chest.  ( I know, I know, this is an impossible angle for your shoulder — but bear with me and pretend you’re as flexible as Gumby. )

Here are five spittin’ images of you holding your arm in those positions:

If you happen to be right-handed, then just hold your computer screen up to a mirror.

Okay, this particular supervised learning problem essentially boils down to the task where you, the learning model, needs to find a trained model that best describes the relationship between your two shoulder angles and where the dart will land.  Figuring out this relationship is what you’re doing when you develop your dart throwing skill.

Let’s represent your skill with the function TM(  ).  Its parameters are your two shoulder angles, and its output is the dart’s landing spot:  TM( horzn_angle, vert_angle ) = where_dart_lands

Mad Dart Throwing Skill, Yo! → TM( ) = ?

BULLSEYE → target variable is less than or equal to ¾ inch
NOT_BULLSEYE → target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleWhere Dart Lands ( Target Variable )Label
pt1????
pt2????
pt3????
pt4????
pt5????
pt6????
etc.etc.etc.etc.etc.

You begin with a random equation, TM1(  ).  And it truly is random, since you have no prior knowledge as to what TM(  ) might look like.

Let’s say TM1(  ) assumes that every single one of your throws will hit the center of the dartboard no matter what.  So, the predicted target value of TM1(  ) will always equal zero, and the predicted label will always be BULLSEYE:

TM1( hori_angle, vert_angle ) = 0, BULLSEYE

You step on the oche and ready your first throw.  You set your horizontal shoulder angle to and rotate your vertical shoulder angle to 90°.  You let the dart fly, and … it hits the ceiling!  It’s about eight feet ( 96 inches ) from the center of the dartboard.  Definitely NOT_BULLSEYE.

Mad Dart Throwing Skill, Yo! → TM( ) = ?
TM1( hori_angle, vert_angle ) = 0, BULLSEYE always


BULLSEYE → ( predicted ) target variable is less than or equal to ¾ inch
NOT_BULLSEYE → ( predicted ) target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)Label
pt1090TM1()0BULLSEYE96NOT_BULLSEYE
pt2???????
pt3???????
pt4???????
pt5???????
pt6???????
etc.etc.etc.etc.etc.etc.etc.etc.

You then rotate your arm down to -90° on the vertical, placing it by your side.  You release the dart, and … it hits the floor — narrowly missing your pinky toe!  Again, it’s about eight feet ( 96 inches ) from the center.  Still NOT_BULLSEYE.

You next set your arm vertically at , but rotate it out horizontally 90°.  You throw, and … the dart hits the side wall.  Also about eight feet ( 96 inches ), and yet again NOT_BULLSEYE.

You still believe in TM1(  ), so you give it one more try.  You rotate your arm horizontally across you chest to -90°, let the dart go, and … it hits the other side wall, about eight feet ( 96 inches ) away from the center.  NOT_BULLSEYE once more.

This is what you have so far:

Mad Dart Throwing Skill, Yo! → TM( ) = ?
TM1( hori_angle, vert_angle ) = 0, BULLSEYE always


BULLSEYE → ( predicted ) target variable is less than or equal to ¾ inch
NOT_BULLSEYE → ( predicted ) target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)Label
pt1090TM1()0BULLSEYE96NOT_BULLSEYE
pt20-90TM1()0BULLSEYE96NOT_BULLSEYE
pt3900TM1()0BULLSEYE96NOT_BULLSEYE
pt4-900TM1()0BULLSEYE96NOT_BULLSEYE
pt5???????
pt6???????
etc.etc.etc.etc.etc.etc.etc.etc.

You’re forced to admit that TM1(  ) is not accurate at all.  You analyze how far off the mark you’ve been by calculating the error.  For this problem, the error is simply the distance between where you predicted the dart would land and where it actually landed.  In mathspeak, it’s the absolute value of the predicted target variable minus the actual target variable: | ptv - tv |.

Mad Dart Throwing Skill, Yo! → TM( ) = ?
TM1( hori_angle, vert_angle ) = 0, BULLSEYE always


BULLSEYE → ( predicted ) target variable is less than or equal to ¾ inch
NOT_BULLSEYE → ( predicted ) target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)LabelError
pt1090TM1()0BULLSEYE96NOT_BULLSEYE96
pt20-90TM1()0BULLSEYE96NOT_BULLSEYE96
pt3900TM1()0BULLSEYE96NOT_BULLSEYE96
pt4-900TM1()0BULLSEYE96NOT_BULLSEYE96
pt5????????
pt6????????
etc.etc.etc.etc.etc.etc.etc.etc.etc.

At this point, the machine learning algorithm in your head adjusts TM1( ) by feeding the errors into a couple of fancy-schmancy statistical equations that came pre-natally installed in your brain.  Let’s call your updated skill, TM2( ), and let’s say it assumes that you’ll hit dead on center when the magnitudes of both of your shoulder angles are less than 45°, but that you’ll miss by two inches when the magnitude of either angle is greater than 45°.

Here are the results of your next few throws:

Mad Dart Throwing Skill, Yo! → TM( ) = ?
TM2( ) = 0, BULLSEYE when both hori_angle and vert_angle < 45°,
         2, NOT_BULLSEYE when either hori_angle or vert_angle ≥ 45°


BULLSEYE → ( predicted ) target variable is less than or equal to ¾ inch
NOT_BULLSEYE → ( predicted ) target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)LabelError
pt1090TM1()0BULLSEYE96NOT_BULLSEYE96
pt20-90TM1()0BULLSEYE96NOT_BULLSEYE96
pt3900TM1()0BULLSEYE96NOT_BULLSEYE96
pt4-900TM1()0BULLSEYE96NOT_BULLSEYE96
pt5060TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt60-60TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt7600TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt8-600TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt9030TM2()0BULLSEYE9NOT_BULLSEYE9
pt100-30TM2()0BULLSEYE9NOT_BULLSEYE9
pt11300TM2()0BULLSEYE9NOT_BULLSEYE9
pt12-300TM2()0BULLSEYE9NOT_BULLSEYE9
pt13????????
pt14????????
etc.etc.etc.etc.etc.etc.etc.etc.etc.

The errors are smaller, but there’s still a lot of room for improvement. The good news, though, is that for practice throws five thru eight, you did not expect to hit the bullseye — which matched what actually happened. So, a tiny bit of progress!

You repeat the process over and over, improving your trained model every few throws — until by the 100th throw and the 14th update to your skill, you get the following:

Mad Dart Throwing Skill, Yo! → TM( ) = TM14( )

BULLSEYE → ( predicted ) target variable is less than or equal to ¾ inch
NOT_BULLSEYE → ( predicted ) target variable is greater than ¾ inch
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)LabelError
pt1090TM1()0BULLSEYE96NOT_BULLSEYE96
pt20-90TM1()0BULLSEYE96NOT_BULLSEYE96
pt3900TM1()0BULLSEYE96NOT_BULLSEYE96
pt4-900TM1()0BULLSEYE96NOT_BULLSEYE96
pt5060TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt60-60TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt7600TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt8-600TM2()2NOT_BULLSEYE37NOT_BULLSEYE35
pt9030TM2()0BULLSEYE9NOT_BULLSEYE9
pt100-30TM2()0BULLSEYE9NOT_BULLSEYE9
pt11300TM2()0BULLSEYE9NOT_BULLSEYE9
pt12-300TM2()0BULLSEYE9NOT_BULLSEYE9
pt27143TM6()0.2BULLSEYE7NOT_BULLSEYE6.8
pt281225TM6()5NOT_BULLSEYE13NOT_BULLSEYE8
pt29-2014TM6()0.2BULLSEYE5NOT_BULLSEYE4.8
pt30-16-2TM6()0.1BULLSEYE6NOT_BULLSEYE5.9
pt58-23TM10()0.3BULLSEYE1NOT_BULLSEYE0.7
pt598-4TM10()0.5BULLSEYE0.15BULLSEYE0.35
pt6012TM10()0.1BULLSEYE0.2BULLSEYE0.1
pt610-3TM10()3BULLSEYE4NOT_BULLSEYE1
pt961-1TM14()0.1BULLSEYE0.2BULLSEYE0.1
pt97-21TM14()0.1BULLSEYE0.7NOT_BULLSEYE0.6
pt984-5TM14()0.2BULLSEYE0.15BULLSEYE0.05
pt9987TM14()0.8NOT_BULLSEYE0.6NOT_BULLSEYE0.2
pt100-62TM14()0.2BULLSEYE0.1BULLSEYE0.1

It’s starting to look pretty good.  You’ve managed to hit the bullseye a number of times, and for the times you didn’t, you expected to miss.  You decide that TM14(  ) is accurate enough.  You could continue, but your arm is starting to feel like spaghetti.

Your brain stores a representation of TM14(  ) in a biochemical format, and does it in a way that allows it to quickly and easily plug in new shoulder angles to accurately predict where the dart will land.  Unfortunately, for the reason once again that we’re using a pen-n-paper system, we need to write out TM14(  ) in a much less efficient form — as a mathematical equation.  Let’s get a rough idea of what the equation looks like by plotting out the data and eyeballing it on a graph.

3D-graph generated at Geogebra.org

Seems like your throws are tracing out a paraboloid pattern.

3D-graph generated at Geogebra.org
Paraboloids are represented by equations of the form, ax2 + by2 + c = z, where a, b, and c are constants.  If we plug in our feature variables, TM14(  ) probably looks something like:
a * ( horzn_angle )2 + b * ( vert_angle )2 + c = where_dart_lands

You would use some fancy math to determine what a, b, and c are.

Basically, the equation is saying that the wider the angles your shoulder takes, the farther the dart will land from the center.  Which makes sense.

We can even use the graph to predict labels.  We’ll represent the label definitions like this:

3D-graph generated at Geogebra.org

A flat plane ( colored brown in the above image ) slices through the paraboloid where the target variable equals 0.75 inches.  Any point below this plane is labeled BULLSEYE, and any point above it is labeled NOT_BULLSEYE.

This tells us that if we want to hit the BULLSEYE, we need to keep our shoulder angles as close to as possible in order to maximize our chances.  Which also makes sense.

For comparison, this is what TM1(  ) looks like on a graph:

3D-graph generated at Geogebra.org

It’s also a flat plane, but it sits on the xy-axes, where the target variable always equals zero.

Here’s TM2(  ):

3D-graph generated at Geogebra.org

TM3( ) maybe took this form:

3D-graph generated at Geogebra.org

TM4( ) may have looked like this:

3D-graph generated at Geogebra.org

As you can see, when you update your trained model, you are essentially molding it like clay until it wraps as tightly around the data points as possible.

And that’s pretty much supervised learning in a nutshell!  Eazy-peezy, right?

The next time you meet your friends, you decide to impress them with your newly minted trained model.  And boy, are they impressed!  You manage to hit the bullseye in eight of ten throws.  They especially like how you exponentiate your shoulder angles — it really accentuates your eyes.

Features To Right Of You,
Features To Left Of You,
Features In Front Of You!

One final note — in the real world, machine learning systems have to deal with hundreds, thousands, even millions of features.  Many may seem at first glance to impact the target variable in some way.  But only on closer inspection would one realize that they don’t actually have any effect at all.  For example, wind could potentially affect your dart throws.  But since you’re practicing indoors, probably not.  While your brain is very good at weeding out irrelevant details, machine learning algorithms need a little help.

Fortunately, there are a whole bunch of tools like feature selection, principle component analysis ( PCA ), linear discriminant analysis ( LDA ), etc., that can help filter out the useless features from the truly impactful ones.

Congratulations!  Now that you understand supervised learning, you graduate from the Cool Kidz Academy School for Cool Kidz!  As a graduation gift, here are a pair of shades and some ice cubes to complete your look.

(sunglasses image by Paweł Ludziński, ice cubes image by Bruno / Germany, both from Pixabay)
0 0 votes
Rate This Article!
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x