Unsupervised Learning

Spread the Love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
(image by harshal07 from Pixabay)

If supervised learning is target practice, then what is unsupervised learning?  Well, these are learning problems where the training data have no target values nor labels — hence they’re unsupervised.

Using the dart throwing example, it’d be like throwing darts without a dartboard.  Wait, what?  If you have nothing to aim at, then what’s the point?  Exactly!  There is no point!  Unsupervised learning is completely useless.

Just like art.

Ha, ha, just kidding!  It’s a joke, it’s a joke!  For heaven’s sake, all you art majors put your pitchforks down!  Jeez-louize!  What a sensitive crowd!

School's Out For The Summer!

Because the training data have no target values, it is up to you as the learning model to provide the values.  And if the situation warrants, the labels as well.  Whereas supervised learning allows for only one particular set of answers, unsupervised learning pretty much accepts any answer you give.

Mad Dart Throwing Skill, Yo! → TM( ) = ?

BULLSEYE → target variable is ?
NOT_BULLSEYE → target variable is ?
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)LabelError
pt1143TM1()n/an/a??n/a
pt21225TM1()n/an/a??n/a
pt3-2014TM1()n/an/a??n/a
pt4-16-2TM1()n/an/a??n/a
pt5-23TM1()n/an/a??n/a
pt69-30TM1()n/an/a??n/a
pt7-172TM1()n/an/a??n/a
pt810TM1()n/an/a??n/a
pt91-1TM1()n/an/a??n/a
pt10-21TM1()n/an/a??n/a
pt114-5TM1()n/an/a??n/a
pt1287TM1()n/an/a??n/a
etc.etc.etc.etc.etc.etc.etc.etc.etc.

Hmm, let’s see here … .  How about we label the darts this way?

Mad Dart Throwing Skill, Yo! → TM( ) = TM1( )

BULLSEYE → target variable is less than or equal to the size of the room
NOT_BULLSEYE → target variable is greater than the size of the room
Practice ThrowHorizontal Shoulder AngleVertical Shoulder AngleTM( ) UsedPredicted Landing (Predicted Target Variable)Predicted LabelWhere Dart Lands (Target Variable)LabelError
pt1143TM1()n/an/a7BULLSEYEn/a
pt21225TM1()n/an/a3BULLSEYEn/a
pt3-2014TM1()n/an/a12BULLSEYEn/a
pt4-16-2TM1()n/an/a5BULLSEYEn/a
pt5-23TM1()n/an/a8BULLSEYEn/a
pt69-30TM1()n/an/a11BULLSEYEn/a
pt7-172TM1()n/an/a1BULLSEYEn/a
pt810TM1()n/an/a0.4BULLSEYEn/a
pt91-1TM1()n/an/a3BULLSEYEn/a
pt10-21TM1()n/an/a6BULLSEYEn/a
pt114-5TM1()n/an/a9BULLSEYEn/a
pt1287TM1()n/an/a7BULLSEYEn/a
etc.etc.etc.etc.etc.etc.etc.etc.etc.

Wow, you’re a dart throwing champion!  This is literally the definition of a participation trophy!

But, hold on!  Before you jump for joy and start chanting, “Woo-hoo!  No more pencils, no more books, no more teachers’ dirty looks! … ,” you need to understand that this task isn’t as easy as it seems.

You’re puzzled.  “How is this not easy?  If any answer is acceptable, then for each data point I can simply assign an arbitrary target value, slap on a random label, and then call it a day!”

Well, technically, yes.  While there are no right or wrong answers, the point of machine learning is to detect patterns.  Arbitrary values and random labels are not patterns.  Finding patterns means that you need to learn what the relationships among the data points are and then assign target values and labels that explicitly describe those relationships.

You become dejected.  “So, even when there’s nothing to learn, we still have to learn?”

Yes.

It's Complicated

Unfortunately, rarely will you ever find simple datasets where each data point has only one type of relationship with other data points.  Most data come with a whole host of features, traits, and characteristics.  Data points will almost always have a diversed variety of relationships with one another.  For example, let’s take a look at the demographic data of the people living in your neighborhood.  We can break them down by age, gender, race, and occupation:

PersonAgeGenderRaceOccupation
person117malewhitehigh school student
person226femalebrownartist
person311femaleblackelementary school student
person456maleyellowneurosurgeon
person535femalesunshinesalesperson
person6162malegoldyouth counselor
person732malegreenfarmer
person887femalepurpleamethyst collector
person945femalewisteriabotanist
person1028itchromecybernetics engineer
etc.etc.etc.etc.etc.

And from their social networking profiles, we also have data on who’s friends with whom:

PersonAgeGenderRaceOccupationFriends
person117malewhitehigh school studentgroup1
person226femalebrownartistgroup2
person311femaleblackelementary school studentgroup1
person456maleyellowneurosurgeongroup2
person535femalesunshinesalespersongroup2
person6162malegoldyouth counselorgroup3
person732malegreenfarmergroup4
person887femalepurpleamethyst collectorgroup5
person945femalewisteriabotanistgroup3
person1028itchromecybernetics engineergroup6
etc.etc.etc.etc.etc.etc.

Say you want to build a recommender system that suggests movies to each person. You can use the demographic information to determine what people of similar age, race, gender, and occupation often watch and make recommendations based on that:

PersonAgeGenderRaceOccupationFriendsLabel
person117malewhitehigh school studentgroup1Friday Night Lights
person226femalebrownartistgroup2Sisterhood of the Traveling Pants
person311femaleblackelementary school studentgroup1My Little Pony: The Movie
person456maleyellowneurosurgeongroup2The Theory of Everything
person535femalesunshinesalespersongroup2Rachel Ray Presents
person6162malegoldyouth counselorgroup3Methusaleh
person732malegreenfarmergroup4Star Trek: To Andromeda!
person887femalepurpleamethyst collectorgroup5Barney
person945femalewisteriabotanistgroup3Desperate Housewives: More Desperate Than Ever
person1028itchromecybernetics engineergroup6Pinocchio
etc.etc.etc.etc.etc.etc.etc.

Or, you can make suggestions based on what their friends have watched.

PersonAgeGenderRaceOccupationFriendsLabel
person117malewhitehigh school studentgroup1Diary of a Wimpy Kid
person226femalebrownartistgroup2Shaun of the Dead
person311femaleblackelementary school studentgroup1Diary of a Wimpy Kid
person456maleyellowneurosurgeongroup2Shaun of the Dead
person535femalesunshinesalespersongroup2Shaun of the Dead
person6162malegoldyouth counselorgroup3Fast Times at Richmond High: The New Class
person732malegreenfarmergroup4A Christmas Carol
person887femalepurpleamethyst collectorgroup5Searching for Bobby Fisher
person945femalewisteriabotanistgroup3Fast Times at Richmond High: The New Class
person1028itchromecybernetics engineergroup6Bridges of Madison County
etc.etc.etc.etc.etc.etc.etc.

As you can see, by using different criteria, you end up with a completely different set of recommendations.  The more relationships you unearth, the more labels you can potentially assign.  While unsupervised learning isn’t random, it’s still very subjective.

Whereas supervised learning has tools such as feature selection, principal component analysis ( PCA ), linear discriminant analysis ( LDA ), etc., to filter out unnecessary features, unsupervised learning offers no such options.  What features you emphasis depends ultimately on what you consider important.  The second recommender system, for example, considers friendships more important than identity.

A Beautiful Mind

Here’s a new dataset and its corresponding graph.

Data PointFeature 1Feature 2
dp132
dp269
dp3219
dp41819
dp5172
dp6149
dp7109
dp8106
dp91011
dp1083
dp1195
dp12115
dp13123
dp1479
dp15139
dp16718
dp171318
dp18213
dp191813
dp2034
dp21174
dp22914
dp231114
dp24216
dp251816
dp2647
dp27167
dp28311
dp291711
dp301519
dp31519
dp321013
dp3362
dp34142
dp35816
dp361216
dp37811
dp381211
dp3987
dp40127
dp41517
dp42813
dp431213
dp441517
dp45315
dp46612
dp471316
dp481715
dp49317
dp501613
dp51413
dp52716
dp531717
dp541412
dp5565
dp56145
dp5754
dp58154
dp5929
dp60189
dp611017
dp62102

Do you notice the pattern?

How about if I label the data points this way:

Data PointFeature 1Feature 2Label
dp1321
dp2691
dp32191
dp418191
dp51721
dp61491
dp71091
dp81061
dp910111
dp10831
dp11951
dp121151
dp131231
dp14791
dp151391
dp167181
dp1713181
dp182131
dp1918131
dp20341
dp211741
dp229141
dp2311141
dp242161
dp2518161
dp26471
dp271671
dp283111
dp2917111
dp3015191
dp315191
dp3210131
dp33621
dp341421
dp358161
dp3612161
dp37811-1
dp381211-1
dp3987-1
dp40127-1
dp41517-1
dp42813-1
dp431213-1
dp441517-1
dp45315-1
dp46612-1
dp471316-1
dp481715-1
dp49317-1
dp501613-1
dp51413-1
dp52716-1
dp531717-1
dp541412-1
dp5565-1
dp56145-1
dp5754-1
dp58154-1
dp5929-1
dp60189-1
dp611017-1
dp62102-1

Pretty, innit?  I call it the butterfly algorithm.

But, wait!  There’s actually another pattern hidden in the data.  Can you see it?

How about if I label the data points like this:

Data PointFeature 1Feature 2Label
dp132-1
dp269-1
dp3219-1
dp41819-1
dp5172-1
dp6149-1
dp7109-1
dp81061
dp910111
dp1083-1
dp1195-1
dp12115-1
dp13123-1
dp14791
dp151391
dp16718-1
dp171318-1
dp18213-1
dp191813-1
dp2034-1
dp21174-1
dp22914-1
dp231114-1
dp24216-1
dp251816-1
dp2647-1
dp27167-1
dp28311-1
dp291711-1
dp301519-1
dp31519-1
dp321013-1
dp3362-1
dp34142-1
dp35816-1
dp361216-1
dp378111
dp3812111
dp39871
dp401271
dp415171
dp428131
dp4312131
dp4415171
dp453151
dp466121
dp4713161
dp4817151
dp493171
dp5016131
dp514131
dp527161
dp5317171
dp5414121
dp5565-1
dp56145-1
dp5754-1
dp58154-1
dp5929-1
dp60189-1
dp611017-1
dp62102-1

Now do you see it?  I call it the bunny rabbit algorithm.

What do we normally call this process of “connecting the dots” in a way that no one else has before — to reveal previously hidden relationships?  Creativity.  And boy, does unsupervised learning offer plenty of room for that!

Just this dataset alone, where we have only two labels and 62 data points, we still get 262 = 4,611,686,018,427,388,000 different possible sets of labels!  That’s four quintillion!  Granted, most of them are random labelings — but even if we’re able to eliminate, say, 99% of these sets, that still leaves 46,116,860,184,273,880, or 46 quadrillion possibilities showing some kind of pattern.

We do have to be careful, though, that the patterns we perceive indeed describe real relationships intrinsic to the data themselves, and not extrinsic patterns that exist only from the observer’s perspective.  Like, when we see shapes in cloud formations, stellar constellations, or Rorschach tests.

Or when we engage in conspiracy theories.  And, yes, machine learning systems are susceptible to conspiracy theories, too.  Oh, sure, the machines may scoff at us for thinking there are UFOs in Area 51.  But that’s only because they know the military actually stores them in Area 53.

Birds of a Feather

Sadly, if you read a machine learning textbook or take a machine learning course, you will never see any mention of beautiful butterfly algorithms, nor adorable bunny rabbit algorithms.  That’s because no one has been successful in utilizing them for anything.

YET.  I still hold out hope that we’ll find a use for them someday.

What you will see a lot of, however, are clusters:

Heavens to Betsy!  This looks like a frightful rash!  The sheer ugliness of clusters would offend the sensibilities of artists everywhere!

Well, maybe not Jackson Pollock.

Clustering algorithms group data points based on how similar they are with one another.  The idea behind this is that data points that have many features, traits, and characteristics in common tend to “congregate”  near one another if plotted out in a featurespace.  It subscribes to the idea that “birds of a feather, flock together.”  In other words, if it walks like a duck, quacks like a duck, and swims like a duck — then chances are, it’s chain-smoking in the bathroom and hanging out with the bad kids.  You’ve repeatedly told the duck to stay away from those kids, but it just won’t listen.

What constitutes similar is up for debate.  Like everything else in unsupervised learning, you have a dizzying array of options to choose from.  And, there are a zillion different clustering algorithms that utilize these similarity measures in different ways.

The reason why these algorithms are so popular — despite being ugly AF — is because they’ve proven to be very useful.  Here’s a marketing example.  The graph above represents the heights and weights of various Marvel comic characters:

Data based on the following sources:
» Superhero Database
» Height Scale for Marvel Characters
» 27 Marvel Comics Characters Who've Gained The Hulk's Powers
» Who Are Marvel's Smallest and Tallest Characters?

Cluster 1 → Monstrosity
Cluster 2 → Above Average Build
Cluster 3 → Average Build
Cluster 4 → Petite Build
CharacterF1 Weight (in lbs)F2 Height (in inches)Label
Incredible Hulk ( Bruce Banner )70089Monstrosity
Sasquatch ( Walter Langowski )64094Monstrosity
Hemingway48094Monstrosity
Juggernaut ( Cain Marko )65099Monstrosity
Colosus ( Peter Rasputin )51089Monstrosity
Man-Thing ( Dr. Theodore Stills )50585Monstrosity
Red Hulk ( General Thaddeus Ross )68084Monstrosity
Apocalypse ( En Sabah Nur )33084Monstrosity
Omega Red ( Arkady Rossovich )42583Monstrosity
Hellboy ( Anung Un Rama )39583Monstrosity
Abomination ( Emil Blonsky )44580Monstrosity
A-Bomb ( Rick Jones )44580Monstrosity
Red She-Hulk / Betsy Ross48080Monstrosity
Thanos44779Monstrosity
She-Hulk ( Jennifer Walters )36079Monstrosity
Doc Samson ( Dr. Leonard Skivorski, Jr. )38078Monstrosity
Deathlok ( Luther Manning )39576Monstrosity
Totally Awesome Hulk ( Amadeus Cho )43576Monstrosity
Thing ( Ben Grimm )54072Monstrosity
Yondu Udonta21086Above Average Build
Cable ( Nathan Summers )33080Above Average Build
Thor Odinson29078Above Average Build
Rhino ( Aleksei Sytsevich )32077Above Average Build
Mister Sinister ( Nathaniel Essex )28577Above Average Build
Iron Man ( Tony Stark )22577Above Average Build
Silver Surfer ( Norrin Radd )24077Above Average Build
Hawkeye ( Clint Barton )23075Above Average Build
Cyclops ( Scott Summers )19575Above Average Build
Venom ( Eddie Brock )26075Above Average Build
Captain America ( Steve Rogers )22074Above Average Build
Adam Warlock24074Above Average Build
Dr. Stephen Strange18074Average Build
Sandman ( William Baker )20573Average Build
Mr. Fantastic ( Reed Richards )18073Average Build
Black Panther ( T’Challa )21072Average Build
Professor X ( Charles Francis Xavier )19072Average Build
Archangel ( Warren Kenneth Worthington III )15072Average Build
Green Goblin ( Norman Osborn )18571Average Build
Spiderman ( Peter Parker )16570Average Build
Agent Phillip J. Coulson20569Average Build
Rogue ( Anna Marie )12069Average Build
Iceman ( Bobby Drake )14568Average Build
Black Widow ( Natasha Romanoff )13067Average Build
Vanisher17565Petite Build
Jubilee ( Jubilation Lee )11565Petite Build
Quill ( Max Jordan )12064Petite Build
Wolverine ( Logan )28063Petite Build
Firestar ( Angelica Jones )12562Petite Build
Franklin Richards ( Powerhouse )10056Petite Build
Batwing ( James Santini )11050Petite Build
Rocket Raccoon5548Petite Build
Fusion ( Hubert & Pinky Fusser )16048Petite Build
Gargouille ( Lavina LeBlanc )7544Petite Build
Puck ( Eugene Judd )22542Petite Build
Fader4540Petite Build

If we group together characters of similar build, we can tailor our ads to particular individuals.  The hope is that the more similar the people in a group are, the more likely it is that they have to deal with similar problems — and hence be interested in the same products / solutions.  For example, the Hulk is always needing new pants.  One can target the following ad to him:

Had another “freakout” and need new pants again?

Why not try our stretchy pants instead?  No matter what bizarre changes your body goes thru, these pants will never rip, tear, nor break apart!  Buy one pair, and it’ll last you a lifetime!

Available in purple, dark purple, royal purple, deep violet, and lavender.

(image by kleefeld from Pixabay)

Bruce Banner upon seeing this would exclaim, “Oh my god, yes! It’s like this ad is speaking to me!”  He would immediately place an order for 100 pairs of stretchy pants.  But if we show it to Black Widow on the other hand, she would probably think, “Why would I need stretchy pants?  I’m, like, always a size 4!”  The ad may not be effective on her, but other Marvel characters in similar situations as Banner is may find this ad very helpful.

Behold!  Beauty!

Okay, you’ve slogged thru all the many, many options.  You’ve narrowed down the features to just the ones you want.  You’ve researched your data thoroughly and are certain that there are real intrinsic relationships, not just pseudo-patterns.  You’ve investigated all the different clustering algorithms and have settled on the one that seems best to fit you needs.  And now, after all that, you finally end up with a bunch of clusters.  But, how do you know whether these clusters have correctly classified your data set?  To determine that, you would need to use a validation measure.

Unfortunately, as you should expect by now, there are a mind-boggling number of such measures.

Even more unfortunate, these measures don’t actually tell you whether your clusters are correct or not.  They can only tell you how closely your clusters fit a particular set of criteria.  In other words, they can only determine how “pretty” your clusters look.  Validation measures are aesthetics metrics.  Remember, there are no right or wrong answers in unsupervised learning.  Aesthetics measures are about the best we can do.

Each measure defines “beauty” differenlty.  That there are so many different definitions simply proves that beauty truly is in the eye of the beholder!

Most measures use one or both of these metrics — compactness and separateness.  Compactness measures how varied the data points within each cluster are.  The less varied, the higher the compactness.  Separateness measures how “tangled” your clusters are.  If they’re intertwined together, your clusters have low separateness.  If they’re far apart, they get a high separateness score.

The Ugly Duckling

Unfortunately, separateness and compactness tend to bias towards clusters that are “clumpy” and ball-shaped.  They consider these types of clusters to be more beautiful than, say, clusters that are long and stringy.  Which means they tend to favor algorithms like k-Means.

K-Means is an iterative algorithm that places each data point into the cluster whose center ( a. k. a. the mean ) is closest to it.  Once all the data points are assigned, the centers are recalculated — because when new data points are added to a cluster, its center necessarily shifts.  The algorithm  then checks every data point again to see if a different cluster is now closer to it.  If so, the data point gets reassigned.  The algorithm repeats this process over and over until no more data points get reassigned.

This type of algorithm likes to produce clumpy, ball-shaped clusters that would score highly on most validation measures.  Unfortunately, most datasets are not clumpy and ball-shaped.  Here are two examples:

K-Means would cluster like so:

There are a bunch of algorithms, such as DBSCAN, that attempts to organize data similar to the way we humans do.  They’re density-based, which means they subscribe to the idea that data points within clusters are more tightly packed together than the areas between clusters.  So, DBSCAN would cluster the two data sets like so:

Unfortunately, it’s difficult to measure how “beautiful” these kinds of clusters are.  Most validation measures would rate them very low — close to falling-out-of-an-ugly-tree-and-hitting-all-the-branches-on-the-way-down low.

To get a better measure, you would need to “roll”  your own.  First, you would create synthetic data that represents what your ideal “perfect 10”  cluster would look like — a.k.a. the ground truth.  Then, you would use an external validation measure, like mutual information, F-measure, etc., that compares your clusters to the ground truth to see how close they come to your standards.

This would be like you creating a checklist before going on a date, and then seeing how many items your date checks off:

  • “You have a pet?  How wonderful, I love pets!  Check!”
    • “Oh, it’s a rabbit?  I’m allergic to rabbits.  Uncheck!”
  • “You graduated from an Ivy League?  That’s fantastic!  Check!”
    • “Wait, what?  It’s Dartmouth?!  Double uncheck!”
  • “You’re a lawyer?  Brilliant!  Triple check!”
    • “Excuse me?  You became a public defender to help poor people?!  Quadruple uncheck!”

Best Way to Tackle an Unsupervised Learning Problem

All this is to say, there are no easy answers in unsupervised learning.  To keep from getting overwhelmed by the myriad of choices, the best way to begin an unsupervised learning problem is to look within.  You need to first do some serious soul-searching to determine what actually matters to you, and how you would define success and failure.  Then, let that insight guide you towards the right features, the right clustering algorithm, the right measure, etc. that reflects what’s important to you.

Machines require much more hand-holding when doing unsupervised learning problems than when doing supervised learning problems.  This is because machines don’t do soul-searching.  They are only capable of carrying out instructions given to them.

With supervised learning, you can simply unleash the machine.  Like a rabid dog, it will relentlessly hunt down the right answers — or die trying.  With unsupervised learning, however, it will act more like an overly enthusiastic puppy with attention deficit disorder — chasing beautiful butterflies, adorable bunny rabbits, and whatever else crosses its path — unless you give it directionYOU have to do the self-analysis, and then design the machine to reflect YOUR values.

Blogging Improves Your Skin and Removes Wrinkles!

And finally, writing this very blog is an unsupervised learning task.  While there is no right way or wrong way to produce a blog, I also didn’t randomly throw words together to create gibberish.  Before doing the actual writing, I spent a lot of time trying to figure out what I wanted to accomplish.  I eventually settled on writing in such a way that would enable anyone reading my posts to be able to understand the broad strokes of machine learning without needing to wade through four years of computer science study.

This objective has guided me in every decision regarding the design, the tone, and the topics that I write about:

  • I limit the number of fancy-schmancy equations.
  • I maintain a conversational tone.
  • I try to keep my posts light-hearted.
  • And, most importantly, I write about you — in the hopes that you find yourself interesting and will stay engaged — eager to read on about your latest and greatest adventures.

As for determining how “pretty” my blog looks, my validation measure will be based on the comments you leave – letting me know if I’ve succeeded in helping you acquire a better understanding of this field.

(image by Gerd Altmann from Pixabay)

“Doc, I have this recurring dream where a mean-ass looking clown is beating me senseless with his giant red shoes.  What does it mean?”

“Well, I can only think of two possibilities.  Either, A) you once really were beaten senseless by a mean-ass looking clown with his giant red shoes.  Or, B) you were obsessed with scaling Mt. Everest.  The clown in your dream represents the trauma your body had had to endure while you were in training and then when you were making the actual climb.”

“Oh my gosh, how did you know?!  Yes, a while back, I was deeply inspired to climb Mt. Everest.  My friends and family all I thought I was crazy, but I was determined.

“I knew my body wasn’t ready to take on such an arduous climb, so I spent the next two years training to improve my stamina and conditioning.  Oh man, the pain!  The torn muscles, the broken bones.  Numerous times I seriously thought about quitting, but I persevered.  When I finally felt ready, I flew over there and tackled the challenge.

“God, it was the hardest thing I’d ever done.  I frequently wanted to head back down.  But somehow, I just kept pressing on.  Until, finally, with my last ounce of strength, I pulled myself up to the very top!  It was so exhilarating!

“Unfortunately, that feeling was short-lived.  I turned around and ended up face-to-face with a mean-ass looking clown — who proceeded to beat me senseless with his giant red shoes.  How did you get all that from just my dream, Doc?”

0 0 votes
Rate This Article!
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x