Unsupervised Learning

Spread the Love

If supervised learning is target practice, then what is unsupervised learning? Well, these are learning problems where the training data have no target values nor labels — hence they’re unsupervised.

Using the dart throwing example, it’d be like throwing darts without a dartboard. Wait, what? If you have nothing to aim at, then what’s the point? Exactly! There is no point! Unsupervised learning is completely useless.

Just like art.

Ha, ha, just kidding! It’s a joke, it’s a joke! For heaven’s sake, all you art majors put your pitchforks down! Jeez-louize! What a sensitive crowd!

School's Out For The Summer!

Because the training data have no target values, it is up to you as the learning model to provide the values. And if the situation warrants, the labels as well. Whereas supervised learning allows for only one particular set of answers, unsupervised learning pretty much accepts any answer you give.

Mad Dart Throwing Skill, Yo! → TM( ) = ?

BULLSEYE → target variable is ?
NOT_BULLSEYE → target variable is ?

Practice Throw	Horizontal Shoulder Angle	Vertical Shoulder Angle	TM( ) Used	Predicted Landing (Predicted Target Variable)	Predicted Label	Where Dart Lands (Target Variable)	Label	Error
pt₁	14	3	TM₁()	n/a	n/a	?	?	n/a
pt₂	12	25	TM₁()	n/a	n/a	?	?	n/a
pt₃	-20	14	TM₁()	n/a	n/a	?	?	n/a
pt₄	-16	-2	TM₁()	n/a	n/a	?	?	n/a
pt₅	-2	3	TM₁()	n/a	n/a	?	?	n/a
pt₆	9	-30	TM₁()	n/a	n/a	?	?	n/a
pt₇	-17	2	TM₁()	n/a	n/a	?	?	n/a
pt₈	1	0	TM₁()	n/a	n/a	?	?	n/a
pt₉	1	-1	TM₁()	n/a	n/a	?	?	n/a
pt₁₀	-2	1	TM₁()	n/a	n/a	?	?	n/a
pt₁₁	4	-5	TM₁()	n/a	n/a	?	?	n/a
pt₁₂	8	7	TM₁()	n/a	n/a	?	?	n/a
etc.	etc.	etc.	etc.	etc.	etc.	etc.	etc.	etc.

Hmm, let’s see here … . How about we label the darts this way?

Mad Dart Throwing Skill, Yo! → TM( ) = TM₁( )

BULLSEYE → target variable is less than or equal to the size of the room
NOT_BULLSEYE → target variable is greater than the size of the room

Practice Throw	Horizontal Shoulder Angle	Vertical Shoulder Angle	TM( ) Used	Predicted Landing (Predicted Target Variable)	Predicted Label	Where Dart Lands (Target Variable)	Label	Error
pt₁	14	3	TM₁()	n/a	n/a	7	BULLSEYE	n/a
pt₂	12	25	TM₁()	n/a	n/a	3	BULLSEYE	n/a
pt₃	-20	14	TM₁()	n/a	n/a	12	BULLSEYE	n/a
pt₄	-16	-2	TM₁()	n/a	n/a	5	BULLSEYE	n/a
pt₅	-2	3	TM₁()	n/a	n/a	8	BULLSEYE	n/a
pt₆	9	-30	TM₁()	n/a	n/a	11	BULLSEYE	n/a
pt₇	-17	2	TM₁()	n/a	n/a	1	BULLSEYE	n/a
pt₈	1	0	TM₁()	n/a	n/a	0.4	BULLSEYE	n/a
pt₉	1	-1	TM₁()	n/a	n/a	3	BULLSEYE	n/a
pt₁₀	-2	1	TM₁()	n/a	n/a	6	BULLSEYE	n/a
pt₁₁	4	-5	TM₁()	n/a	n/a	9	BULLSEYE	n/a
pt₁₂	8	7	TM₁()	n/a	n/a	7	BULLSEYE	n/a
etc.	etc.	etc.	etc.	etc.	etc.	etc.	etc.	etc.

Wow, you’re a dart throwing champion! This is literally the definition of a participation trophy!

But, hold on! Before you jump for joy and start chanting, “Woo-hoo! No more pencils, no more books, no more teachers’ dirty looks! … ,” you need to understand that this task isn’t as easy as it seems.

You’re puzzled. “How is this not easy? If any answer is acceptable, then for each data point I can simply assign an arbitrary target value, slap on a random label, and then call it a day!”

Well, technically, yes. While there are no right or wrong answers, the point of machine learning is to detect patterns. Arbitrary values and random labels are not patterns. Finding patterns means that you need to learn what the relationships among the data points are and then assign target values and labels that explicitly describe those relationships.

You become dejected. “So, even when there’s nothing to learn, we still have to learn?”

Yes.

It's Complicated

Unfortunately, rarely will you ever find simple datasets where each data point has only one type of relationship with other data points. Most data come with a whole host of features, traits, and characteristics. Data points will almost always have a diversed variety of relationships with one another. For example, let’s take a look at the demographic data of the people living in your neighborhood. We can break them down by age, gender, race, and occupation:

Person	Age	Gender	Race	Occupation
person₁	17	male	white	high school student
person₂	26	female	brown	artist
person₃	11	female	black	elementary school student
person₄	56	male	yellow	neurosurgeon
person₅	35	female	sunshine	salesperson
person₆	162	male	gold	youth counselor
person₇	32	male	green	farmer
person₈	87	female	purple	amethyst collector
person₉	45	female	wisteria	botanist
person₁₀	28	it	chrome	cybernetics engineer
etc.	etc.	etc.	etc.	etc.

And from their social networking profiles, we also have data on who’s friends with whom:

Person	Age	Gender	Race	Occupation	Friends
person₁	17	male	white	high school student	group₁
person₂	26	female	brown	artist	group₂
person₃	11	female	black	elementary school student	group₁
person₄	56	male	yellow	neurosurgeon	group₂
person₅	35	female	sunshine	salesperson	group₂
person₆	162	male	gold	youth counselor	group₃
person₇	32	male	green	farmer	group₄
person₈	87	female	purple	amethyst collector	group₅
person₉	45	female	wisteria	botanist	group₃
person₁₀	28	it	chrome	cybernetics engineer	group₆
etc.	etc.	etc.	etc.	etc.	etc.

Say you want to build a recommender system that suggests movies to each person. You can use the demographic information to determine what people of similar age, race, gender, and occupation often watch and make recommendations based on that:

Person	Age	Gender	Race	Occupation	Friends	Label
person₁	17	male	white	high school student	group₁	Friday Night Lights
person₂	26	female	brown	artist	group₂	Sisterhood of the Traveling Pants
person₃	11	female	black	elementary school student	group₁	My Little Pony: The Movie
person₄	56	male	yellow	neurosurgeon	group₂	The Theory of Everything
person₅	35	female	sunshine	salesperson	group₂	Rachel Ray Presents
person₆	162	male	gold	youth counselor	group₃	Methusaleh
person₇	32	male	green	farmer	group₄	Star Trek: To Andromeda!
person₈	87	female	purple	amethyst collector	group₅	Barney
person₉	45	female	wisteria	botanist	group₃	Desperate Housewives: More Desperate Than Ever
person₁₀	28	it	chrome	cybernetics engineer	group₆	Pinocchio
etc.	etc.	etc.	etc.	etc.	etc.	etc.

Or, you can make suggestions based on what their friends have watched.

Person	Age	Gender	Race	Occupation	Friends	Label
person₁	17	male	white	high school student	group₁	Diary of a Wimpy Kid
person₂	26	female	brown	artist	group₂	Shaun of the Dead
person₃	11	female	black	elementary school student	group₁	Diary of a Wimpy Kid
person₄	56	male	yellow	neurosurgeon	group₂	Shaun of the Dead
person₅	35	female	sunshine	salesperson	group₂	Shaun of the Dead
person₆	162	male	gold	youth counselor	group₃	Fast Times at Richmond High: The New Class
person₇	32	male	green	farmer	group₄	A Christmas Carol
person₈	87	female	purple	amethyst collector	group₅	Searching for Bobby Fisher
person₉	45	female	wisteria	botanist	group₃	Fast Times at Richmond High: The New Class
person₁₀	28	it	chrome	cybernetics engineer	group₆	Bridges of Madison County
etc.	etc.	etc.	etc.	etc.	etc.	etc.

As you can see, by using different criteria, you end up with a completely different set of recommendations. The more relationships you unearth, the more labels you can potentially assign. While unsupervised learning isn’t random, it’s still very subjective.

Whereas supervised learning has tools such as feature selection, principal component analysis ( PCA ), linear discriminant analysis ( LDA ), etc., to filter out unnecessary features, unsupervised learning offers no such options. What features you emphasis depends ultimately on what you consider important. The second recommender system, for example, considers friendships more important than identity.

A Beautiful Mind

Here’s a new dataset and its corresponding graph.

Data Point	Feature 1	Feature 2
dp₁	3	2
dp₂	6	9
dp₃	2	19
dp₄	18	19
dp₅	17	2
dp₆	14	9
dp₇	10	9
dp₈	10	6
dp₉	10	11
dp₁₀	8	3
dp₁₁	9	5
dp₁₂	11	5
dp₁₃	12	3
dp₁₄	7	9
dp₁₅	13	9
dp₁₆	7	18
dp₁₇	13	18
dp₁₈	2	13
dp₁₉	18	13
dp₂₀	3	4
dp₂₁	17	4
dp₂₂	9	14
dp₂₃	11	14
dp₂₄	2	16
dp₂₅	18	16
dp₂₆	4	7
dp₂₇	16	7
dp₂₈	3	11
dp₂₉	17	11
dp₃₀	15	19
dp₃₁	5	19
dp₃₂	10	13
dp₃₃	6	2
dp₃₄	14	2
dp₃₅	8	16
dp₃₆	12	16
dp₃₇	8	11
dp₃₈	12	11
dp₃₉	8	7
dp₄₀	12	7
dp₄₁	5	17
dp₄₂	8	13
dp₄₃	12	13
dp₄₄	15	17
dp₄₅	3	15
dp₄₆	6	12
dp₄₇	13	16
dp₄₈	17	15
dp₄₉	3	17
dp₅₀	16	13
dp₅₁	4	13
dp₅₂	7	16
dp₅₃	17	17
dp₅₄	14	12
dp₅₅	6	5
dp₅₆	14	5
dp₅₇	5	4
dp₅₈	15	4
dp₅₉	2	9
dp₆₀	18	9
dp₆₁	10	17
dp₆₂	10	2

Do you notice the pattern?

How about if I label the data points this way:

Data Point	Feature 1	Feature 2	Label
dp₁	3	2	1
dp₂	6	9	1
dp₃	2	19	1
dp₄	18	19	1
dp₅	17	2	1
dp₆	14	9	1
dp₇	10	9	1
dp₈	10	6	1
dp₉	10	11	1
dp₁₀	8	3	1
dp₁₁	9	5	1
dp₁₂	11	5	1
dp₁₃	12	3	1
dp₁₄	7	9	1
dp₁₅	13	9	1
dp₁₆	7	18	1
dp₁₇	13	18	1
dp₁₈	2	13	1
dp₁₉	18	13	1
dp₂₀	3	4	1
dp₂₁	17	4	1
dp₂₂	9	14	1
dp₂₃	11	14	1
dp₂₄	2	16	1
dp₂₅	18	16	1
dp₂₆	4	7	1
dp₂₇	16	7	1
dp₂₈	3	11	1
dp₂₉	17	11	1
dp₃₀	15	19	1
dp₃₁	5	19	1
dp₃₂	10	13	1
dp₃₃	6	2	1
dp₃₄	14	2	1
dp₃₅	8	16	1
dp₃₆	12	16	1
dp₃₇	8	11	-1
dp₃₈	12	11	-1
dp₃₉	8	7	-1
dp₄₀	12	7	-1
dp₄₁	5	17	-1
dp₄₂	8	13	-1
dp₄₃	12	13	-1
dp₄₄	15	17	-1
dp₄₅	3	15	-1
dp₄₆	6	12	-1
dp₄₇	13	16	-1
dp₄₈	17	15	-1
dp₄₉	3	17	-1
dp₅₀	16	13	-1
dp₅₁	4	13	-1
dp₅₂	7	16	-1
dp₅₃	17	17	-1
dp₅₄	14	12	-1
dp₅₅	6	5	-1
dp₅₆	14	5	-1
dp₅₇	5	4	-1
dp₅₈	15	4	-1
dp₅₉	2	9	-1
dp₆₀	18	9	-1
dp₆₁	10	17	-1
dp₆₂	10	2	-1

Pretty, innit? I call it the butterfly algorithm.

But, wait! There’s actually another pattern hidden in the data. Can you see it?

How about if I label the data points like this:

Data Point	Feature 1	Feature 2	Label
dp₁	3	2	-1
dp₂	6	9	-1
dp₃	2	19	-1
dp₄	18	19	-1
dp₅	17	2	-1
dp₆	14	9	-1
dp₇	10	9	-1
dp₈	10	6	1
dp₉	10	11	1
dp₁₀	8	3	-1
dp₁₁	9	5	-1
dp₁₂	11	5	-1
dp₁₃	12	3	-1
dp₁₄	7	9	1
dp₁₅	13	9	1
dp₁₆	7	18	-1
dp₁₇	13	18	-1
dp₁₈	2	13	-1
dp₁₉	18	13	-1
dp₂₀	3	4	-1
dp₂₁	17	4	-1
dp₂₂	9	14	-1
dp₂₃	11	14	-1
dp₂₄	2	16	-1
dp₂₅	18	16	-1
dp₂₆	4	7	-1
dp₂₇	16	7	-1
dp₂₈	3	11	-1
dp₂₉	17	11	-1
dp₃₀	15	19	-1
dp₃₁	5	19	-1
dp₃₂	10	13	-1
dp₃₃	6	2	-1
dp₃₄	14	2	-1
dp₃₅	8	16	-1
dp₃₆	12	16	-1
dp₃₇	8	11	1
dp₃₈	12	11	1
dp₃₉	8	7	1
dp₄₀	12	7	1
dp₄₁	5	17	1
dp₄₂	8	13	1
dp₄₃	12	13	1
dp₄₄	15	17	1
dp₄₅	3	15	1
dp₄₆	6	12	1
dp₄₇	13	16	1
dp₄₈	17	15	1
dp₄₉	3	17	1
dp₅₀	16	13	1
dp₅₁	4	13	1
dp₅₂	7	16	1
dp₅₃	17	17	1
dp₅₄	14	12	1
dp₅₅	6	5	-1
dp₅₆	14	5	-1
dp₅₇	5	4	-1
dp₅₈	15	4	-1
dp₅₉	2	9	-1
dp₆₀	18	9	-1
dp₆₁	10	17	-1
dp₆₂	10	2	-1

Now do you see it? I call it the bunny rabbit algorithm.

What do we normally call this process of “connecting the dots” in a way that no one else has before — to reveal previously hidden relationships? Creativity. And boy, does unsupervised learning offer plenty of room for that!

Just this dataset alone, where we have only two labels and 62 data points, we still get 2⁶² = 4,611,686,018,427,388,000 different possible sets of labels! That’s four quintillion! Granted, most of them are random labelings — but even if we’re able to eliminate, say, 99% of these sets, that still leaves 46,116,860,184,273,880, or 46 quadrillion possibilities showing some kind of pattern.

We do have to be careful, though, that the patterns we perceive indeed describe real relationships intrinsic to the data themselves, and not extrinsic patterns that exist only from the observer’s perspective. Like, when we see shapes in cloud formations, stellar constellations, or Rorschach tests.

Or when we engage in conspiracy theories. And, yes, machine learning systems are susceptible to conspiracy theories, too. Oh, sure, the machines may scoff at us for thinking there are UFOs in Area 51. But that’s only because they know the military actually stores them in Area 53.

Birds of a Feather

Sadly, if you read a machine learning textbook or take a machine learning course, you will never see any mention of beautiful butterfly algorithms, nor adorable bunny rabbit algorithms. That’s because no one has been successful in utilizing them for anything.

YET. I still hold out hope that we’ll find a use for them someday.

What you will see a lot of, however, are clusters:

Heavens to Betsy! This looks like a frightful rash! The sheer ugliness of clusters would offend the sensibilities of artists everywhere!

Well, maybe not Jackson Pollock.

Clustering algorithms group data points based on how similar they are with one another. The idea behind this is that data points that have many features, traits, and characteristics in common tend to “congregate” near one another if plotted out in a featurespace. It subscribes to the idea that “birds of a feather, flock together.” In other words, if it walks like a duck, quacks like a duck, and swims like a duck — then chances are, it’s chain-smoking in the bathroom and hanging out with the bad kids. You’ve repeatedly told the duck to stay away from those kids, but it just won’t listen.

What constitutes similar is up for debate. Like everything else in unsupervised learning, you have a dizzying array of options to choose from. And, there are a zillion different clustering algorithms that utilize these similarity measures in different ways.

The reason why these algorithms are so popular — despite being ugly AF — is because they’ve proven to be very useful. Here’s a marketing example. The graph above represents the heights and weights of various Marvel comic characters:

Data based on the following sources:
» Superhero Database
» Height Scale for Marvel Characters
» 27 Marvel Comics Characters Who've Gained The Hulk's Powers
» Who Are Marvel's Smallest and Tallest Characters?

Cluster 1 → Monstrosity
Cluster 2 → Above Average Build
Cluster 3 → Average Build
Cluster 4 → Petite Build

Character	F1 Weight (in lbs)	F2 Height (in inches)	Label
Incredible Hulk ( Bruce Banner )	700	89	Monstrosity
Sasquatch ( Walter Langowski )	640	94	Monstrosity
Hemingway	480	94	Monstrosity
Juggernaut ( Cain Marko )	650	99	Monstrosity
Colosus ( Peter Rasputin )	510	89	Monstrosity
Man-Thing ( Dr. Theodore Stills )	505	85	Monstrosity
Red Hulk ( General Thaddeus Ross )	680	84	Monstrosity
Apocalypse ( En Sabah Nur )	330	84	Monstrosity
Omega Red ( Arkady Rossovich )	425	83	Monstrosity
Hellboy ( Anung Un Rama )	395	83	Monstrosity
Abomination ( Emil Blonsky )	445	80	Monstrosity
A-Bomb ( Rick Jones )	445	80	Monstrosity
Red She-Hulk / Betsy Ross	480	80	Monstrosity
Thanos	447	79	Monstrosity
She-Hulk ( Jennifer Walters )	360	79	Monstrosity
Doc Samson ( Dr. Leonard Skivorski, Jr. )	380	78	Monstrosity
Deathlok ( Luther Manning )	395	76	Monstrosity
Totally Awesome Hulk ( Amadeus Cho )	435	76	Monstrosity
Thing ( Ben Grimm )	540	72	Monstrosity
Yondu Udonta	210	86	Above Average Build
Cable ( Nathan Summers )	330	80	Above Average Build
Thor Odinson	290	78	Above Average Build
Rhino ( Aleksei Sytsevich )	320	77	Above Average Build
Mister Sinister ( Nathaniel Essex )	285	77	Above Average Build
Iron Man ( Tony Stark )	225	77	Above Average Build
Silver Surfer ( Norrin Radd )	240	77	Above Average Build
Hawkeye ( Clint Barton )	230	75	Above Average Build
Cyclops ( Scott Summers )	195	75	Above Average Build
Venom ( Eddie Brock )	260	75	Above Average Build
Captain America ( Steve Rogers )	220	74	Above Average Build
Adam Warlock	240	74	Above Average Build
Dr. Stephen Strange	180	74	Average Build
Sandman ( William Baker )	205	73	Average Build
Mr. Fantastic ( Reed Richards )	180	73	Average Build
Black Panther ( TChalla )	210	72	Average Build
Professor X ( Charles Francis Xavier )	190	72	Average Build
Archangel ( Warren Kenneth Worthington III )	150	72	Average Build
Green Goblin ( Norman Osborn )	185	71	Average Build
Spiderman ( Peter Parker )	165	70	Average Build
Agent Phillip J. Coulson	205	69	Average Build
Rogue ( Anna Marie )	120	69	Average Build
Iceman ( Bobby Drake )	145	68	Average Build
Black Widow ( Natasha Romanoff )	130	67	Average Build
Vanisher	175	65	Petite Build
Jubilee ( Jubilation Lee )	115	65	Petite Build
Quill ( Max Jordan )	120	64	Petite Build
Wolverine ( Logan )	280	63	Petite Build
Firestar ( Angelica Jones )	125	62	Petite Build
Franklin Richards ( Powerhouse )	100	56	Petite Build
Batwing ( James Santini )	110	50	Petite Build
Rocket Raccoon	55	48	Petite Build
Fusion ( Hubert & Pinky Fusser )	160	48	Petite Build
Gargouille ( Lavina LeBlanc )	75	44	Petite Build
Puck ( Eugene Judd )	225	42	Petite Build
Fader	45	40	Petite Build

If we group together characters of similar build, we can tailor our ads to particular individuals. The hope is that the more similar the people in a group are, the more likely it is that they have to deal with similar problems — and hence be interested in the same products / solutions. For example, the Hulk is always needing new pants. One can target the following ad to him:

Had another “freakout” and need new pants again?

Why not try our stretchy pants instead? No matter what bizarre changes your body goes thru, these pants will never rip, tear, nor break apart! Buy one pair, and it’ll last you a lifetime!

Available in purple, dark purple, royal purple, deep violet, and lavender.

(image by kleefeld from Pixabay)

Bruce Banner upon seeing this would exclaim, “Oh my god, yes! It’s like this ad is speaking to me!” He would immediately place an order for 100 pairs of stretchy pants. But if we show it to Black Widow on the other hand, she would probably think, “Why would I need stretchy pants? I’m, like, always a size 4!” The ad may not be effective on her, but other Marvel characters in similar situations as Banner is may find this ad very helpful.

Behold! Beauty!

Okay, you’ve slogged thru all the many, many options. You’ve narrowed down the features to just the ones you want. You’ve researched your data thoroughly and are certain that there are real intrinsic relationships, not just pseudo-patterns. You’ve investigated all the different clustering algorithms and have settled on the one that seems best to fit you needs. And now, after all that, you finally end up with a bunch of clusters. But, how do you know whether these clusters have correctly classified your data set? To determine that, you would need to use a validation measure.

Unfortunately, as you should expect by now, there are a mind-boggling number of such measures.

Even more unfortunate, these measures don’t actually tell you whether your clusters are correct or not. They can only tell you how closely your clusters fit a particular set of criteria. In other words, they can only determine how “pretty” your clusters look. Validation measures are aesthetics metrics. Remember, there are no right or wrong answers in unsupervised learning. Aesthetics measures are about the best we can do.

Each measure defines “beauty” differenlty. That there are so many different definitions simply proves that beauty truly is in the eye of the beholder!

Most measures use one or both of these metrics — compactness and separateness. Compactness measures how varied the data points within each cluster are. The less varied, the higher the compactness. Separateness measures how “tangled” your clusters are. If they’re intertwined together, your clusters have low separateness. If they’re far apart, they get a high separateness score.

The Ugly Duckling

Unfortunately, separateness and compactness tend to bias towards clusters that are “clumpy” and ball-shaped. They consider these types of clusters to be more beautiful than, say, clusters that are long and stringy. Which means they tend to favor algorithms like k-Means.

K-Means is an iterative algorithm that places each data point into the cluster whose center ( a. k. a. the mean ) is closest to it. Once all the data points are assigned, the centers are recalculated — because when new data points are added to a cluster, its center necessarily shifts. The algorithm then checks every data point again to see if a different cluster is now closer to it. If so, the data point gets reassigned. The algorithm repeats this process over and over until no more data points get reassigned.

This type of algorithm likes to produce clumpy, ball-shaped clusters that would score highly on most validation measures. Unfortunately, most datasets are not clumpy and ball-shaped. Here are two examples:

K-Means would cluster like so:

There are a bunch of algorithms, such as DBSCAN, that attempts to organize data similar to the way we humans do. They’re density-based, which means they subscribe to the idea that data points within clusters are more tightly packed together than the areas between clusters. So, DBSCAN would cluster the two data sets like so:

Unfortunately, it’s difficult to measure how “beautiful” these kinds of clusters are. Most validation measures would rate them very low — close to falling-out-of-an-ugly-tree-and-hitting-all-the-branches-on-the-way-down low.

To get a better measure, you would need to “roll” your own. First, you would create synthetic data that represents what your ideal “perfect 10” cluster would look like — a.k.a. the ground truth. Then, you would use an external validation measure, like mutual information, F-measure, etc., that compares your clusters to the ground truth to see how close they come to your standards.

This would be like you creating a checklist before going on a date, and then seeing how many items your date checks off:

“You have a pet? How wonderful, I love pets! Check!”
- “Oh, it’s a rabbit? I’m allergic to rabbits. Uncheck!”
“You graduated from an Ivy League? That’s fantastic! Check!”
- “Wait, what? It’s Dartmouth?! Double uncheck!”
“You’re a lawyer? Brilliant! Triple check!”
- “Excuse me? You became a public defender to help poor people?! Quadruple uncheck!”

Best Way to Tackle an Unsupervised Learning Problem

All this is to say, there are no easy answers in unsupervised learning. To keep from getting overwhelmed by the myriad of choices, the best way to begin an unsupervised learning problem is to look within. You need to first do some serious soul-searching to determine what actually matters to you, and how you would define success and failure. Then, let that insight guide you towards the right features, the right clustering algorithm, the right measure, etc. that reflects what’s important to you.

Machines require much more hand-holding when doing unsupervised learning problems than when doing supervised learning problems. This is because machines don’t do soul-searching. They are only capable of carrying out instructions given to them.

With supervised learning, you can simply unleash the machine. Like a rabid dog, it will relentlessly hunt down the right answers — or die trying. With unsupervised learning, however, it will act more like an overly enthusiastic puppy with attention deficit disorder — chasing beautiful butterflies, adorable bunny rabbits, and whatever else crosses its path — unless you give it direction. YOU have to do the self-analysis, and then design the machine to reflect YOUR values.

Blogging Improves Your Skin and Removes Wrinkles!

And finally, writing this very blog is an unsupervised learning task. While there is no right way or wrong way to produce a blog, I also didn’t randomly throw words together to create gibberish. Before doing the actual writing, I spent a lot of time trying to figure out what I wanted to accomplish. I eventually settled on writing in such a way that would enable anyone reading my posts to be able to understand the broad strokes of machine learning without needing to wade through four years of computer science study.

This objective has guided me in every decision regarding the design, the tone, and the topics that I write about:

I limit the number of fancy-schmancy equations.
I maintain a conversational tone.
I try to keep my posts light-hearted.
And, most importantly, I write about you — in the hopes that you find yourself interesting and will stay engaged — eager to read on about your latest and greatest adventures.

As for determining how “pretty” my blog looks, my validation measure will be based on the comments you leave – letting me know if I’ve succeeded in helping you acquire a better understanding of this field.

“Doc, I have this recurring dream where a mean-ass looking clown is beating me senseless with his giant red shoes. What does it mean?”

“Well, I can only think of two possibilities. Either, A) you once really were beaten senseless by a mean-ass looking clown with his giant red shoes. Or, B) you were obsessed with scaling Mt. Everest. The clown in your dream represents the trauma your body had had to endure while you were in training and then when you were making the actual climb.”

“Oh my gosh, how did you know?! Yes, a while back, I was deeply inspired to climb Mt. Everest. My friends and family all I thought I was crazy, but I was determined.

“I knew my body wasn’t ready to take on such an arduous climb, so I spent the next two years training to improve my stamina and conditioning. Oh man, the pain! The torn muscles, the broken bones. Numerous times I seriously thought about quitting, but I persevered. When I finally felt ready, I flew over there and tackled the challenge.

“God, it was the hardest thing I’d ever done. I frequently wanted to head back down. But somehow, I just kept pressing on. Until, finally, with my last ounce of strength, I pulled myself up to the very top! It was so exhilarating!

“Unfortunately, that feeling was short-lived. I turned around and ended up face-to-face with a mean-ass looking clown — who proceeded to beat me senseless with his giant red shoes. How did you get all that from just my dream, Doc?”

0 0 votes

Rate This Article!