The best way, in my view, to understand a field is to understand the reason why the field exists in the first place, Why do we need a field like machine learning? In short, what problems does it solve and why?
Let’s start with an analogy, something you do practically every morning: you wake up and get ready to go to work. What problems do you need to solve? For one, you need to put on some clothes to protect your body from the weather and your feet against the rough surfaces you might encounter. You need to perhaps cover your head with a hat or a scarf and protect your eyes with sunglasses against the harsh rays of the sun. These are the problems we need to solve in getting dressed.
Algorithms are like clothes and shoes, hats and scarves and sunglasses, continuing the analogy from above. You could wear sneakers, dress shoes, or high heels. You could wear a T-shirt, a dress shirt, a full length skirt and so on. Clothes and shoes are ways to solve the problem of dressing up for work. Which clothes you wear and what shoes you put on may vary, depending on the occasion and the weather. Similarly, which machine learning algorithm you use may depend on the problem, the data, the distribution of instances etc. The lesson from the fashion industry is quite apt and worth remembering. Problems never change (you always need something to cover your feet), but algorithms change often (new styles of clothes and shoes get created every week or month). Don’t waste time learning fashionable solutions when they will become like yesterday’s newspaper. Problems last, algorithms don’t!
There’s a tendency, unfortunately, of recommending universal solutions to machine learning these days (e.g., learn TensorFlow and code up every algorithm as stochastic gradient descent using a deep neural net). To me, this makes just about as much sense as wrapping yourself up in your bedsheets to go to work. Sure, it covers most parts of your body, and probably could do the job, but it’s a one size fits all approach that neither shows any style or taste, nor any understanding of the machine learning (or dressing) problem.
The machine learning community has spent over four decades trying to understand how to pose the problem of machine learning. Start by understanding a few of these formulations, and resist the temptation to view every machine learning problem through a simplified lens (like supervised learning, one of dozens of ways of posing ML problems). The major categories include unsupervised learning, the most important, followed by reinforcement learning (learning by trial and error, the most widely prevalent in children after unsupervised learning), and finally supervised learning (which occurs rather late, because it requires labels and language, which young children mostly lack in early years). Transfer learning is growing in importance as labeled data is expensive and hard to collect for every new problem. There’s lifelong learning, and online learning, and so on. One of the deepest and most interesting areas of machine learning is the theory of probably approximately correct (or PAC) learning. This is a fascinating area, which looks at the problem of how we can give guarantees that a machine learning algorithm will work reliably or will produce a sufficiently accurate answer. Whether you understand PAC learning or not tells me if you are a ML scientist, or an ML engineer.
The most basic formulation of machine learning, and the one that gets short shrift in many popular expositions, is learning a “representation”. What does this even mean? Take the number “three”. I could write it using three strokes III, or as 11, or as 3. These correspond to the unary, binary, and decimal representations. The latter was invented in India more than 2000 years ago. Remarkably, the Greeks, for all their wisdom, never discovered the use of 0 (zero), and never invented decimal numbers. Claude Shannon, the famed inventor of information theory, popularized binary representations for computers in a famous MS thesis at MIT in the early part of the 20th century.
What does it mean for a computer to “learn” a representation? Take a selfie and imagine writing a program to identify your image (or your spouse or your pet) from the image. The phone uses one representation for the image (usually something like JPEG, which is mathematically called the Fourier basis). It turns out this basis is a terrible representation for machine learning. There are many better representations, and new ones get invented all the time. A representation is like the material that makes up your dress. There’s cotton and polyester and wool and nylon. Each of these has its strengths and weaknesses. Similarly, different representations of input data have their pros and cons. Resist the temptation to view one representation as superior to all the others.
Humans spend most of their day solving sequential tasks (driving, eating, typing, walking, etc.). All of these require making a sequence of decisions, and learning such tasks involves reinforcement learning. Without RL, we would not get very far. Sadly, all textbooks of ML ignore this most basic and important area, to their discredit. Fortunately, there are excellent specialized books that cover this area.
Let me end with two famous maxims from the legendary physicist Richard Feynman about learning a topic. First, he said: “What I cannot create, I do not understand”. What he meant that unless you can recreate an idea or an algorithm yourself, you probably haven’t understood it well enough. Second, he said: “Know how to solve every problem that has already been solved”. This second maxim is to make sure you understand what has been done previously. For most of us, these are hard principles to follow, but to the extent you can follow them, you will find your way to complete mastery over any field, including machine learning. Good luck!
I discovered AI in early 1982, when I chanced on Doug Hofstadter’s Pulitzer-winning first book — still his best — “Godel, Escher, Bach: An Eternal Golden Braid” in a book fair in New Delhi, India. It changed my life. I’ve spent the better part of the past almost four decades in AI in research in ML and AI, seeing it emerge from the shadows being studied by a small number of enthusiasts like me, to today, where it seems to rule the tech industry as a trillion dollar technology.
Then, as now, the biggest problem with AI is how to make accurate predictions. Herbert Simon, one of the founders of the field, made some wild predictions of where the field would be in 10 years, back in the 1960s. He was wrong, by a factor of about 50 years. But, as predictions go, that’s not all that bad.
Most sci fi flicks of that era predicted flying cars by now — recall the beginning scene of Blade Runner — and other than in Chitty Chitty Bang Bang and some James Bond movies, we don’t seem remotely close to getting flying cars. Compared to such predictions, AI has done rather well. Thanks to a whole host of related inventions, from the smartphone to the internet and cloud computing, the reach of AI is more pervasive than ever. Where will AI be in the next 50 years?
This gets to my answer to the question. 40 years ago, I found machine learning the most fascinating field I could possibly study. I don’t think that any longer. The strengths and weaknesses of machine learning have become apparent in the ensuing decades. It’s best to explain this by an analogy, and I love analogies (as does Doug Hofstadter, as his most recent book “Surfaces and Essences” is subtitled “Analogy as the fuel and fire of thinking”).
Imagine you are fascinated, as our ancients were, by the possibility of manned human-powered flight. Every culture known to me has humans soaring in the air like birds in their mythology. In Greek mythology, Daedalus invented wings of wax to help him and his son, Icarus, escape from imprisonment. Sadly, Icarus flew too close to the sun, not heeding his father’s warning, and perished to his death. Indian mythology is riddled with stories of flying machines.
We now have flying machines that whisk us across continents at the speed of sound. But, we need huge airports, mile long runways, jet fuel, seat belts, and all the paraphernalia of modern air travel (don’t get me started on TSA background checks). Where’s our dream of human powered flight, soaring like birds? Gone into mythology, where it shall remain.
ML is in a similar state. Many of us 40 years ago dreamed our machines would learn like us, like children, curious about the world, learn fluency in many languages, help us in our old age, and become our intellectual companions. Alas, that’s largely a pipe dream.
Modern ML, like modern air travel, is a completely different enterprise. It needs huge labeled datasets (now in the petabytes). It’s notoriously brittle, as recent single pixel attacks have shown how vulnerable deep learning, our best ML technology, is. If you cater to its every whim, it can be successful, but it is no match for human learning, as a modern 747 is no match for the common garden sparrows that flit about my backyard.
So, I am curious whether AI will ever reach a state when it will lead to truly intelligent machines that can soar in the sky, like birds do without all the trappings of modern airliners, or will it for ever be consigned to the same fate, needing mile long runways, jet fuel, and seat belts and TSA background checks.
So, like the great MLK, I too have a dream. I dream of the day when machine learners will become like human learners, be like children, eternally curious about the world, not be dependent on terabyte sized labeled datasets, and careful human parameter and architecture tweaking. Is this a pipe dream? Will we get to this promised land? Or, like modern air travel, is this to be our fate to be relegated to intrusive TSA checks when we feel like soaring like the birds?
The best way, in my view, to understand a field is to understand the reason why the field exists in the first place, Why do we need a field like machine learning? In short, what problems does it solve and why?
Let’s start with an analogy, something you do practically every morning: you wake up and get ready to go to work. What problems do you need to solve? For one, you need to put on some clothes to protect your body from the weather and your feet against the rough surfaces you might encounter. You need to perhaps cover your head with a hat or a scarf and protect your eyes with sunglasses against the harsh rays of the sun. These are the problems we need to solve in getting dressed.
Algorithms are like clothes and shoes, hats and scarves and sunglasses, continuing the analogy from above. You could wear sneakers, dress shoes, or high heels. You could wear a T-shirt, a dress shirt, a full length skirt and so on. Clothes and shoes are ways to solve the problem of dressing up for work. Which clothes you wear and what shoes you put on may vary, depending on the occasion and the weather. Similarly, which machine learning algorithm you use may depend on the problem, the data, the distribution of instances etc. The lesson from the fashion industry is quite apt and worth remembering. Problems never change (you always need something to cover your feet), but algorithms change often (new styles of clothes and shoes get created every week or month). Don’t waste time learning fashionable solutions when they will become like yesterday’s newspaper. Problems last, algorithms don’t!
There’s a tendency, unfortunately, of recommending universal solutions to machine learning these days (e.g., learn TensorFlow and code up every algorithm as stochastic gradient descent using a deep neural net). To me, this makes just about as much sense as wrapping yourself up in your bedsheets to go to work. Sure, it covers most parts of your body, and probably could do the job, but it’s a one size fits all approach that neither shows any style or taste, nor any understanding of the machine learning (or dressing) problem.
The machine learning community has spent over four decades trying to understand how to pose the problem of machine learning. Start by understanding a few of these formulations, and resist the temptation to view every machine learning problem through a simplified lens (like supervised learning, one of dozens of ways of posing ML problems). The major categories include unsupervised learning, the most important, followed by reinforcement learning (learning by trial and error, the most widely prevalent in children after unsupervised learning), and finally supervised learning (which occurs rather late, because it requires labels and language, which young children mostly lack in early years). Transfer learning is growing in importance as labeled data is expensive and hard to collect for every new problem. There’s lifelong learning, and online learning, and so on. One of the deepest and most interesting areas of machine learning is the theory of probably approximately correct (or PAC) learning. This is a fascinating area, which looks at the problem of how we can give guarantees that a machine learning algorithm will work reliably or will produce a sufficiently accurate answer. Whether you understand PAC learning or not tells me if you are a ML scientist, or an ML engineer.
The most basic formulation of machine learning, and the one that gets short shrift in many popular expositions, is learning a “representation”. What does this even mean? Take the number “three”. I could write it using three strokes III, or as 11, or as 3. These correspond to the unary, binary, and decimal representations. The latter was invented in India more than 2000 years ago. Remarkably, the Greeks, for all their wisdom, never discovered the use of 0 (zero), and never invented decimal numbers. Claude Shannon, the famed inventor of information theory, popularized binary representations for computers in a famous MS thesis at MIT in the early part of the 20th century.
What does it mean for a computer to “learn” a representation? Take a selfie and imagine writing a program to identify your image (or your spouse or your pet) from the image. The phone uses one representation for the image (usually something like JPEG, which is mathematically called the Fourier basis). It turns out this basis is a terrible representation for machine learning. There are many better representations, and new ones get invented all the time. A representation is like the material that makes up your dress. There’s cotton and polyester and wool and nylon. Each of these has its strengths and weaknesses. Similarly, different representations of input data have their pros and cons. Resist the temptation to view one representation as superior to all the others.
Humans spend most of their day solving sequential tasks (driving, eating, typing, walking, etc.). All of these require making a sequence of decisions, and learning such tasks involves reinforcement learning. Without RL, we would not get very far. Sadly, all textbooks of ML ignore this most basic and important area, to their discredit. Fortunately, there are excellent specialized books that cover this area.
Let me end with two famous maxims from the legendary physicist Richard Feynman about learning a topic. First, he said: “What I cannot create, I do not understand”. What he meant that unless you can recreate an idea or an algorithm yourself, you probably haven’t understood it well enough. Second, he said: “Know how to solve every problem that has already been solved”. This second maxim is to make sure you understand what has been done previously. For most of us, these are hard principles to follow, but to the extent you can follow them, you will find your way to complete mastery over any field, including machine learning. Good luck!
That’s awesome! I am into learning more about machine learning engineering 🚀🔥
Glad to hear it! More to come.
Thank you. ❤️
Thank you!
I discovered AI in early 1982, when I chanced on Doug Hofstadter’s Pulitzer-winning first book — still his best — “Godel, Escher, Bach: An Eternal Golden Braid” in a book fair in New Delhi, India. It changed my life. I’ve spent the better part of the past almost four decades in AI in research in ML and AI, seeing it emerge from the shadows being studied by a small number of enthusiasts like me, to today, where it seems to rule the tech industry as a trillion dollar technology.
Then, as now, the biggest problem with AI is how to make accurate predictions. Herbert Simon, one of the founders of the field, made some wild predictions of where the field would be in 10 years, back in the 1960s. He was wrong, by a factor of about 50 years. But, as predictions go, that’s not all that bad.
Most sci fi flicks of that era predicted flying cars by now — recall the beginning scene of Blade Runner — and other than in Chitty Chitty Bang Bang and some James Bond movies, we don’t seem remotely close to getting flying cars. Compared to such predictions, AI has done rather well. Thanks to a whole host of related inventions, from the smartphone to the internet and cloud computing, the reach of AI is more pervasive than ever. Where will AI be in the next 50 years?
This gets to my answer to the question. 40 years ago, I found machine learning the most fascinating field I could possibly study. I don’t think that any longer. The strengths and weaknesses of machine learning have become apparent in the ensuing decades. It’s best to explain this by an analogy, and I love analogies (as does Doug Hofstadter, as his most recent book “Surfaces and Essences” is subtitled “Analogy as the fuel and fire of thinking”).
Imagine you are fascinated, as our ancients were, by the possibility of manned human-powered flight. Every culture known to me has humans soaring in the air like birds in their mythology. In Greek mythology, Daedalus invented wings of wax to help him and his son, Icarus, escape from imprisonment. Sadly, Icarus flew too close to the sun, not heeding his father’s warning, and perished to his death. Indian mythology is riddled with stories of flying machines.
We now have flying machines that whisk us across continents at the speed of sound. But, we need huge airports, mile long runways, jet fuel, seat belts, and all the paraphernalia of modern air travel (don’t get me started on TSA background checks). Where’s our dream of human powered flight, soaring like birds? Gone into mythology, where it shall remain.
ML is in a similar state. Many of us 40 years ago dreamed our machines would learn like us, like children, curious about the world, learn fluency in many languages, help us in our old age, and become our intellectual companions. Alas, that’s largely a pipe dream.
Modern ML, like modern air travel, is a completely different enterprise. It needs huge labeled datasets (now in the petabytes). It’s notoriously brittle, as recent single pixel attacks have shown how vulnerable deep learning, our best ML technology, is. If you cater to its every whim, it can be successful, but it is no match for human learning, as a modern 747 is no match for the common garden sparrows that flit about my backyard.
So, I am curious whether AI will ever reach a state when it will lead to truly intelligent machines that can soar in the sky, like birds do without all the trappings of modern airliners, or will it for ever be consigned to the same fate, needing mile long runways, jet fuel, and seat belts and TSA background checks.
So, like the great MLK, I too have a dream. I dream of the day when machine learners will become like human learners, be like children, eternally curious about the world, not be dependent on terabyte sized labeled datasets, and careful human parameter and architecture tweaking. Is this a pipe dream? Will we get to this promised land? Or, like modern air travel, is this to be our fate to be relegated to intrusive TSA checks when we feel like soaring like the birds?