What is the purpose of using a decision tree?What is the purpose of using a decision tree?Should I use decision trees to predict user preferences?Deciding attributes for decision treesWhat does “degree of freedom” mean in neural networks?The efficiency of Decision TreeComparing learning methods for facial recognitionWhat are the most common machine learning algorithms applied to binary categorical data?Decision Tree Quality MetricModeling failure “events” in time-series environmental dataWhat is Oblivious Decision Tree and Why?What happen to gain ratio when information gain is 0?
Can I use my Chinese passport to enter China after I acquired another citizenship?
Greatest common substring
What are the ramifications of creating a homebrew world without an Astral Plane?
Why Were Madagascar and New Zealand Discovered So Late?
What defines a dissertation?
Was Spock the First Vulcan in Starfleet?
What is the opposite of 'gravitas'?
How was Earth single-handedly capable of creating 3 of the 4 gods of chaos?
Products and sum of cubes in Fibonacci
Hostile work environment after whistle-blowing on coworker and our boss. What do I do?
Can I Retrieve Email Addresses from BCC?
Teaching indefinite integrals that require special-casing
What to do with wrong results in talks?
What's the purpose of "true" in bash "if sudo true; then"
Is a roofing delivery truck likely to crack my driveway slab?
when is out of tune ok?
Time travel short story where a man arrives in the late 19th century in a time machine and then sends the machine back into the past
Mapping a list into a phase plot
What would happen if the UK refused to take part in EU Parliamentary elections?
Bash method for viewing beginning and end of file
Print name if parameter passed to function
Can a monster with multiattack use this ability if they are missing a limb?
Why "be dealt cards" rather than "be dealing cards"?
Cynical novel that describes an America ruled by the media, arms manufacturers, and ethnic figureheads
What is the purpose of using a decision tree?
What is the purpose of using a decision tree?Should I use decision trees to predict user preferences?Deciding attributes for decision treesWhat does “degree of freedom” mean in neural networks?The efficiency of Decision TreeComparing learning methods for facial recognitionWhat are the most common machine learning algorithms applied to binary categorical data?Decision Tree Quality MetricModeling failure “events” in time-series environmental dataWhat is Oblivious Decision Tree and Why?What happen to gain ratio when information gain is 0?
$begingroup$
I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?
I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)
Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?
Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?
machine-learning
$endgroup$
add a comment |
$begingroup$
I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?
I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)
Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?
Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?
machine-learning
$endgroup$
6
$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28
6
$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54
$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung♦
Mar 19 at 18:59
$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33
add a comment |
$begingroup$
I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?
I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)
Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?
Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?
machine-learning
$endgroup$
I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?
I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)
Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?
Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?
machine-learning
machine-learning
asked Mar 19 at 12:26
5791357913
452
452
6
$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28
6
$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54
$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung♦
Mar 19 at 18:59
$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33
add a comment |
6
$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28
6
$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54
$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung♦
Mar 19 at 18:59
$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33
6
6
$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28
$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28
6
6
$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54
$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54
$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung♦
Mar 19 at 18:59
$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung♦
Mar 19 at 18:59
$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33
$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?
You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.
Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.
You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.
$endgroup$
2
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
1
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
add a comment |
$begingroup$
Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.
If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.
Any set of partitions will approximate the function but some are clearly better than others.
Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.
Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.
$endgroup$
add a comment |
$begingroup$
A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.
Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.
Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.
For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398322%2fwhat-is-the-purpose-of-using-a-decision-tree%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?
You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.
Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.
You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.
$endgroup$
2
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
1
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
add a comment |
$begingroup$
The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?
You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.
Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.
You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.
$endgroup$
2
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
1
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
add a comment |
$begingroup$
The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?
You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.
Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.
You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.
$endgroup$
The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?
You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.
Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.
You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.
answered Mar 19 at 13:25
Fabian WernerFabian Werner
1,621516
1,621516
2
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
1
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
add a comment |
2
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
1
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
2
2
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
$begingroup$
The other big plus is being accessible for human inspection ("aaah, so that's why!").
$endgroup$
– dedObed
Mar 19 at 21:46
1
1
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
$begingroup$
yeah decision trees are great for explaining to people without stats background because they are very intuitive.
$endgroup$
– qwr
Mar 19 at 21:57
add a comment |
$begingroup$
Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.
If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.
Any set of partitions will approximate the function but some are clearly better than others.
Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.
Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.
$endgroup$
add a comment |
$begingroup$
Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.
If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.
Any set of partitions will approximate the function but some are clearly better than others.
Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.
Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.
$endgroup$
add a comment |
$begingroup$
Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.
If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.
Any set of partitions will approximate the function but some are clearly better than others.
Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.
Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.
$endgroup$
Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.
If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.
Any set of partitions will approximate the function but some are clearly better than others.
Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.
Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.
edited Mar 22 at 6:51
answered Mar 21 at 21:12
j__j__
1,451511
1,451511
add a comment |
add a comment |
$begingroup$
A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.
Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.
Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.
For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.
$endgroup$
add a comment |
$begingroup$
A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.
Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.
Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.
For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.
$endgroup$
add a comment |
$begingroup$
A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.
Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.
Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.
For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.
$endgroup$
A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.
Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.
Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.
For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.
answered Mar 19 at 13:54
Yves DaoustYves Daoust
19819
19819
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398322%2fwhat-is-the-purpose-of-using-a-decision-tree%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28
6
$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54
$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung♦
Mar 19 at 18:59
$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33