What is the purpose of using a decision tree?What is the purpose of using a decision tree?Should I use decision trees to predict user preferences?Deciding attributes for decision treesWhat does “degree of freedom” mean in neural networks?The efficiency of Decision TreeComparing learning methods for facial recognitionWhat are the most common machine learning algorithms applied to binary categorical data?Decision Tree Quality MetricModeling failure “events” in time-series environmental dataWhat is Oblivious Decision Tree and Why?What happen to gain ratio when information gain is 0?

Can I use my Chinese passport to enter China after I acquired another citizenship?

Greatest common substring

What are the ramifications of creating a homebrew world without an Astral Plane?

Why Were Madagascar and New Zealand Discovered So Late?

What defines a dissertation?

Was Spock the First Vulcan in Starfleet?

What is the opposite of 'gravitas'?

How was Earth single-handedly capable of creating 3 of the 4 gods of chaos?

Products and sum of cubes in Fibonacci

Hostile work environment after whistle-blowing on coworker and our boss. What do I do?

Can I Retrieve Email Addresses from BCC?

Teaching indefinite integrals that require special-casing

What to do with wrong results in talks?

What's the purpose of "true" in bash "if sudo true; then"

Is a roofing delivery truck likely to crack my driveway slab?

when is out of tune ok?

Time travel short story where a man arrives in the late 19th century in a time machine and then sends the machine back into the past

Mapping a list into a phase plot

What would happen if the UK refused to take part in EU Parliamentary elections?

Bash method for viewing beginning and end of file

Print name if parameter passed to function

Can a monster with multiattack use this ability if they are missing a limb?

Why "be dealt cards" rather than "be dealing cards"?

Cynical novel that describes an America ruled by the media, arms manufacturers, and ethnic figureheads



What is the purpose of using a decision tree?


What is the purpose of using a decision tree?Should I use decision trees to predict user preferences?Deciding attributes for decision treesWhat does “degree of freedom” mean in neural networks?The efficiency of Decision TreeComparing learning methods for facial recognitionWhat are the most common machine learning algorithms applied to binary categorical data?Decision Tree Quality MetricModeling failure “events” in time-series environmental dataWhat is Oblivious Decision Tree and Why?What happen to gain ratio when information gain is 0?













8












$begingroup$


I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?










share|cite|improve this question









$endgroup$







  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33















8












$begingroup$


I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?










share|cite|improve this question









$endgroup$







  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33













8












8








8


1



$begingroup$


I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?










share|cite|improve this question









$endgroup$




I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?







machine-learning






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Mar 19 at 12:26









5791357913

452




452







  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33












  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33







6




6




$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28




$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28




6




6




$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54




$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54












$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung
Mar 19 at 18:59




$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung
Mar 19 at 18:59












$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33




$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33










3 Answers
3






active

oldest

votes


















21












$begingroup$


The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






share|cite|improve this answer









$endgroup$








  • 2




    $begingroup$
    The other big plus is being accessible for human inspection ("aaah, so that's why!").
    $endgroup$
    – dedObed
    Mar 19 at 21:46






  • 1




    $begingroup$
    yeah decision trees are great for explaining to people without stats background because they are very intuitive.
    $endgroup$
    – qwr
    Mar 19 at 21:57


















1












$begingroup$

Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



Any set of partitions will approximate the function but some are clearly better than others.



Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






share|cite|improve this answer











$endgroup$




















    0












    $begingroup$

    A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



    Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



    Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.




    For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






    share|cite|improve this answer









    $endgroup$












      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "65"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398322%2fwhat-is-the-purpose-of-using-a-decision-tree%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      21












      $begingroup$


      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






      share|cite|improve this answer









      $endgroup$








      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57















      21












      $begingroup$


      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






      share|cite|improve this answer









      $endgroup$








      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57













      21












      21








      21





      $begingroup$


      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






      share|cite|improve this answer









      $endgroup$




      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbbR$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_i+1$ we try to approximize $f|_(a_i, a_i+1)$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Mar 19 at 13:25









      Fabian WernerFabian Werner

      1,621516




      1,621516







      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57












      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57







      2




      2




      $begingroup$
      The other big plus is being accessible for human inspection ("aaah, so that's why!").
      $endgroup$
      – dedObed
      Mar 19 at 21:46




      $begingroup$
      The other big plus is being accessible for human inspection ("aaah, so that's why!").
      $endgroup$
      – dedObed
      Mar 19 at 21:46




      1




      1




      $begingroup$
      yeah decision trees are great for explaining to people without stats background because they are very intuitive.
      $endgroup$
      – qwr
      Mar 19 at 21:57




      $begingroup$
      yeah decision trees are great for explaining to people without stats background because they are very intuitive.
      $endgroup$
      – qwr
      Mar 19 at 21:57













      1












      $begingroup$

      Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



      If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



      Any set of partitions will approximate the function but some are clearly better than others.



      Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



      Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






      share|cite|improve this answer











      $endgroup$

















        1












        $begingroup$

        Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



        If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



        Any set of partitions will approximate the function but some are clearly better than others.



        Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



        Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






        share|cite|improve this answer











        $endgroup$















          1












          1








          1





          $begingroup$

          Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



          If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



          Any set of partitions will approximate the function but some are clearly better than others.



          Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



          Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






          share|cite|improve this answer











          $endgroup$



          Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



          If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



          Any set of partitions will approximate the function but some are clearly better than others.



          Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



          Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Mar 22 at 6:51

























          answered Mar 21 at 21:12









          j__j__

          1,451511




          1,451511





















              0












              $begingroup$

              A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



              Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



              Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.




              For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






              share|cite|improve this answer









              $endgroup$

















                0












                $begingroup$

                A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



                Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



                Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.




                For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






                share|cite|improve this answer









                $endgroup$















                  0












                  0








                  0





                  $begingroup$

                  A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



                  Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



                  Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.




                  For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






                  share|cite|improve this answer









                  $endgroup$



                  A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



                  Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



                  Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.




                  For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Mar 19 at 13:54









                  Yves DaoustYves Daoust

                  19819




                  19819



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398322%2fwhat-is-the-purpose-of-using-a-decision-tree%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Bruad Bilen | Luke uk diar | NawigatsjuunCommonskategorii: BruadCommonskategorii: RunstükenWikiquote: Bruad

                      Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

                      He _____ here since 1970 . Answer needed [closed]What does “since he was so high” mean?Meaning of “catch birds for”?How do I ensure “since” takes the meaning I want?“Who cares here” meaningWhat does “right round toward” mean?the time tense (had now been detected)What does the phrase “ring around the roses” mean here?Correct usage of “visited upon”Meaning of “foiled rail sabotage bid”It was the third time I had gone to Rome or It is the third time I had been to Rome