How to correctly apply the same data transformation , used on the training dataset , on real data in a...












2












$begingroup$


Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?










share|improve this question









New contributor




Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    2












    $begingroup$


    Let's say I used minmaxscaler while creating my model.
    Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?










    share|improve this question









    New contributor




    Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      2












      2








      2


      1



      $begingroup$


      Let's say I used minmaxscaler while creating my model.
      Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?










      share|improve this question









      New contributor




      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      Let's say I used minmaxscaler while creating my model.
      Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?







      machine-learning data






      share|improve this question









      New contributor




      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 3 hours ago









      Ethan

      574224




      574224






      New contributor




      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 16 hours ago









      BlenzusBlenzus

      638




      638




      New contributor




      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            15 hours ago










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            15 hours ago






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            14 hours ago



















          1












          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :




          1. Load scaler from file

          2. Use this instance of scaler for scaling input values



          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler"    
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$













          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            16 hours ago






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            15 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Blenzus is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            15 hours ago










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            15 hours ago






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            14 hours ago
















          2












          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            15 hours ago










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            15 hours ago






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            14 hours ago














          2












          2








          2





          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$



          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 15 hours ago

























          answered 15 hours ago









          Dan CarterDan Carter

          7751218




          7751218








          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            15 hours ago










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            15 hours ago






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            14 hours ago














          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            15 hours ago










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            15 hours ago






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            14 hours ago








          1




          1




          $begingroup$
          Thanks, this is what i was looking for
          $endgroup$
          – Blenzus
          15 hours ago




          $begingroup$
          Thanks, this is what i was looking for
          $endgroup$
          – Blenzus
          15 hours ago












          $begingroup$
          Does this apply to dummy variables?
          $endgroup$
          – Blenzus
          15 hours ago




          $begingroup$
          Does this apply to dummy variables?
          $endgroup$
          – Blenzus
          15 hours ago




          1




          1




          $begingroup$
          If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
          $endgroup$
          – Dan Carter
          14 hours ago




          $begingroup$
          If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
          $endgroup$
          – Dan Carter
          14 hours ago











          1












          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :




          1. Load scaler from file

          2. Use this instance of scaler for scaling input values



          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler"    
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$













          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            16 hours ago






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            15 hours ago
















          1












          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :




          1. Load scaler from file

          2. Use this instance of scaler for scaling input values



          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler"    
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$













          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            16 hours ago






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            15 hours ago














          1












          1








          1





          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :




          1. Load scaler from file

          2. Use this instance of scaler for scaling input values



          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler"    
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$



          You need to save minmaxscaler (along with model). In Flask app, you can :




          1. Load scaler from file

          2. Use this instance of scaler for scaling input values



          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler"    
          scaler = joblib.load(scaler_filename)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 16 hours ago









          Shamit VermaShamit Verma

          1,009210




          1,009210












          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            16 hours ago






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            15 hours ago


















          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            16 hours ago






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            15 hours ago
















          $begingroup$
          Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
          $endgroup$
          – Blenzus
          16 hours ago




          $begingroup$
          Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
          $endgroup$
          – Blenzus
          16 hours ago




          1




          1




          $begingroup$
          You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
          $endgroup$
          – Shamit Verma
          15 hours ago




          $begingroup$
          You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
          $endgroup$
          – Shamit Verma
          15 hours ago










          Blenzus is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Blenzus is a new contributor. Be nice, and check out our Code of Conduct.













          Blenzus is a new contributor. Be nice, and check out our Code of Conduct.












          Blenzus is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Bruad Bilen | Luke uk diar | NawigatsjuunCommonskategorii: BruadCommonskategorii: RunstükenWikiquote: Bruad

          Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

          He _____ here since 1970 . Answer needed [closed]What does “since he was so high” mean?Meaning of “catch birds for”?How do I ensure “since” takes the meaning I want?“Who cares here” meaningWhat does “right round toward” mean?the time tense (had now been detected)What does the phrase “ring around the roses” mean here?Correct usage of “visited upon”Meaning of “foiled rail sabotage bid”It was the third time I had gone to Rome or It is the third time I had been to Rome