How to correctly apply the same data transformation , used on the training dataset , on real data in a...

Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?

edited 3 hours ago

Ethan

574224

asked 16 hours ago

Blenzus

638

New contributor

add a comment |

edited 3 hours ago

Ethan

574224

asked 16 hours ago

Blenzus

638

New contributor

add a comment |

edited 3 hours ago

Ethan

574224

asked 16 hours ago

Blenzus

638

New contributor

machine-learning data

edited 3 hours ago

Ethan

574224

asked 16 hours ago

Blenzus

638

New contributor

edited 3 hours ago

Ethan

574224

asked 16 hours ago

Blenzus

638

New contributor

edited 3 hours ago

Ethan

574224

edited 3 hours ago

Ethan

574224

edited 3 hours ago

Ethan

574224

asked 16 hours ago

Blenzus

638

New contributor

asked 16 hours ago

Blenzus

638

asked 16 hours ago

Blenzus

638

New contributor

Blenzus is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.

from sklearn.pipeline import Pipeline

from sklearn.externals import joblib



pipeline = Pipeline([

                ('normalization', MinMaxScaler()),

                ('classifier', RandomForestClassifier())

            ])



joblib.dump(pipeline, 'transform_predict.joblib')

You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:

 pipeline = load('transform_predict.joblib') 

 predictions = pipeline.predict(new_data)

edited 15 hours ago

answered 15 hours ago

Dan Carter

7751218

1

$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
15 hours ago

$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
15 hours ago

1

$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
14 hours ago

add a comment |

You need to save minmaxscaler (along with model). In Flask app, you can :

Load scaler from file

Use this instance of scaler for scaling input values

#While training

from sklearn.externals import joblib

scaler_filename = "saved_scaler"

joblib.dump(scaler, scaler_filename)

In Flask App

scaler_filename = "saved_scaler"    

scaler = joblib.load(scaler_filename)

answered 16 hours ago

Shamit Verma

1,009210

$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
16 hours ago

1

$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
15 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Blenzus is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.

from sklearn.pipeline import Pipeline

from sklearn.externals import joblib



pipeline = Pipeline([

                ('normalization', MinMaxScaler()),

                ('classifier', RandomForestClassifier())

            ])



joblib.dump(pipeline, 'transform_predict.joblib')

You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:

 pipeline = load('transform_predict.joblib') 

 predictions = pipeline.predict(new_data)

edited 15 hours ago

answered 15 hours ago

Dan Carter

7751218

1

$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
15 hours ago

$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
15 hours ago

1

$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
14 hours ago

add a comment |

Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.

from sklearn.pipeline import Pipeline

from sklearn.externals import joblib



pipeline = Pipeline([

                ('normalization', MinMaxScaler()),

                ('classifier', RandomForestClassifier())

            ])



joblib.dump(pipeline, 'transform_predict.joblib')

You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:

 pipeline = load('transform_predict.joblib') 

 predictions = pipeline.predict(new_data)

edited 15 hours ago

answered 15 hours ago

Dan Carter

7751218

1

$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
15 hours ago

$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
15 hours ago

1

$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
14 hours ago

add a comment |

Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.

from sklearn.pipeline import Pipeline

from sklearn.externals import joblib



pipeline = Pipeline([

                ('normalization', MinMaxScaler()),

                ('classifier', RandomForestClassifier())

            ])



joblib.dump(pipeline, 'transform_predict.joblib')

You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:

 pipeline = load('transform_predict.joblib') 

 predictions = pipeline.predict(new_data)

edited 15 hours ago

answered 15 hours ago

Dan Carter

7751218

Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.

from sklearn.pipeline import Pipeline

from sklearn.externals import joblib



pipeline = Pipeline([

                ('normalization', MinMaxScaler()),

                ('classifier', RandomForestClassifier())

            ])



joblib.dump(pipeline, 'transform_predict.joblib')

You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:

 pipeline = load('transform_predict.joblib') 

 predictions = pipeline.predict(new_data)

edited 15 hours ago

answered 15 hours ago

Dan Carter

7751218

edited 15 hours ago

answered 15 hours ago

Dan Carter

7751218

answered 15 hours ago

Dan Carter

7751218

answered 15 hours ago

Dan Carter

7751218

1

$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
15 hours ago

$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
15 hours ago

1

$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
14 hours ago

add a comment |

1

$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
15 hours ago

$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
15 hours ago

1

$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
14 hours ago

Thanks, this is what i was looking for

– Blenzus
15 hours ago

Does this apply to dummy variables?

– Blenzus
15 hours ago

If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.

– Dan Carter
14 hours ago

add a comment |

You need to save minmaxscaler (along with model). In Flask app, you can :

Load scaler from file

Use this instance of scaler for scaling input values

#While training

from sklearn.externals import joblib

scaler_filename = "saved_scaler"

joblib.dump(scaler, scaler_filename)

In Flask App

scaler_filename = "saved_scaler"    

scaler = joblib.load(scaler_filename)

answered 16 hours ago

Shamit Verma

1,009210

$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
16 hours ago

1

$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
15 hours ago

add a comment |

You need to save minmaxscaler (along with model). In Flask app, you can :

Load scaler from file

Use this instance of scaler for scaling input values

#While training

from sklearn.externals import joblib

scaler_filename = "saved_scaler"

joblib.dump(scaler, scaler_filename)

In Flask App

scaler_filename = "saved_scaler"    

scaler = joblib.load(scaler_filename)

answered 16 hours ago

Shamit Verma

1,009210

$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
16 hours ago

1

$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
15 hours ago

add a comment |

You need to save minmaxscaler (along with model). In Flask app, you can :

Load scaler from file

Use this instance of scaler for scaling input values

#While training

from sklearn.externals import joblib

scaler_filename = "saved_scaler"

joblib.dump(scaler, scaler_filename)

In Flask App

scaler_filename = "saved_scaler"    

scaler = joblib.load(scaler_filename)

answered 16 hours ago

Shamit Verma

1,009210

You need to save minmaxscaler (along with model). In Flask app, you can :

Load scaler from file

Use this instance of scaler for scaling input values

#While training

from sklearn.externals import joblib

scaler_filename = "saved_scaler"

joblib.dump(scaler, scaler_filename)

In Flask App

scaler_filename = "saved_scaler"    

scaler = joblib.load(scaler_filename)

answered 16 hours ago

Shamit Verma

1,009210

answered 16 hours ago

Shamit Verma

1,009210

answered 16 hours ago

Shamit Verma

1,009210

answered 16 hours ago

Shamit Verma

1,009210

$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
16 hours ago

1

$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
15 hours ago

add a comment |

$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
16 hours ago

1

$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
15 hours ago

Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?

– Blenzus
16 hours ago

You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…

– Shamit Verma
15 hours ago

add a comment |

Blenzus is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Blenzus is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hcfyk

How to correctly apply the same data transformation , used on the training dataset , on real data in a...

2 Answers
2

In Flask App

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

In Flask App

In Flask App

In Flask App

In Flask App

Post as a guest

Popular posts from this blog

Bruad Bilen | Luke uk diar | NawigatsjuunCommonskategorii: BruadCommonskategorii: RunstükenWikiquote: Bruad

Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

How to correctly apply the same data transformation , used on the training dataset , on real data in a...

2 Answers 2

In Flask App

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

In Flask App

In Flask App

In Flask App

In Flask App

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Bruad Bilen | Luke uk diar | NawigatsjuunCommonskategorii: BruadCommonskategorii: RunstükenWikiquote: Bruad

Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

2 Answers
2

2 Answers
2

2 Answers
2