retrieve food groups from food item list Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the best way to propose an item from a set based on previous choices?How to create an array from the list of arrays in pythonClassify sentences containing typos into groupsVisualizing item similaritiesPython - Get FP/TP from Confusion Matrix using a ListExtracting sections from document based on list of keywords - Pythonout of memory error when consrtucting 2d list from 2 numpy arraysHow to convert nested list into a single list in python?unsupported operand type(s) for -: 'list' and 'list' using pythonPython list formatting
Monty Hall Problem-Probability Paradox
What to do with repeated rejections for phd position
Why are vacuum tubes still used in amateur radios?
Find Maximum of any discrete function (not necessarily a PDF)
Creating a body for the spirit of a magic item?
Why do early math courses focus on the cross sections of a cone and not on other 3D objects?
Why datecode is SO IMPORTANT to chip manufacturers?
Connecting Mac Book Pro 2017 to 2 Projectors via USB C
Should a wizard buy fine inks every time he want to copy spells into his spellbook?
Why in helicopter autorotation phase the opposing torque is eliminated?
One-one communication
Asymptotics question
RSA find public exponent
The Nth Gryphon Number
If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?
Trademark violation for app?
Special flights
BITCOIN: on a chart what does it mean for the USD price to be higher then marketcap?
How many time has Arya actually used Needle?
How often does castling occur in grandmaster games?
Why is a lens darker than other ones when applying the same settings?
How many morphisms from 1 to 1+1 can there be?
I can't update due to The repository 'http://download.opensuse.org/repositories/home:/strycore/xUbuntu_16.04 ./ Release' is not signed
Ore hitori de wa kesshite miru koto no deki nai keshiki; It's a view I could never see on my own
retrieve food groups from food item list
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat is the best way to propose an item from a set based on previous choices?How to create an array from the list of arrays in pythonClassify sentences containing typos into groupsVisualizing item similaritiesPython - Get FP/TP from Confusion Matrix using a ListExtracting sections from document based on list of keywords - Pythonout of memory error when consrtucting 2d list from 2 numpy arraysHow to convert nested list into a single list in python?unsupported operand type(s) for -: 'list' and 'list' using pythonPython list formatting
$begingroup$
I have a dataframe of food items as follows: I have to create a food_group list that gives the food group it belongs to, for-example all type of yogurts should be in one group called yogurt.
I used a snippet to take the first segment of the comma separated name, but I do not get the result like putting all yogurts in one group
food_group_0 = [i.split(',') for i in data['name']]
food_group = [item[0] for item in food_group_0]
#To count how many of each entry there are in the list you can use the Counter class in the collections module:
from collections import Counter
c = Counter(food_group)
print(c)
the dataframe
0 4-Grain Flakes
1 4-Grain Flakes, Gluten Free
2 4-Grain Flakes, Riihikosken Vehnämylly
3 Almond
4 Almond Drink, Sweetened, Alrpo
5 Almond Drink, Unsweetened, Alrpo
6 Amaranth Flakes
7 Anchovy
8 Apple, Average, With Skin
9 Apple, Domestic, Without Skin
10 Apple, Domestic, With Skin
11 Apple, Dried
12 Apple, Imported, Without Skin
13 Apple, Imported, With Skin
14 Apple Chips
15 Apple Crisp Delight, Apple, Oat Flakes
16 Apple Jam
17 Apple Juice, Unsweetened, Vitamin C
18 Apple Kissel, Apple Soup, Dried Apples
19 Apple Kissel, Apple Soup, Fresh Apples
20 Apple Pie, Basic Sweet Dough, Gluten-Free, Con...
21 Apple Pie, Basic Sweet Dough, Low-Fat Milk
22 Apple Pie, Basic Sweet Dough, Naturally Gluten...
23 Apple Pie, Basic Sweet Dough, Whole Milk
24 Apple Pie, Shortbread Crust
25 Apple Pie, Shortbread Crust, Gluten-Free, Cont...
26 Apple Pie, Shortbread Crust, Naturally Gluten-...
27 Apple Pie, Shortbread Crust With Sour Milk
28 Apple Pie, Soft, Low-Fat Milk
29 Apple Pie With Quark Filling, Shortbread Crust
...
4068 Yoghurt, Plain, A+, Fat 2.5%, 1 Ug Vitamin D, ...
4069 Yoghurt, Plain, A+, Fat 2.5%, Lactose-Free, 1 ...
4070 Yoghurt, Plain, A+, Fat 4%, 1 Ug Vitamin D, La...
4071 Yoghurt, Plain, A+, Fatfree, 1 Ug Vitamin D, L...
4072 Yoghurt, Plain, A+ Greek, 2 % Fat, Lactose-Fre...
4073 Yoghurt, Plain, Ab, 0.2% Fat, Probiotics
4074 Yoghurt, Plain, Ab, 2.5% Fat, Probiotics
4075 Yoghurt, Plain, Activia, 3.4% Fat
4076 Yoghurt, Plain, Arla Protein, 1% Fat, Lactose-...
4077 Yoghurt, Plain, Bulgarian, 9% Fat
4078 Yoghurt, Plain, Fat-Free
4079 Yoghurt, Plain, Fat-Free, Lactose-Free, 1 Ug V...
4080 Yoghurt, Plain, Fat-Free, Low-Lactose, 0.5 Ug ...
4081 Yoghurt, Plain, Greek, 7% Fat, Lactose-Free
4082 Yoghurt, Plain, Organic, 3% Fat
4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-...
4084 Yoghurt, Turkish/Greek, 10% Fat
4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free
4086 Yoghurt Sauce
4087 Yoghurt With Jam, Fat-Free
4088 Yoghurt With Muesli, A+, Fat 3.5%, Low-Lactose
4089 Yoghurt With Quark, Flavoured, Arla, 1.4% Fat,...
4090 Yoghurt With Quark, Flavoured, Luonto+, 1.2% F...
4091 Yoghurt With Quark, Flavoured, Valio, 1.7% Fat...
4092 Zander, Pike-Perch
4093 Zucchini, Boiled Without Salt
4094 Zucchini, Summer Squash
4095 Zucchini Filled With Minced Meat
4096 Zucchini Filled With Soya And Rice
4097 Zucchini Filled With Vegetables
python
$endgroup$
add a comment |
$begingroup$
I have a dataframe of food items as follows: I have to create a food_group list that gives the food group it belongs to, for-example all type of yogurts should be in one group called yogurt.
I used a snippet to take the first segment of the comma separated name, but I do not get the result like putting all yogurts in one group
food_group_0 = [i.split(',') for i in data['name']]
food_group = [item[0] for item in food_group_0]
#To count how many of each entry there are in the list you can use the Counter class in the collections module:
from collections import Counter
c = Counter(food_group)
print(c)
the dataframe
0 4-Grain Flakes
1 4-Grain Flakes, Gluten Free
2 4-Grain Flakes, Riihikosken Vehnämylly
3 Almond
4 Almond Drink, Sweetened, Alrpo
5 Almond Drink, Unsweetened, Alrpo
6 Amaranth Flakes
7 Anchovy
8 Apple, Average, With Skin
9 Apple, Domestic, Without Skin
10 Apple, Domestic, With Skin
11 Apple, Dried
12 Apple, Imported, Without Skin
13 Apple, Imported, With Skin
14 Apple Chips
15 Apple Crisp Delight, Apple, Oat Flakes
16 Apple Jam
17 Apple Juice, Unsweetened, Vitamin C
18 Apple Kissel, Apple Soup, Dried Apples
19 Apple Kissel, Apple Soup, Fresh Apples
20 Apple Pie, Basic Sweet Dough, Gluten-Free, Con...
21 Apple Pie, Basic Sweet Dough, Low-Fat Milk
22 Apple Pie, Basic Sweet Dough, Naturally Gluten...
23 Apple Pie, Basic Sweet Dough, Whole Milk
24 Apple Pie, Shortbread Crust
25 Apple Pie, Shortbread Crust, Gluten-Free, Cont...
26 Apple Pie, Shortbread Crust, Naturally Gluten-...
27 Apple Pie, Shortbread Crust With Sour Milk
28 Apple Pie, Soft, Low-Fat Milk
29 Apple Pie With Quark Filling, Shortbread Crust
...
4068 Yoghurt, Plain, A+, Fat 2.5%, 1 Ug Vitamin D, ...
4069 Yoghurt, Plain, A+, Fat 2.5%, Lactose-Free, 1 ...
4070 Yoghurt, Plain, A+, Fat 4%, 1 Ug Vitamin D, La...
4071 Yoghurt, Plain, A+, Fatfree, 1 Ug Vitamin D, L...
4072 Yoghurt, Plain, A+ Greek, 2 % Fat, Lactose-Fre...
4073 Yoghurt, Plain, Ab, 0.2% Fat, Probiotics
4074 Yoghurt, Plain, Ab, 2.5% Fat, Probiotics
4075 Yoghurt, Plain, Activia, 3.4% Fat
4076 Yoghurt, Plain, Arla Protein, 1% Fat, Lactose-...
4077 Yoghurt, Plain, Bulgarian, 9% Fat
4078 Yoghurt, Plain, Fat-Free
4079 Yoghurt, Plain, Fat-Free, Lactose-Free, 1 Ug V...
4080 Yoghurt, Plain, Fat-Free, Low-Lactose, 0.5 Ug ...
4081 Yoghurt, Plain, Greek, 7% Fat, Lactose-Free
4082 Yoghurt, Plain, Organic, 3% Fat
4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-...
4084 Yoghurt, Turkish/Greek, 10% Fat
4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free
4086 Yoghurt Sauce
4087 Yoghurt With Jam, Fat-Free
4088 Yoghurt With Muesli, A+, Fat 3.5%, Low-Lactose
4089 Yoghurt With Quark, Flavoured, Arla, 1.4% Fat,...
4090 Yoghurt With Quark, Flavoured, Luonto+, 1.2% F...
4091 Yoghurt With Quark, Flavoured, Valio, 1.7% Fat...
4092 Zander, Pike-Perch
4093 Zucchini, Boiled Without Salt
4094 Zucchini, Summer Squash
4095 Zucchini Filled With Minced Meat
4096 Zucchini Filled With Soya And Rice
4097 Zucchini Filled With Vegetables
python
$endgroup$
$begingroup$
I can not just extract the first word because there will be complications like I will get 4-Grain instead of 4-Grain Flakes for the first item in food list
$endgroup$
– KHAN irfan
55 mins ago
$begingroup$
Are you able to share the data? And why doesn't splitting on the first comma,
give the result you expect? It looks like it would work, according to you example data. Perhaps, like in your other question, you could create a multi-index.Yogurt
would be the first level, thenPlain
and e.g.Flavoured
would be the second level.
$endgroup$
– n1k31t4
52 mins ago
$begingroup$
@n1k31t4 but 4-Grain would be first level and Grain would be second level. Yes I can share the data
$endgroup$
– KHAN irfan
39 mins ago
add a comment |
$begingroup$
I have a dataframe of food items as follows: I have to create a food_group list that gives the food group it belongs to, for-example all type of yogurts should be in one group called yogurt.
I used a snippet to take the first segment of the comma separated name, but I do not get the result like putting all yogurts in one group
food_group_0 = [i.split(',') for i in data['name']]
food_group = [item[0] for item in food_group_0]
#To count how many of each entry there are in the list you can use the Counter class in the collections module:
from collections import Counter
c = Counter(food_group)
print(c)
the dataframe
0 4-Grain Flakes
1 4-Grain Flakes, Gluten Free
2 4-Grain Flakes, Riihikosken Vehnämylly
3 Almond
4 Almond Drink, Sweetened, Alrpo
5 Almond Drink, Unsweetened, Alrpo
6 Amaranth Flakes
7 Anchovy
8 Apple, Average, With Skin
9 Apple, Domestic, Without Skin
10 Apple, Domestic, With Skin
11 Apple, Dried
12 Apple, Imported, Without Skin
13 Apple, Imported, With Skin
14 Apple Chips
15 Apple Crisp Delight, Apple, Oat Flakes
16 Apple Jam
17 Apple Juice, Unsweetened, Vitamin C
18 Apple Kissel, Apple Soup, Dried Apples
19 Apple Kissel, Apple Soup, Fresh Apples
20 Apple Pie, Basic Sweet Dough, Gluten-Free, Con...
21 Apple Pie, Basic Sweet Dough, Low-Fat Milk
22 Apple Pie, Basic Sweet Dough, Naturally Gluten...
23 Apple Pie, Basic Sweet Dough, Whole Milk
24 Apple Pie, Shortbread Crust
25 Apple Pie, Shortbread Crust, Gluten-Free, Cont...
26 Apple Pie, Shortbread Crust, Naturally Gluten-...
27 Apple Pie, Shortbread Crust With Sour Milk
28 Apple Pie, Soft, Low-Fat Milk
29 Apple Pie With Quark Filling, Shortbread Crust
...
4068 Yoghurt, Plain, A+, Fat 2.5%, 1 Ug Vitamin D, ...
4069 Yoghurt, Plain, A+, Fat 2.5%, Lactose-Free, 1 ...
4070 Yoghurt, Plain, A+, Fat 4%, 1 Ug Vitamin D, La...
4071 Yoghurt, Plain, A+, Fatfree, 1 Ug Vitamin D, L...
4072 Yoghurt, Plain, A+ Greek, 2 % Fat, Lactose-Fre...
4073 Yoghurt, Plain, Ab, 0.2% Fat, Probiotics
4074 Yoghurt, Plain, Ab, 2.5% Fat, Probiotics
4075 Yoghurt, Plain, Activia, 3.4% Fat
4076 Yoghurt, Plain, Arla Protein, 1% Fat, Lactose-...
4077 Yoghurt, Plain, Bulgarian, 9% Fat
4078 Yoghurt, Plain, Fat-Free
4079 Yoghurt, Plain, Fat-Free, Lactose-Free, 1 Ug V...
4080 Yoghurt, Plain, Fat-Free, Low-Lactose, 0.5 Ug ...
4081 Yoghurt, Plain, Greek, 7% Fat, Lactose-Free
4082 Yoghurt, Plain, Organic, 3% Fat
4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-...
4084 Yoghurt, Turkish/Greek, 10% Fat
4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free
4086 Yoghurt Sauce
4087 Yoghurt With Jam, Fat-Free
4088 Yoghurt With Muesli, A+, Fat 3.5%, Low-Lactose
4089 Yoghurt With Quark, Flavoured, Arla, 1.4% Fat,...
4090 Yoghurt With Quark, Flavoured, Luonto+, 1.2% F...
4091 Yoghurt With Quark, Flavoured, Valio, 1.7% Fat...
4092 Zander, Pike-Perch
4093 Zucchini, Boiled Without Salt
4094 Zucchini, Summer Squash
4095 Zucchini Filled With Minced Meat
4096 Zucchini Filled With Soya And Rice
4097 Zucchini Filled With Vegetables
python
$endgroup$
I have a dataframe of food items as follows: I have to create a food_group list that gives the food group it belongs to, for-example all type of yogurts should be in one group called yogurt.
I used a snippet to take the first segment of the comma separated name, but I do not get the result like putting all yogurts in one group
food_group_0 = [i.split(',') for i in data['name']]
food_group = [item[0] for item in food_group_0]
#To count how many of each entry there are in the list you can use the Counter class in the collections module:
from collections import Counter
c = Counter(food_group)
print(c)
the dataframe
0 4-Grain Flakes
1 4-Grain Flakes, Gluten Free
2 4-Grain Flakes, Riihikosken Vehnämylly
3 Almond
4 Almond Drink, Sweetened, Alrpo
5 Almond Drink, Unsweetened, Alrpo
6 Amaranth Flakes
7 Anchovy
8 Apple, Average, With Skin
9 Apple, Domestic, Without Skin
10 Apple, Domestic, With Skin
11 Apple, Dried
12 Apple, Imported, Without Skin
13 Apple, Imported, With Skin
14 Apple Chips
15 Apple Crisp Delight, Apple, Oat Flakes
16 Apple Jam
17 Apple Juice, Unsweetened, Vitamin C
18 Apple Kissel, Apple Soup, Dried Apples
19 Apple Kissel, Apple Soup, Fresh Apples
20 Apple Pie, Basic Sweet Dough, Gluten-Free, Con...
21 Apple Pie, Basic Sweet Dough, Low-Fat Milk
22 Apple Pie, Basic Sweet Dough, Naturally Gluten...
23 Apple Pie, Basic Sweet Dough, Whole Milk
24 Apple Pie, Shortbread Crust
25 Apple Pie, Shortbread Crust, Gluten-Free, Cont...
26 Apple Pie, Shortbread Crust, Naturally Gluten-...
27 Apple Pie, Shortbread Crust With Sour Milk
28 Apple Pie, Soft, Low-Fat Milk
29 Apple Pie With Quark Filling, Shortbread Crust
...
4068 Yoghurt, Plain, A+, Fat 2.5%, 1 Ug Vitamin D, ...
4069 Yoghurt, Plain, A+, Fat 2.5%, Lactose-Free, 1 ...
4070 Yoghurt, Plain, A+, Fat 4%, 1 Ug Vitamin D, La...
4071 Yoghurt, Plain, A+, Fatfree, 1 Ug Vitamin D, L...
4072 Yoghurt, Plain, A+ Greek, 2 % Fat, Lactose-Fre...
4073 Yoghurt, Plain, Ab, 0.2% Fat, Probiotics
4074 Yoghurt, Plain, Ab, 2.5% Fat, Probiotics
4075 Yoghurt, Plain, Activia, 3.4% Fat
4076 Yoghurt, Plain, Arla Protein, 1% Fat, Lactose-...
4077 Yoghurt, Plain, Bulgarian, 9% Fat
4078 Yoghurt, Plain, Fat-Free
4079 Yoghurt, Plain, Fat-Free, Lactose-Free, 1 Ug V...
4080 Yoghurt, Plain, Fat-Free, Low-Lactose, 0.5 Ug ...
4081 Yoghurt, Plain, Greek, 7% Fat, Lactose-Free
4082 Yoghurt, Plain, Organic, 3% Fat
4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-...
4084 Yoghurt, Turkish/Greek, 10% Fat
4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free
4086 Yoghurt Sauce
4087 Yoghurt With Jam, Fat-Free
4088 Yoghurt With Muesli, A+, Fat 3.5%, Low-Lactose
4089 Yoghurt With Quark, Flavoured, Arla, 1.4% Fat,...
4090 Yoghurt With Quark, Flavoured, Luonto+, 1.2% F...
4091 Yoghurt With Quark, Flavoured, Valio, 1.7% Fat...
4092 Zander, Pike-Perch
4093 Zucchini, Boiled Without Salt
4094 Zucchini, Summer Squash
4095 Zucchini Filled With Minced Meat
4096 Zucchini Filled With Soya And Rice
4097 Zucchini Filled With Vegetables
python
python
asked 58 mins ago
KHAN irfanKHAN irfan
10010
10010
$begingroup$
I can not just extract the first word because there will be complications like I will get 4-Grain instead of 4-Grain Flakes for the first item in food list
$endgroup$
– KHAN irfan
55 mins ago
$begingroup$
Are you able to share the data? And why doesn't splitting on the first comma,
give the result you expect? It looks like it would work, according to you example data. Perhaps, like in your other question, you could create a multi-index.Yogurt
would be the first level, thenPlain
and e.g.Flavoured
would be the second level.
$endgroup$
– n1k31t4
52 mins ago
$begingroup$
@n1k31t4 but 4-Grain would be first level and Grain would be second level. Yes I can share the data
$endgroup$
– KHAN irfan
39 mins ago
add a comment |
$begingroup$
I can not just extract the first word because there will be complications like I will get 4-Grain instead of 4-Grain Flakes for the first item in food list
$endgroup$
– KHAN irfan
55 mins ago
$begingroup$
Are you able to share the data? And why doesn't splitting on the first comma,
give the result you expect? It looks like it would work, according to you example data. Perhaps, like in your other question, you could create a multi-index.Yogurt
would be the first level, thenPlain
and e.g.Flavoured
would be the second level.
$endgroup$
– n1k31t4
52 mins ago
$begingroup$
@n1k31t4 but 4-Grain would be first level and Grain would be second level. Yes I can share the data
$endgroup$
– KHAN irfan
39 mins ago
$begingroup$
I can not just extract the first word because there will be complications like I will get 4-Grain instead of 4-Grain Flakes for the first item in food list
$endgroup$
– KHAN irfan
55 mins ago
$begingroup$
I can not just extract the first word because there will be complications like I will get 4-Grain instead of 4-Grain Flakes for the first item in food list
$endgroup$
– KHAN irfan
55 mins ago
$begingroup$
Are you able to share the data? And why doesn't splitting on the first comma
,
give the result you expect? It looks like it would work, according to you example data. Perhaps, like in your other question, you could create a multi-index. Yogurt
would be the first level, then Plain
and e.g. Flavoured
would be the second level.$endgroup$
– n1k31t4
52 mins ago
$begingroup$
Are you able to share the data? And why doesn't splitting on the first comma
,
give the result you expect? It looks like it would work, according to you example data. Perhaps, like in your other question, you could create a multi-index. Yogurt
would be the first level, then Plain
and e.g. Flavoured
would be the second level.$endgroup$
– n1k31t4
52 mins ago
$begingroup$
@n1k31t4 but 4-Grain would be first level and Grain would be second level. Yes I can share the data
$endgroup$
– KHAN irfan
39 mins ago
$begingroup$
@n1k31t4 but 4-Grain would be first level and Grain would be second level. Yes I can share the data
$endgroup$
– KHAN irfan
39 mins ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You can actually do the string-spitting and indexing on the columns themselves - no need to extract the column and do list comprehensions.
Below I take whatever is before the first comma and put it in a column called food_group
and then the first field after the same column and put it in a new column called sub_cat
-egory:
df["food_group"] = df.name.str.split(",").str[0]
df["sub_cat"] = df.name.str.split(",").str[1]
Here is example output for some Yogurt data:
id name food_group sub_cat
44 4082 Yoghurt, Plain, Organic, 3% Fat Yoghurt Plain
45 4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-... Yoghurt Plain
46 4084 Yoghurt, Turkish/Greek, 10% Fat Yoghurt Turkish/Greek
47 4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free Yoghurt Turkish/Greek
48 4086 Yoghurt Sauce Yoghurt Sauce NaN
Notice that any fields that are empty are filled with NaN
. This will happen, when your name
column only contains a single field (i.e. no commas).
EDIT
Here is the top of my dataframe, after the operation above:
In [13]: df.head(10)
Out[13]:
id name food_group sub_cat
0 0 4-Grain Flakes 4-Grain Flakes NaN
1 1 4-Grain Flakes, Gluten Free 4-Grain Flakes Gluten Free
2 2 4-Grain Flakes, Riihikosken Vehnämylly 4-Grain Flakes Riihikosken Vehnämylly
3 3 Almond Almond NaN
4 4 Almond Drink, Sweetened, Alrpo Almond Drink Sweetened
5 5 Almond Drink, Unsweetened, Alrpo Almond Drink Unsweetened
6 6 Amaranth Flakes Amaranth Flakes NaN
7 7 Anchovy Anchovy NaN
8 8 Apple, Average, With Skin Apple Average
9 9 Apple, Domestic, Without Skin Apple Domestic
You could continue to make a multi-index from these two new columns, but is might not be necessary - it depends on what you want to do afterwards with the data.
$endgroup$
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
@KHANirfan - I am splitting on the,
- meaning I do indeed get4-Grain Flakes
. See the top of my dataframe, added to my answer.
$endgroup$
– n1k31t4
20 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49641%2fretrieve-food-groups-from-food-item-list%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can actually do the string-spitting and indexing on the columns themselves - no need to extract the column and do list comprehensions.
Below I take whatever is before the first comma and put it in a column called food_group
and then the first field after the same column and put it in a new column called sub_cat
-egory:
df["food_group"] = df.name.str.split(",").str[0]
df["sub_cat"] = df.name.str.split(",").str[1]
Here is example output for some Yogurt data:
id name food_group sub_cat
44 4082 Yoghurt, Plain, Organic, 3% Fat Yoghurt Plain
45 4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-... Yoghurt Plain
46 4084 Yoghurt, Turkish/Greek, 10% Fat Yoghurt Turkish/Greek
47 4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free Yoghurt Turkish/Greek
48 4086 Yoghurt Sauce Yoghurt Sauce NaN
Notice that any fields that are empty are filled with NaN
. This will happen, when your name
column only contains a single field (i.e. no commas).
EDIT
Here is the top of my dataframe, after the operation above:
In [13]: df.head(10)
Out[13]:
id name food_group sub_cat
0 0 4-Grain Flakes 4-Grain Flakes NaN
1 1 4-Grain Flakes, Gluten Free 4-Grain Flakes Gluten Free
2 2 4-Grain Flakes, Riihikosken Vehnämylly 4-Grain Flakes Riihikosken Vehnämylly
3 3 Almond Almond NaN
4 4 Almond Drink, Sweetened, Alrpo Almond Drink Sweetened
5 5 Almond Drink, Unsweetened, Alrpo Almond Drink Unsweetened
6 6 Amaranth Flakes Amaranth Flakes NaN
7 7 Anchovy Anchovy NaN
8 8 Apple, Average, With Skin Apple Average
9 9 Apple, Domestic, Without Skin Apple Domestic
You could continue to make a multi-index from these two new columns, but is might not be necessary - it depends on what you want to do afterwards with the data.
$endgroup$
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
@KHANirfan - I am splitting on the,
- meaning I do indeed get4-Grain Flakes
. See the top of my dataframe, added to my answer.
$endgroup$
– n1k31t4
20 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
add a comment |
$begingroup$
You can actually do the string-spitting and indexing on the columns themselves - no need to extract the column and do list comprehensions.
Below I take whatever is before the first comma and put it in a column called food_group
and then the first field after the same column and put it in a new column called sub_cat
-egory:
df["food_group"] = df.name.str.split(",").str[0]
df["sub_cat"] = df.name.str.split(",").str[1]
Here is example output for some Yogurt data:
id name food_group sub_cat
44 4082 Yoghurt, Plain, Organic, 3% Fat Yoghurt Plain
45 4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-... Yoghurt Plain
46 4084 Yoghurt, Turkish/Greek, 10% Fat Yoghurt Turkish/Greek
47 4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free Yoghurt Turkish/Greek
48 4086 Yoghurt Sauce Yoghurt Sauce NaN
Notice that any fields that are empty are filled with NaN
. This will happen, when your name
column only contains a single field (i.e. no commas).
EDIT
Here is the top of my dataframe, after the operation above:
In [13]: df.head(10)
Out[13]:
id name food_group sub_cat
0 0 4-Grain Flakes 4-Grain Flakes NaN
1 1 4-Grain Flakes, Gluten Free 4-Grain Flakes Gluten Free
2 2 4-Grain Flakes, Riihikosken Vehnämylly 4-Grain Flakes Riihikosken Vehnämylly
3 3 Almond Almond NaN
4 4 Almond Drink, Sweetened, Alrpo Almond Drink Sweetened
5 5 Almond Drink, Unsweetened, Alrpo Almond Drink Unsweetened
6 6 Amaranth Flakes Amaranth Flakes NaN
7 7 Anchovy Anchovy NaN
8 8 Apple, Average, With Skin Apple Average
9 9 Apple, Domestic, Without Skin Apple Domestic
You could continue to make a multi-index from these two new columns, but is might not be necessary - it depends on what you want to do afterwards with the data.
$endgroup$
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
@KHANirfan - I am splitting on the,
- meaning I do indeed get4-Grain Flakes
. See the top of my dataframe, added to my answer.
$endgroup$
– n1k31t4
20 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
add a comment |
$begingroup$
You can actually do the string-spitting and indexing on the columns themselves - no need to extract the column and do list comprehensions.
Below I take whatever is before the first comma and put it in a column called food_group
and then the first field after the same column and put it in a new column called sub_cat
-egory:
df["food_group"] = df.name.str.split(",").str[0]
df["sub_cat"] = df.name.str.split(",").str[1]
Here is example output for some Yogurt data:
id name food_group sub_cat
44 4082 Yoghurt, Plain, Organic, 3% Fat Yoghurt Plain
45 4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-... Yoghurt Plain
46 4084 Yoghurt, Turkish/Greek, 10% Fat Yoghurt Turkish/Greek
47 4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free Yoghurt Turkish/Greek
48 4086 Yoghurt Sauce Yoghurt Sauce NaN
Notice that any fields that are empty are filled with NaN
. This will happen, when your name
column only contains a single field (i.e. no commas).
EDIT
Here is the top of my dataframe, after the operation above:
In [13]: df.head(10)
Out[13]:
id name food_group sub_cat
0 0 4-Grain Flakes 4-Grain Flakes NaN
1 1 4-Grain Flakes, Gluten Free 4-Grain Flakes Gluten Free
2 2 4-Grain Flakes, Riihikosken Vehnämylly 4-Grain Flakes Riihikosken Vehnämylly
3 3 Almond Almond NaN
4 4 Almond Drink, Sweetened, Alrpo Almond Drink Sweetened
5 5 Almond Drink, Unsweetened, Alrpo Almond Drink Unsweetened
6 6 Amaranth Flakes Amaranth Flakes NaN
7 7 Anchovy Anchovy NaN
8 8 Apple, Average, With Skin Apple Average
9 9 Apple, Domestic, Without Skin Apple Domestic
You could continue to make a multi-index from these two new columns, but is might not be necessary - it depends on what you want to do afterwards with the data.
$endgroup$
You can actually do the string-spitting and indexing on the columns themselves - no need to extract the column and do list comprehensions.
Below I take whatever is before the first comma and put it in a column called food_group
and then the first field after the same column and put it in a new column called sub_cat
-egory:
df["food_group"] = df.name.str.split(",").str[0]
df["sub_cat"] = df.name.str.split(",").str[1]
Here is example output for some Yogurt data:
id name food_group sub_cat
44 4082 Yoghurt, Plain, Organic, 3% Fat Yoghurt Plain
45 4083 Yoghurt, Plain, Pirkka Reducol, 2.5% Fat, Low-... Yoghurt Plain
46 4084 Yoghurt, Turkish/Greek, 10% Fat Yoghurt Turkish/Greek
47 4085 Yoghurt, Turkish/Greek, 10% Fat, Lactose-Free Yoghurt Turkish/Greek
48 4086 Yoghurt Sauce Yoghurt Sauce NaN
Notice that any fields that are empty are filled with NaN
. This will happen, when your name
column only contains a single field (i.e. no commas).
EDIT
Here is the top of my dataframe, after the operation above:
In [13]: df.head(10)
Out[13]:
id name food_group sub_cat
0 0 4-Grain Flakes 4-Grain Flakes NaN
1 1 4-Grain Flakes, Gluten Free 4-Grain Flakes Gluten Free
2 2 4-Grain Flakes, Riihikosken Vehnämylly 4-Grain Flakes Riihikosken Vehnämylly
3 3 Almond Almond NaN
4 4 Almond Drink, Sweetened, Alrpo Almond Drink Sweetened
5 5 Almond Drink, Unsweetened, Alrpo Almond Drink Unsweetened
6 6 Amaranth Flakes Amaranth Flakes NaN
7 7 Anchovy Anchovy NaN
8 8 Apple, Average, With Skin Apple Average
9 9 Apple, Domestic, Without Skin Apple Domestic
You could continue to make a multi-index from these two new columns, but is might not be necessary - it depends on what you want to do afterwards with the data.
edited 19 mins ago
answered 33 mins ago
n1k31t4n1k31t4
6,6062421
6,6062421
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
@KHANirfan - I am splitting on the,
- meaning I do indeed get4-Grain Flakes
. See the top of my dataframe, added to my answer.
$endgroup$
– n1k31t4
20 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
add a comment |
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
@KHANirfan - I am splitting on the,
- meaning I do indeed get4-Grain Flakes
. See the top of my dataframe, added to my answer.
$endgroup$
– n1k31t4
20 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
the first name is 4-Grain Flakes, I will only get 4-Grain, how can I handle it?
$endgroup$
– KHAN irfan
28 mins ago
$begingroup$
@KHANirfan - I am splitting on the
,
- meaning I do indeed get 4-Grain Flakes
. See the top of my dataframe, added to my answer.$endgroup$
– n1k31t4
20 mins ago
$begingroup$
@KHANirfan - I am splitting on the
,
- meaning I do indeed get 4-Grain Flakes
. See the top of my dataframe, added to my answer.$endgroup$
– n1k31t4
20 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yoghurt and Yoghurt With Quark will be a separate food catagory?
$endgroup$
– KHAN irfan
18 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Yes. Everything to the left of the first comma is taken. If you want to be more specific with you categories, you probably can't do it in a straightforward manner, as I have above. If each row might have its own rules, you will have to probably fix the strange cases by hand, or generate a new input file that reflects your ideas about what is a food category.
$endgroup$
– n1k31t4
11 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
$begingroup$
Thanks for your input. Please try my snippet, it does the same thing. :)
$endgroup$
– KHAN irfan
7 mins ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49641%2fretrieve-food-groups-from-food-item-list%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I can not just extract the first word because there will be complications like I will get 4-Grain instead of 4-Grain Flakes for the first item in food list
$endgroup$
– KHAN irfan
55 mins ago
$begingroup$
Are you able to share the data? And why doesn't splitting on the first comma
,
give the result you expect? It looks like it would work, according to you example data. Perhaps, like in your other question, you could create a multi-index.Yogurt
would be the first level, thenPlain
and e.g.Flavoured
would be the second level.$endgroup$
– n1k31t4
52 mins ago
$begingroup$
@n1k31t4 but 4-Grain would be first level and Grain would be second level. Yes I can share the data
$endgroup$
– KHAN irfan
39 mins ago