How to delete every two lines after 3rd lines in a file contains very large number of lines? [duplicate]





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







5
















This question already has an answer here:




  • How to print lines number 15 and 25 out of each 50 lines?

    4 answers






Like

If I have :



1st line (keep)  
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)


etc....










share|improve this question















marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix Apr 1 at 6:34


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • 1





    increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)

    – ChuckCottrill
    Mar 30 at 4:04











  • can you please clarify more,

    – Jaguar Jom
    Mar 30 at 4:10






  • 1





    the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n'

    – Sundeep
    Mar 30 at 7:27













  • also, the sed version above might be faster than the awk one for large files

    – Sundeep
    Mar 30 at 7:47


















5
















This question already has an answer here:




  • How to print lines number 15 and 25 out of each 50 lines?

    4 answers






Like

If I have :



1st line (keep)  
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)


etc....










share|improve this question















marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix Apr 1 at 6:34


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • 1





    increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)

    – ChuckCottrill
    Mar 30 at 4:04











  • can you please clarify more,

    – Jaguar Jom
    Mar 30 at 4:10






  • 1





    the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n'

    – Sundeep
    Mar 30 at 7:27













  • also, the sed version above might be faster than the awk one for large files

    – Sundeep
    Mar 30 at 7:47














5












5








5


0







This question already has an answer here:




  • How to print lines number 15 and 25 out of each 50 lines?

    4 answers






Like

If I have :



1st line (keep)  
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)


etc....










share|improve this question

















This question already has an answer here:




  • How to print lines number 15 and 25 out of each 50 lines?

    4 answers






Like

If I have :



1st line (keep)  
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)


etc....





This question already has an answer here:




  • How to print lines number 15 and 25 out of each 50 lines?

    4 answers








bash shell awk sed






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 31 at 4:25









Prvt_Yadv

3,16131330




3,16131330










asked Mar 30 at 3:57









Jaguar JomJaguar Jom

262




262




marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix Apr 1 at 6:34


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix Apr 1 at 6:34


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 1





    increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)

    – ChuckCottrill
    Mar 30 at 4:04











  • can you please clarify more,

    – Jaguar Jom
    Mar 30 at 4:10






  • 1





    the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n'

    – Sundeep
    Mar 30 at 7:27













  • also, the sed version above might be faster than the awk one for large files

    – Sundeep
    Mar 30 at 7:47














  • 1





    increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)

    – ChuckCottrill
    Mar 30 at 4:04











  • can you please clarify more,

    – Jaguar Jom
    Mar 30 at 4:10






  • 1





    the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n'

    – Sundeep
    Mar 30 at 7:27













  • also, the sed version above might be faster than the awk one for large files

    – Sundeep
    Mar 30 at 7:47








1




1





increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)

– ChuckCottrill
Mar 30 at 4:04





increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)

– ChuckCottrill
Mar 30 at 4:04













can you please clarify more,

– Jaguar Jom
Mar 30 at 4:10





can you please clarify more,

– Jaguar Jom
Mar 30 at 4:10




1




1





the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n'

– Sundeep
Mar 30 at 7:27







the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n'

– Sundeep
Mar 30 at 7:27















also, the sed version above might be faster than the awk one for large files

– Sundeep
Mar 30 at 7:47





also, the sed version above might be faster than the awk one for large files

– Sundeep
Mar 30 at 7:47










6 Answers
6






active

oldest

votes


















13














Try:



awk '(NR-1)%5<3' file


For example:



$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)


How it works



The command (NR-1)%5<3 tells awk to print any line for which (NR-1)%5<3 is true. In awk, NR is the line number with the first line counting as 1. For every five lines in the file, that statement will be true for the first three.






share|improve this answer


























  • Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

    – Jaguar Jom
    Apr 1 at 0:38











  • @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

    – John1024
    Apr 1 at 0:47













  • yes actually i got different result actually,

    – Jaguar Jom
    Apr 1 at 5:00











  • To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

    – John1024
    Apr 1 at 5:08











  • @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

    – John1024
    Apr 1 at 5:51



















6














A simple command is:



awk '{if((NR-1) % 5<=2){print $0}}' file


It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5 will give output like 0 1 2 3 4, and first 3 lines are less than equal to 2. So it will only print them.



I have file with contents:



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15


The output is:



1
2
3
6
7
8
11
12
13


Or as suggested in comments you can use:



awk '(NR - 1) % 5 <= 2' file





share|improve this answer





















  • 3





    Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

    – Kusalananda
    Mar 30 at 8:39













  • Thanks I didnt know it.

    – Prvt_Yadv
    Mar 30 at 9:33











  • awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

    – Jaguar Jom
    Apr 1 at 0:42





















5














Basically, you want something like 'Fizz-Buzz' in awk ...



awk '{ if (i++%5 < 3) print $0;}'


To show this works...



for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk '{ if (i++%5 < 3) print $0;}'


When your file is named, 'mybigfile.csv',



awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv





share|improve this answer
























  • You could use NR, or just rely on i defaulting to zero :-) (code golf)

    – ChuckCottrill
    Mar 30 at 4:38



















5














This can be solved using GNU sed:



sed '4~5,5~5d' file


Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew, after which it can be used as gsed. On Linux, GNU sed is the default.



This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d' fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.



The top-voted answer suggests using awk '(NR-1)%5<3'. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.






share|improve this answer



















  • 3





    +1 ... or 4~5{N;d;}

    – steeldriver
    Mar 30 at 15:03



















4














A generic solution for masking out a particular pattern of lines from a file:



#!/bin/sh

# The pattern is given on the command line.
pattern=$1

# The period is simply the length of the pattern.
period=${#pattern}

# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )

awk -v mask="$mask" -v period="$period" '
BEGIN { p = lshift(1, period-1) }
and(rshift(p, (FNR-1) % period), mask)'


This relies on awk implementing the non-standard functions and() (bitwise AND), rshift() and lshift() (bitwise right and left shift), which both GNU awk and some BSD implementations of awk does, but not mawk.



This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1 means "keep" and a 0 means "delete".



For example: The pattern of line that should be applied in your question is 11100, which means "for each set of five lines, keep the first three and delete the others".



Using 01001000 would delete all but the 2nd and 5th lines in every 8 lines.



The awk program could also be written without the BEGIN block as



and(lshift(1, (period-1) - (FNR-1) % period), mask)


Left-shifting 1 by (period-1) - (FNR-1) % period positions is the same as calculating 2 to that power, but I'm using lshift() since awk does its arithmetics using floating point operations rather than in exact integer arithmetics.



Since the code relies on the binary representation of the pattern, very long patterns may not work well.



Testing:



Removing the lines you want to remove:



$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)


Inverting the pattern:



$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)





share|improve this answer

































    1














    Tried with below command and it worked fine



    for((i=1;i<=20;i++)); do  j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done


    output



    1st line (keep)
    2nd line (keep)
    3rd line (keep)
    6th (keep)
    7nth (keep)
    8th lines (keep)
    11th (keep)
    12th (keep)
    13th (keep)





    share|improve this answer



















    • 1





      That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

      – Law29
      Mar 30 at 11:45


















    6 Answers
    6






    active

    oldest

    votes








    6 Answers
    6






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    13














    Try:



    awk '(NR-1)%5<3' file


    For example:



    $ awk '(NR-1)%5<3' file
    1st line (keep)
    2nd line (keep)
    3rd line (keep)
    6th (keep)
    7nth (keep)
    8th lines (keep)
    11th (keep)
    12th (keep)
    13th (keep)


    How it works



    The command (NR-1)%5<3 tells awk to print any line for which (NR-1)%5<3 is true. In awk, NR is the line number with the first line counting as 1. For every five lines in the file, that statement will be true for the first three.






    share|improve this answer


























    • Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

      – Jaguar Jom
      Apr 1 at 0:38











    • @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

      – John1024
      Apr 1 at 0:47













    • yes actually i got different result actually,

      – Jaguar Jom
      Apr 1 at 5:00











    • To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

      – John1024
      Apr 1 at 5:08











    • @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

      – John1024
      Apr 1 at 5:51
















    13














    Try:



    awk '(NR-1)%5<3' file


    For example:



    $ awk '(NR-1)%5<3' file
    1st line (keep)
    2nd line (keep)
    3rd line (keep)
    6th (keep)
    7nth (keep)
    8th lines (keep)
    11th (keep)
    12th (keep)
    13th (keep)


    How it works



    The command (NR-1)%5<3 tells awk to print any line for which (NR-1)%5<3 is true. In awk, NR is the line number with the first line counting as 1. For every five lines in the file, that statement will be true for the first three.






    share|improve this answer


























    • Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

      – Jaguar Jom
      Apr 1 at 0:38











    • @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

      – John1024
      Apr 1 at 0:47













    • yes actually i got different result actually,

      – Jaguar Jom
      Apr 1 at 5:00











    • To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

      – John1024
      Apr 1 at 5:08











    • @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

      – John1024
      Apr 1 at 5:51














    13












    13








    13







    Try:



    awk '(NR-1)%5<3' file


    For example:



    $ awk '(NR-1)%5<3' file
    1st line (keep)
    2nd line (keep)
    3rd line (keep)
    6th (keep)
    7nth (keep)
    8th lines (keep)
    11th (keep)
    12th (keep)
    13th (keep)


    How it works



    The command (NR-1)%5<3 tells awk to print any line for which (NR-1)%5<3 is true. In awk, NR is the line number with the first line counting as 1. For every five lines in the file, that statement will be true for the first three.






    share|improve this answer















    Try:



    awk '(NR-1)%5<3' file


    For example:



    $ awk '(NR-1)%5<3' file
    1st line (keep)
    2nd line (keep)
    3rd line (keep)
    6th (keep)
    7nth (keep)
    8th lines (keep)
    11th (keep)
    12th (keep)
    13th (keep)


    How it works



    The command (NR-1)%5<3 tells awk to print any line for which (NR-1)%5<3 is true. In awk, NR is the line number with the first line counting as 1. For every five lines in the file, that statement will be true for the first three.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 30 at 8:44









    Kusalananda

    140k17261435




    140k17261435










    answered Mar 30 at 4:38









    John1024John1024

    48.4k5113128




    48.4k5113128













    • Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

      – Jaguar Jom
      Apr 1 at 0:38











    • @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

      – John1024
      Apr 1 at 0:47













    • yes actually i got different result actually,

      – Jaguar Jom
      Apr 1 at 5:00











    • To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

      – John1024
      Apr 1 at 5:08











    • @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

      – John1024
      Apr 1 at 5:51



















    • Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

      – Jaguar Jom
      Apr 1 at 0:38











    • @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

      – John1024
      Apr 1 at 0:47













    • yes actually i got different result actually,

      – Jaguar Jom
      Apr 1 at 5:00











    • To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

      – John1024
      Apr 1 at 5:08











    • @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

      – John1024
      Apr 1 at 5:51

















    Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

    – Jaguar Jom
    Apr 1 at 0:38





    Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

    – Jaguar Jom
    Apr 1 at 0:38













    @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

    – John1024
    Apr 1 at 0:47







    @JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

    – John1024
    Apr 1 at 0:47















    yes actually i got different result actually,

    – Jaguar Jom
    Apr 1 at 5:00





    yes actually i got different result actually,

    – Jaguar Jom
    Apr 1 at 5:00













    To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

    – John1024
    Apr 1 at 5:08





    To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

    – John1024
    Apr 1 at 5:08













    @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

    – John1024
    Apr 1 at 5:51





    @JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

    – John1024
    Apr 1 at 5:51













    6














    A simple command is:



    awk '{if((NR-1) % 5<=2){print $0}}' file


    It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5 will give output like 0 1 2 3 4, and first 3 lines are less than equal to 2. So it will only print them.



    I have file with contents:



    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15


    The output is:



    1
    2
    3
    6
    7
    8
    11
    12
    13


    Or as suggested in comments you can use:



    awk '(NR - 1) % 5 <= 2' file





    share|improve this answer





















    • 3





      Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

      – Kusalananda
      Mar 30 at 8:39













    • Thanks I didnt know it.

      – Prvt_Yadv
      Mar 30 at 9:33











    • awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

      – Jaguar Jom
      Apr 1 at 0:42


















    6














    A simple command is:



    awk '{if((NR-1) % 5<=2){print $0}}' file


    It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5 will give output like 0 1 2 3 4, and first 3 lines are less than equal to 2. So it will only print them.



    I have file with contents:



    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15


    The output is:



    1
    2
    3
    6
    7
    8
    11
    12
    13


    Or as suggested in comments you can use:



    awk '(NR - 1) % 5 <= 2' file





    share|improve this answer





















    • 3





      Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

      – Kusalananda
      Mar 30 at 8:39













    • Thanks I didnt know it.

      – Prvt_Yadv
      Mar 30 at 9:33











    • awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

      – Jaguar Jom
      Apr 1 at 0:42
















    6












    6








    6







    A simple command is:



    awk '{if((NR-1) % 5<=2){print $0}}' file


    It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5 will give output like 0 1 2 3 4, and first 3 lines are less than equal to 2. So it will only print them.



    I have file with contents:



    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15


    The output is:



    1
    2
    3
    6
    7
    8
    11
    12
    13


    Or as suggested in comments you can use:



    awk '(NR - 1) % 5 <= 2' file





    share|improve this answer















    A simple command is:



    awk '{if((NR-1) % 5<=2){print $0}}' file


    It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5 will give output like 0 1 2 3 4, and first 3 lines are less than equal to 2. So it will only print them.



    I have file with contents:



    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15


    The output is:



    1
    2
    3
    6
    7
    8
    11
    12
    13


    Or as suggested in comments you can use:



    awk '(NR - 1) % 5 <= 2' file






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 30 at 9:42

























    answered Mar 30 at 4:36









    Prvt_YadvPrvt_Yadv

    3,16131330




    3,16131330








    • 3





      Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

      – Kusalananda
      Mar 30 at 8:39













    • Thanks I didnt know it.

      – Prvt_Yadv
      Mar 30 at 9:33











    • awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

      – Jaguar Jom
      Apr 1 at 0:42
















    • 3





      Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

      – Kusalananda
      Mar 30 at 8:39













    • Thanks I didnt know it.

      – Prvt_Yadv
      Mar 30 at 9:33











    • awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

      – Jaguar Jom
      Apr 1 at 0:42










    3




    3





    Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

    – Kusalananda
    Mar 30 at 8:39







    Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file

    – Kusalananda
    Mar 30 at 8:39















    Thanks I didnt know it.

    – Prvt_Yadv
    Mar 30 at 9:33





    Thanks I didnt know it.

    – Prvt_Yadv
    Mar 30 at 9:33













    awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

    – Jaguar Jom
    Apr 1 at 0:42







    awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file

    – Jaguar Jom
    Apr 1 at 0:42













    5














    Basically, you want something like 'Fizz-Buzz' in awk ...



    awk '{ if (i++%5 < 3) print $0;}'


    To show this works...



    for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
    awk '{ if (i++%5 < 3) print $0;}'


    When your file is named, 'mybigfile.csv',



    awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv





    share|improve this answer
























    • You could use NR, or just rely on i defaulting to zero :-) (code golf)

      – ChuckCottrill
      Mar 30 at 4:38
















    5














    Basically, you want something like 'Fizz-Buzz' in awk ...



    awk '{ if (i++%5 < 3) print $0;}'


    To show this works...



    for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
    awk '{ if (i++%5 < 3) print $0;}'


    When your file is named, 'mybigfile.csv',



    awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv





    share|improve this answer
























    • You could use NR, or just rely on i defaulting to zero :-) (code golf)

      – ChuckCottrill
      Mar 30 at 4:38














    5












    5








    5







    Basically, you want something like 'Fizz-Buzz' in awk ...



    awk '{ if (i++%5 < 3) print $0;}'


    To show this works...



    for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
    awk '{ if (i++%5 < 3) print $0;}'


    When your file is named, 'mybigfile.csv',



    awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv





    share|improve this answer













    Basically, you want something like 'Fizz-Buzz' in awk ...



    awk '{ if (i++%5 < 3) print $0;}'


    To show this works...



    for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
    awk '{ if (i++%5 < 3) print $0;}'


    When your file is named, 'mybigfile.csv',



    awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 30 at 4:37









    ChuckCottrillChuckCottrill

    732814




    732814













    • You could use NR, or just rely on i defaulting to zero :-) (code golf)

      – ChuckCottrill
      Mar 30 at 4:38



















    • You could use NR, or just rely on i defaulting to zero :-) (code golf)

      – ChuckCottrill
      Mar 30 at 4:38

















    You could use NR, or just rely on i defaulting to zero :-) (code golf)

    – ChuckCottrill
    Mar 30 at 4:38





    You could use NR, or just rely on i defaulting to zero :-) (code golf)

    – ChuckCottrill
    Mar 30 at 4:38











    5














    This can be solved using GNU sed:



    sed '4~5,5~5d' file


    Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew, after which it can be used as gsed. On Linux, GNU sed is the default.



    This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d' fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.



    The top-voted answer suggests using awk '(NR-1)%5<3'. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.






    share|improve this answer



















    • 3





      +1 ... or 4~5{N;d;}

      – steeldriver
      Mar 30 at 15:03
















    5














    This can be solved using GNU sed:



    sed '4~5,5~5d' file


    Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew, after which it can be used as gsed. On Linux, GNU sed is the default.



    This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d' fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.



    The top-voted answer suggests using awk '(NR-1)%5<3'. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.






    share|improve this answer



















    • 3





      +1 ... or 4~5{N;d;}

      – steeldriver
      Mar 30 at 15:03














    5












    5








    5







    This can be solved using GNU sed:



    sed '4~5,5~5d' file


    Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew, after which it can be used as gsed. On Linux, GNU sed is the default.



    This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d' fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.



    The top-voted answer suggests using awk '(NR-1)%5<3'. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.






    share|improve this answer













    This can be solved using GNU sed:



    sed '4~5,5~5d' file


    Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew, after which it can be used as gsed. On Linux, GNU sed is the default.



    This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d' fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.



    The top-voted answer suggests using awk '(NR-1)%5<3'. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 30 at 13:47









    tomsmedingtomsmeding

    1513




    1513








    • 3





      +1 ... or 4~5{N;d;}

      – steeldriver
      Mar 30 at 15:03














    • 3





      +1 ... or 4~5{N;d;}

      – steeldriver
      Mar 30 at 15:03








    3




    3





    +1 ... or 4~5{N;d;}

    – steeldriver
    Mar 30 at 15:03





    +1 ... or 4~5{N;d;}

    – steeldriver
    Mar 30 at 15:03











    4














    A generic solution for masking out a particular pattern of lines from a file:



    #!/bin/sh

    # The pattern is given on the command line.
    pattern=$1

    # The period is simply the length of the pattern.
    period=${#pattern}

    # Use bc to convert the binary pattern to an integer.
    mask=$( printf 'ibase=2; %sn' "$pattern" | bc )

    awk -v mask="$mask" -v period="$period" '
    BEGIN { p = lshift(1, period-1) }
    and(rshift(p, (FNR-1) % period), mask)'


    This relies on awk implementing the non-standard functions and() (bitwise AND), rshift() and lshift() (bitwise right and left shift), which both GNU awk and some BSD implementations of awk does, but not mawk.



    This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1 means "keep" and a 0 means "delete".



    For example: The pattern of line that should be applied in your question is 11100, which means "for each set of five lines, keep the first three and delete the others".



    Using 01001000 would delete all but the 2nd and 5th lines in every 8 lines.



    The awk program could also be written without the BEGIN block as



    and(lshift(1, (period-1) - (FNR-1) % period), mask)


    Left-shifting 1 by (period-1) - (FNR-1) % period positions is the same as calculating 2 to that power, but I'm using lshift() since awk does its arithmetics using floating point operations rather than in exact integer arithmetics.



    Since the code relies on the binary representation of the pattern, very long patterns may not work well.



    Testing:



    Removing the lines you want to remove:



    $ sh script.sh 11100 <file
    1st line (keep)
    2nd line (keep)
    3rd line (keep)
    6th (keep)
    7nth (keep)
    8th lines (keep)
    11th (keep)
    12th (keep)
    13th (keep)


    Inverting the pattern:



    $ sh script.sh 00011 <file
    4rth lines (delete)
    5th (del)
    9th (del)
    10th (del)
    14th (del)
    15th (del)





    share|improve this answer






























      4














      A generic solution for masking out a particular pattern of lines from a file:



      #!/bin/sh

      # The pattern is given on the command line.
      pattern=$1

      # The period is simply the length of the pattern.
      period=${#pattern}

      # Use bc to convert the binary pattern to an integer.
      mask=$( printf 'ibase=2; %sn' "$pattern" | bc )

      awk -v mask="$mask" -v period="$period" '
      BEGIN { p = lshift(1, period-1) }
      and(rshift(p, (FNR-1) % period), mask)'


      This relies on awk implementing the non-standard functions and() (bitwise AND), rshift() and lshift() (bitwise right and left shift), which both GNU awk and some BSD implementations of awk does, but not mawk.



      This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1 means "keep" and a 0 means "delete".



      For example: The pattern of line that should be applied in your question is 11100, which means "for each set of five lines, keep the first three and delete the others".



      Using 01001000 would delete all but the 2nd and 5th lines in every 8 lines.



      The awk program could also be written without the BEGIN block as



      and(lshift(1, (period-1) - (FNR-1) % period), mask)


      Left-shifting 1 by (period-1) - (FNR-1) % period positions is the same as calculating 2 to that power, but I'm using lshift() since awk does its arithmetics using floating point operations rather than in exact integer arithmetics.



      Since the code relies on the binary representation of the pattern, very long patterns may not work well.



      Testing:



      Removing the lines you want to remove:



      $ sh script.sh 11100 <file
      1st line (keep)
      2nd line (keep)
      3rd line (keep)
      6th (keep)
      7nth (keep)
      8th lines (keep)
      11th (keep)
      12th (keep)
      13th (keep)


      Inverting the pattern:



      $ sh script.sh 00011 <file
      4rth lines (delete)
      5th (del)
      9th (del)
      10th (del)
      14th (del)
      15th (del)





      share|improve this answer




























        4












        4








        4







        A generic solution for masking out a particular pattern of lines from a file:



        #!/bin/sh

        # The pattern is given on the command line.
        pattern=$1

        # The period is simply the length of the pattern.
        period=${#pattern}

        # Use bc to convert the binary pattern to an integer.
        mask=$( printf 'ibase=2; %sn' "$pattern" | bc )

        awk -v mask="$mask" -v period="$period" '
        BEGIN { p = lshift(1, period-1) }
        and(rshift(p, (FNR-1) % period), mask)'


        This relies on awk implementing the non-standard functions and() (bitwise AND), rshift() and lshift() (bitwise right and left shift), which both GNU awk and some BSD implementations of awk does, but not mawk.



        This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1 means "keep" and a 0 means "delete".



        For example: The pattern of line that should be applied in your question is 11100, which means "for each set of five lines, keep the first three and delete the others".



        Using 01001000 would delete all but the 2nd and 5th lines in every 8 lines.



        The awk program could also be written without the BEGIN block as



        and(lshift(1, (period-1) - (FNR-1) % period), mask)


        Left-shifting 1 by (period-1) - (FNR-1) % period positions is the same as calculating 2 to that power, but I'm using lshift() since awk does its arithmetics using floating point operations rather than in exact integer arithmetics.



        Since the code relies on the binary representation of the pattern, very long patterns may not work well.



        Testing:



        Removing the lines you want to remove:



        $ sh script.sh 11100 <file
        1st line (keep)
        2nd line (keep)
        3rd line (keep)
        6th (keep)
        7nth (keep)
        8th lines (keep)
        11th (keep)
        12th (keep)
        13th (keep)


        Inverting the pattern:



        $ sh script.sh 00011 <file
        4rth lines (delete)
        5th (del)
        9th (del)
        10th (del)
        14th (del)
        15th (del)





        share|improve this answer















        A generic solution for masking out a particular pattern of lines from a file:



        #!/bin/sh

        # The pattern is given on the command line.
        pattern=$1

        # The period is simply the length of the pattern.
        period=${#pattern}

        # Use bc to convert the binary pattern to an integer.
        mask=$( printf 'ibase=2; %sn' "$pattern" | bc )

        awk -v mask="$mask" -v period="$period" '
        BEGIN { p = lshift(1, period-1) }
        and(rshift(p, (FNR-1) % period), mask)'


        This relies on awk implementing the non-standard functions and() (bitwise AND), rshift() and lshift() (bitwise right and left shift), which both GNU awk and some BSD implementations of awk does, but not mawk.



        This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1 means "keep" and a 0 means "delete".



        For example: The pattern of line that should be applied in your question is 11100, which means "for each set of five lines, keep the first three and delete the others".



        Using 01001000 would delete all but the 2nd and 5th lines in every 8 lines.



        The awk program could also be written without the BEGIN block as



        and(lshift(1, (period-1) - (FNR-1) % period), mask)


        Left-shifting 1 by (period-1) - (FNR-1) % period positions is the same as calculating 2 to that power, but I'm using lshift() since awk does its arithmetics using floating point operations rather than in exact integer arithmetics.



        Since the code relies on the binary representation of the pattern, very long patterns may not work well.



        Testing:



        Removing the lines you want to remove:



        $ sh script.sh 11100 <file
        1st line (keep)
        2nd line (keep)
        3rd line (keep)
        6th (keep)
        7nth (keep)
        8th lines (keep)
        11th (keep)
        12th (keep)
        13th (keep)


        Inverting the pattern:



        $ sh script.sh 00011 <file
        4rth lines (delete)
        5th (del)
        9th (del)
        10th (del)
        14th (del)
        15th (del)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 30 at 13:07

























        answered Mar 30 at 9:55









        KusalanandaKusalananda

        140k17261435




        140k17261435























            1














            Tried with below command and it worked fine



            for((i=1;i<=20;i++)); do  j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done


            output



            1st line (keep)
            2nd line (keep)
            3rd line (keep)
            6th (keep)
            7nth (keep)
            8th lines (keep)
            11th (keep)
            12th (keep)
            13th (keep)





            share|improve this answer



















            • 1





              That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

              – Law29
              Mar 30 at 11:45
















            1














            Tried with below command and it worked fine



            for((i=1;i<=20;i++)); do  j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done


            output



            1st line (keep)
            2nd line (keep)
            3rd line (keep)
            6th (keep)
            7nth (keep)
            8th lines (keep)
            11th (keep)
            12th (keep)
            13th (keep)





            share|improve this answer



















            • 1





              That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

              – Law29
              Mar 30 at 11:45














            1












            1








            1







            Tried with below command and it worked fine



            for((i=1;i<=20;i++)); do  j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done


            output



            1st line (keep)
            2nd line (keep)
            3rd line (keep)
            6th (keep)
            7nth (keep)
            8th lines (keep)
            11th (keep)
            12th (keep)
            13th (keep)





            share|improve this answer













            Tried with below command and it worked fine



            for((i=1;i<=20;i++)); do  j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done


            output



            1st line (keep)
            2nd line (keep)
            3rd line (keep)
            6th (keep)
            7nth (keep)
            8th lines (keep)
            11th (keep)
            12th (keep)
            13th (keep)






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 30 at 7:52









            Praveen Kumar BSPraveen Kumar BS

            1,7391311




            1,7391311








            • 1





              That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

              – Law29
              Mar 30 at 11:45














            • 1





              That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

              – Law29
              Mar 30 at 11:45








            1




            1





            That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

            – Law29
            Mar 30 at 11:45





            That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

            – Law29
            Mar 30 at 11:45



            Popular posts from this blog

            Bruad Bilen | Luke uk diar | NawigatsjuunCommonskategorii: BruadCommonskategorii: RunstükenWikiquote: Bruad

            Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

            He _____ here since 1970 . Answer needed [closed]What does “since he was so high” mean?Meaning of “catch birds for”?How do I ensure “since” takes the meaning I want?“Who cares here” meaningWhat does “right round toward” mean?the time tense (had now been detected)What does the phrase “ring around the roses” mean here?Correct usage of “visited upon”Meaning of “foiled rail sabotage bid”It was the third time I had gone to Rome or It is the third time I had been to Rome