Split a text file in half (or any percentage) on Ubuntu Linux
If you have an unwieldy text file that you are trying to process,
splitting it in sections can sometimes help processing time, especially
if we were going to import a file into a spreadsheet. Or you might want
to just retrieve a particular set of lines from a file.
Enter split, wc, tail, cat, and grep. (don’t forget sed and awk). Linux contains a rich set of utilities for working with text files on the command line. For our task today we will use split and wc.
First we take a look at our log file….
Let’s check the amount of lines in the file using the wc utility, which stands for “word count”.
Enter split, wc, tail, cat, and grep. (don’t forget sed and awk). Linux contains a rich set of utilities for working with text files on the command line. For our task today we will use split and wc.
First we take a look at our log file….
> ls -lWe see that the file size is 42MB. That’s kinda big… but how many lines are we dealing with? If we wanted to import this into Excel, we would need to keep it less than 65k lines.
-rw-r–r– 1 thegeek ggroup 42046520 2006-09-19 11:42 access.log
Let’s check the amount of lines in the file using the wc utility, which stands for “word count”.
> wc -l access.logWe’re way over our limit. We’ll need to split this into 3 segments. We’ll use the split utility to do this.
146330 access.log
> split -l 60000 access.logWe’ve now split our text files into 3 seperate files, each containing less than 60000 lines, which seemed like a good number to choose. The last file contains the leftover amount. If you were going to cut this particular file in half, you’d have done this:
> ls -l
total 79124
-rw-rw-r– 1 thegeek ggroup 40465200 2006-09-19 12:00 access.log
-rw-rw-r– 1 thegeek ggroup 16598163 2006-09-19 12:05 xaa
-rw-rw-r– 1 thegeek ggroup 16596545 2006-09-19 12:05 xab
-rw-rw-r– 1 thegeek ggroup 7270492 2006-09-19 12:05 xac
> split -l 73165 access.logAnd, that’s all there is to it.
Split a text file in half (or any percentage) on Ubuntu Linux
Reviewed by Pakainfo
on
August 08, 2018
Rating:
No comments: