[荐]datamash:处理文本数据的瑞士军刀

GNU datamash is a command-line program which performs basic numeric,textual and statistical operations (e.g. count, sum, min, max, mean, 1st quartile, median, 3rd quarile, IQR, stdev, string coalescing)on input textual data files.

GNU datamash is designed for ease of use, strict input validation, and robust operation. If datamash is not available, some operations could be performed using existing software (such as awk, Perl, R). Using Datamash has the following advantages over simple one-liners:

  • Datamash performs strict input validation on the input, and provides informative error messages when invalid input is found.
  • Datamash operations are simpler to type, and less error-prone than writing one-liners.
  • Datamash supports header lines (-H/–headers) on all operations.
  • Datamash supports printing the entire line (-f/–full), not just the field being processed.
  • Datamash’s output is suitable for both interactive command-line usage, and for scripting, automation and down-stream processing by other tools.

GNU datamash is runs on a wide variety of UNIX platforms, Windows, and MacOS.

扩展链接: