COMP 2314 Linux Programming for Data Analysis

The Linux Operating System provides the infrastructure that powers the vast majority of the Internet. This course introduces Linux, with a particular focus on writing code and algorithms on the command line, in scripts, and in programming languages, to manipulate massive amounts of data to achieve analytic objectives. The courses cover the following topics: the Linux command line (aka 'the Shell') and built-in features, permissions and processes, regular expressions on the command line and within code, the 'gawk' programming language, version control and automated application building, advanced Python features (such as generators) and its libraries, visualization packages such as matplotlib. There is special emphasis on Hadoop and Spark to build algorithms for massively scalable computations on clusters of Linux computers. Prerequisite(s): COMP 1300.

Credits

4