Modern Programming for Data Analytics - Course Overview

Instructor: Shawn T. Brown, PhD.

Course Syllabus is on GitHub at:

https://github.com/CMU-MS-DAS-Modern-Programming-Mini/Syllabus

The Goals of the Course

  • Be able to, given a task in data analytics, be able to pick the right algorithm and apply Python tools to solve the task.
  • Gain insight into the process of how Python and Python-based tools work and can be applied properly for data analytics.
  • Be able to apply the techniques learned in the class to be able to manipulate, process, and manage data sets from small and simple to large and complex.
  • Be able to present their software and develop effective software packages in Python within a team development environment.

Topics we will try to cover:

  • Brief introduction to Python3 and the development ecosystem
  • Introduction to object-oriented and procedural programming models and basic software architecture principles
  • Professional programming techniques for modern software development: version control and team development (Git and GitHub), coding standards, unit and regression testing (PyTest) and continuous integration (TravisCI)
  • Introduction to R and RStudio
  • Developing reusable, sharable, and interactive electronic notebooks with Jupyter
  • Python environment management: Virtualenv and Anaconda
  • Fundamentals of data structures and their implementation in Python
  • Python packages for science and data science: NumPy, SciPy, Pandas, StatsModels
  • Data processing techniques for small, medium and large datasets
  • Manual and programmatic metadata standards
  • Data Analytics with Python: Optimization, Linear and Non-Linear Regression, Mathematical Modeling, Monte Carlo Sampling, Distributions, and Clustering

Assignments and Grading

There will be 3 programming assignnemts throughout the mini semester and a final project

  • Each programming assignment will be 20% of grade
    • All assighments will be submitted through GitHub, specific instructions will be given with the assignment
    • All assignments will be due 7 days after the assignment is given, 3% points will be deducted every day that the assignment is template
  • Final Project will be 40% of the grade
    • The final project will include a code walkthrough in-class presentation

How will we interact?

We will attempt to do as much interaction in the class through GitHub

GitHub is is a web-platform that:

  • Provides software developers a place to interact
  • Provides a place to share version contolled code
  • Provides a place to manage software release and distribution

Getting a GitHub account (if you don't already have a GitHub account):

  1. Go to github.com
  2. Click "Sign Up" at the top right
  3. Go through the process
  4. Send your GitHub user name to stbrown@andrew.cmu.edu

GitHub Organization

Created an Organization on Github called: CMU-MS-DAS-Modern-Programming-Mini

The organization allows us to group all of the materials and work done for the class

GitHub Team

We can add teams to the GitHub Organization that allows us to control groups of developers.

Can see our current teams at https://github.com/orgs/CMU-MS-DAS-Modern-Programming-Mini/teams

We currently have a team called "Class-2021"", which I will make all of you a member once you send me your github account.

General Help with the course

Once you are in our GitHub organziation, you will gain access so the Syllabus repository

You can ask questions throug the **Issues** page

Software Development Cycles

  • Requirements gathering
  • Functional Specification
  • Initial Development Phase
  • Release Cycle
    • Testing
    • Addressing Issues
    • Release Software
    • Repeat

A few things to keep in mind when developing software

  • Elegant is the enemy of good
  • Testing, testing, testing, testing...
  • Performance is the second step, not the first
  • Planning is a good thing
  • Pick the right tool for the job
  • Don't be afraid to use other people's work
  • Invest early in version control (Git)
  • Just because you are doing scientific software development, doesn't mean you get to make crappy code

A few things to think about when developing as a team

  • Establish rules for group development based on best practices
    • Code reviews
    • Testing procedures (automated testing)
    • Coding standards (that are enforced)
    • Documentation requirements
  • Spend time up front on setting up the collaborative environment
  • Spend time creating a contribution plan and software philosophy
  • Adopt a software development project management (Scrum, Agile, DevOps)

Assignments for next class (not graded)