Modern Programming for Data Analytics - pandas

What is pandas?

  • a Python package that provides fast, flexible and expressive data structures
  • designed to work with relational and labeled data
  • it is suited to work with:
    • tabular data with heterogenous types
    • ordered and unordered times series data
    • arbitrary matrix data
    • observational /statistical data sets
  • pandas is fast, many high-performance algorithms under the hood

pandas does well:

  • handling of missing
  • mutable data structures (inserting and deleting data)
  • automatic and explicit data alignment
  • flexible groupby functionality and advanced data querying
  • high interperability with NumPy and SciPy
  • easy and intelligent slicing, indexing and subsetting of large data sets
  • intuiting merging and joining of data
  • hierarchical and multilevel datasets
  • robust i/o operations to read and write multiple data formats
  • time series specific functionality

pandas Data Structures

  • Series - 1-D labeled homogeneously-typed array (that type may be object)
  • DataFrame - General 2D labeled, size mutable tabular structure with potentially heterogenous-typed columns

pandas DataFrame