Modern Programming for Data Analytics - pandas
What is pandas?
- a Python package that provides fast, flexible and expressive data structures
- designed to work with relational and labeled data
- it is suited to work with:
- tabular data with heterogenous types
- ordered and unordered times series data
- arbitrary matrix data
- observational /statistical data sets
- pandas is fast, many high-performance algorithms under the hood
pandas does well:
- handling of missing
- mutable data structures (inserting and deleting data)
- automatic and explicit data alignment
- flexible groupby functionality and advanced data querying
- high interperability with NumPy and SciPy
- easy and intelligent slicing, indexing and subsetting of large data sets
- intuiting merging and joining of data
- hierarchical and multilevel datasets
- robust i/o operations to read and write multiple data formats
- time series specific functionality
pandas Data Structures
- Series - 1-D labeled homogeneously-typed array (that type may be object)
- DataFrame - General 2D labeled, size mutable tabular structure with potentially
heterogenous-typed columns
pandas DataFrame