Pandas Basics
Installation
pyenv activate guild # guild is the environment name
pip3 install numpy pandas # install numpy and pandas in the 'guild' environment
jupyter notebook # start up jupyter
cheatsheet
transactions_df
is a Pandas Dataframe that looks like this:
order_id | product | quantity | price | customer_id | transaction_date | |
---|---|---|---|---|---|---|
0 | 1 | Pasta | 43 | $11.39 | 4 | 17/6/19 |
1 | 2 | Lamb | 71 | $3.72 | 5 | 4/9/19 |
2 | 3 | Cake | 81 | $3.09 | 76 | 24/10/19 |
3 | 4 | Wine | 69 | $7.09 | 20 | 29/4/19 |
# overview
import pandas as pd
tranx = pd.read_csv('transactions.csv')
tranx.info()
tranx.shape
tranx.head(10) # first 10 rows
tranx["price"].describe()
transactions['price'].dtype
# null
tranx.isnull()
tranx.notnull()
tranx.dropna()
tranx.dropna(subset=["price"])
tranx.fillna()
tranx.fillna(value={"price":0.0, "quantity":0})
# selecting
tranx["price"] # column 'price'
tranx.iloc[0] # first row
tranx.nlargest(2,columns=["price"])
tranx.nsmallest(2,columns=["price"])
# sorting & grouping
tranx.sort_values(["price"], ascending=False)
tranx.groupby("customer_id").sum("price")
tranx.groupby("customer_id").sum("price")
# filter row
new_df = tranx[(tranx['product']=='Honey')['price']==0.0)]
# column operations
### strip_off_dollar is a function that takes 1 parameter
tranx['price'] = tranx['price'].apply(strip_off_dollar)
tranx['total'] = tranx['price'] * tranx["quantity"]