All you need to handle dates and timestamps in Pandas! Many examples provided.
Published in · 6 min read · Mar 4, 2020
--
- Dates in general
More specifically, handling operations with…
- Days
- Weeks
- Months
- Years
Extra: Generating date ranges
- Generate Sequential date ranges
- Generate Random date ranges
(*) Before running any code, please import pandas!
import pandas as pd
Now, let’s get started.
Changing Date Format in a column DataFrame
You can choose every format as you want, following this simple strftime documentation.
So for example, starting from this DataFrame:
Change the Date Format, with:
df['date'] = df['date'].apply(lambda x: pd.Timestamp(x).strftime('%Y-%m-%d'))
Or, we can go a bit more exotic and do:
df['date'] = df['date'].apply(lambda x: pd.Timestamp(x).strftime('%B-%d-%Y %I:%M %p'))
Remember, all variations for timestamp formats that you can choose, you can find them in this link: strftime
Try it yourself! Starting from this example-built DataFrame:
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-03-02 13:15:21']})
Change column type — from Object/String to DateTime
# 4 possible options# 1
df['date'] = pd.to_datetime(df['date'])
# 2
df['date'] = df['date'].astype('datetime64[ns]')
# 3
df['date'] = df['date'].apply(lambda x: parse(x))
# 4
df['date'] = df['date'].apply(lambda x: pd.Timestamp(x))
Example (we try only one of the 4 options, but all of them should work)
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-01-02 13:15:21']})df
df.dtypesdf['date'] = pd.to_datetime(df['date'])
df
df.dtypes
Output:
Going from String to Date format, and from Date format to String
Example with isolated variables:
from dateutil.parser import parsestr_date = '2018-05-01'# String to Date:
date_1 = parse(str_date)
print ('date_1: ',date_1, type(date_1))# Date to String:
date_2 = date_1.strftime('%Y-%m-%d')
print ('date_2: ',date_2, type(date_2))
Output:
From Unix/Epoch time to Readable date format
df['date'] = pd.to_datetime(df['date'],unit='s')
Example:
df = pd.DataFrame({'date': [1349720105,1349806505]})dfdf['date'] = pd.to_datetime(df['date'],unit='s')df
Output (before and after):
Add and subtract dates
Use Timedelta! Example:
from datetime import datetime, timedelta
from dateutil.parser import parseparse('2019-04-07') — timedelta(days=3)# or, to get it as a string
(parse('2019-04-07') — timedelta(days=3)).strftime('%Y-%m-%d')
Output:
# with date format
datetime.datetime(2019, 4, 4, 0, 0) # with string format
'2019-04-04'
Get the difference between two dates
Convert both strings into date format, and then do the calculation. Example:
from dateutil.parser import parsed1 = parse('2018-12-01')
d2 = parse('2018-12-08')
abs((d2 - d1).days)
Output:
7
# 7 days
Get the day from a Date
# for a column in a DataFrame
from datetime import datetime as dt
df['day'] = df['date'].dt.day# for a single value
from dateutil.parser import parse
parse('2018-08-09').day
Output:
9
Get week number of the year
Example:
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-01-02 13:15:21']})# if date column type is a string
df['week'] = pd.DatetimeIndex(df['date']).week# if date column type is a datetime
# df['week'] = df['date'].dt.week
Output:
(*) To create a Week column, in the format yyyy-ww, use:
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-03-02 13:15:21']})# if column type is a string/object
df['yyyy_ww'] = pd.DatetimeIndex(df['date']).strftime('%Y-%U')# if column type is a datetime
# df['yyyy_ww'] = df['date'].dt.strftime('%Y-%U')
And for an isolated variable:
import datetime
date_1 = '2018-02-06'
parse(date_1).isocalendar()[1]
Output:
6
# 6th week of the year
Get weekday
Example:
df['weekday'] = df['date'].apply(lambda x: parse(str(x)).strftime("%A"))
Output:
Go from Year-Week format to yyyy-mm-dd format (getting the first and last day o of the week)
Example: you want to know what dates were the start and end from week number 37 in the year 2018:
# define this function
def get_start_end_dates(yyyyww):
year = yyyyww[:4]
week = yyyyww[-2:]
first_day_year = str(year) + '-' + '01' + '-' + '01'
d = parse(first_day_year)
if(d.weekday()<= 3):
d = d - timedelta(d.weekday())
else:
d = d + timedelta(7-d.weekday())
dlt = timedelta(days = (int(week)-1)*7)
return (d + dlt).strftime('%Y-%m-%d'), (d + dlt + timedelta(days=6)).strftime('%Y-%m-%d')# run it
get_start_end_dates('201837')
Output (a tuple with the start and the end of the week):
('2018-09-10', '2018-09-16')
Get month number of the year
Example:
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-03-02 13:15:21']})# if date column type is a string/object
df['month'] = pd.DatetimeIndex(df['date']).month# if date column type is a datetime
# df['month'] = df['date'].dt.month
Output:
And for an isolated variable:
import datetime
date_1 = '2018-02-06'
parse(date_1).month
Output:
2
# 2nd month of the year
(*) To create a month column, in the format YYYY-MM, use:
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-03-02 13:15:21']})# if column type is a string/object
df['yyyy_mm'] = pd.DatetimeIndex(df['date']).strftime('%Y-%m')# if column type is a datetime
# df['yyyy_mm'] = df['date'].dt.strftime('%Y-%m')
Add or subtract months (go X months back or forward)
Use this function
def monthdelta(date, delta):
m, y = (date.month+delta) % 12, date.year + ((date.month)+delta-1) // 12
if not m: m = 12
d = min(date.day, [31,
29 if y%4==0 and not y%400==0 else 28,31,30,31,30,31,31,30,31,30,31][m-1])
new_date = (date.replace(day=d,month=m, year=y))
return new_date.strftime('%Y-%m-%d')
Example (subtracting 4 months to a certain date):
monthdelta(parse('2019-11-09'), -4)
Output (shows the same date, but 4 months before):
'2019-07-09'
Get the year
Example:
df = pd.DataFrame({'date': ['2018-08-09 11:10:55','2019-03-02 13:15:21']})# if date column type is a string/object
df['year'] = pd.DatetimeIndex(df['date']).year# if date column type is a datetime
# df['year'] = df['date'].dt.year
And for an isolated variable:
import datetime
date_1 = '2018-02-06'
parse(date_1).year
Output:
2018
Generate Sequential date ranges
Example: generating a date range from 01/01/2019 to 01/02/2019, with hourly frequency.
from datetime import datetime
import numpy as npdate_range = pd.date_range(start='01/01/2019', end='01/02/2019', freq='H')
See the different option for the frequencies in here.
Generate Random date ranges
import random
import time
from dateutil.parser import parsedef str_time_prop(start, end, format, prop):
stime = time.mktime(time.strptime(start, format))
etime = time.mktime(time.strptime(end, format))
ptime = stime + prop * (etime - stime)
return time.strftime(format, time.localtime(ptime))selected_format = '%Y-%m-%d %H:%M:%S'def random_date(start, end, prop):
return parse(str_time_prop(start, end, selected_format, prop)).strftime(selected_format)print(random_date("2020-01-01 13:40:00", "2020-01-01 14:10:00", random.random()))def make_date(x):
return random_date("2012-12-01 13:40:00", "2012-12-24 14:50:00", random.random())
Here’s the source for this function.
From this, we can generate random dates. For example, let’s generate a list of 10 random timestamps between Christmas and new year:
def make_date(x):
return random_date("2012-12-24 00:00:00", "2012-12-31 23:59:59", random.random())[make_date(x) for x in range(10)]
We can add it also to any dataframe, like this:
df = pd.DataFrame({'number': [1,2,3,4,5]})
df['time'] = df['number'].apply(make_date)
df
This is the end of the article. Hope you enjoy it and that you can make good use of it!
Send me a message or leave a reply if you have any question.
Follow me if you want to get informed about articles like this one in the future!