import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from numpy import NaN
from glob import glob
import re
pd.set_option('max_columns', 200)
pd.set_option('max_rows', 300)
pd.set_option('display.expand_frame_repr', True)
stocks1 = 'data/intro_to_python_for_finance/stock_data.csv'
stocks2 = 'data/intro_to_python_for_finance/stock_data2.csv'
sector_100 = 'data/intro_to_python_for_finance/sector.txt'
exercises = 'data/intro_to_python_for_finance/exercise_data.csv'
Course Description
The financial industry is increasingly adopting Python for general-purpose programming and quantitative analysis, ranging from understanding trading dynamics to risk management systems. This course focuses specifically on introducing Python for financial analysis. Using practical examples, you will learn the fundamentals of Python data structures such as lists and arrays and learn powerful ways to store and manipulate financial data to identify trends.
This chapter is an introduction to basics in Python, including how to name variables and various data types in Python.
Using variables to evaluate stock trends
$\text{Price to earning ratio} = \frac{\text{Maket Price}}{\text{Earnings per share}}$
price = 200
earnings = 5
pe_ratio = price/earnings
pe_ratio
Booleans are used to represent True or False statements in Python. Boolean comparisons include:
Operators | Descriptions |
---|---|
> | greater than |
>= | greater than or equal |
< | less than |
<= | less than or equal |
== | equal (compare) |
!= | does not equal |
This chapter introduces lists in Python and how they can be used to work with data.
Methods | Functions |
---|---|
All methods are functions | Not all functions are methods |
List methods are a subset of built in functions in Python | |
Used on an object | Requires an input of an object |
prices.sort() | type(prices) |
This chapter introduces packages in Python, specifically the NumPy package and how it can be efficiently used to manipulate arrays.
Why use an array for financial analysis?
# Arrays - element-wise sum
array_A = np.array([1, 2, 3])
array_B = np.array([4, 5, 6])
array_A + array_B
# Lists - list concatenation
list_A = [1, 2, 3]
list_B = [4, 5, 6]
list_A + list_B
In this chapter, you will be introduced to the Matplotlib package for creating line plots, scatter plots, and histograms.
df = pd.read_csv(stocks1)
df.head()
plt.plot(df.Day, df.Price, color='red', linestyle='--')
# Add x and y labels
plt.xlabel('Days')
plt.ylabel('Prices, $')
# Add plot title
plt.title('Company Stock Prices Over Time')
df = pd.read_csv(stocks2)
df.head()
# Plot two lines of varying colors
plt.plot(df.day, df.company1, color='red')
plt.plot(df.day, df.company2, color='green')
# Add labels
plt.xlabel('Days')
plt.ylabel('Prices, $')
plt.title('Stock Prices Over Time')
df[['company1', 'company2']].plot()
plt.scatter(df.day, df.company1, color='green', s=0.1)
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=3)
plt.show()
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, density=True)
plt.show()
plt.hist(x=prices, bins=6, density=True)
plt.hist(x=prices2, bins=6, density=True)
plt.show()
plt.hist(x=prices, bins=6, density=True, alpha=0.5)
plt.hist(x=prices2, bins=6, density=True, alpha=0.5)
plt.show()
plt.hist(x=prices, bins=6, density=True, alpha=0.5, label='Prices 1')
plt.hist(x=prices2, bins=6, density=True, alpha=0.5, label='Prices New')
plt.legend()
plt.show()
plt.hist(df.company2, bins=100, ec='black')
plt.show()
df_exercises = pd.read_csv(exercises)
df_exercises.head()
df_exercises.hist(bins=100, alpha=0.4, ec='black')
plt.show()
# Plot histogram of stocks_A
plt.hist(df_exercises.stock_A, bins=100, alpha=0.4, label='Stock A')
# Plot histogram of stocks_B
plt.hist(df_exercises.stock_B, bins=100, alpha=0.4, label='Stock B')
# Add the legend
plt.legend()
# Display plot
plt.show()
In this chapter, you will get a chance to apply all the techniques you learned in the course on the S&P 100 data.
Standard and Poor's S&P 100:
df = pd.read_csv(sector_100)
df.head()
df.tail()
$\text{Price to earning ratio} = \frac{\text{Maket Price}}{\text{Earnings per share}}$
Given
Objective Part I
names = df.Name.values
prices = df.Price.values
earnings = df.EPS.values
sectors = df.Sector.values
type(names)
Stocks in the S&P 100 are selected to represent sector balance and market capitalization. To begin, let's take a look at what data we have associated with S&P companies.
Four lists, names, prices, earnings, and sectors, are available in your workspace.
Instructions
# First four items of names
print(names[:4])
# Print information on last company
print(names[-1])
print(prices[-1])
print(earnings[-1])
print(sectors[-1])
NumPy is a scientific computing package in Python that helps you to work with arrays. Let's use array operations to calculate price to earning ratios of the S&P 100 stocks.
The S&P 100 data is available as the lists: prices (stock prices per share) and earnings (earnings per share).
Instructions
# Convert lists to arrays
prices_array = np.array(prices)
earnings_array = np.array(earnings)
# Calculate P/E ratio
pe = prices/earnings
pe[:10]
Given
Objective Part II
In this lesson, you will focus on two sectors:
numpy is imported as np and S&P 100 data is stored as arrays: names, sectors, and pe (price to earnings ratio).
Instructions 1/2
# Create boolean array
boolean_array = (sectors == 'Information Technology')
# Subset sector-specific data
it_names = names[boolean_array]
it_pe = pe[boolean_array]
# Display sector names
print(it_names)
print(it_pe)
Instructions 2/2
# Create boolean array
boolean_array = (sectors == 'Consumer Staples')
# Subset sector-specific data
cs_names = names[boolean_array]
cs_pe = pe[boolean_array]
# Display sector names
print(cs_names)
print(cs_pe)
In this exercise, you will calculate the mean and standard deviation of P/E ratios for Information Technology and Consumer Staples sectors. numpy is imported as np and the it_pe and cs_pe arrays from the previous exercise are available in your workspace.
Instructions 1/2
Calculate the mean and standard deviation of the P/E ratios (it_pe) for the Industrial Technology sector.
# Calculate mean and standard deviation
it_pe_mean = np.mean(it_pe)
it_pe_std = np.std(it_pe)
print(it_pe_mean)
print(it_pe_std)
Instructions 2/2
# Calculate mean and standard deviation
cs_pe_mean = np.mean(cs_pe)
cs_pe_std = np.std(cs_pe)
print(cs_pe_mean)
print(cs_pe_std)
Let's take a closer look at the P/E ratios using a scatter plot for each company in these two sectors.
The arrays it_pe and cs_pe from the previous exercise are available in your workspace. Also, each company name has been assigned a numeric ID contained in the arrays it_id and cs_id.
Instructions
it_id = np.arange(0, 15)
cs_id = np.arange(0, 12)
# Make a scatterplot
plt.scatter(it_id, it_pe, color='red', label='IT')
plt.scatter(cs_id, cs_pe, color='green', label='CS')
# Add legend
plt.legend()
# Add labels
plt.xlabel('Company ID')
plt.ylabel('P/E Ratio')
plt.show()
Notice that there is one company in the IT sector with an unusually high P/E ratio
To visualize and understand the distribution of the P/E ratios in the IT sector, you can use a histogram.
The array it_pe from the previous exercise is available in your workspace.
Instructions
# Plot histogram
plt.hist(it_pe, bins=8, ec='black')
# Add x-label
plt.xlabel('P/E ratio')
# Add y-label
plt.ylabel('Frequency')
# Show plot
plt.show()
A stock with P/E ratio > 50.
You've identified that a company in the Industrial Technology sector has a P/E ratio of greater than 50. Let's identify this company.
numpy is imported as np, and arrays it_pe (P/E ratios of Industrial Technology companies) and it_names (names of Industrial Technology companies) are available in your workspace.
Instructions
# Identify P/E ratio within it_pe that is > 50
outlier_price = it_pe[it_pe > 50]
# Identify the company with PE ratio > 50
outlier_name = it_names[it_pe == outlier_price]
# Display results
print(f'In 2017 {outlier_name[0]} had an abnormally high P/E ratio of {round(outlier_price[0], 2)}.')