Data Visualization¶
Stock Market Data Analysis Using Yahoo Finance API¶
This project focuses on analyzing stock market data by fetching intraday trading data using Yahoo Finance API (yfinance). The project implements various data visualization techniques to interpret stock price movements and trading volumes efficiently. The main objectives are:
- Fetching stock market data for specific companies based on date and interval.
- Analyzing volume trends, price fluctuations, and statistical distributions.
- Utilizing data visualization to extract meaningful insights.
- Handling missing data and ensuring robust error management.
Technologies Used¶
- Python: Primary programming language.
- yfinance: Fetches real-time and historical stock data.
- NumPy & Pandas: Data processing and transformation.
- Matplotlib & Seaborn: Data visualization and analysis.
- Datetime & Timedelta: Handling date-based operations.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime,timedelta
import matplotlib.ticker as mticker
import seaborn as sns
DATE_FORMAT = "%Y-%m-%d"
TEST_CASES = [
{"ticker": "AAPL", "date": "2025-01-15", "interval": "5m"},
{"ticker": "GOOG", "date": "2025-01-17", "interval": "15m"},
{"ticker": "MSFT", "date": "2025-01-15", "interval": "5m"},
{"ticker": "TSLA", "date": "2025-01-15", "interval": "30m"},
{"ticker": "AMZN", "date": "2025-01-15", "interval": "5m"},
]
plt.style.use(plt.style.available[11])
Task 1 A:¶
Write a function called data_download. This function takes three parameters (ticker, date, interval) and uses yfinance to download stock data according to the parameters. The downloaded data is returned by the function.
ticker is the code for a company like “AAPL”, “GOOG” etc.
interval is the granularity of the data and can be “1m”, “5m”, “15m”
def data_download(ticker:str,date:str,interval:str):
ALLOWED_INTERVALS = ["1m", "2m", "5m", "15m", "30m", "60m", "90m", "1h"]
assert interval in ALLOWED_INTERVALS, f"Irregular Interval Provided \n Select From {', '.join(ALLOWED_INTERVALS)}"
try:
tk = yf.Ticker(ticker)
date_ = datetime.strptime(date,DATE_FORMAT)
data = tk.history(start=date,end=date_+timedelta(days=1),interval= interval)
return data
except ValueError as error:
print(f"ValueError: {error}")
except Exception as error:
print(f"Failed: {error}")
return None
Task 1 B:¶
Write code to test data_download using two tickers, two dates and two intervals. You can accomplish this using two test cases. For each test, print the first 5 rows of the downloaded data.
for i,test in enumerate(TEST_CASES):
print(f'TEST CASE {i+1}:\t{test["ticker"]}')
data = data_download(test["ticker"], test["date"], test["interval"])
print(data.head(5))
print("\n")
TEST CASE 1: AAPL Open High Low Close \ Datetime 2025-01-15 09:30:00-05:00 234.639999 236.919998 234.429993 236.789993 2025-01-15 09:35:00-05:00 236.796799 237.190002 236.610001 236.929001 2025-01-15 09:40:00-05:00 237.128906 237.520004 236.949997 237.350006 2025-01-15 09:45:00-05:00 237.339996 237.910004 237.339996 237.869995 2025-01-15 09:50:00-05:00 237.895004 238.121002 237.410004 237.431396 Volume Dividends Stock Splits Datetime 2025-01-15 09:30:00-05:00 2223901 0.0 0.0 2025-01-15 09:35:00-05:00 884610 0.0 0.0 2025-01-15 09:40:00-05:00 537723 0.0 0.0 2025-01-15 09:45:00-05:00 547889 0.0 0.0 2025-01-15 09:50:00-05:00 808161 0.0 0.0 TEST CASE 2: GOOG Open High Low Close \ Datetime 2025-01-17 09:30:00-05:00 198.110001 198.350006 195.714996 195.929993 2025-01-17 09:45:00-05:00 196.009995 196.350006 195.440002 195.589996 2025-01-17 10:00:00-05:00 195.600006 196.470001 195.509995 196.410004 2025-01-17 10:15:00-05:00 196.380005 196.490005 195.309998 196.177002 2025-01-17 10:30:00-05:00 196.139999 196.500000 195.840103 196.365005 Volume Dividends Stock Splits Datetime 2025-01-17 09:30:00-05:00 4810042 0.0 0.0 2025-01-17 09:45:00-05:00 601725 0.0 0.0 2025-01-17 10:00:00-05:00 427181 0.0 0.0 2025-01-17 10:15:00-05:00 994941 0.0 0.0 2025-01-17 10:30:00-05:00 323794 0.0 0.0
Task 2 A:¶
Write a function called volume_analysis. This function takes data, ticker, date and interval as parameters and displays the Volume column in the data as a bar chart
percent_of_labels = .1
def get_bar_colors(data):
colors = ["green"]
for i in range(1, len(data)):
if data["Volume"].iloc[i] >= data["Volume"].iloc[i - 1]:
colors.append("green")
else:
colors.append("red")
return colors
def get_ticker_name(ticker):
return yf.Ticker(ticker).info.get("longName", ticker)
def volume_analysis(data:pd.DataFrame, ticker:str, date:str, interval:str):
plt.figure(figsize=(12,6),dpi=300)
width = 0.265 / len(data)
plt.bar(data.index, data["Volume"], edgecolor="black",alpha=.6, color=get_bar_colors(data),width=width,label=ticker)
plt.title(
f"Volume Analysis For {get_ticker_name(ticker)} on {date} (Interval:{interval})",
color="black",
fontsize=12,
fontweight="bold",
)
plt.xlabel("Time Period", fontsize=10)
plt.ylabel("Volume", fontsize=10)
plt.xticks(
data.index[:: int(len(data) * percent_of_labels)],
[
d.strftime("%H:%M")
for d in data.index[:: int(len(data) * percent_of_labels)]
],
rotation=45,
fontsize=10,
ha="right",
)
# FORMAT Y TICKS
plt.yticks(fontsize=10)
plt.gca().yaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f"{x/1e6:.1f}M")
)
legend = plt.legend(
title="Ticker",
facecolor="white",
frameon=True,
fancybox=True,
framealpha=0.8,
edgecolor="gray",
)
plt.setp(legend.get_title(), color="black")
plt.tight_layout()
plt.show()
Task 2 B:¶
Write code to test volume_analysis using two tickers, two dates and two intervals. First download data using a call to data_download and then use the function to plot.
for i, test in enumerate(TEST_CASES):
print(f'TEST CASE {i+1}:\t{test["ticker"]}')
data = data_download(test["ticker"], test["date"], test["interval"])
volume_analysis(data, test["ticker"], test["date"], test["interval"])
TEST CASE 1: AAPL
TEST CASE 2: GOOG
Task 3 A:¶
Write a function called price_analysis which takes data, ticker, date and interval as parameters. It calculates the mean of the prices (Close, High, Low, and Open) for each row and displays that as a line chart. It also displays Close price as a separate line in the same graph. This function will plot two lines in a single graph.
def price_analysis(data,ticker,date,interval):
plt.figure(figsize=(12, 6), dpi=300)
mean_price = data[["Open","High","Low","Close"]].mean(axis=1)
plt.plot(data.index,data["Close"],label="Closing Price",alpha=.6,color="green")
plt.plot(
mean_price.index,
mean_price,
color="red",
label="Mean Price(Close, High, Low, and Open)",
linestyle="dashed",
linewidth=1.5,
alpha=0.9,
)
plt.title(
f"Price Analysis For {get_ticker_name(ticker)} on {date} (Interval:{interval})",
color="black",
fontsize=12,
fontweight="bold",
)
plt.xlabel("Time Period", fontsize=10)
plt.ylabel(f"Price", fontsize=10)
plt.xticks(
data.index[:: int(len(data) * percent_of_labels)],
[
d.strftime("%H:%M")
for d in data.index[:: int(len(data) * percent_of_labels)]
],
rotation=45,
fontsize=10,
ha="right",
)
# FORMAT Y TICKS
plt.yticks(fontsize=10)
plt.legend(
facecolor="white",
frameon=True,
fancybox=True,
framealpha=0.8,
edgecolor="gray")
plt.show()
return None
Task 3 B:¶
Write code to test price_analysis using two tickers, two dates and two intervals. First download data using a call to data_download and then use the function to plot.
for i, test in enumerate(TEST_CASES):
print(f'TEST CASE {i+1}:\t{test["ticker"]}')
data = data_download(test["ticker"], test["date"], test["interval"])
price_analysis(data, test["ticker"], test["date"], test["interval"])
TEST CASE 1: AAPL
TEST CASE 2: GOOG
Task 4 A:¶
Write a function called violin_plots which takes data, ticker, date and interval as parameters. It draws a violin plot for each of the following columns in your data: Close, Open, Low, High
def violin_plots(data,ticker,date,interval):
df = data[["Open", "High", "Low", "Close"]]
plt.figure(figsize=(10, 6),dpi=300)
sns.set_theme(style="whitegrid")
sns.violinplot(df)
plt.xlabel("Parameters", fontsize=8)
plt.ylabel("Price",fontsize=8)
plt.title(
f"Violin Plot For {get_ticker_name(ticker)} on {date} (Interval:{interval})",
color="black",
fontsize=12,
fontweight="bold",
)
plt.show()
return None
Task 4 B:¶
Write code to test voilen_plots using two tickers, two dates and two intervals. First download data using a call to data_download and then use the function to plot
for i, test in enumerate(TEST_CASES):
print(f'TEST CASE {i+1}:\t{test["ticker"]}')
data = data_download(test["ticker"], test["date"], test["interval"])
violin_plots(data, test["ticker"], test["date"], test["interval"])
TEST CASE 1: AAPL
TEST CASE 2: GOOG