Data Visualization¶

Stock Market Data Analysis Using Yahoo Finance API¶

This project focuses on analyzing stock market data by fetching intraday trading data using Yahoo Finance API (yfinance). The project implements various data visualization techniques to interpret stock price movements and trading volumes efficiently. The main objectives are:

  • Fetching stock market data for specific companies based on date and interval.
  • Analyzing volume trends, price fluctuations, and statistical distributions.
  • Utilizing data visualization to extract meaningful insights.
  • Handling missing data and ensuring robust error management.

Technologies Used¶

  • Python: Primary programming language.
  • yfinance: Fetches real-time and historical stock data.
  • NumPy & Pandas: Data processing and transformation.
  • Matplotlib & Seaborn: Data visualization and analysis.
  • Datetime & Timedelta: Handling date-based operations.
In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime,timedelta
import matplotlib.ticker as mticker
import seaborn as sns

DATE_FORMAT = "%Y-%m-%d"

TEST_CASES = [
    {"ticker": "AAPL", "date": "2025-01-15", "interval": "5m"},
    {"ticker": "GOOG", "date": "2025-01-17", "interval": "15m"},
    {"ticker": "MSFT", "date": "2025-01-15", "interval": "5m"},
    {"ticker": "TSLA", "date": "2025-01-15", "interval": "30m"},
    {"ticker": "AMZN", "date": "2025-01-15", "interval": "5m"},
]
plt.style.use(plt.style.available[11])

Task 1 A:¶

Write a function called data_download. This function takes three parameters (ticker, date, interval) and uses yfinance to download stock data according to the parameters. The downloaded data is returned by the function.

ticker is the code for a company like “AAPL”, “GOOG” etc.

interval is the granularity of the data and can be “1m”, “5m”, “15m”

In [75]:
def data_download(ticker:str,date:str,interval:str):
    ALLOWED_INTERVALS = ["1m", "2m", "5m", "15m", "30m", "60m", "90m", "1h"]
    assert interval in ALLOWED_INTERVALS, f"Irregular Interval Provided \n Select From {', '.join(ALLOWED_INTERVALS)}"
    try:
        tk = yf.Ticker(ticker)
        date_ = datetime.strptime(date,DATE_FORMAT)
        data = tk.history(start=date,end=date_+timedelta(days=1),interval= interval)
        return data
    except ValueError as error:
        print(f"ValueError: {error}")
    except Exception as error:
        print(f"Failed: {error}")
    return None

Task 1 B:¶

Write code to test data_download using two tickers, two dates and two intervals. You can accomplish this using two test cases. For each test, print the first 5 rows of the downloaded data.

In [76]:
for i,test in enumerate(TEST_CASES):
    print(f'TEST CASE {i+1}:\t{test["ticker"]}')
    data = data_download(test["ticker"], test["date"], test["interval"])
    print(data.head(5))
    print("\n")
TEST CASE 1:	AAPL
                                 Open        High         Low       Close  \
Datetime                                                                    
2025-01-15 09:30:00-05:00  234.639999  236.919998  234.429993  236.789993   
2025-01-15 09:35:00-05:00  236.796799  237.190002  236.610001  236.929001   
2025-01-15 09:40:00-05:00  237.128906  237.520004  236.949997  237.350006   
2025-01-15 09:45:00-05:00  237.339996  237.910004  237.339996  237.869995   
2025-01-15 09:50:00-05:00  237.895004  238.121002  237.410004  237.431396   

                            Volume  Dividends  Stock Splits  
Datetime                                                     
2025-01-15 09:30:00-05:00  2223901        0.0           0.0  
2025-01-15 09:35:00-05:00   884610        0.0           0.0  
2025-01-15 09:40:00-05:00   537723        0.0           0.0  
2025-01-15 09:45:00-05:00   547889        0.0           0.0  
2025-01-15 09:50:00-05:00   808161        0.0           0.0  


TEST CASE 2:	GOOG
                                 Open        High         Low       Close  \
Datetime                                                                    
2025-01-17 09:30:00-05:00  198.110001  198.350006  195.714996  195.929993   
2025-01-17 09:45:00-05:00  196.009995  196.350006  195.440002  195.589996   
2025-01-17 10:00:00-05:00  195.600006  196.470001  195.509995  196.410004   
2025-01-17 10:15:00-05:00  196.380005  196.490005  195.309998  196.177002   
2025-01-17 10:30:00-05:00  196.139999  196.500000  195.840103  196.365005   

                            Volume  Dividends  Stock Splits  
Datetime                                                     
2025-01-17 09:30:00-05:00  4810042        0.0           0.0  
2025-01-17 09:45:00-05:00   601725        0.0           0.0  
2025-01-17 10:00:00-05:00   427181        0.0           0.0  
2025-01-17 10:15:00-05:00   994941        0.0           0.0  
2025-01-17 10:30:00-05:00   323794        0.0           0.0  


Task 2 A:¶

Write a function called volume_analysis. This function takes data, ticker, date and interval as parameters and displays the Volume column in the data as a bar chart

In [77]:
percent_of_labels = .1

def get_bar_colors(data):
    colors = ["green"]
    for i in range(1, len(data)):
        if data["Volume"].iloc[i] >= data["Volume"].iloc[i - 1]:
            colors.append("green")
        else:
            colors.append("red")
    return colors

def get_ticker_name(ticker):
    return yf.Ticker(ticker).info.get("longName", ticker)

def volume_analysis(data:pd.DataFrame, ticker:str, date:str, interval:str):
    plt.figure(figsize=(12,6),dpi=300)
    width = 0.265 / len(data)
    plt.bar(data.index, data["Volume"], edgecolor="black",alpha=.6, color=get_bar_colors(data),width=width,label=ticker)

    plt.title(
        f"Volume Analysis For {get_ticker_name(ticker)} on {date} (Interval:{interval})",
        color="black",
        fontsize=12,
        fontweight="bold",
    )
    plt.xlabel("Time Period", fontsize=10)
    plt.ylabel("Volume", fontsize=10)
    plt.xticks(
        data.index[:: int(len(data) * percent_of_labels)],
        [
            d.strftime("%H:%M")
            for d in data.index[:: int(len(data) * percent_of_labels)]
        ],
        rotation=45,
        fontsize=10,
        ha="right",
    )
    # FORMAT Y TICKS
    plt.yticks(fontsize=10)
    plt.gca().yaxis.set_major_formatter(
        mticker.FuncFormatter(lambda x, _: f"{x/1e6:.1f}M")
    )

    legend = plt.legend(
        title="Ticker",
        facecolor="white",
        frameon=True,
        fancybox=True,
        framealpha=0.8,
        edgecolor="gray",
    )
    plt.setp(legend.get_title(), color="black")
    plt.tight_layout()
    plt.show()

Task 2 B:¶

Write code to test volume_analysis using two tickers, two dates and two intervals. First download data using a call to data_download and then use the function to plot.

In [78]:
for i, test in enumerate(TEST_CASES):
    print(f'TEST CASE {i+1}:\t{test["ticker"]}')
    data = data_download(test["ticker"], test["date"], test["interval"])
    volume_analysis(data, test["ticker"], test["date"], test["interval"])
TEST CASE 1:	AAPL
No description has been provided for this image
TEST CASE 2:	GOOG
No description has been provided for this image

Task 3 A:¶

Write a function called price_analysis which takes data, ticker, date and interval as parameters. It calculates the mean of the prices (Close, High, Low, and Open) for each row and displays that as a line chart. It also displays Close price as a separate line in the same graph. This function will plot two lines in a single graph.

In [79]:
def price_analysis(data,ticker,date,interval):
    plt.figure(figsize=(12, 6), dpi=300)
    mean_price = data[["Open","High","Low","Close"]].mean(axis=1)
    plt.plot(data.index,data["Close"],label="Closing Price",alpha=.6,color="green")
    plt.plot(
        mean_price.index,
        mean_price,
        color="red",
        label="Mean Price(Close, High, Low, and Open)",
        linestyle="dashed",
        linewidth=1.5,
        alpha=0.9,
    )

    plt.title(
        f"Price Analysis For {get_ticker_name(ticker)} on {date} (Interval:{interval})",
        color="black",
        fontsize=12,
        fontweight="bold",
    )

    plt.xlabel("Time Period", fontsize=10)
    plt.ylabel(f"Price", fontsize=10)
    plt.xticks(
        data.index[:: int(len(data) * percent_of_labels)],
        [
            d.strftime("%H:%M")
            for d in data.index[:: int(len(data) * percent_of_labels)]
        ],
        rotation=45,
        fontsize=10,
        ha="right",
    )
    # FORMAT Y TICKS
    plt.yticks(fontsize=10)

    plt.legend(
        facecolor="white",
        frameon=True,
        fancybox=True,
        framealpha=0.8,
        edgecolor="gray")
    
    plt.show()

    return None

Task 3 B:¶

Write code to test price_analysis using two tickers, two dates and two intervals. First download data using a call to data_download and then use the function to plot.

In [80]:
for i, test in enumerate(TEST_CASES):
    print(f'TEST CASE {i+1}:\t{test["ticker"]}')
    data = data_download(test["ticker"], test["date"], test["interval"])
    price_analysis(data, test["ticker"], test["date"], test["interval"])
TEST CASE 1:	AAPL
No description has been provided for this image
TEST CASE 2:	GOOG
No description has been provided for this image

Task 4 A:¶

Write a function called violin_plots which takes data, ticker, date and interval as parameters. It draws a violin plot for each of the following columns in your data: Close, Open, Low, High

In [81]:
def violin_plots(data,ticker,date,interval):
    df = data[["Open", "High", "Low", "Close"]]
    plt.figure(figsize=(10, 6),dpi=300)
    sns.set_theme(style="whitegrid")
    sns.violinplot(df)
    plt.xlabel("Parameters", fontsize=8)
    plt.ylabel("Price",fontsize=8)
    plt.title(
        f"Violin Plot For {get_ticker_name(ticker)} on {date} (Interval:{interval})",
        color="black",
        fontsize=12,
        fontweight="bold",
    )
    plt.show()
    return None

Task 4 B:¶

Write code to test voilen_plots using two tickers, two dates and two intervals. First download data using a call to data_download and then use the function to plot

In [82]:
for i, test in enumerate(TEST_CASES):
    print(f'TEST CASE {i+1}:\t{test["ticker"]}')
    data = data_download(test["ticker"], test["date"], test["interval"])
    violin_plots(data, test["ticker"], test["date"], test["interval"])
TEST CASE 1:	AAPL
No description has been provided for this image
TEST CASE 2:	GOOG
No description has been provided for this image