Programming in Python
Chapter Five: Introduction to Data Visualization & Practical Programs
Comprehensive study guide exploring data visualization (Matplotlib, Seaborn, Plotly) and a full catalogue of 18 practical Python programs.
1. Introduction to Data Visualization
5.1 What is Data Visualization?
Data visualization means showing data using visual elements like charts, graphs, or maps to understand it better. Instead of reading through endless rows of numbers, a visual graph helps our brains instantly see patterns, trends, and connections in a simple and intuitive way.
Importance of Data Visualization
5.2 Popular Python Libraries for Visualization
Python is famous for its data science capabilities, and it has many useful libraries designed specifically to make charts. Each library has its own unique features:
5.3 Deep Dive: Matplotlib
Matplotlib is the foundational Python plotting library. It uses a “low-level” interface, which means it offers you a massive amount of freedom and customization, but it might require you to write a bit more code compared to newer libraries.
pip install matplotlib or conda install matplotlib.import matplotlib.pyplot as plt1. Scatter Plot
Scatter plots use individual dots to represent values for two different numeric variables. They are perfect for observing relationships or correlations between data points. We use the plt.scatter() method.
import pandas as pd
import matplotlib.pyplot as plt
# Reading the dataset
dataset = pd.read_csv("Stu_data.csv")
# Plotting the scatter chart
plt.scatter(dataset['Name'], dataset['Marks'])
plt.title("Scatter Plot")
plt.xlabel('Name')
plt.ylabel('Marks')
plt.show()
2. Bar Chart
A bar chart represents data categories using rectangular bars. The height or length of the bar directly corresponds to the data value it represents. We use the plt.bar() method.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("tips.csv")
plt.bar(data['total_bill'], data['day'])
plt.title("Bar Chart")
plt.xlabel('Day')
plt.ylabel('Tip')
plt.show()
5.4 Deep Dive: Seaborn
Seaborn is a library designed to make beautiful, highly informative charts with minimal effort. Because it is built on top of Matplotlib, it works seamlessly with Pandas DataFrames. It is especially great for showing complex patterns through Line plots, Bar plots, and Heatmaps.
pip install seabornLine Plot
To plot a smooth line graph in Seaborn, we use the lineplot() method.
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
dataset = pd.read_csv("Stu_data.csv")
# Creating the line plot
sn.lineplot(x='Name', y='Marks', data=dataset)
plt.show()
5.5 Deep Dive: Plotly
Plotly is an open-source Python library used to make interactive charts. Unlike static images, Plotly graphs let you click, zoom, hover over data points for exact numbers, and edit charts directly in your web browser or Jupyter notebook. It is perfect for 3D charts, scientific graphs, and financial dashboards.
pip install plotly1. Interactive Scatter Plot
Plotly Express makes it easy to generate interactive scatter plots using plotly.express.scatter().
import pandas as pd
import plotly.express as px
dataset = pd.read_csv("tips.csv")
# Color='smoker' automatically color-codes the dots based on that category!
graph = px.scatter(dataset, x="total_bill", y="size", color='smoker')
graph.show()
2. Interactive Line Chart
Line plots in Plotly are highly accessible and easy to style. We use px.line().
import plotly.express as px
import pandas as pd
data = pd.read_csv("stu_data.csv")
fig = px.line(data, x='Name', y='Marks', color='Gender')
fig.show()
3. Interactive Bar Chart
With Plotly Express, you can create interactive bar charts effortlessly using the px.bar() method.
import plotly.express as px
import pandas as pd
data = pd.read_csv("Stu_data.csv")
fig = px.bar(data, x='Name', y='Marks', color='Gender')
fig.show()
Exercise 1: Choose the correct answer.
Select an option to view the correct answer and justification.
Justification: Data visualization turns raw, hard-to-read numbers into charts and graphs, making it much easier for our brains to understand patterns, trends, and context at a glance.
Justification: While Seaborn and Plotly are great for advanced or interactive charts, the text specifies that Matplotlib is the foundational library specifically suited for creating basic graphs like line charts and bar charts.
Justification: A scatter plot places individual dots on an X and Y axis to show how two different variables relate to or correlate with each other.
Justification: A bar chart uses rectangular bars where the length or height of the bar directly corresponds to the numeric value of the data category it represents.
Justification: Seaborn is explicitly built on top of Matplotlib. It simplifies the code needed to create beautiful, complex statistical charts and works seamlessly with Pandas.
Justification: In the Plotly Express module, the scatter() method (e.g.,
plotly.express.scatter()) is called to generate an interactive scatter plot.Justification: If Plotly Express is imported as
px, the function used to generate a line chart is px.line().Justification: In Matplotlib (imported as
plt), the plt.pie() function takes an array of sizes and turns them into a circular pie chart representing percentages.Justification: In the Plotly Express module (
px), the px.bar() function is explicitly used to construct interactive bar charts.Exercise 2: Write short answers to these questions.
plt.xlabel() function is used to add a descriptive text label to the horizontal X-axis of a graph, while plt.ylabel() adds a descriptive text label to the vertical Y-axis, helping viewers understand what the graph is measuring.
color argument automatically groups and color-codes the data points on the graph based on a specific category or column in your dataset (e.g., color='Gender' will automatically color male and female data points differently and generate a legend).
Exercise 3: Long Answer Questions.
In the process of data analysis, analysts must comb through thousands or even millions of rows of raw data. Human brains are not naturally equipped to find mathematical patterns in giant spreadsheets. Data visualization acts as a translation tool, converting complex numerical datasets into visual context (shapes, colors, lines). Its importance lies in three areas:
Matplotlib is a foundational, low-level plotting library.
Plotly is a modern, high-level, open-source library.
# 1. Import the library
import matplotlib.pyplot as plt
# Example Data
months = ['Jan', 'Feb', 'Mar']
sales = [150, 200, 180]
# --- A. Line Chart ---
# Uses plt.plot() to draw a line connecting the data points
plt.plot(months, sales)
plt.title("Monthly Sales Trend")
plt.show()
# --- B. Bar Graph ---
# Uses plt.bar() to create rectangular bars for categorical comparison
plt.bar(months, sales)
plt.title("Monthly Sales Comparison")
plt.show()
# --- C. Pie Chart ---
# Uses plt.pie() to show parts of a whole (percentages)
# 'labels' assigns names to the slices, 'autopct' displays the percentages
plt.pie(sales, labels=months, autopct='%1.1f%%')
plt.title("Sales Distribution")
plt.show()
To accomplish this, we first import pandas and plotly.express. We then read the dataset into a DataFrame and pass that DataFrame into the specific Plotly Express functions (px.line, px.bar, px.pie).
import pandas as pd
import plotly.express as px
# Step 1: Assume the data is loaded into a Pandas DataFrame
# (In a real scenario, this would be: df = pd.read_csv("sales.csv"))
data = {
'Month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar'],
'Category': ['Tech', 'Tech', 'Tech', 'Home', 'Home', 'Home'],
'Sales': [5000, 6000, 5500, 3000, 3200, 3100]
}
df = pd.DataFrame(data)
# a) Line Chart showing total sales trend over the months
# 'color' splits the lines by category
fig_line = px.line(df, x='Month', y='Sales', color='Category', title="Sales Trend")
fig_line.show()
# b) Bar Graph comparing total sales for each product category
fig_bar = px.bar(df, x='Category', y='Sales', title="Category Sales Comparison")
fig_bar.show()
# c) Pie Plot showing the percentage contribution of each category
fig_pie = px.pie(df, names='Category', values='Sales', title="Category Contribution")
fig_pie.show()
Part 3: Python Practical Questions (18 Programs)
a = float(input("Enter first number: "))
b = float(input("Enter second number: "))
print("Sum:", a + b)
print("Difference:", a - b)
print("Product:", a * b)
print("Quotient:", a / b)
name = input("Enter your name: ")
print("Hello there,", name, "! Welcome to Python.")
num = int(input("Enter an integer: "))
if num > 0:
print("The number is positive.")
elif num < 0:
print("The number is negative.")
else:
print("The number is exactly zero.")
x = int(input("Enter X: "))
y = int(input("Enter Y: "))
# Python allows elegant simultaneous swapping!
x, y = y, x
print("After swapping: X =", x, "and Y =", y)
sentence = input("Enter a sentence: ").lower()
vowel_count = 0
for char in sentence:
if char in 'aeiou':
vowel_count += 1
print("Number of vowels:", vowel_count)
file = open("data.txt", "w")
file.write("Hello, this is a test file.\n")
file.close()
print("File created successfully.")
file = open("data.txt", "a") # Open in Append mode
file.write("This is an appended line.\n")
file.close()
print("Line appended successfully.")
file = open("data.txt", "r")
content = file.read()
print("--- File Contents ---")
print(content)
file.close()
import csv
data = [
["Name", "Marks"],
["Aarosh", 85],
["Subigya", 92],
["Binay", 78]
]
with open("students.csv", "w", newline='') as file:
writer = csv.writer(file)
writer.writerows(data)
print("students.csv created successfully.")
import csv
with open("students.csv", "r") as file:
reader = csv.reader(file)
for row in reader:
print(row)
import pandas as pd
# Using the file we created in question 9
df = pd.read_csv("students.csv")
print(df.head()) # .head() prints the first 5 rows by default
import pandas as pd
my_dict = {
'Item': ['Laptop', 'Mouse', 'Keyboard'],
'Stock': [15, 100, 45]
}
df = pd.DataFrame(my_dict)
df.to_csv("inventory.csv", index=False)
print("Dictionary saved to inventory.csv")
import pandas as pd
import matplotlib.pyplot as plt
# Using the dictionary from Q12 as a mock dataset
data = {'Product': ['Laptop', 'Mouse', 'Keyboard'], 'Sales': [5000, 1500, 2500]}
df = pd.DataFrame(data)
plt.pie(df['Sales'], labels=df['Product'], autopct='%1.1f%%')
plt.title("Product Sales Distribution")
plt.show()
def get_sum(num1, num2):
return num1 + num2
print("Sum is:", get_sum(10, 20))
import math
def circle_area(radius):
return math.pi * (radius ** 2)
r = float(input("Enter radius: "))
print("Area of circle:", circle_area(r))
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
num = int(input("Enter a number to check if prime: "))
if is_prime(num):
print("It is a prime number.")
else:
print("It is not a prime number.")
def find_max(number_list):
# Python has a built-in max() function!
return max(number_list)
my_list = [4, 99, 23, 1, 85]
print("The maximum number is:", find_max(my_list))
try:
file = open("missing_file.txt", "r")
print(file.read())
file.close()
except FileNotFoundError:
print("Error: The file you are looking for does not exist on this computer!")
📚 Also Read: Class 10 SEE Notes
Computer Science Units
Other Subjects
