Python Basics: Essential Coding Skills for Data Science

Python Basics: Essential Coding Skills for Data Science

·

7 min read

Importance of Python in Data Science

Python programming language has become a cornerstone in the realm of data science. Its simplicity and extensive library ecosystem make it an ideal choice for data analysis and machine learning. Python's popularity among data scientists is a testament to its efficacy.

In this article, we'll explore everything from syntax, and Python basics coding to its versatile data structures and powerful libraries.

Why Learn Python Basics for Data Science

Understanding Python programming is the first step towards mastering data science. It provides a strong foundation for more advanced topics and techniques in data analysis and machine learning.

The Stack Overflow Developer Survey provides comprehensive insights into the most commonly used programming languages among developers globally, including those working in data science

StackOverflow SURvey insights of most commonly used programming languages

Kaggle, a platform for data science and machine learning, also conducts an annual survey among its users, including data scientists. The Kaggle Machine Learning & Data Science Survey provides information about the tools and languages most frequently used in data science.

A graphic illustrating the importance of Python in data science based on KAGGLE Data 2022

Understanding Python Basics

Python Syntax and Indentation

Python's syntax is clean and easy to understand. Unlike other languages, Python uses indentation to define blocks of code. This makes the code more readable and organized.

print("Hello, world!")
Hello, World!

And here's an example of a Python if-else statement:

Variables and Data Types in Python

Python supports various data types, including integers, floats, strings, and booleans. Variables in Python are dynamically typed, meaning you can change a variable's type after it has been set.

x = 10  # x is an integer
print(x)

x = "Hello"  # x is now a string
print(x)
Positive number

Python Operators

Python includes a variety of operators for performing operations on values, including arithmetic, comparison, assignment, logical, and bitwise operators.

# Arithmetic operators
x = 10 + 2  # addition
y = 10 - 2  # subtraction
z = 10 * 2  # multiplication
a = 10 / 2  # division

# Comparison operators
b = (x == y)  # equality
c = (x != y)  # inequality

# Logical operators
d = b and c  # logical AND
e = b or c   # logical OR
# Print all variables
print("x:", x)
print("y:", y)
print("z:", z)
print("a:", a)
print("b:", b)
print("c:", c)
print("d:", d)
print("e:", e)
x: 12
y: 8
z: 20
a: 5.0
b: False
c: True
d: False
e: True

Control Flow in Python

Control flow statements in Python, such as if, for, and while, allow the program to execute different blocks of code based on certain conditions.

for i in range(10):
    print(i)
0
1
2
3
4
5
6
7
8
9

And here's a Python while loop that prints the numbers 0 through 9:

i = 0
while i < 10:
    print(i)
    i += 1
0
1
2
3
4
5
6
7
8
9

Python Data Structures

Lists in Python

Lists are one of the most commonly used data structures in Python. They are ordered, mutable, and can contain a mix of different data types.

my_list = [1, 2, "Python", 4.5, True]
print(my_list)
[1, 2, 'Python', 4.5, True]

Tuples in Python

Tuples are similar to lists but are immutable. This means that once a tuple is created, it cannot be changed.

my_tuple = (1, 2, "Python", 4.5, True)
print(my_tuple)
(1, 2, 'Python', 4.5, True)

Dictionaries in Python

Dictionaries in Python are unordered collections of key-value pairs. They are mutable and indexed by their keys, making them ideal for data retrieval.

my_dict = {"name": "John", "age": 30, "city": "New York"}
print(my_dict)
{'name': 'John', 'age': 30, 'city': 'New York'}

Sets in Python

Sets are unordered collections of unique elements. They help remove duplicates from a collection and perform mathematical set operations.

my_set = {1, 2, 3, 4, 5, 5, 5}
print(my_set)  # Outputs: {1, 2, 3, 4, 5}
{1, 2, 3, 4, 5}

Python Functions and Modules

Defining and Calling Functions in Python

Functions in Python are blocks of reusable code that perform a specific task. They can be defined using the def keyword and called by their name.

def greet(name):
    print(f"Hello, {name}!")

greet("John")  # Outputs: Hello, John!
Hello, John!

Python Modules and Import Statement

Python modules contain Python code, such as functions, classes, or variables. They can be imported into other Python scripts using the import statement.

Here's an example of importing the math module in Python:

import math
print(math.sqrt(16))  # Outputs: 4.0
4.0

Python for Data Analysis

Introduction to Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures and functions needed to manipulate structured data.

Here's an example of creating a DataFrame in Pandas:

import pandas as pd

data = {
    "Name": ["John", "Anna", "Peter"],
    "Age": [28, 24, 35],
}

df = pd.DataFrame(data)

print(df)
    Name  Age
0   John   28
1   Anna   24
2  Peter   35

Data Manipulation with Pandas

Pandas provide various functions for data manipulation, including filtering, grouping, merging, reshaping, and aggregating data.

Here's an example of filtering data in Pandas:

filtered_df = df[df["Age"] > 25]
print(filtered_df)
   Name  Age
0   John   28
2  Peter   35

Data Visualization with Matplotlib and Seaborn

Matplotlib and Seaborn are two popular Python libraries for data visualization. They provide functions to create a variety of plots and charts to visualize data.

Here's an example of creating a bar plot with Matplotlib:

import matplotlib.pyplot as plt

names = ["John", "Anna", "Peter"]
ages = [28, 24, 35]

plt.bar(names, ages)
plt.show()

Python for Machine Learning

Introduction to Scikit-Learn

Scikit-Learn is a popular Python library for machine learning. It provides simple and efficient tools for data mining and data analysis.

Here's an example of using Scikit-Learn to create a simple linear regression model:

from sklearn.linear_model import LinearRegression

X = [[0], [1], [2]]  # feature
y = [0, 1, 2]  # target

model = LinearRegression()
model.fit(X, y)

prediction = model.predict([[3]])
print(prediction)  # Outputs: [3.]
[3.]

This code trains a linear regression model using the numbers 0, 1, and 2. The model learns the pattern in this data. Then, it uses what it learned to guess what the output should be when the input is 3. As expected, it predicts the output to be 3.

Building a Simple Machine Learning Model in Python

With Scikit-Learn, you can build a simple machine-learning model in Python. This involves selecting a model, fitting it to the data, making predictions, and evaluating the model's performance.

Here's an example of building a simple logistic regression model with Scikit-Learn:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Scale the data
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a model with increased max_iter
model = LogisticRegression(max_iter=1000)

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, predictions)
print(f"Model accuracy: {accuracy}")
Model accuracy: 1.0

The model's accuracy score of 1.0 means it predicted the type of iris correctly for every flower in the test dataset.

Conclusion

The Future of Python in Data Science

Python's popularity in data science shows no signs of waning. Its ease of use, versatility, and robust community support make it an ideal language for data science. As more tools and libraries are developed, Python's role in data science will only continue to grow. For more information on the future of Python in data science, refer to this article.

Next Steps in Mastering Python for Data Science

After mastering the basics of Python for data science, the next steps involve delving deeper into data analysis and machine learning techniques, as well as learning more advanced Python libraries and tools.

Did you find this article valuable?

Support Subhro Kr by becoming a sponsor. Any amount is appreciated!