Skip to content

pagebase/Data-Science-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Table of content


numpy

NumPy (short for Numerical Python) is the fundamental package for scientific computing in Python. It's a powerful, open-source library that provides support for large, multi-dimensional arrays and matrices.

Syntax:

numpy.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

Parameters

1. object (Required)

This is the data you want to put into the array. It's the only parameter that is absolutely required.

  • What it is: An "array-like" object.

  • Common examples:

    - A Python list: [1, 2, 3]

    - A list of lists (for a 2D array): [[1, 2], [3, 4]]

    - A tuple: (1, 2, 3)

    - Another NumPy array.

import numpy as np
# A 1D array from a list
a = np.array([1, 2, 3])
# A 2D array from a list of lists
b = np.array([[1.5, 2.2], [3.1, 4.9]])

2. dtype (Optional)

This parameter specifies the data type of the elements in the array.

  • Default: None. When it's None, NumPy analyzes your object data and chooses the most appropriate type automatically. (e.g., [1, 2, 3] becomes int64, [1.0, 2, 3] becomes float64).

  • Why use it?

    1. To force a type: You can force integers to be floats.

    2. To save memory: If you know your numbers are small (e.g., 0-255), you can use np.uint8 (8-bit integer) instead of the default np.int64 (64-bit integer), which uses 8x less memory.

# NumPy auto-detects 'int'

arr_int = np.array([1, 2, 3])

print(f"Default Dtype: {arr_int.dtype}")

# Output: Default Dtype: int64

  

# ---

# 1. Force integers to be floats

arr_float = np.array([1, 2, 3], dtype=np.float64)

print(f"Forced Dtype: {arr_float.dtype}")

print(f"Array values: {arr_float}")

# Output: Forced Dtype: float64

# Output: Array values: [1. 2. 3.]

  

# ---

# 2. Save memory (using an 8-bit integer)

arr_small = np.array([10, 20, 255], dtype=np.uint8)

print(f"Small Dtype: {arr_small.dtype}")

# Output: Small Dtype: uint8

3. copy (Optional)

This boolean parameter controls whether the function must create a new copy of the data in memory.

  • Default: True. This is the safest option. It guarantees that np.array() creates a brand new array in a new location in memory. Modifying the new array will never affect the original object.

  • copy=False: This is a performance optimization. It tells NumPy: "If possible, avoid making a new copy."

    - When is it possible? If the object you pass in is already a NumPy array and its dtype and order match what you're asking for. In this case, NumPy will just return a view (a reference) to the original array.

    - Warning: If you use copy=False and get a view, changing the new array will change the original array.

original_arr = np.array([10, 20, 30])

  

# ---

# Default behavior (copy=True)

arr_copy = np.array(original_arr, copy=True)

arr_copy[0] = 99

print(f"With copy=True: {original_arr}") # Original is unchanged

# Output: With copy=True: [10 20 30]

  

# ---

# Optimized behavior (copy=False)

arr_view = np.array(original_arr, copy=False)

arr_view[0] = 99 # This time, the original IS changed

print(f"With copy=False: {original_arr}")

# Output: With copy=False: [99 20 30]

4. order (Optional)

This parameter specifies how multi-dimensional arrays are laid out in your computer's memory.

Who is this for? This is an advanced optimization. 99% of users can ignore this parameter and stick with the default. It's mainly for performance tuning or interfacing with code from other languages (like Fortran).

  • Default: 'K' (which means "Keep"). It tries to match the memory layout of the input object if possible. If you pass a Python list, it defaults to 'C'.

  • 'C': (C-style, row-major). This is the standard in Python. It means elements of a row are stored next to each other in memory.

    - [[1, 2], [3, 4]] is stored as: 1, 2, 3, 4

  • 'F': (Fortran-style, column-major). This means elements of a column are stored next to each other in memory.

    - [[1, 2], [3, 4]] is stored as: 1, 3, 2, 4

# Create a 2D array

data = [[1, 2], [3, 4]]

  

# C-style (row-major), the default for Python lists

arr_c = np.array(data, order='C')

  

# F-style (column-major)

arr_f = np.array(data, order='F')

  

# You can check the flags

print(f"C-style contiguous: {arr_c.flags['C_CONTIGUOUS']}") # True

print(f"F-style contiguous: {arr_f.flags['F_CONTIGUOUS']}") # True

5. subok (Optional)

This boolean parameter controls whether the new array can be a sub-class of a NumPy array.

Who is this for? This is an advanced feature. If you don't know what a sub-class is (like np.matrix), you should always leave this as False.

  • Default: False. This is highly recommended. It guarantees the object you get back is a base np.ndarray. This makes code predictable.

  • subok=True: If the object you pass in is a sub-class (e.g., a np.matrix), the function will create a new array of that same sub-class type.

# np.matrix is a sub-class of np.ndarray

my_matrix = np.matrix([[1, 2], [3, 4]])

  

# ---

# Default behavior (subok=False)

# The function "flattens" the sub-class into a base ndarray

arr_base = np.array(my_matrix, subok=False)

print(f"With subok=False: {type(arr_base)}")

# Output: With subok=False: <class 'numpy.ndarray'>

  

# ---

# With subok=True

# The function preserves the sub-class type

arr_sub = np.array(my_matrix, subok=True)

print(f"With subok=True: {type(arr_sub)}")

# Output: With subok=True: <class 'numpy.matrix'>

6. ndmin (Optional)

This parameter specifies the minimum number of dimensions the resulting array should have.

  • Default: 0 (which means NumPy will just use the dimensions from your object).

  • Why use it? It's a convenient way to ensure your array is at least 1D, 2D, etc. It "wraps" the array in new dimensions (of size 1) from the outside until the minimum is met.

# A simple 1D array

my_data = [1, 2, 3]

  

# ---

# Default (ndmin=0)

arr_1d = np.array(my_data)

print(f"Shape: {arr_1d.shape}") # Shape: (3,)

print(arr_1d)

# [1 2 3]

  

# ---

# Force minimum 2 dimensions

arr_2d = np.array(my_data, ndmin=2)

print(f"Shape: {arr_2d.shape}") # Shape: (1, 3)

print(arr_2d)

# [[1 2 3]]

  

# ---

# Force minimum 3 dimensions

arr_3d = np.array(my_data, ndmin=3)

print(f"Shape: {arr_3d.shape}") # Shape: (1, 1, 3)

print(arr_3d)

# [[[1 2 3]]]

What is dimension?

The dimension of an array, also called its dimensionality or rank, is the number of indices (or subscripts) required to uniquely identify a single element within that array.

|Dimension|Analogy|Description|Example Access| |---|---|---|---| |1-D|A line or a list|Requires one index to locate an element.|array[i]| |2-D|A table or a grid (like a matrix)|Requires two indices (row and column) to locate an element.|array[row][col] or array[i, j]| |3-D|A cube or a stack of tables|Requires three indices (depth/layer, row, and column) to locate an element.|array[depth][row][col] or array[i, j, k]|

Top 10 numpy array attributes

  1. .dtype:
    • Description: The data type of the elements in the array (e.g., int32, float64, bool). All elements in a NumPy array must have the same type.

    Example:

    import numpy as np

arr=np.array([1,2,3,4,5]) print(arr.dtype) # Output: int64 2. `.shape`: - - **Description:** A **tuple** that indicates the size of the array along each dimension. This defines the array's geometry. - **Example:** For a $3 \times 4$ matrix, `.shape` is $(3, 4)$. #### Example:python import numpy as np arr=np.array([     [1,2,3,4],     [5,6,7,8] ]) print(arr.shape) # Output: (2,4) ```

  1. .size:
    • Description: The total number of elements in the array. This is equal to the product of the elements in the .shape tuple.

    Example:

    import numpy as np

arr=np.array([     [1,2,3,4],     [5,6,7,8] ]) print(arr.size) # Output: 8 ```

  1. .ndim:
    • Description: The number of dimensions (or axes) of the array, which is also the length of the .shape tuple.

    Example:

    import numpy as np

arr=np.array([     [1,2,3,4],     [5,6,7,8] ]) print(arr.ndim) # Output: 2 ```

  1. .itemsize:
    • Description: The size in bytes of a single element in the array.
    • Example: If the .dtype is float64, .itemsize is 8 (since 64 bits = 8 bytes).

    Example:

    import numpy as np

arr=np.array([     [1,2,3,4],     [5,6,7,8] ]) print(arr.itemsize) # Output: 8 ```

  1. .nbytes:
    • Description: The total bytes consumed by the array's data. It's calculated as .size * .itemsize.

    Example:

    import numpy as np

arr=np.array([     [1,2,3,4],     [5,6,7,8] ]) print(arr.nbytes) # Output: 64 ```

  1. .T:
  • Description: The transposed view of the array. For a 2D array, rows become columns and columns become rows. It returns a view, not a copy.

Example:

import numpy as np
arr=np.array([
    [1,2,3,4],
    [5,6,7,8]
])
print(arr.T) # Output: 64
  1. .data:
  • Description: An object representing the actual memory buffer containing the array's elements. You rarely interact with this directly.

Example:

import numpy as np
arr=np.array([
    [1,2,3,4],
    [5,6,7,8]
])
print(arr.data) # Output: <memory at 0x000002486C13A0C0>
  1. .flat:
    • An iterator that allows you to loop through the array as if it were a single 1D array, regardless of its original shape.

    Example:

    import numpy as np
    
    

Create a 2D array

arr_2d = np.array([[1, 2, 3],

                   [4, 5, 6]])

Access elements using flat iterator

print("Iterating through flat:")

for element in arr_2d.flat:

    print(element) ```

Output:

1
2
3
4
5
6

Converting to Standard Python Data Structures

  1. .tolist(): Converts the array (including multi-dimensional arrays) into a standard nested Python list. This is the most common conversion for interoperability.

    Example:

    import numpy as np
    
    

Create a 2D array

arr_2d = np.array([[1, 2, 3],

                   [4, 5, 6]])

lst=arr_2d.tolist()

print(lst)

print(type(lst)) #### Output:css [[1, 2, 3], [4, 5, 6]] <class 'list'> ```


Data types

  1. Numerical Types:
  • Integers:

    • Signed Integers: int8int16int32int64. These can store both positive and negative whole numbers, with the number indicating the bit size (e.g., int8 stores values from -128 to 127).

    • Unsigned Integers: uint8uint16uint32uint64. These store only non-negative whole numbers, offering a larger positive range for the same bit size (e.g., uint8 stores values from 0 to 255).

  • Floating-Point Numbers: 

    float16float32float64float128. These represent real numbers with decimal points, with the number indicating the precision. float64 is the default.

  • Complex Numbers: 

    complex64complex128. These store complex numbers, composed of a real and an imaginary part, with the number indicating the precision of each part.

  1. Other Data Types:
  • Boolean: bool. Stores True or False values.

  • Timedelta: timedelta. Represents differences in time.

  • Datetime: datetime. Represents specific points in time.

  • Object: object. Stores arbitrary Python objects.

  • String: S or a (byte-string), U (Unicode). Stores sequences of characters.

  • Void: V. Represents raw data (useful for structured arrays).

How to convert numpy array in other data types?

Using .astype("float").

Common types:

  • int32
  • float64
  • bool_

Basic array creation function

  1. np.asarray(): Convert list, tuple into numpy array.

Syntax:

numpy.asarray(a, dtype=None, order=None, *, like=None)
Parameter Type Description
a array_like The input data you want to convert to an array. This can be a Python list, tuple, list of lists, or an existing NumPy array. This is the only required parameter.
dtype data-type, optional The desired data type for the elements of the new array. If not specified (None), the data type is inferred from the input data. Common examples: np.float64, np.int32, np.bool_.
order {'C', 'F', 'A', 'K'}, optional Specifies the memory layout of the array.
* 'C': Row-major (C-style) order (default).
* 'F': Column-major (Fortran-style) order.
* 'A': Fortran if input is Fortran, else C.
* 'K': Preserve storage order of the input (default if a is an array).
like array_like, optional Reference object to allow creation of arrays not stored in NumPy's standard memory (e.g., CuPy, if supported). If None, a standard NumPy array is created. (Used for array interoperability protocols).

Example:

import numpy as np

  

list1=[1,2,3,4,5,6,7]

tuple1=(8,9,10)

arr1=np.asarray(list1)

arr2=np.asarray(tuple1)

print(type(arr1)) # Output: <class 'numpy.ndarray'>

print(type(arr2)) # Output: <class 'numpy.ndarray'>
  1. np.arange(): The numpy.arange() function in Python's NumPy library creates arrays with evenly spaced values within a specified interval. It is analogous to Python's built-in range() function but returns a NumPy array instead of a list.

Syntax:

numpy.arange(start=0, stop, step=1, dtype=None)

Example:

import numpy as np

  

arr1=np.arange(100)

print(arr1)

# Output:

#[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

# 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

# 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

# 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

# 96 97 98 99]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published