Keras Cheatsheet — The one stop shop

11 min readDec 1, 2023

Keras is usually the first framework one learns while starting to get into Deep Learning. It was mine too. But I tend to forget the syntaxes very often even if I stop working on it for a month. (Probably because ML/DL space has so many things to remember… 🙈huh). But I thought of creating this cheatsheet which I can come back to every once in a while. So this might help both professionals who might have got a bit rusty or newbies who want to have a quick walkthrough of the framework. I am not going to dive deep into the theoretical concepts as the purpose of the blogs is not that.

Some sources that I went through and recommend are —

a. Krish Naik’s DL Playlist

b. Andrew Ng’s Deep Learning Specialization course

Once done, you would want to know the code where the blog comes into place. I will cover the following —
1. Creating a Simple Artificial Neural Network
2. Creating a Convolutional Neural Network
3. Few important layers (apart from ones already used)
4. Few common Loss Functions and metrics
5. Using Pretrained model
6. Data Loading
7. Callbacks
8. Custom layer and model
9. Using Multi-GPU for training

1. Creating a Simple Artificial Neural Network

1.a Preparing the Dataset

#importing libraries
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_classification

#using sklearn's make_classification for creating random datasets for ANN
train_samples, train_labels = make_classification(n_samples = 1050, n_features = 1,
                    n_informative=1, n_redundant=0, n_repeated=0, n_classes=2,
                    n_clusters_per_class=1)
#using sklearn's minmaxscaler to normalize the datasets 
scaler = MinMaxScaler()
scaled_train_samples = scaler.fit_transform(train_samples)

b Actual Model Training


#importing libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential, load_model
#Dense implements the operation:output = activation(dot(input, kernel) + bias)
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
# Computes the crossentropy loss between the labels and predictions.
from tensorflow.keras.metrics import categorical_crossentropy

There are two ways to build a model in Keras — Sequential and Functional. For ANN, we will look at Sequential API. This allows you to create models layer-by-layer by stacking them. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs. We will discuss on other options for layers, optimizers and metrics later.

#Creating model structure using Sequential API - no need for input
model = Sequential([
    Dense(units = 16, input_shape = (1, ), activation = 'relu'),
    Dense(units = 32, input_shape = (1, ), activation = 'relu'),
    Dense(units = 2, activation = 'softmax')
])
#Shows layers of model along with num of parameters
model.summary()
#Compile defines the loss function, the optimizer and the metrics. 
#You need a compiled model to train but not for predicting.
model.compile(optimizer = Adam(learning_rate = 0.001),
              loss = 'sparse_categorical_crossentropy',
              metrics = ["accuracy"]
              )
#Train the model which splits the data for val set and then shuffles train
# default epochs = 1
model.fit(x = scaled_train_samples, y = train_labels, batch_size = 16,
          epochs = 10, shuffle = True, verbose = 2, validation_split = 0.1)
#Predict using the model
preds = model.predict(x = scaled_train_samples, batch_size = 64, verbose = 0)
#saves architecture, weights, configs(loss, optim)
model.save('models.h5') 
#to load the saved model
new_model = load_model('models.h5')
#model.to_json() - only saves architecture
#model.save_weights() - only saves weights

# evaluate the model on a dataset it has not seen yet
model.evaluate(X_test, y_test)

We can also set class_weight or sample_weight as a parameter in model.fit(). The method returns a object containing the training parameters (history.params), the list of epochs it went through
(history.epoch ), and most importantly a dictionary (history.history) containing the loss and extra metrics it measured at the end of each epoch on the training set and on the validation set (if any). We can plot it as shown below —

pd.DataFrame(history.history).plot(figsize = (8, 5))
plt.grid(True)
plt.gca.set_ylim(0, 1)
plt.show()

2. Creating a Convolutional Neural Network

#import layers used in a CNN - Conv, Pool, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Input
#CNN using Functional API - input layer is required unline Sequential
input = Input(shape=(28, 28, 1), name="img")
x = Conv2D(filters = 32, kernel_size = 3, activation="relu")(input)
x = MaxPooling2D(pool_size = 2)(x)
x = Conv2D(64, 3, activation="relu")(x)
x = MaxPooling2D(2)(x)
#Flatten the input. Typically used after the Conv-Pool layers before Dense
x = Flatten()(x)
output = Dense(3, activation = 'softmax')(x)
model = keras.Model(input, output, name="functional")
#Code for compiling, fitting, prediction and saving is same for all models

Functional API provides a more flexibility as you can easily defines models where layers connect to more than just the previous and next layers, and you can connect layers to any other layers. As a result, you can create complex network such as Residual Network.

3. Few important layers (apart from ones already used)

Embedding(1000, 128, input_length = 10) — Turns positive integers (indexes) into dense vectors of fixed size. e.g. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] where [4] is embedded to [0.25, 0.1]. Typically used with LSTMs when the model only understand numbers, but the inputs are not converted to meaningful numerical representations. This layer will learn along with other layers and learn the meaningful representations (128 shaped embeddings) for each input of the sequence.
Dot(axes=1)([x1, x2]) — Layer that computes a dot product between samples in two tensors. Each entry i will be the dot product between a[i] and b[i]. Typically used between embeddings to get similarity (dot product is a faster/vectorized version of cosine_similarity when the embeddings are unit normalized)
Concatenate()([x1, x2]) — Layer that concatenates a list of inputs. Suppose we have to concatenate 2 inputs — image embeddings(128) and text embeddings(128). This concatenate the embeddings side by side and give a 256 shaped embeddings
BatchNormalization() — Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Helps in fast convergence, regularization
Dropout(rate = .2, input_shape=(2,)) —Dropout layer randomly sets input units to 0 with a frequency of rate. Helps in regularization
LSTM(128, return_sequences=True) — Based on Long Short-Term Memory explained here. Typically used after the Embedding layer. If multiple LSTM layers are stacked use return_sequences=True
attention_output = MultiHeadAttention(num_heads=2, key_dim=2) — Implementation of multi-headed attention as described in the paper “Attention is all you Need” explained here. In some cases, you will not be able to use BERT directly like when you want to do a sequence classification task on a non-word sequence (like sequence of product_ids), you might have to build a light-weight model of your own using this. example

4. Few common Loss Functions and metrics

A loss function is implemented during training to optimize a learning function. It is not a judge of overall performance. A Evaluation Metric is used after training to measure overall performance. In keras, we have to input both in the compile function shown above. Knowing the below mentioned metrics and losses should be sufficient (as per my knowledge).

5. Using Pretrained model

A pretrained model is commonly used to save on training costs/model development time as getting the accuracy of the models is relatively hard and time-cosuming. Pretrained models like CLIP, BERT, ResNet, even LLMs are trained on huge amount of data. Keras has a huge library of pretrained models which can be directly used without any hassle. These also come with a preprocess_input function generally which transforms the input that the model requires like shape of input image. While loading weights, include top = False will remove the last Dense layer with 100 neurons so that a layer with the required number of output classes can be added (which is usually done when we finetune pretrained model).

# Pretrained models are generally present in applications
from tf.keras.applications.vgg16 import VGG16, preprocess_input
# load the model weights trained on imagenet 
vgg_model = VGG16(weights="imagenet",include_top=False)
# Functional API
inputs = keras.Input(shape=(224, 224, 3))
x = vgg_model(inputs, training=False)
# Adding a layer at the end for binary classification (1 neuron)
outputs = Dense(1)(x)
model = keras.Model(inputs, outputs)

6. Data Loading

This is probably the one of the most important part. Now data can be present in various different ways and we have to load it in such a way that we can put it to the model.fit() function. One of the easy case is when data/images is structured in a proper folder structure something like this — we can easily load data using keras utils — image_dataset_from_directory and use in model.fit() almost directly. (This is almost never the case in real-life problems)

# loading keras utils
from tensorflow.keras.utils import image_dataset_from_directory
# this creates a data loader from the mentioned folder with the classes as 
# subfolders with a batch_size = 32 and the image_sizes
train_ds = image_dataset_from_directory(
    directory='train/',
    labels='inferred',
    label_mode='categorical',
    batch_size=32,
    image_size=(256, 256))

Almost most of the time we will have to create our own data generator. Mostly all our images (image1.jpeg, image2.jpeg, … ) is one folder (say images/ )and we generally have a metadata dataframe which will have for each image, a split mapped to it. We convert it into dictionaries which can be used as shown below (We will look at the custom DataGenerator below)

import numpy as np
from keras.models import Sequential
from image_datagen import DataGenerator

# Datasets dictionary for each image
> partition
{'train': ['id-1', 'id-2', 'id-3'], 'val': ['id-4']}
# Labels dictionary
> labels
{'id-1': 0, 'id-2': 1, 'id-3': 2, 'id-4': 1}
#addtional params
params = {'dim': (224, 224, 3),
          'batch_size': 64,
          'n_classes': 6,
          'shuffle': True}
# pass partition and label dictionary with additional params
train_generator = DataGenerator(partition['train'], labels, **params)
val_generator = DataGenerator(partition['val'], labels, **params)
# Design model
model = Sequential()
[...] # Architecture
model.compile()
# Train model on dataset
model.fit(train_generator, val_generator, use_multiprocessing=True, workers=6)

Now let’s get into the main crux i.e. DataGenerator class (which is put in image_datagen.py for this example) — this inherits keras’ Sequence class (Base object for fitting to a sequence of data, such as a dataset). Every Sequence must implement the __getitem__ and the __len__ methods. The method __getitem__ should return a complete batch with batch_number as input like __getitem__(0) outputs the first batch of (X, y (as one-hot encoded)) and so on. If you want to modify your dataset between epochs, you may implement on_epoch_end which in this case shuffles the indexes of the list which will be used while creating batch. Suppose initially indexes for elements [A, B, C, D] are [0, 1, 2, 3] which after shuffling becomes [2, 1, 0, 3] and we have a batch of 2. The first batch will be [C, B] ([2, 1] being the first batch as list[2] = C)

import numpy as np
import keras
from PIL import Image

# Creating DataGenerator class inhereting Sequence class.
class DataGenerator(keras.utils.Sequence):
    def __init__(self, list_IDs, labels, batch_size=32, dim=(224, 224, 3), 
        n_classes=10, shuffle=False):
        #Constructor - initializes all hyperparameters
        self.dim = dim
        self.batch_size = batch_size
        self.labels = labels
        self.list_IDs = list_IDs
        self.n_classes = n_classes
        self.shuffle = shuffle
        self.on_epoch_end()
    def on_epoch_end(self):
        # creates indexes for each item in list_IDs and shuffles it if True
        self.indexes = np.arange(len(self.list_IDs))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)
    def __len__(self):
        # Denotes the number of batches per epoch
        return int(np.ceil(len(self.list_IDs) / self.batch_size))
    def __getitem__(self, index):
        # Generate one batch of data
        # Generate indexes of the batch given the batch_index like 0, 1, 2..
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        # Get list of IDs based on the shuffled index 
        list_IDs_temp = [self.list_IDs[k] for k in indexes]
        # Call __data_generation helper function to fetch the X, y
        X, y = self.__data_generation(list_IDs_temp)
        return X, y
    def __data_generation(self, list_IDs_temp):
        # Helper function to Generates data containing batch_size samples' 
        # X : (batch_size, *dim[0, 1], *dim[2]/channels)
        # Create an empty array
        X = np.empty((self.batch_size, *self.dim[0:2], *self.dim[2]))
        y = np.empty((self.batch_size), dtype=int)
        # Iterate over the list to load data       
        for i, ID in enumerate(list_IDs_temp):
            # load image
            im = Image.open('data/' + ID + '.jpeg').convert("RGB")
            X[i, ] = np.array(im)        
            # load the label given the ID from labels dict
            y[i] = self.labels[ID]
        # return X array and explode y to the number of classes as one-hot 
        # for 3 total classes and 1st classes - [1, 0, 0] will be output
        return X, keras.utils.to_categorical(y, num_classes=self.n_classes)

7. Callbacks

A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc). This is very important for example when we want to save the model and log the performance at each epoch to not loose on the progress made in case of some issues or change the learning rate after few epochs (which helps in faster). Following are few most important callbacks and how we add it to model.fit().

# function for LR scheduler callback to update LR
def scheduler(epoch, lr):
   # reduce LR when epoch > 10 by a factor of exp(-0.1)
   if epoch < 10:
     return lr
   else:
     return lr * tf.math.exp(-0.1)
#list of all callbacks
my_callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=2),
    tf.keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5',
        monitor = "val_loss", save_best_only = True),
    tf.keras.callbacks.CSVLogger(filename, separator=",", append=False)
    tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2,
        patience=5, min_lr=0.001)
    tf.keras.callbacks.LearningRateScheduler(scheduler) 
    #we wont use this and ReduceLROnPlateau together
]
# assign the list to callbacks parameter
model.fit(dataset, epochs=10, callbacks=my_callbacks)

8. Custom layer and model

So sometimes you will need a custom layer which cannot be fulfilled by any of the existing layers then we might have to create a custom layer perform the action required by that layer and add it as a layer to the model. This is done by creating a class (which will inherit from Layer class) with 3 functions — __init__ () , call and build . Here we build the Dense/Layer just to show an example.

# present in layers 
from tensorflow.keras.layers import Layer
# Create class Linear - inherit Layer
class Linear(Layer):
    # Constructor - Defines the layer's hyperparameters - units
    def __init__(self, units=32):
        super().__init__()
        self.units = units
    # build dynamically allocates weights and biases based on input shape
    # when the layer is first used 
    def build(self, input_shape):
        self.w = self.add_weight( shape=(input_shape[-1], self.units), 
                initializer="random_normal", trainable=True)
        self.b = self.add_weight(shape=(self.units,), 
                initializer="random_normal", trainable=True)
    # forward pass - Takes the input data and performs the core computations 
    # of the layer. Returns the transformed output data.
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

When we want to go beyond the standard sequential or functional API constructs, we create a CustomModel which I feel is more OOPs way of doing things and is clean. This is done by creating a class (which will inherit from Model class) with 2functions — __init__ () and call. Here we build the Simple ANN with 2 dense layers just to show an example .

from tensorflow.keras import Model
# Create class CustomModel - inherit Model
class CustomModel(Model):
    # Constructor - Define model hyperparameters like input/output dimensions,
    # number of layers, activation functions
    def __init__(self, num_classes=1000):
        super().__init__()
        self.block_1 = Dense(units = 32)
        self.classifier = Dense(num_classes)
    # forward pass - Takes the input data as argument. Performs the 
# computations defined by the model's architecture. Applies the operations of 
# each layer sequentially on the input. Returns the model's output
    def call(self, inputs):
        x = self.block_1(inputs)
        return self.classifier(x)

model = CustomModel()

9. Using Multi-GPU for training

Now sometimes working on 1 GPU can be slow and you might want to use multiple GPUs. But how do the parallelization works in Keras so that we don’t have to take care of moving data from one place to another. This includes 2 steps — 1. defining the distribution strategy 2. model creation, instantiation of the metrics andcompilation of the model need to happen with the strategy scope (model.fit() can remain out of the scope). Voila, you model will train much faster using th multiple GPUs.

#instantiate a "distribution strategy" object for using multiple GPUs
# if we want to use all gpus leaves devices to default
strategy = tf.distribute.MirroredStrategy(
  cross_device_ops = tf.distribute.ReductionToOneDevice(),
  devices=["/gpu:0", "/gpu:1"]
)

with mirrored_strategy.scope():
  model = tf.keras.Sequential([
      tf.keras.layers.Dense(1, input_shape=(1,),
                            kernel_regularizer=tf.keras.regularizers.L2(1e-4))])
  model.compile(loss='mse', optimizer='sgd')

If you learnt something from the blog or have some feedback, do clap and comment as it took some considerable effort to create 😊 🙏