Chapter04 Distributed Computing

If you find this content useful, consider buying this book:

  • Take the duke/coursera specialization
  • Chapter 4: Challenges and Opportunities in Distributed Computing #

    One of the critical tenants of life is tradeoffs. To gain one thing, you often lose another. With Distributed Computing, there is much to both gain and lose. Let’s talk through some of the keys concepts.

    Learn what Debugging Python Code is in the following screencast.

    Debugging Python Code

    Video Link: https://www.youtube.com/watch?v=BBCdU_3AI1E

    Eventual Consistency #

    The foundational infrastructure of the Cloud builds upon the concept of eventual consistency. This theory means that a relational database’s default behavior, for example, of strong consistency, is not needed in many Cloud Scale applications. For instance, if two users worldwide see a slightly different piece of text on a social media post and then 10 seconds later, a few more sentences change, and they become in sync, it is irrelevant to most humans viewing the post. Likewise, the social media company couldn’t care in the least in designing the application.

    An excellent example of a product built around the Cloud Scale concept is Amazon DynamoDB. This database scales automatically, is key value-based, and is “eventually consistent.”

    CAP Theorem #

    There is a tradeoff between consistency, availability, and fault-tolerance according to the CAP Theorem. This theorem speaks to the statement you cannot gain one thing without losing another in Distributed Computing.

    cap-theorem

    Learn what the CAP Theorem is in the following screencast.

    CAP Theorem

    Video Link: https://www.youtube.com/watch?v=hW6PL68yuB0

    Amdahl’s Law #

    Likewise, there are diminishing returns with parallelization. Just because more cores exist, it doesn’t mean they will be useful in all problems. Further, diminished efficiency occurs because most programs are not 100% parallel.

    Learn what Amdahl’s Law is in the following screencast.

    Amdahl’s Law

    Video Link: https://www.youtube.com/watch?v=8pzBLcyTcR4

    For each portion of the program that is not entirely parallel, the diminishing returns increase. Finally, some languages have critical flaws because they existed before multi-cores architectures were common. A great example of this is the Python language and the GIL (Global Interpreter Lock).

    Learn about Concurrency in Python in the following screencast.

    Concurrency in Python

    Video Link: https://www.youtube.com/watch?v=22BDJ4TIA5M

    Elasticity #

    Elasticity means the ability to adapt to a varying workload. In the world of Cloud Computing, this means you should only pay for what you use. It also means you can respond both to increased demand without failure, and you can scale down to reduce costs when the need diminishes.

    elasticity

    Learn what Elasticity is in the following screencast.

    Elasticity

    Video Link: https://www.youtube.com/watch?v=HA2bh0oDMwo

    Highly Available #

    Does the system always respond with a healthy response? An excellent example of a high availability system is Amazon S3. To build a HA (Highly Available) system, the Cloud provides the ability to architect against load balancers and regions.

    highly-available

    Learn what Highly Available is in the following screencast.

    Highly Available

    Video Link: https://www.youtube.com/watch?v=UmAnxEOJSpM

    End of Moore’s Law #

    Perhaps the best source to describe what is happening with chips is Dr. David Patterson, a professor at UC Berkeley and an architect for the TPU (Tensorflow Processing Unit).

    Screen Shot 2020-02-18 at 11 49 01 AM

    The high-level version is that CPU clock speed is effectively dead. This “problem” opens up an opportunity for new solutions. These solutions involve both cloud computing, and also specialized chips called ASICs.

    Learn about Moore’s Law in the following screencast.

    Moore’s Law

    Video Link: https://www.youtube.com/watch?v=adPvJpoj_tU

    ASICS: GPUs, TPUs, FPGA #

    Learn what an ASIC is in the following screencast.

    Moore’s Law

    Video Link: https://www.youtube.com/watch?v=SXLQ6ipVtxQ

    ASIC vs CPU vs GPU #

    TPU in Production TPU Production

    CPU How CPUs work

    GPU GPUs work

    TPU TPUs work

    Sources: https://storage.googleapis.com/nexttpu/index.html

    Using GPUs and JIT #

    Learn how to do GPU programming is in the following screencast.

    GPU programming

    Video Link: https://www.youtube.com/watch?v=3dHJ00mAQAY

    One of the easiest ways to use a Just in Time compiler (JIT) or a GPU is to use a library like numba and a hosted runtime like Google Colab.

    Learn what Colab Pro is in the following screencast.

    Colab Pro

    Video Link: https://www.youtube.com/watch?v=W8RcIP2-h7c

    There is a step by step example of how to use these operations in the following notebook https://github.com/noahgift/cloud-data-analysis-at-scale/blob/master/GPU_Programming.ipynb. The main high-level takeaway is that a GPU runtime must exist. This example is available via Google Colab, but it could also live on a server somewhere or your workstation if it has an NVidia GPU.

    Screen Shot 2020-02-18 at 12 44 56 PM

    Next up, you can install numba if not installed and double-check the CUDA .so libraries are available.

    !pip install numba
    !find / -iname 'libdevice'
    !find / -iname 'libnvvm.so'
    

    You should see something like this.

    /usr/local/cuda-10.0/nvvm/libdevice
    /usr/local/cuda-10.1/nvvm/libdevice
    /usr/local/cuda-10.0/nvvm/lib64/libnvvm.so
    /usr/local/cuda-10.1/nvvm/lib64/libnvvm.so
    

    GPU Workflow #

    A key concept in the GPU program is the following steps to code to the GPU from scratch.

    1. Create a Vectorized Function
    2. Move calculations to GPU memory
    3. Calculate on the GPU
    4. Move the Calculations back to the host

    Here is a contrived example below that is available in this Google Colab Notebook.

    from numba import (cuda, vectorize)
    import pandas as pd
    import numpy as np
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.cluster import KMeans
    
    from functools import wraps
    from time import time
    
    def real_estate_df():
        """30 Years of Housing Prices"""
    
        df = pd.read_csv("https://raw.githubusercontent.com/noahgift/real_estate_ml/master/data/Zip_Zhvi_SingleFamilyResidence.csv")
        df.rename(columns={"RegionName":"ZipCode"}, inplace=True)
        df["ZipCode"]=df["ZipCode"].map(lambda x: "{:.0f}".format(x))
        df["RegionID"]=df["RegionID"].map(lambda x: "{:.0f}".format(x))
        return df
    
    def numerical_real_estate_array(df):
        """Converts df to numpy numerical array"""
    
        columns_to_drop = ['RegionID', 'ZipCode', 'City', 'State', 'Metro', 'CountyName']
        df_numerical = df.dropna()
        df_numerical = df_numerical.drop(columns_to_drop, axis=1)
        return df_numerical.values
    
    def real_estate_array():
        """Returns Real Estate Array"""
    
        df = real_estate_df()
        rea = numerical_real_estate_array(df)
        return np.float32(rea)
    
    
    @vectorize(['float32(float32, float32)'], target='cuda')
    def add_ufunc(x, y):
        return x + y
    
    def cuda_operation():
        """Performs Vectorized Operations on GPU"""
    
        x = real_estate_array()
        y = real_estate_array()
    
        print("Moving calculations to GPU memory")
        x_device = cuda.to_device(x)
        y_device = cuda.to_device(y)
        out_device = cuda.device_array(
            shape=(x_device.shape[0],x_device.shape[1]), dtype=np.float32)
        print(x_device)
        print(x_device.shape)
        print(x_device.dtype)
    
        print("Calculating on GPU")
        add_ufunc(x_device,y_device, out=out_device)
    
        out_host = out_device.copy_to_host()
        print(f"Calculations from GPU {out_host}")
    
    cuda_operation()
    

    Exercise-GPU-Coding #

    • Topic: Go through colab example here
    • Estimated time: 20-30 minutes
    • People: Individual or Final Project Team
    • Slack Channel: #noisy-exercise-chatter
    • Directions:
      • Part A: Get code working in colab
      • Part B: Make your GPU or JIT code to speed up a project. Share in slack or create a technical blog post about it.

    Summary #

    This chapter covers the theory behind many Cloud Computing fundamentals, including Distributed Computing, CAP Theorem, Eventual Consistency, Amdahl’s Law, and Highly Available. It concludes with a hands-on example of doing GPU programming with CUDA in Python.