Initial commit

c36a7082 · Cristian Quezada · c36a7082 · c36a7082 · c36a7082 · c36a7082
Commit c36a7082 authored Aug 25, 2021 by Cristian Quezada
19 changed files
--- a/README.md
+++ b/README.md
+<div align="center" id="top"> 
+  <img src="./.github/app.gif" alt="Mlops_aws" />
+  &#xa0;
+  <!-- <a href="https://mlops_aws.netlify.app">Demo</a> -->
+</div>
+<h1 align="center">Mlops_aws</h1>
+<!-- Status -->
+<!-- <h4 align="center"> 
+	🚧  Mlops_aws 🚀 Under construction...  🚧
+</h4> 
+<hr> -->
+<br>
+## AWS MLOPS ##
+Repositorio con los scripts necesarios para implementar una arquitectura de MLOPS.
+Esta implementación permite agregar a tu código de ML la capacidad de entrenar y desplegar. Permite realizar cambios en el código y en los datos mientras se realizan pruebas para validarlos.
+## Pre-Requisitos
+### Services
+Se necesita una experiencia básica en :
+  - Train/test un ML model
+  - Python ([scikit-learn](https://scikit-learn.org/stable/#))
+  - [Jupyter Notebook](https://jupyter.org/)
+  - [AWS CodePipeline](https://aws.amazon.com/codepipeline/)
+  - [AWS CodeCommit](https://aws.amazon.com/codecommit/)
+  - [AWS CodeBuild](https://aws.amazon.com/codebuild/)
+  - [Amazon ECR](https://aws.amazon.com/ecr/)
+  - [Amazon SageMaker](https://aws.amazon.com/sagemaker/)
+  - [AWS CloudFormation](https://aws.amazon.com/cloudformation/)
+  - [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html)
+### Cuenta AWS 
+Se requiere una cuenta AWS, revisar si los serivicios estan disponibles en la capa gratuita.
+## Parte 1 : Creación de Imagen Docker
+### Arquitectura
+![Build Docker Image](resources/imgs/MLOps_BuildImage.jpg)
+* Un ML Developer crea assets(Modelo RandomForest) para la imagen Docker y realiza un push a CodeCommit
+* CodePipeline escucha el evento push de CodeCommit, obtiene el código fuente  y lanza CodeBuild
+* CodeBuild se autentica en ECR, crea la imagen de Docker y la inserta en el repositorio de ECR.
+## Parte 2 : Entrenamiento y Despliegue
+### Arquitectura
+![Build Docker Image](resources/imgs/MLOps_Train_Deploy_TestModel.png)
+* Inicia al subir un zip con los assets(configuración de despliegue) + datasets 
+* CodePipeline escucha este evento y ejecuta Lambda Request para verificar el archivo en el bucket e iniciar el entrenamiento
+* Lambda envia trabajos de entrenamiento a Sagemaker
+* Cuando finaliza el entrenamiento , CodePipeline verifica el estado
+* CodePipeline llama a CloudFormation para desplegar en un entorno DEV
+* Luego espera una aprobación manual
+* CodePipeline llama a CloudFormation para desplegar en un entorno PROD
+## Instrucciones
+Opción 1: Ejecutar CloudFormation utilizando la plantilla que se encuentra en code/yml/m.yml
+Opción 2: Ejecutar CloudFormation utilizando la plantilla que se alojada en un bucket público de s3
+Region| Launch
+------|-----
+US East (N. Virginia) | [![Launch MLOps solution in us-east-1](resources/imgs/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=AIWorkshop&templateURL=https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/m.yml)
+### Contenido - Code
+En la carpeta code/lambdas se encuentra los archivos python que se indican en el gráfico.
+- mlops-env-setup: Obtiene variables de entorno de codecommit para establecer el entorno con las credenciales.
+- mlops-op-deployment: Se ecnuentra el código que permite realizar el despliegue automático de un modelo entrenado por Sagemaker.
+- mlops-op-process-request: Verifica que el archivo trainingjob.zip alojado en el bucket contenga los archivos necesarios para el entrenamiento y el despliegue.
+- mlops-op-training: Extrae la data del archivo de entrenamiento y los manda a Sagemaker para ejecutar trabajos de entrenamiento.
+### Contenido - Notebooks
+1- Crear Imagen Docker/
+01_crear_imagen_docker : creación de imagen , empleando modelos personalizados
+02_Probar_local_mode : En el notebook anterior hay una celda que permite desplegar un servicio local, y con este notebook lo podemos probar.
+03_Probando_sagemaker_estimator: Permite probar la imagen docker creada y utilizarla para ejecutar trabajos de entrenamiento y realizar predicciones.
+2-Entrenamiento y Despliegue/
+01_Training: Aca se muestra toda la configuración para realizar el despliegue y como se adjunta al dataset en un archivo zip para iniciar el flujo de la parte 2.
+02_Check_Progress: Opcional, muestra una vista con más detalle a la que se muestra en la interfas de CodePipeline
--- a/code/yml/build_image.yml
+++ b/code/yml/build_image.yml
+# aws cloudformation delete-stack --stack-name scikit-image
+# aws cloudformation create-stack --stack-name scikit-image --template-body file://build-image.yml
+Description: Create a CodePipeline for creating a Docker base image for training/serving models
+Parameters:
+  RepoBranchName:
+    Type: String
+    Description: Name of the branch the code is located
+  ImageRepoName:
+    Type: String
+    Description: Name of the ECR repo without the image name
+  ImageTagName:
+    Type: String
+    Description: Name of the ECR image tag
+    Default: latest
+Resources:
+  BuildImageProject:
+    Type: AWS::CodeBuild::Project
+    Properties:
+      Name: !Sub mlops-buildimage-${ImageRepoName}
+      Description: Build a Model Image
+      ServiceRole: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+      Artifacts:
+        Type: CODEPIPELINE
+      Source:
+        Type: CODEPIPELINE
+        BuildSpec: buildspec.yml
+      Environment:
+        Type: LINUX_CONTAINER
+        ComputeType: BUILD_GENERAL1_SMALL
+        Image: aws/codebuild/docker:17.09.0
+        EnvironmentVariables:
+          - Name: IMAGE_REPO_NAME
+            Value:
+              Ref: ImageRepoName
+          - Name: IMAGE_TAG
+            Value:
+              Ref: ImageTagName
+          - Name: AWS_ACCOUNT_ID
+            Value: !Sub ${AWS::AccountId}
+          - Name: AWS_DEFAULT_REGION
+            Value: !Sub ${AWS::Region}
+          - Name: TEMPLATE_BUCKET
+            Value: !Sub mlops-${AWS::Region}-${AWS::AccountId}
+          - Name: TEMPLATE_PREFIX
+            Value: codebuild
+      Tags:
+        - Key: Name
+          Value: !Sub mlops-buildimage-${ImageRepoName}
+  DeployPipeline:
+    Type: "AWS::CodePipeline::Pipeline"
+    Properties:
+      Name: !Sub mlops-${ImageRepoName}
+      RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+      ArtifactStore:
+          Type: S3
+          Location: !Sub mlops-${AWS::Region}-${AWS::AccountId}
+      Stages:
+        -
+          Name: Source
+          Actions:
+            -
+              Name: GetSource
+              ActionTypeId:
+                Category: Source
+                Owner: AWS
+                Version: 1
+                Provider: CodeCommit
+              OutputArtifacts:
+                -
+                  Name: ModelSourceOutput
+              Configuration:
+                BranchName: 
+                  Ref: RepoBranchName
+                RepositoryName: mlops
+              RunOrder: 1
+        -
+          Name: Build
+          Actions:
+            -
+              Name: BuildImage
+              InputArtifacts:
+                - Name: ModelSourceOutput
+              ActionTypeId:
+                Category: Build
+                Owner: AWS
+                Version: 1
+                Provider: CodeBuild
+              Configuration:
+                  ProjectName:
+                    Ref: BuildImageProject
+              RunOrder: 1
--- a/code/yml/main.yml
+++ b/code/yml/main.yml
+# These are the parameters we'll ask the user before creating the environment
+Parameters:
+    NotebookInstanceSubNetId:
+        Type: AWS::EC2::Subnet::Id
+        Description: "Select any subnet id"
+        AllowedPattern: ^subnet\-[a-zA-Z0-9]+$
+        ConstraintDescription: "You need to inform any subnetid"
+    NotebookInstanceSecGroupId:
+        Type: List<AWS::EC2::SecurityGroup::Id>
+        Description: "Select the default security group"
+        AllowedPattern: ^sg\-[a-zA-Z0-9]+$
+        ConstraintDescription: "Select the default security group"
+Resources:
+    ####################
+    ## PERMISSIONS
+    ####################
+    MLOpsSecurity:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_security.yml
+    MLOpsLambdaLayers:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_crhelper.yml
+    MLOpsProcessRequest:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_process_request.yml
+        DependsOn:
+            - MLOpsLambdaLayers
+            - MLOpsSecurity
+    MLOpsDeploymentOperator:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_deploy.yml
+        DependsOn:
+            - MLOpsLambdaLayers
+            - MLOpsSecurity
+    MLOpsTrainingOperator:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_training.yml
+        DependsOn: 
+            - MLOpsLambdaLayers
+            - MLOpsSecurity
+    ## OK. Then we'll create some repos for the source code
+    ## We need to create two branches in the default repo, so
+    ## let's use a custom resource with a Lambda Function to do that 
+    ## Also, when the stack is deleted we need to remove all the versioned
+    ## files from the S3 bucket, otherwise it will fail
+    MLOpsEnvSetup:
+        Type: AWS::Lambda::Function
+        Properties:
+            Code:
+                ZipFile: !Sub |
+                    import cfnresponse
+                    import boto3
+                    codeCommit = boto3.client('codecommit')
+                    s3 = boto3.resource('s3')
+                    ecr = boto3.client('ecr')
+                    def lambda_handler(event, context):
+                        responseData = {'status': 'NONE'}
+                        if event['RequestType'] == 'Create':
+                            repoName = event['ResourceProperties'].get('RepoName')
+                            branch_names = event['ResourceProperties'].get('BranchNames')
+                            branches = codeCommit.list_branches(repositoryName=repoName)['branches']
+                            responseData['default_branch'] = branch_names[0] 
+                            if len(branches) == 0:
+                                putFiles = {'filePath': 'buildspec.yml', 'fileContent': "version: 0.2\nphases:\n  build:\n    commands:\n      - echo 'dummy'\n".encode()}
+                                resp = codeCommit.create_commit(repositoryName=repoName, branchName='master', commitMessage=' - repo init', putFiles=[putFiles])
+                                for i in branch_names:
+                                    codeCommit.create_branch(repositoryName=repoName, branchName=i, commitId=resp['commitId'])
+                                responseData['status'] = 'CREATED'
+                        elif event['RequestType'] == 'Delete':
+                            s3.Bucket( event['ResourceProperties'].get('BucketName') ).object_versions.all().delete()
+                            try:
+                                for i in event['ResourceProperties'].get('ImageRepoNames'):
+                                    imgs = ecr.list_images(registryId='${AWS::AccountId}', repositoryName=i)
+                                    ecr.batch_delete_image(registryId='${AWS::AccountId}', repositoryName=i, imageIds=imgs['imageIds'])
+                            except Exception as e:
+                                pass
+                            responseData['status'] = 'DELETED'
+                        cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData)
+            FunctionName: mlops-env-setup
+            Handler: "index.lambda_handler"
+            Timeout: 60
+            MemorySize: 512
+            Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps 
+            Runtime: python3.7
+        DependsOn:
+            - MLOpsSecurity
+    MLOpsEnvSetupCaller:
+        Type: Custom::EnvSetupCaller
+        Properties:
+            ServiceToken: !GetAtt MLOpsEnvSetup.Arn
+            RepoName: mlops
+            BranchNames:
+                - iris_model
+            ImageRepoNames:
+                - iris-model
+            BucketName: !Ref MLOpsBucket
+        DependsOn:
+            - MLOpsRepo
+            - MLOpsIrisModelRepo
+    ####################
+    ## REPOSITORIES
+    ####################
+    ## We have the custome resource that can invoke a lambda function to create the branches,
+    ## So, let's create the CodeCommit repo and also the SageMaker Code repos
+    MLOpsRepo:
+        Type: AWS::CodeCommit::Repository
+        Properties: 
+            RepositoryDescription: Repository for the ML models/images code 
+            RepositoryName: mlops
+    MLOpsIrisModelRepo:
+        Type: AWS::ECR::Repository
+        Properties: 
+            RepositoryName: iris-model
+    MLOpsBucket:
+        Type: AWS::S3::Bucket
+        Properties: 
+            BucketName: !Sub mlops-${AWS::Region}-${AWS::AccountId}
+            Tags:
+                - Key: Name
+                  Value: !Sub mlops-${AWS::Region}-${AWS::AccountId}
+            AccessControl: Private
+            VersioningConfiguration:
+                Status: Enabled
+    ####################
+    ## PIPELINES 
+    ####################
+    BuildPipelineIrisModel:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/build_image.yml
+            Parameters:
+                RepoBranchName: iris_model
+                ImageRepoName: iris-model
+                ImageTagName: latest
+        DependsOn:
+            - MLOpsBucket
+            - MLOpsCodeRepo
+    MLPipelineIrisModel:
+        Type: AWS::CloudFormation::Stack
+        DeletionPolicy: Delete
+        Properties:
+            TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_pipeline.yml
+            Parameters:
+                SourceBucketPath: !Ref MLOpsBucket
+                ModelNamePrefix: iris-model
+        DependsOn:
+            - MLOpsBucket
+            - MLOpsCodeRepo
+    ####################
+    ## Notebook Instance
+    ####################
+    MLOpsExercisesRepo:
+        Type: AWS::SageMaker::CodeRepository
+        Properties:
+            CodeRepositoryName: MLOpsExercisesRepo
+            GitConfig:
+                RepositoryUrl: https://github.com/awslabs/amazon-sagemaker-mlops-workshop.git 
+    MLOpsCodeRepo:
+        Type: AWS::SageMaker::CodeRepository
+        Properties:
+            CodeRepositoryName: MLOpsCodeRepo
+            GitConfig:
+                RepositoryUrl: !GetAtt  MLOpsRepo.CloneUrlHttp 
+                Branch: !Sub "${MLOpsEnvSetupCaller.default_branch}"
+        DependsOn: MLOpsEnvSetupCaller
+    IAWorkshopNotebookInstanceLifecycleConfig:
+        Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig"
+        Properties:
+            NotebookInstanceLifecycleConfigName: !Sub ${AWS::StackName}-lifecycle-config
+            OnStart:
+                - Content: !Base64 |
+                    #!/bin/bash
+                    sudo -u ec2-user -i <<'EOF'
+                    echo "Finally, let's clone and build an image for testing codebuild locally"
+                    git clone https://github.com/aws/aws-codebuild-docker-images.git /tmp/aws-codebuild
+                    chmod +x /tmp/aws-codebuild/local_builds/codebuild_build.sh
+                    docker pull amazon/aws-codebuild-local:latest --disable-content-trust=false
+                    # This will affect only the Jupyter kernel called "conda_python3".
+                    source activate python3
+                    # Update the sagemaker to the latest version
+                    pip install -U sagemaker
+                    source deactivate
+                    EOF
+    MLOpsNotebookInstance:
+        Type: "AWS::SageMaker::NotebookInstance"
+        Properties:
+            NotebookInstanceName: MLOpsWorkshop
+            InstanceType: "ml.m4.xlarge"
+            SubnetId: !Ref NotebookInstanceSubNetId
+            SecurityGroupIds: !Ref NotebookInstanceSecGroupId
+            RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+            DefaultCodeRepository: MLOpsCodeRepo
+            AdditionalCodeRepositories:
+                - MLOpsExercisesRepo
+            VolumeSizeInGB: 15
+            LifecycleConfigName: !GetAtt IAWorkshopNotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName
+        DependsOn:
+            - IAWorkshopNotebookInstanceLifecycleConfig
+            - MLOpsCodeRepo
+            - MLOpsExercisesRepo
+            - MLOpsSecurity
+Outputs:
+    MLOpsNotebookInstanceId:
+        Value: !Ref MLOpsNotebookInstance
--- a/code/yml/mlops_crhelper.yml
+++ b/code/yml/mlops_crhelper.yml
+Resources:
+    CloudFormationHelperLayer:
+        Type: AWS::Lambda::LayerVersion
+        Properties:
+            CompatibleRuntimes:
+                - python3.6
+                - python3.7
+            LayerName: crhelper
+            Description: https://github.com/aws-cloudformation/custom-resource-helper
+            LicenseInfo: Apache 2.0 License
+            Content:
+                S3Bucket: aws-ai-ml-aod-latam
+                S3Key: mlops-workshop/assets/crhelper.zip
+Outputs:
+    LayerArn:
+        Description: Arn of the layer's latest version
+        Value: !Ref CloudFormationHelperLayer
+        Export:
+            Name: mlops-crhelper-LayerArn
--- a/code/yml/mlops_op_deploy.yml
+++ b/code/yml/mlops_op_deploy.yml
+Resources:
+    MLOpsDeployment:
+        Type: "AWS::Lambda::Function"
+        Properties: 
+            FunctionName: mlops-op-deployment
+            Handler: mlops_op_deploy.lambda_handler
+            MemorySize: 512
+            Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+            Runtime: python3.7
+            Timeout: 60
+            Layers:
+                - Fn::ImportValue: mlops-crhelper-LayerArn
+            Code: 
+                S3Bucket: aws-ai-ml-aod-latam
+                S3Key: mlops-workshop/assets/src/mlops_op_deploy.zip
+            Description: "Function that will start a new Sagemaker Deployment"
+            Tags:
+                - Key: Description
+                  Value: Lambda function that process the request and prepares the cfn template for deployment
--- a/code/yml/mlops_op_process_request.yml
+++ b/code/yml/mlops_op_process_request.yml
+Resources:
+  MLOpsProcessRequest:
+    Type: "AWS::Lambda::Function"
+    Properties: 
+      FunctionName: mlops-op-process-request
+      Handler: index.lambda_handler
+      MemorySize: 512
+      Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+      Runtime: python3.7
+      Timeout: 60
+      Code: 
+        ZipFile: !Sub |
+            import boto3
+            import io
+            import zipfile
+            import json
+            from datetime import datetime
+            s3 = boto3.client('s3')
+            codepipeline = boto3.client('codepipeline')
+            def lambda_handler(event, context):
+                trainingJob = None
+                deployment = None
+                try:
+                    now = datetime.now()
+                    jobId = event["CodePipeline.job"]["id"]
+                    user_params = json.loads(event["CodePipeline.job"]["data"]["actionConfiguration"]["configuration"]["UserParameters"])
+                    model_prefix = user_params['model_prefix']
+                    mlops_operation_template = s3.get_object(Bucket=user_params['bucket'], Key=user_params['prefix'] )['Body'].read()
+                    job_name = 'mlops-%s-%s' % (model_prefix, now.strftime("%Y-%m-%d-%H-%M-%S"))
+                    s3Location = None
+                    for inputArtifacts in event["CodePipeline.job"]["data"]["inputArtifacts"]:
+                        if inputArtifacts['name'] == 'ModelSourceOutput':
+                            s3Location = inputArtifacts['location']['s3Location']
+                    params = {
+                        "Parameters": {
+                            "AssetsBucket": s3Location['bucketName'],
+                            "AssetsKey": s3Location['objectKey'],
+                            "Operation": "training",
+                            "Environment": "none",
+                            "JobName": job_name
+                        }
+                    }
+                    for outputArtifacts in event["CodePipeline.job"]["data"]["outputArtifacts"]:
+                        if outputArtifacts['name'] == 'RequestOutput':
+                            s3Location = outputArtifacts['location']['s3Location']
+                            zip_bytes = io.BytesIO()
+                            with zipfile.ZipFile(zip_bytes, "w") as z:
+                                z.writestr('assets/params_train.json', json.dumps(params))
+                                params['Parameters']['Operation'] = 'deployment'
+                                params['Parameters']['Environment'] = 'development'
+                                z.writestr('assets/params_deploy_dev.json', json.dumps(params))
+                                params['Parameters']['Environment'] = 'production'
+                                z.writestr('assets/params_deploy_prd.json', json.dumps(params))
+                                z.writestr('assets/mlops_operation_handler.yml', mlops_operation_template)
+                            zip_bytes.seek(0)
+                            s3.put_object(Bucket=s3Location['bucketName'], Key=s3Location['objectKey'], Body=zip_bytes.read())
+                    # and update codepipeline
+                    codepipeline.put_job_success_result(jobId=jobId)
+                except Exception as e:
+                    resp = codepipeline.put_job_failure_result(
+                        jobId=jobId,
+                        failureDetails={
+                            'type': 'ConfigurationError',
+                            'message': str(e),
+                            'externalExecutionId': context.aws_request_id
+                        }
+                    )
+      Description: "Function that will start a new Sagemaker Training Job"
+      Tags:
+        - Key: Description
+          Value: Lambda function that process the request and prepares the cfn template for training
--- a/code/yml/mlops_op_training.yml
+++ b/code/yml/mlops_op_training.yml
+Resources:
+    MLOpsTraining:
+        Type: "AWS::Lambda::Function"
+        Properties: 
+            FunctionName: mlops-op-training
+            Handler: index.lambda_handler
+            MemorySize: 512
+            Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+            Runtime: python3.7
+            Timeout: 60
+            Layers:
+                - Fn::ImportValue: mlops-crhelper-LayerArn
+            Code: 
+                ZipFile: !Sub |
+                    import boto3
+                    import io
+                    import zipfile
+                    import json
+                    import logging
+                    from crhelper import CfnResource
+                    logger = logging.getLogger(__name__)
+                    # Initialise the helper, all inputs are optional, this example shows the defaults
+                    helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL')
+                    s3 = boto3.client('s3')
+                    sm =  boto3.client('sagemaker')
+                    def lambda_handler(event, context):
+                        helper(event, context)
+                    @helper.create
+                    @helper.update
+                    def start_training_job(event, context):
+                        try:
+                            # Get the training job and deployment descriptors
+                            training_params = None
+                            deployment_params = None
+                            job_name = event['ResourceProperties']['JobName']
+                            helper.Data.update({'job_name': job_name})
+                            try:
+                                # We need to check if there is another training job with the same name
+                                sm.describe_training_job(TrainingJobName=job_name)
+                                ## there is, let's let the poll to address this
+                            except Exception as a:
+                                # Ok. there isn't. so, let's start a new training job
+                                resp = s3.get_object(Bucket=event['ResourceProperties']['AssetsBucket'], Key=event['ResourceProperties']['AssetsKey'])
+                                with zipfile.ZipFile(io.BytesIO(resp['Body'].read()), "r") as z:
+                                    training_params = json.loads(z.read('trainingjob.json').decode('ascii'))
+                                    deployment_params = json.loads(z.read('deployment.json').decode('ascii'))
+                                training_params['TrainingJobName'] = job_name
+                                resp = sm.create_training_job(**training_params)
+                        except Exception as e:
+                            logger.error("start_training_job - Ops! Something went wrong: %s" % e)
+                            raise e
+                    @helper.delete
+                    def stop_training_job(event, context):
+                        try:
+                            job_name =  event['ResourceProperties']['JobName']
+                            status = sm.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
+                            if status == 'InProgress':
+                                logger.info('Stopping InProgress training job: %s', job_name)
+                                sm.stop_training_job(TrainingJobName=job_name)
+                                return False
+                            else:
+                                logger.info('Training job status: %s, nothing to stop', status)
+                        except Exception as e:
+                            logger.error("stop_training_job - Ops! Something went wrong: %s" % e)
+                        return True
+                    @helper.poll_create
+                    @helper.poll_update
+                    def check_training_job_progress(event, context):
+                        failed = False
+                        try:
+                            job_name = helper.Data.get('job_name')
+                            resp = sm.describe_training_job(TrainingJobName=job_name)
+                            status = resp['TrainingJobStatus']
+                            if status == 'Completed':
+                                logger.info('Training Job (%s) is Completed', job_name)
+                                return True
+                            elif status in ['InProgress', 'Stopping' ]:
+                                logger.info('Training job (%s) still in progress (%s), waiting and polling again...', 
+                                    job_name, resp['SecondaryStatus'])
+                            elif status == 'Failed':
+                                failed = True
+                                raise Exception('Training job has failed: {}',format(resp['FailureReason']))
+                            else:
+                                raise Exception('Training job ({}) has unexpected status: {}'.format(job_name, status))
+                        except Exception as e:
+                            logger.error("check_training_job_progress - Ops! Something went wrong: %s" % e)
+                            if failed:
+                                raise e
+                        return False
+                    @helper.poll_delete
+                    def check_stopping_training_job_progress(event, context):
+                        logger.info("check_stopping_training_job_progress")
+                        return stop_training_job(event, context)
+            Description: "Function that will start a new Sagemaker Training Job"
+            Tags:
+                - Key: Description
+                  Value: Lambda function that process the request and prepares the cfn template for training
--- a/code/yml/mlops_pipeline.yml
+++ b/code/yml/mlops_pipeline.yml
+Description: Create a CodePipeline for a Machine Learning Pipeline
+Parameters:
+    SourceBucketPath:
+        Type: String
+        Description: Path of the S3 bucket that CodePipeline should find a sagemaker jobfile
+    ModelNamePrefix:
+        Type: String
+        Description: The name prefix of the model that will be supported by this pipeline
+Resources:  
+  DeployPipeline:
+    Type: "AWS::CodePipeline::Pipeline"
+    Properties:
+      Name: !Sub ${ModelNamePrefix}-pipeline
+      RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps 
+      ArtifactStore:
+          Type: S3
+          Location: !Sub mlops-${AWS::Region}-${AWS::AccountId}
+      Stages:
+        -
+          Name: Source
+          Actions: 
+            - 
+              Name: SourceAction
+              ActionTypeId: 
+                Category: Source
+                Owner: AWS
+                Version: 1
+                Provider: S3
+              OutputArtifacts: 
+                - 
+                  Name: ModelSourceOutput
+              Configuration: 
+                S3Bucket: 
+                  !Sub ${SourceBucketPath}
+                S3ObjectKey: 
+                  !Sub training_jobs/${ModelNamePrefix}/trainingjob.zip
+              RunOrder: 1
+        -
+          Name: ProcessRequest
+          Actions:
+            -
+              Name: ProcessRequest
+              InputArtifacts:
+                - Name: ModelSourceOutput
+              OutputArtifacts:
+                -
+                  Name: RequestOutput
+              ActionTypeId:
+                Category: Invoke
+                Owner: AWS
+                Version: 1
+                Provider: Lambda
+              Configuration:
+                  FunctionName: mlops-op-process-request
+                  UserParameters: !Sub '{"model_prefix": "${ModelNamePrefix}", "bucket":"aws-ai-ml-aod-latam","prefix":"mlops-workshop/assets/mlops_operation_handler.yml" }'
+              RunOrder: 1
+        -
+          Name: Train
+          Actions:
+            - 
+              Name: TrainModel
+              InputArtifacts:
+                - Name: ModelSourceOutput
+                - Name: RequestOutput
+              OutputArtifacts:
+                - Name: ModelTrainOutput
+              ActionTypeId:
+                Category: Deploy
+                Owner: AWS
+                Version: 1
+                Provider: CloudFormation
+              Configuration:
+                ActionMode: CREATE_UPDATE
+                RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+                StackName: !Sub mlops-training-${ModelNamePrefix}-job
+                TemplateConfiguration: RequestOutput::assets/params_train.json
+                TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
+              RunOrder: 1
+        -
+          Name: DeployDev
+          Actions:
+            - 
+              Name: DeployDevModel
+              InputArtifacts:
+                - Name: ModelSourceOutput
+                - Name: RequestOutput
+              OutputArtifacts:
+                - Name: ModelDeployDevOutput
+              ActionTypeId:
+                Category: Deploy
+                Owner: AWS
+                Version: 1
+                Provider: CloudFormation
+              Configuration:
+                ActionMode: CREATE_UPDATE 
+                RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+                StackName: !Sub mlops-deploy-${ModelNamePrefix}-dev
+                TemplateConfiguration: RequestOutput::assets/params_deploy_dev.json
+                TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
+              RunOrder: 1
+        -
+          Name: DeployApproval
+          Actions:
+            -
+              Name: ApproveDeploy
+              ActionTypeId:
+                Category: Approval
+                Owner: AWS
+                Version: 1
+                Provider: Manual
+              Configuration:
+                  CustomData: 'Shall this model be put into production?'
+              RunOrder: 1
+        -
+          Name: DeployPrd
+          Actions:
+            - 
+              Name: DeployModelPrd
+              InputArtifacts:
+                - Name: ModelSourceOutput
+                - Name: RequestOutput
+              OutputArtifacts:
+                - Name: ModelDeployPrdOutput
+              ActionTypeId:
+                Category: Deploy
+                Owner: AWS
+                Version: 1
+                Provider: CloudFormation
+              Configuration:
+                ActionMode: CREATE_UPDATE
+                RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
+                StackName: !Sub mlops-deploy-${ModelNamePrefix}-prd
+                TemplateConfiguration: RequestOutput::assets/params_deploy_prd.json
+                TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
+              RunOrder: 1
--- a/code/yml/mlops_security.yml
+++ b/code/yml/mlops_security.yml
+Resources:
+    MLOpsRole:
+        Type: "AWS::IAM::Role"
+        Properties: 
+            RoleName: MLOps
+            AssumeRolePolicyDocument: 
+                Version: "2012-10-17"
+                Statement: 
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "sagemaker.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "cloudformation.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "codepipeline.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "codebuild.amazonaws.com"
+                        Action: 
+                          - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "lambda.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "events.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "states.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+                    - 
+                        Effect: "Allow"
+                        Principal: 
+                            Service: 
+                                - "glue.amazonaws.com"
+                        Action: 
+                            - "sts:AssumeRole"
+            Path: "/"
+            Policies: 
+                - 
+                    PolicyName: "Admin"
+                    PolicyDocument: 
+                        Version: "2012-10-17"
+                        Statement: 
+                            - 
+                                Effect: "Allow"
+                                Action: "*"
+                                Resource: "*"
+Outputs:
+    LayerArn:
+        Description: Arn of the role
+        Value: !Ref MLOpsRole 
+        Export: 
+            Name: mlops-RoleArn
--- a/notebooks/1- Crear Imagen Docker/01_crear_imagen_docker.ipynb
+++ b/notebooks/1- Crear Imagen Docker/01_crear_imagen_docker.ipynb
--- a/notebooks/1- Crear Imagen Docker/02_Probar local model server.ipynb
+++ b/notebooks/1- Crear Imagen Docker/02_Probar local model server.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Primero probemos haciendo ping (GET /ping)\n",
+    "Sagemaker utilizará este meétodo para comprobar el estado de nuestro modelo.\n",
+    "Debe devolver un código **200**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from urllib import request\n",
+    "\n",
+    "base_url='http://localhost:8080'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Response code: 200\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = request.urlopen(\"%s/ping\" % base_url)\n",
+    "print(\"Response code: %d\" % resp.getcode() )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ahora podemos hacer predicciones (POST /invocations)\n",
+    "SAgemaker utilizará este meétodo para las predicciones. Aquí estamos simulando el parámetro de encabezado relacionado con CustomAttributes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================================\n",
+      "Response code: 200, Prediction: b'0.0\\n'\n",
+      "\n",
+      "content-type text/csv\n",
+      "x-request-id a1d96868-7349-4cab-b308-9e83f0c0329c\n",
+      "Pragma no-cache\n",
+      "Cache-Control no-cache; no-store, must-revalidate, private\n",
+      "Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
+      "content-length 4\n",
+      "connection keep-alive\n",
+      "================================================\n",
+      "Response code: 200, Prediction: b'2.0\\n'\n",
+      "\n",
+      "content-type text/csv\n",
+      "x-request-id 250bf8e7-a439-437a-a894-b38fe19aa625\n",
+      "Pragma no-cache\n",
+      "Cache-Control no-cache; no-store, must-revalidate, private\n",
+      "Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
+      "content-length 4\n",
+      "connection keep-alive\n",
+      "================================================\n",
+      "Response code: 200, Prediction: b'1.0\\n'\n",
+      "\n",
+      "content-type text/csv\n",
+      "x-request-id 77c1b1c0-19da-4715-8adf-5d918e404fea\n",
+      "Pragma no-cache\n",
+      "Cache-Control no-cache; no-store, must-revalidate, private\n",
+      "Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
+      "content-length 4\n",
+      "connection keep-alive\n",
+      "CPU times: user 8.59 ms, sys: 8.27 ms, total: 16.9 ms\n",
+      "Wall time: 342 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "from sagemaker.serializers import CSVSerializer\n",
+    "csv_serializer = CSVSerializer()\n",
+    "payloads = [\n",
+    "    [4.6, 3.1, 1.5, 0.2], # 0\n",
+    "    [7.7, 2.6, 6.9, 2.3], # 2\n",
+    "    [6.1, 2.8, 4.7, 1.2]  # 1\n",
+    "]\n",
+    "\n",
+    "def predict(payload):\n",
+    "    headers = {\n",
+    "        'Content-type': 'text/csv',\n",
+    "        'Accept': 'text/csv'\n",
+    "    }\n",
+    "    \n",
+    "    req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
+    "    resp = request.urlopen(req)\n",
+    "    print('================================================')\n",
+    "    print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
+    "    for i in resp.headers:\n",
+    "        print(i, resp.headers[i])\n",
+    "\n",
+    "for p in payloads:\n",
+    "    predict(p)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Todo Ok"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "conda_python3",
+   "language": "python",
+   "name": "conda_python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/notebooks/1- Crear Imagen Docker/02_Testing our local model server.ipynb
+++ b/notebooks/1- Crear Imagen Docker/02_Testing our local model server.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## First, let's  test the ping method (GET /ping)\n",
+    "This method will be used by Sagemaker for health check our model. It must return a code **200**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from urllib import request\n",
+    "\n",
+    "base_url='http://localhost:8080'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Response code: 200\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = request.urlopen(\"%s/ping\" % base_url)\n",
+    "print(\"Response code: %d\" % resp.getcode() )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Then we can the predictions (POST /invocations)\n",
+    "This method will be used by Sagemaker for the predictions. Here we're simulating the header parameter related to the CustomAttributes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================================\n",
+      "Response code: 200, Prediction: b'0.0\\n'\n",
+      "\n",
+      "content-type text/csv\n",
+      "x-request-id a1d96868-7349-4cab-b308-9e83f0c0329c\n",
+      "Pragma no-cache\n",
+      "Cache-Control no-cache; no-store, must-revalidate, private\n",
+      "Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
+      "content-length 4\n",
+      "connection keep-alive\n",
+      "================================================\n",
+      "Response code: 200, Prediction: b'2.0\\n'\n",
+      "\n",
+      "content-type text/csv\n",
+      "x-request-id 250bf8e7-a439-437a-a894-b38fe19aa625\n",
+      "Pragma no-cache\n",
+      "Cache-Control no-cache; no-store, must-revalidate, private\n",
+      "Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
+      "content-length 4\n",
+      "connection keep-alive\n",
+      "================================================\n",
+      "Response code: 200, Prediction: b'1.0\\n'\n",
+      "\n",
+      "content-type text/csv\n",
+      "x-request-id 77c1b1c0-19da-4715-8adf-5d918e404fea\n",
+      "Pragma no-cache\n",
+      "Cache-Control no-cache; no-store, must-revalidate, private\n",
+      "Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
+      "content-length 4\n",
+      "connection keep-alive\n",
+      "CPU times: user 8.59 ms, sys: 8.27 ms, total: 16.9 ms\n",
+      "Wall time: 342 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "from sagemaker.serializers import CSVSerializer\n",
+    "csv_serializer = CSVSerializer()\n",
+    "payloads = [\n",
+    "    [4.6, 3.1, 1.5, 0.2], # 0\n",
+    "    [7.7, 2.6, 6.9, 2.3], # 2\n",
+    "    [6.1, 2.8, 4.7, 1.2]  # 1\n",
+    "]\n",
+    "\n",
+    "def predict(payload):\n",
+    "    headers = {\n",
+    "        'Content-type': 'text/csv',\n",
+    "        'Accept': 'text/csv'\n",
+    "    }\n",
+    "    \n",
+    "    req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
+    "    resp = request.urlopen(req)\n",
+    "    print('================================================')\n",
+    "    print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
+    "    for i in resp.headers:\n",
+    "        print(i, resp.headers[i])\n",
+    "\n",
+    "for p in payloads:\n",
+    "    predict(p)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Todo Ok"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "conda_python3",
+   "language": "python",
+   "name": "conda_python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/notebooks/1- Crear Imagen Docker/03_Probando el contenedor usando SageMaker Estimator.ipynb
+++ b/notebooks/1- Crear Imagen Docker/03_Probando el contenedor usando SageMaker Estimator.ipynb
--- a/notebooks/2- Entrenamiento y Despliege/01_Training our model.ipynb
+++ b/notebooks/2- Entrenamiento y Despliege/01_Training our model.ipynb
--- a/notebooks/2- Entrenamiento y Despliege/02_Check Progress and Test the endpoint.ipynb
+++ b/notebooks/2- Entrenamiento y Despliege/02_Check Progress and Test the endpoint.ipynb
--- a/ppt/MLOPS_AWS.pptx
+++ b/ppt/MLOPS_AWS.pptx
--- a/resources/imgs/MLOps_BuildImage.jpg
+++ b/resources/imgs/MLOps_BuildImage.jpg
--- a/resources/imgs/MLOps_Train_Deploy_TestModel.png
+++ b/resources/imgs/MLOps_Train_Deploy_TestModel.png
--- a/resources/imgs/cloudformation-launch-stack.png
+++ b/resources/imgs/cloudformation-launch-stack.png