Deploying Hugging Face Models to AWS: A Complete Guide with CDK, SageMaker, and Lambda

November 30, 2025 • 55 min read • Yenn j12 engineering team

AWS CDK SageMaker Lambda Hugging Face Machine Learning MLOps TypeScript Python API Gateway Infrastructure as Code AI Deep Learning

🎯 Introduction

Deploying machine learning models to production is a complex challenge that goes far beyond training a model. When working with large models from Hugging Face—whether it’s image generation, text-to-image synthesis, or other AI tasks—you need robust infrastructure that handles:

Scalability: Auto-scaling to handle variable loads from 0 to thousands of concurrent requests
Cost Efficiency: Paying only for what you use while maintaining performance
Reliability: 99.9%+ uptime with proper error handling and monitoring
Security: Protecting models, data, and API endpoints
Observability: Comprehensive logging, metrics, and tracing

This comprehensive guide demonstrates how to deploy a Hugging Face model to AWS using infrastructure as code (CDK with TypeScript), combining SageMaker for model hosting and Lambda for API orchestration.

💡 Core Philosophy: “Production ML deployment isn’t about running inference—it’s about building a reliable, scalable, cost-effective system that serves predictions while handling failures gracefully”

🎬 What We’ll Build

We’ll deploy a complete ML inference system with:

Hugging Face Model on SageMaker for scalable inference
Lambda Functions for API endpoints and orchestration
API Gateway for RESTful API access
S3 for model artifacts and output storage
CloudWatch for monitoring and logging
IAM for fine-grained security controls
VPC configuration for network isolation

🏗️ System Architecture

📊 High-Level Architecture

graph TB
    Client[Client Application] --> APIG[API Gateway]
    APIG --> Lambda1[Lambda: API Handler]
    Lambda1 --> SQS[SQS Queue - Optional]
    Lambda1 --> SageMaker[SageMaker Endpoint]
    SageMaker --> Model[Hugging Face Model]
    Lambda1 --> S3[S3: Results Storage]
    Lambda1 --> DynamoDB[DynamoDB: Metadata]

    CloudWatch[CloudWatch Logs & Metrics]
    SageMaker -.-> CloudWatch
    Lambda1 -.-> CloudWatch

    CDK[CDK Stack TypeScript] -.->|Deploys| APIG
    CDK -.->|Deploys| Lambda1
    CDK -.->|Deploys| SageMaker
    CDK -.->|Deploys| S3

    style Client fill:#ff6b6b
    style SageMaker fill:#4ecdc4
    style Lambda1 fill:#feca57
    style CDK fill:#95e1d3
    style Model fill:#a29bfe

🔄 Request Flow

sequenceDiagram
    participant Client
    participant API Gateway
    participant Lambda
    participant SageMaker
    participant S3
    participant DynamoDB

    Client->>API Gateway: POST /predict
    API Gateway->>Lambda: Invoke with payload
    Lambda->>DynamoDB: Store request metadata
    Lambda->>SageMaker: InvokeEndpoint
    SageMaker->>SageMaker: Run inference
    SageMaker-->>Lambda: Return prediction
    Lambda->>S3: Store result (if large)
    Lambda->>DynamoDB: Update status
    Lambda-->>API Gateway: Return response
    API Gateway-->>Client: JSON response

🎯 Architecture Decisions

Decision	Choice	Reasoning
Model Hosting	SageMaker	Auto-scaling, managed infrastructure, optimized for ML
API Layer	Lambda + API Gateway	Serverless, cost-effective, scales automatically
Storage	S3 + DynamoDB	Durable storage for results, fast metadata access
IaC Tool	AWS CDK (TypeScript)	Type-safe, familiar language, great AWS integration
Async Processing	SQS (Optional)	Handles long-running inference, decouples components

📦 Prerequisites and Setup

🛠️ Required Tools

 1# Node.js and npm
 2node --version  # v18+ recommended
 3npm --version
 4
 5# AWS CDK
 6npm install -g aws-cdk
 7cdk --version
 8
 9# AWS CLI
10aws --version
11aws configure  # Set up credentials
12
13# Python (for model code)
14python3 --version  # 3.9+ recommended
15pip3 --version
16
17# Docker (for building container images)
18docker --version

🔑 AWS Credentials Setup

1# Configure AWS credentials
2aws configure
3
4# Verify credentials
5aws sts get-caller-identity
6
7# Bootstrap CDK (first time only)
8cdk bootstrap aws://ACCOUNT-ID/REGION

🏗️ CDK Project Structure

ml-inference-cdk/
├── bin/
│   └── ml-inference.ts          # CDK app entry point
├── lib/
│   ├── stacks/
│   │   ├── vpc-stack.ts         # VPC configuration
│   │   ├── sagemaker-stack.ts   # SageMaker endpoint
│   │   ├── lambda-stack.ts      # Lambda functions
│   │   └── api-stack.ts         # API Gateway
│   ├── constructs/
│   │   ├── sagemaker-model.ts   # Reusable SageMaker construct
│   │   └── lambda-api.ts        # Lambda + API construct
│   └── config/
│       ├── model-config.ts      # Model configuration
│       └── app-config.ts        # Application config
├── lambda/
│   ├── predict/
│   │   ├── index.ts             # Prediction Lambda
│   │   └── package.json
│   └── async-predict/
│       ├── index.ts             # Async prediction Lambda
│       └── package.json
├── model/
│   ├── inference.py             # SageMaker inference script
│   ├── requirements.txt         # Python dependencies
│   └── Dockerfile               # Container image
├── test/
│   └── ml-inference.test.ts     # CDK tests
├── cdk.json                     # CDK configuration
├── tsconfig.json                # TypeScript config
└── package.json                 # Node.js dependencies

🚀 Step 1: Initialize CDK Project

📝 Create New CDK Project

 1# Create project directory
 2mkdir ml-inference-cdk
 3cd ml-inference-cdk
 4
 5# Initialize CDK project
 6cdk init app --language=typescript
 7
 8# Install dependencies
 9npm install @aws-cdk/aws-sagemaker-alpha
10npm install @aws-cdk/aws-apigatewayv2-alpha @aws-cdk/aws-apigatewayv2-integrations-alpha

⚙️ Configuration Files

 1// lib/config/model-config.ts
 2export interface ModelConfig {
 3  modelId: string;
 4  modelVersion: string;
 5  instanceType: string;
 6  instanceCount: number;
 7  containerImage: string;
 8  environment: Record<string, string>;
 9}
10
11export const modelConfigs = {
12  development: {
13    modelId: 'stabilityai/stable-diffusion-xl-base-1.0',
14    modelVersion: '1.0',
15    instanceType: 'ml.g4dn.xlarge',
16    instanceCount: 1,
17    containerImage: '', // Will be set after build
18    environment: {
19      MODEL_CACHE_DIR: '/opt/ml/model',
20      TRANSFORMERS_CACHE: '/opt/ml/model',
21      HF_HOME: '/opt/ml/model'
22    }
23  },
24  production: {
25    modelId: 'stabilityai/stable-diffusion-xl-base-1.0',
26    modelVersion: '1.0',
27    instanceType: 'ml.g4dn.2xlarge',
28    instanceCount: 2,
29    containerImage: '',
30    environment: {
31      MODEL_CACHE_DIR: '/opt/ml/model',
32      TRANSFORMERS_CACHE: '/opt/ml/model',
33      HF_HOME: '/opt/ml/model'
34    }
35  }
36} as const;
37
38export type Environment = keyof typeof modelConfigs;

 1// lib/config/app-config.ts
 2import * as cdk from 'aws-cdk-lib';
 3
 4export interface AppConfig {
 5  environment: string;
 6  region: string;
 7  account: string;
 8  vpcCidr: string;
 9  enableVpc: boolean;
10  tags: Record<string, string>;
11}
12
13export function getAppConfig(app: cdk.App): AppConfig {
14  const environment = app.node.tryGetContext('environment') || 'development';
15
16  return {
17    environment,
18    region: process.env.CDK_DEFAULT_REGION || 'us-east-1',
19    account: process.env.CDK_DEFAULT_ACCOUNT || '',
20    vpcCidr: '10.0.0.0/16',
21    enableVpc: environment === 'production',
22    tags: {
23      Environment: environment,
24      Project: 'MLInference',
25      ManagedBy: 'CDK'
26    }
27  };
28}

🐳 Step 2: Create SageMaker Inference Container

🐍 Inference Script

  1# model/inference.py
  2import json
  3import os
  4import torch
  5from diffusers import DiffusionPipeline
  6import base64
  7from io import BytesIO
  8from PIL import Image
  9
 10class ModelHandler:
 11    def __init__(self):
 12        self.model = None
 13        self.device = "cuda" if torch.cuda.is_available() else "cpu"
 14        print(f"Using device: {self.device}")
 15
 16    def load_model(self):
 17        """Load the Hugging Face model"""
 18        model_id = os.environ.get('MODEL_ID', 'stabilityai/stable-diffusion-xl-base-1.0')
 19
 20        print(f"Loading model: {model_id}")
 21
 22        self.model = DiffusionPipeline.from_pretrained(
 23            model_id,
 24            torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
 25            use_safetensors=True,
 26            variant="fp16" if self.device == "cuda" else None
 27        )
 28
 29        self.model = self.model.to(self.device)
 30
 31        # Enable memory efficient attention if available
 32        if hasattr(self.model, 'enable_xformers_memory_efficient_attention'):
 33            try:
 34                self.model.enable_xformers_memory_efficient_attention()
 35            except Exception as e:
 36                print(f"Could not enable xformers: {e}")
 37
 38        print("Model loaded successfully")
 39
 40    def preprocess(self, request_body):
 41        """Preprocess the input request"""
 42        try:
 43            if isinstance(request_body, bytes):
 44                request_body = request_body.decode('utf-8')
 45
 46            data = json.loads(request_body)
 47
 48            prompt = data.get('prompt', '')
 49            negative_prompt = data.get('negative_prompt', '')
 50            num_inference_steps = data.get('num_inference_steps', 50)
 51            guidance_scale = data.get('guidance_scale', 7.5)
 52            width = data.get('width', 1024)
 53            height = data.get('height', 1024)
 54            seed = data.get('seed', None)
 55
 56            return {
 57                'prompt': prompt,
 58                'negative_prompt': negative_prompt,
 59                'num_inference_steps': num_inference_steps,
 60                'guidance_scale': guidance_scale,
 61                'width': width,
 62                'height': height,
 63                'seed': seed
 64            }
 65        except Exception as e:
 66            raise ValueError(f"Error preprocessing request: {str(e)}")
 67
 68    def predict(self, data):
 69        """Run inference"""
 70        if self.model is None:
 71            self.load_model()
 72
 73        # Set seed for reproducibility
 74        if data['seed'] is not None:
 75            generator = torch.Generator(device=self.device).manual_seed(data['seed'])
 76        else:
 77            generator = None
 78
 79        # Generate image
 80        with torch.no_grad():
 81            image = self.model(
 82                prompt=data['prompt'],
 83                negative_prompt=data['negative_prompt'],
 84                num_inference_steps=data['num_inference_steps'],
 85                guidance_scale=data['guidance_scale'],
 86                width=data['width'],
 87                height=data['height'],
 88                generator=generator
 89            ).images[0]
 90
 91        return image
 92
 93    def postprocess(self, image):
 94        """Convert image to base64"""
 95        buffered = BytesIO()
 96        image.save(buffered, format="PNG")
 97        img_str = base64.b64encode(buffered.getvalue()).decode()
 98
 99        return {
100            'image': img_str,
101            'format': 'png'
102        }
103
104# Global model handler
105model_handler = ModelHandler()
106
107def model_fn(model_dir):
108    """Load model - called once when container starts"""
109    model_handler.load_model()
110    return model_handler
111
112def input_fn(request_body, request_content_type):
113    """Parse input data"""
114    if request_content_type == 'application/json':
115        return model_handler.preprocess(request_body)
116    else:
117        raise ValueError(f"Unsupported content type: {request_content_type}")
118
119def predict_fn(data, model):
120    """Run prediction"""
121    return model.predict(data)
122
123def output_fn(prediction, response_content_type):
124    """Format output"""
125    if response_content_type == 'application/json':
126        return json.dumps(model_handler.postprocess(prediction))
127    else:
128        raise ValueError(f"Unsupported content type: {response_content_type}")

📦 Requirements and Dockerfile

1# model/requirements.txt
2torch==2.1.0
3diffusers==0.24.0
4transformers==4.36.0
5accelerate==0.25.0
6safetensors==0.4.1
7pillow==10.1.0
8xformers==0.0.23  # Optional, for memory efficiency

 1# model/Dockerfile
 2FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
 3
 4# Set working directory
 5WORKDIR /opt/ml/code
 6
 7# Install system dependencies
 8RUN apt-get update && apt-get install -y \
 9    git \
10    wget \
11    && rm -rf /var/lib/apt/lists/*
12
13# Copy requirements and install Python dependencies
14COPY requirements.txt .
15RUN pip install --no-cache-dir -r requirements.txt
16
17# Copy inference script
18COPY inference.py .
19
20# Set environment variables
21ENV PYTHONUNBUFFERED=1
22ENV MODEL_CACHE_DIR=/opt/ml/model
23ENV TRANSFORMERS_CACHE=/opt/ml/model
24ENV HF_HOME=/opt/ml/model
25
26# SageMaker uses port 8080
27ENV SAGEMAKER_BIND_TO_PORT=8080
28ENV SAGEMAKER_PROGRAM=inference.py
29
30# Health check
31HEALTHCHECK --interval=30s --timeout=10s --start-period=5m --retries=3 \
32    CMD wget --quiet --tries=1 --spider http://localhost:8080/ping || exit 1
33
34ENTRYPOINT ["python", "inference.py"]

🏗️ Step 3: CDK Stacks Implementation

🌐 VPC Stack (Optional but Recommended)

 1// lib/stacks/vpc-stack.ts
 2import * as cdk from 'aws-cdk-lib';
 3import * as ec2 from 'aws-cdk-lib/aws-ec2';
 4import { Construct } from 'constructs';
 5
 6export interface VpcStackProps extends cdk.StackProps {
 7  vpcCidr: string;
 8}
 9
10export class VpcStack extends cdk.Stack {
11  public readonly vpc: ec2.Vpc;
12
13  constructor(scope: Construct, id: string, props: VpcStackProps) {
14    super(scope, id, props);
15
16    // Create VPC with public and private subnets
17    this.vpc = new ec2.Vpc(this, 'MLInferenceVpc', {
18      ipAddresses: ec2.IpAddresses.cidr(props.vpcCidr),
19      maxAzs: 2,
20      natGateways: 1, // Cost optimization
21      subnetConfiguration: [
22        {
23          name: 'Public',
24          subnetType: ec2.SubnetType.PUBLIC,
25          cidrMask: 24,
26        },
27        {
28          name: 'Private',
29          subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
30          cidrMask: 24,
31        },
32      ],
33      enableDnsHostnames: true,
34      enableDnsSupport: true,
35    });
36
37    // VPC Endpoints for cost optimization (avoid NAT charges)
38    this.vpc.addInterfaceEndpoint('SageMakerRuntimeEndpoint', {
39      service: ec2.InterfaceVpcEndpointAwsService.SAGEMAKER_RUNTIME,
40    });
41
42    this.vpc.addGatewayEndpoint('S3Endpoint', {
43      service: ec2.GatewayVpcEndpointAwsService.S3,
44    });
45
46    // Output VPC ID
47    new cdk.CfnOutput(this, 'VpcId', {
48      value: this.vpc.vpcId,
49      description: 'VPC ID',
50    });
51  }
52}

🤖 SageMaker Stack

  1// lib/stacks/sagemaker-stack.ts
  2import * as cdk from 'aws-cdk-lib';
  3import * as sagemaker from 'aws-cdk-lib/aws-sagemaker';
  4import * as iam from 'aws-cdk-lib/aws-iam';
  5import * as ec2 from 'aws-cdk-lib/aws-ec2';
  6import * as ecr from 'aws-cdk-lib/aws-ecr';
  7import { Construct } from 'constructs';
  8import { ModelConfig } from '../config/model-config';
  9
 10export interface SageMakerStackProps extends cdk.StackProps {
 11  modelConfig: ModelConfig;
 12  vpc?: ec2.Vpc;
 13  ecrRepository: ecr.Repository;
 14}
 15
 16export class SageMakerStack extends cdk.Stack {
 17  public readonly endpointName: string;
 18  public readonly endpoint: sagemaker.CfnEndpoint;
 19
 20  constructor(scope: Construct, id: string, props: SageMakerStackProps) {
 21    super(scope, id, props);
 22
 23    const { modelConfig, vpc, ecrRepository } = props;
 24
 25    // IAM Role for SageMaker
 26    const sagemakerRole = new iam.Role(this, 'SageMakerExecutionRole', {
 27      assumedBy: new iam.ServicePrincipal('sagemaker.amazonaws.com'),
 28      managedPolicies: [
 29        iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonSageMakerFullAccess'),
 30      ],
 31    });
 32
 33    // Grant ECR access
 34    ecrRepository.grantPull(sagemakerRole);
 35
 36    // Model
 37    const model = new sagemaker.CfnModel(this, 'HuggingFaceModel', {
 38      executionRoleArn: sagemakerRole.roleArn,
 39      primaryContainer: {
 40        image: `${ecrRepository.repositoryUri}:latest`,
 41        mode: 'SingleModel',
 42        environment: {
 43          ...modelConfig.environment,
 44          MODEL_ID: modelConfig.modelId,
 45        },
 46      },
 47      vpcConfig: vpc
 48        ? {
 49            subnets: vpc.privateSubnets.map((subnet) => subnet.subnetId),
 50            securityGroupIds: [
 51              new ec2.SecurityGroup(this, 'SageMakerSecurityGroup', {
 52                vpc,
 53                description: 'Security group for SageMaker endpoint',
 54                allowAllOutbound: true,
 55              }).securityGroupId,
 56            ],
 57          }
 58        : undefined,
 59    });
 60
 61    // Endpoint Configuration
 62    const endpointConfig = new sagemaker.CfnEndpointConfig(
 63      this,
 64      'EndpointConfig',
 65      {
 66        productionVariants: [
 67          {
 68            modelName: model.attrModelName,
 69            variantName: 'AllTraffic',
 70            initialInstanceCount: modelConfig.instanceCount,
 71            instanceType: modelConfig.instanceType,
 72            initialVariantWeight: 1.0,
 73          },
 74        ],
 75        // Auto-scaling configuration
 76        asyncInferenceConfig: {
 77          outputConfig: {
 78            s3OutputPath: `s3://${cdk.Aws.ACCOUNT_ID}-ml-inference-output`,
 79          },
 80        },
 81      }
 82    );
 83
 84    endpointConfig.addDependency(model);
 85
 86    // Endpoint
 87    this.endpointName = `ml-inference-endpoint-${cdk.Aws.STACK_NAME}`;
 88    this.endpoint = new sagemaker.CfnEndpoint(this, 'Endpoint', {
 89      endpointName: this.endpointName,
 90      endpointConfigName: endpointConfig.attrEndpointConfigName,
 91    });
 92
 93    this.endpoint.addDependency(endpointConfig);
 94
 95    // Auto-scaling
 96    const scalableTarget = new cdk.aws_applicationautoscaling.ScalableTarget(
 97      this,
 98      'ScalableTarget',
 99      {
100        serviceNamespace: cdk.aws_applicationautoscaling.ServiceNamespace.SAGEMAKER,
101        resourceId: `endpoint/${this.endpointName}/variant/AllTraffic`,
102        scalableDimension: 'sagemaker:variant:DesiredInstanceCount',
103        minCapacity: 1,
104        maxCapacity: 5,
105      }
106    );
107
108    scalableTarget.node.addDependency(this.endpoint);
109
110    // Target tracking scaling policy
111    scalableTarget.scaleToTrackMetric('TargetTracking', {
112      targetValue: 70,
113      predefinedMetric: cdk.aws_applicationautoscaling.PredefinedMetric.SAGEMAKER_VARIANT_INVOCATIONS_PER_INSTANCE,
114      scaleInCooldown: cdk.Duration.seconds(300),
115      scaleOutCooldown: cdk.Duration.seconds(60),
116    });
117
118    // Outputs
119    new cdk.CfnOutput(this, 'EndpointName', {
120      value: this.endpointName,
121      description: 'SageMaker Endpoint Name',
122    });
123
124    new cdk.CfnOutput(this, 'EndpointArn', {
125      value: this.endpoint.ref,
126      description: 'SageMaker Endpoint ARN',
127    });
128  }
129}

⚡ Lambda Stack

 1// lib/stacks/lambda-stack.ts
 2import * as cdk from 'aws-cdk-lib';
 3import * as lambda from 'aws-cdk-lib/aws-lambda';
 4import * as iam from 'aws-cdk-lib/aws-iam';
 5import * as logs from 'aws-cdk-lib/aws-logs';
 6import * as s3 from 'aws-cdk-lib/aws-s3';
 7import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
 8import { Construct } from 'constructs';
 9import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
10import * as path from 'path';
11
12export interface LambdaStackProps extends cdk.StackProps {
13  endpointName: string;
14  resultsBucket: s3.Bucket;
15  metadataTable: dynamodb.Table;
16}
17
18export class LambdaStack extends cdk.Stack {
19  public readonly predictFunction: lambda.Function;
20  public readonly statusFunction: lambda.Function;
21
22  constructor(scope: Construct, id: string, props: LambdaStackProps) {
23    super(scope, id, props);
24
25    const { endpointName, resultsBucket, metadataTable } = props;
26
27    // Lambda execution role
28    const lambdaRole = new iam.Role(this, 'LambdaExecutionRole', {
29      assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
30      managedPolicies: [
31        iam.ManagedPolicy.fromAwsManagedPolicyName(
32          'service-role/AWSLambdaBasicExecutionRole'
33        ),
34      ],
35    });
36
37    // Grant SageMaker invoke permissions
38    lambdaRole.addToPolicy(
39      new iam.PolicyStatement({
40        actions: ['sagemaker:InvokeEndpoint'],
41        resources: [
42          `arn:aws:sagemaker:${cdk.Aws.REGION}:${cdk.Aws.ACCOUNT_ID}:endpoint/${endpointName}`,
43        ],
44      })
45    );
46
47    // Grant S3 permissions
48    resultsBucket.grantReadWrite(lambdaRole);
49
50    // Grant DynamoDB permissions
51    metadataTable.grantReadWriteData(lambdaRole);
52
53    // Prediction Lambda Function
54    this.predictFunction = new NodejsFunction(this, 'PredictFunction', {
55      runtime: lambda.Runtime.NODEJS_20_X,
56      handler: 'handler',
57      entry: path.join(__dirname, '../../lambda/predict/index.ts'),
58      timeout: cdk.Duration.minutes(5),
59      memorySize: 512,
60      role: lambdaRole,
61      environment: {
62        ENDPOINT_NAME: endpointName,
63        RESULTS_BUCKET: resultsBucket.bucketName,
64        METADATA_TABLE: metadataTable.tableName,
65        REGION: cdk.Aws.REGION,
66      },
67      logRetention: logs.RetentionDays.ONE_WEEK,
68      bundling: {
69        minify: true,
70        sourceMap: true,
71        target: 'es2020',
72      },
73    });
74
75    // Status Check Lambda Function
76    this.statusFunction = new NodejsFunction(this, 'StatusFunction', {
77      runtime: lambda.Runtime.NODEJS_20_X,
78      handler: 'handler',
79      entry: path.join(__dirname, '../../lambda/status/index.ts'),
80      timeout: cdk.Duration.seconds(30),
81      memorySize: 256,
82      role: lambdaRole,
83      environment: {
84        RESULTS_BUCKET: resultsBucket.bucketName,
85        METADATA_TABLE: metadataTable.tableName,
86        REGION: cdk.Aws.REGION,
87      },
88      logRetention: logs.RetentionDays.ONE_WEEK,
89    });
90
91    // Outputs
92    new cdk.CfnOutput(this, 'PredictFunctionArn', {
93      value: this.predictFunction.functionArn,
94      description: 'Predict Lambda Function ARN',
95    });
96  }
97}

🌐 API Gateway Stack

  1// lib/stacks/api-stack.ts
  2import * as cdk from 'aws-cdk-lib';
  3import * as apigateway from 'aws-cdk-lib/aws-apigateway';
  4import * as lambda from 'aws-cdk-lib/aws-lambda';
  5import * as logs from 'aws-cdk-lib/aws-logs';
  6import { Construct } from 'constructs';
  7
  8export interface ApiStackProps extends cdk.StackProps {
  9  predictFunction: lambda.Function;
 10  statusFunction: lambda.Function;
 11}
 12
 13export class ApiStack extends cdk.Stack {
 14  public readonly api: apigateway.RestApi;
 15
 16  constructor(scope: Construct, id: string, props: ApiStackProps) {
 17    super(scope, id, props);
 18
 19    const { predictFunction, statusFunction } = props;
 20
 21    // CloudWatch Logs for API Gateway
 22    const logGroup = new logs.LogGroup(this, 'ApiGatewayLogs', {
 23      retention: logs.RetentionDays.ONE_WEEK,
 24      removalPolicy: cdk.RemovalPolicy.DESTROY,
 25    });
 26
 27    // REST API
 28    this.api = new apigateway.RestApi(this, 'MLInferenceApi', {
 29      restApiName: 'ML Inference API',
 30      description: 'API for ML model inference',
 31      deployOptions: {
 32        stageName: 'prod',
 33        loggingLevel: apigateway.MethodLoggingLevel.INFO,
 34        dataTraceEnabled: true,
 35        accessLogDestination: new apigateway.LogGroupLogDestination(logGroup),
 36        accessLogFormat: apigateway.AccessLogFormat.jsonWithStandardFields(),
 37        throttlingRateLimit: 100,
 38        throttlingBurstLimit: 200,
 39      },
 40      defaultCorsPreflightOptions: {
 41        allowOrigins: apigateway.Cors.ALL_ORIGINS,
 42        allowMethods: apigateway.Cors.ALL_METHODS,
 43        allowHeaders: ['Content-Type', 'Authorization'],
 44      },
 45    });
 46
 47    // API Key for authentication
 48    const apiKey = this.api.addApiKey('ApiKey', {
 49      apiKeyName: 'MLInferenceApiKey',
 50    });
 51
 52    const usagePlan = this.api.addUsagePlan('UsagePlan', {
 53      name: 'Standard',
 54      throttle: {
 55        rateLimit: 100,
 56        burstLimit: 200,
 57      },
 58      quota: {
 59        limit: 10000,
 60        period: apigateway.Period.DAY,
 61      },
 62    });
 63
 64    usagePlan.addApiKey(apiKey);
 65    usagePlan.addApiStage({
 66      stage: this.api.deploymentStage,
 67    });
 68
 69    // Request validator
 70    const requestValidator = new apigateway.RequestValidator(
 71      this,
 72      'RequestValidator',
 73      {
 74        restApi: this.api,
 75        validateRequestBody: true,
 76        validateRequestParameters: true,
 77      }
 78    );
 79
 80    // Request model
 81    const requestModel = this.api.addModel('PredictRequestModel', {
 82      contentType: 'application/json',
 83      modelName: 'PredictRequest',
 84      schema: {
 85        type: apigateway.JsonSchemaType.OBJECT,
 86        required: ['prompt'],
 87        properties: {
 88          prompt: { type: apigateway.JsonSchemaType.STRING },
 89          negative_prompt: { type: apigateway.JsonSchemaType.STRING },
 90          num_inference_steps: { type: apigateway.JsonSchemaType.INTEGER },
 91          guidance_scale: { type: apigateway.JsonSchemaType.NUMBER },
 92          width: { type: apigateway.JsonSchemaType.INTEGER },
 93          height: { type: apigateway.JsonSchemaType.INTEGER },
 94          seed: { type: apigateway.JsonSchemaType.INTEGER },
 95        },
 96      },
 97    });
 98
 99    // /predict endpoint
100    const predictResource = this.api.root.addResource('predict');
101    predictResource.addMethod(
102      'POST',
103      new apigateway.LambdaIntegration(predictFunction, {
104        proxy: true,
105      }),
106      {
107        apiKeyRequired: true,
108        requestValidator,
109        requestModels: {
110          'application/json': requestModel,
111        },
112      }
113    );
114
115    // /status/{jobId} endpoint
116    const statusResource = this.api.root.addResource('status');
117    const statusJobResource = statusResource.addResource('{jobId}');
118    statusJobResource.addMethod(
119      'GET',
120      new apigateway.LambdaIntegration(statusFunction, {
121        proxy: true,
122      }),
123      {
124        apiKeyRequired: true,
125      }
126    );
127
128    // Outputs
129    new cdk.CfnOutput(this, 'ApiUrl', {
130      value: this.api.url,
131      description: 'API Gateway URL',
132    });
133
134    new cdk.CfnOutput(this, 'ApiKeyId', {
135      value: apiKey.keyId,
136      description: 'API Key ID',
137    });
138  }
139}

🔧 Step 4: Lambda Function Implementation

🎯 Prediction Lambda

  1// lambda/predict/index.ts
  2import {
  3  SageMakerRuntimeClient,
  4  InvokeEndpointCommand,
  5} from '@aws-sdk/client-sagemaker-runtime';
  6import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
  7import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
  8import { DynamoDBDocumentClient, PutCommand } from '@aws-sdk/lib-dynamodb';
  9import { APIGatewayProxyEvent, APIGatewayProxyResult } from 'aws-lambda';
 10import { v4 as uuidv4 } from 'uuid';
 11
 12const sagemakerClient = new SageMakerRuntimeClient({
 13  region: process.env.REGION,
 14});
 15
 16const s3Client = new S3Client({ region: process.env.REGION });
 17
 18const dynamoClient = DynamoDBDocumentClient.from(
 19  new DynamoDBClient({ region: process.env.REGION })
 20);
 21
 22interface PredictRequest {
 23  prompt: string;
 24  negative_prompt?: string;
 25  num_inference_steps?: number;
 26  guidance_scale?: number;
 27  width?: number;
 28  height?: number;
 29  seed?: number;
 30}
 31
 32interface PredictResponse {
 33  jobId: string;
 34  status: 'processing' | 'completed' | 'failed';
 35  message: string;
 36  result?: {
 37    image: string;
 38    s3Url?: string;
 39  };
 40}
 41
 42export const handler = async (
 43  event: APIGatewayProxyEvent
 44): Promise<APIGatewayProxyResult> => {
 45  console.log('Event:', JSON.stringify(event, null, 2));
 46
 47  try {
 48    // Parse request body
 49    if (!event.body) {
 50      return {
 51        statusCode: 400,
 52        body: JSON.stringify({ error: 'Request body is required' }),
 53      };
 54    }
 55
 56    const request: PredictRequest = JSON.parse(event.body);
 57
 58    // Validate request
 59    if (!request.prompt || request.prompt.trim() === '') {
 60      return {
 61        statusCode: 400,
 62        body: JSON.stringify({ error: 'Prompt is required' }),
 63      };
 64    }
 65
 66    // Generate job ID
 67    const jobId = uuidv4();
 68    const timestamp = new Date().toISOString();
 69
 70    // Store job metadata in DynamoDB
 71    await dynamoClient.send(
 72      new PutCommand({
 73        TableName: process.env.METADATA_TABLE,
 74        Item: {
 75          jobId,
 76          status: 'processing',
 77          prompt: request.prompt,
 78          timestamp,
 79          ttl: Math.floor(Date.now() / 1000) + 86400, // 24 hours
 80        },
 81      })
 82    );
 83
 84    // Prepare SageMaker request
 85    const sagemakerPayload = {
 86      prompt: request.prompt,
 87      negative_prompt: request.negative_prompt || '',
 88      num_inference_steps: request.num_inference_steps || 50,
 89      guidance_scale: request.guidance_scale || 7.5,
 90      width: request.width || 1024,
 91      height: request.height || 1024,
 92      seed: request.seed || null,
 93    };
 94
 95    console.log('Invoking SageMaker endpoint:', process.env.ENDPOINT_NAME);
 96
 97    // Invoke SageMaker endpoint
 98    const command = new InvokeEndpointCommand({
 99      EndpointName: process.env.ENDPOINT_NAME,
100      ContentType: 'application/json',
101      Body: JSON.stringify(sagemakerPayload),
102    });
103
104    const response = await sagemakerClient.send(command);
105
106    // Parse response
107    const result = JSON.parse(new TextDecoder().decode(response.Body));
108
109    // Store image in S3 if it's large
110    let s3Url: string | undefined;
111
112    if (result.image && result.image.length > 100000) {
113      // Store in S3 if > 100KB
114      const s3Key = `results/${jobId}.png`;
115
116      await s3Client.send(
117        new PutObjectCommand({
118          Bucket: process.env.RESULTS_BUCKET,
119          Key: s3Key,
120          Body: Buffer.from(result.image, 'base64'),
121          ContentType: 'image/png',
122        })
123      );
124
125      s3Url = `s3://${process.env.RESULTS_BUCKET}/${s3Key}`;
126
127      // Update DynamoDB with result
128      await dynamoClient.send(
129        new PutCommand({
130          TableName: process.env.METADATA_TABLE,
131          Item: {
132            jobId,
133            status: 'completed',
134            prompt: request.prompt,
135            timestamp,
136            s3Url,
137            completedAt: new Date().toISOString(),
138            ttl: Math.floor(Date.now() / 1000) + 86400,
139          },
140        })
141      );
142
143      // Return response with S3 URL
144      const responseBody: PredictResponse = {
145        jobId,
146        status: 'completed',
147        message: 'Inference completed successfully',
148        result: {
149          image: result.image.substring(0, 100) + '...', // Truncated
150          s3Url,
151        },
152      };
153
154      return {
155        statusCode: 200,
156        headers: {
157          'Content-Type': 'application/json',
158          'Access-Control-Allow-Origin': '*',
159        },
160        body: JSON.stringify(responseBody),
161      };
162    } else {
163      // Update DynamoDB
164      await dynamoClient.send(
165        new PutCommand({
166          TableName: process.env.METADATA_TABLE,
167          Item: {
168            jobId,
169            status: 'completed',
170            prompt: request.prompt,
171            timestamp,
172            completedAt: new Date().toISOString(),
173            ttl: Math.floor(Date.now() / 1000) + 86400,
174          },
175        })
176      );
177
178      // Return response with inline image
179      const responseBody: PredictResponse = {
180        jobId,
181        status: 'completed',
182        message: 'Inference completed successfully',
183        result: {
184          image: result.image,
185        },
186      };
187
188      return {
189        statusCode: 200,
190        headers: {
191          'Content-Type': 'application/json',
192          'Access-Control-Allow-Origin': '*',
193        },
194        body: JSON.stringify(responseBody),
195      };
196    }
197  } catch (error) {
198    console.error('Error:', error);
199
200    return {
201      statusCode: 500,
202      headers: {
203        'Content-Type': 'application/json',
204        'Access-Control-Allow-Origin': '*',
205      },
206      body: JSON.stringify({
207        error: 'Internal server error',
208        message: error instanceof Error ? error.message : 'Unknown error',
209      }),
210    };
211  }
212};

📊 Status Lambda

 1// lambda/status/index.ts
 2import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
 3import { DynamoDBDocumentClient, GetCommand } from '@aws-sdk/lib-dynamodb';
 4import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
 5import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
 6import { APIGatewayProxyEvent, APIGatewayProxyResult } from 'aws-lambda';
 7
 8const dynamoClient = DynamoDBDocumentClient.from(
 9  new DynamoDBClient({ region: process.env.REGION })
10);
11
12const s3Client = new S3Client({ region: process.env.REGION });
13
14export const handler = async (
15  event: APIGatewayProxyEvent
16): Promise<APIGatewayProxyResult> => {
17  try {
18    const jobId = event.pathParameters?.jobId;
19
20    if (!jobId) {
21      return {
22        statusCode: 400,
23        body: JSON.stringify({ error: 'Job ID is required' }),
24      };
25    }
26
27    // Get job metadata from DynamoDB
28    const result = await dynamoClient.send(
29      new GetCommand({
30        TableName: process.env.METADATA_TABLE,
31        Key: { jobId },
32      })
33    );
34
35    if (!result.Item) {
36      return {
37        statusCode: 404,
38        body: JSON.stringify({ error: 'Job not found' }),
39      };
40    }
41
42    // Generate presigned URL if result is in S3
43    let presignedUrl: string | undefined;
44
45    if (result.Item.s3Url) {
46      const s3Key = result.Item.s3Url.replace(
47        `s3://${process.env.RESULTS_BUCKET}/`,
48        ''
49      );
50
51      presignedUrl = await getSignedUrl(
52        s3Client,
53        new GetObjectCommand({
54          Bucket: process.env.RESULTS_BUCKET,
55          Key: s3Key,
56        }),
57        { expiresIn: 3600 } // 1 hour
58      );
59    }
60
61    return {
62      statusCode: 200,
63      headers: {
64        'Content-Type': 'application/json',
65        'Access-Control-Allow-Origin': '*',
66      },
67      body: JSON.stringify({
68        jobId: result.Item.jobId,
69        status: result.Item.status,
70        prompt: result.Item.prompt,
71        timestamp: result.Item.timestamp,
72        completedAt: result.Item.completedAt,
73        ...(presignedUrl && { downloadUrl: presignedUrl }),
74      }),
75    };
76  } catch (error) {
77    console.error('Error:', error);
78
79    return {
80      statusCode: 500,
81      headers: {
82        'Content-Type': 'application/json',
83        'Access-Control-Allow-Origin': '*',
84      },
85      body: JSON.stringify({
86        error: 'Internal server error',
87        message: error instanceof Error ? error.message : 'Unknown error',
88      }),
89    };
90  }
91};

🏗️ Step 5: Main CDK App

 1// bin/ml-inference.ts
 2#!/usr/bin/env node
 3import 'source-map-support/register';
 4import * as cdk from 'aws-cdk-lib';
 5import * as s3 from 'aws-cdk-lib/aws-s3';
 6import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
 7import * as ecr from 'aws-cdk-lib/aws-ecr';
 8import { VpcStack } from '../lib/stacks/vpc-stack';
 9import { SageMakerStack } from '../lib/stacks/sagemaker-stack';
10import { LambdaStack } from '../lib/stacks/lambda-stack';
11import { ApiStack } from '../lib/stacks/api-stack';
12import { getAppConfig } from '../lib/config/app-config';
13import { modelConfigs } from '../lib/config/model-config';
14
15const app = new cdk.App();
16const config = getAppConfig(app);
17
18// Shared resources
19const resultsBucket = new s3.Bucket(app, 'ResultsBucket', {
20  bucketName: `${config.account}-ml-inference-results`,
21  removalPolicy: cdk.RemovalPolicy.DESTROY,
22  autoDeleteObjects: true,
23  encryption: s3.BucketEncryption.S3_MANAGED,
24  lifecycleRules: [
25    {
26      expiration: cdk.Duration.days(7),
27    },
28  ],
29});
30
31const metadataTable = new dynamodb.Table(app, 'MetadataTable', {
32  tableName: 'ml-inference-jobs',
33  partitionKey: { name: 'jobId', type: dynamodb.AttributeType.STRING },
34  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
35  removalPolicy: cdk.RemovalPolicy.DESTROY,
36  timeToLiveAttribute: 'ttl',
37  pointInTimeRecovery: true,
38});
39
40const ecrRepository = new ecr.Repository(app, 'ModelRepository', {
41  repositoryName: 'ml-inference-model',
42  removalPolicy: cdk.RemovalPolicy.DESTROY,
43  autoDeleteImages: true,
44});
45
46// VPC Stack (optional)
47let vpcStack: VpcStack | undefined;
48if (config.enableVpc) {
49  vpcStack = new VpcStack(app, 'VpcStack', {
50    vpcCidr: config.vpcCidr,
51    env: {
52      account: config.account,
53      region: config.region,
54    },
55  });
56}
57
58// SageMaker Stack
59const sagemakerStack = new SageMakerStack(app, 'SageMakerStack', {
60  modelConfig: modelConfigs[config.environment as keyof typeof modelConfigs],
61  vpc: vpcStack?.vpc,
62  ecrRepository,
63  env: {
64    account: config.account,
65    region: config.region,
66  },
67});
68
69// Lambda Stack
70const lambdaStack = new LambdaStack(app, 'LambdaStack', {
71  endpointName: sagemakerStack.endpointName,
72  resultsBucket,
73  metadataTable,
74  env: {
75    account: config.account,
76    region: config.region,
77  },
78});
79
80lambdaStack.addDependency(sagemakerStack);
81
82// API Stack
83const apiStack = new ApiStack(app, 'ApiStack', {
84  predictFunction: lambdaStack.predictFunction,
85  statusFunction: lambdaStack.statusFunction,
86  env: {
87    account: config.account,
88    region: config.region,
89  },
90});
91
92apiStack.addDependency(lambdaStack);
93
94// Add tags to all resources
95Object.entries(config.tags).forEach(([key, value]) => {
96  cdk.Tags.of(app).add(key, value);
97});
98
99app.synth();

🚀 Step 6: Deployment

📦 Build and Push Docker Image

 1# Build Docker image
 2cd model
 3docker build -t ml-inference-model .
 4
 5# Get ECR login
 6aws ecr get-login-password --region us-east-1 | \
 7  docker login --username AWS --password-stdin \
 8  ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
 9
10# Tag image
11docker tag ml-inference-model:latest \
12  ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/ml-inference-model:latest
13
14# Push to ECR
15docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/ml-inference-model:latest

🚀 Deploy CDK Stacks

 1# Install dependencies
 2npm install
 3
 4# Synthesize CloudFormation templates
 5cdk synth
 6
 7# Deploy all stacks
 8cdk deploy --all --require-approval never
 9
10# Or deploy individually
11cdk deploy VpcStack
12cdk deploy SageMakerStack
13cdk deploy LambdaStack
14cdk deploy ApiStack

🧪 Test the API

 1# Get API Key
 2aws apigateway get-api-keys --include-values
 3
 4# Make prediction request
 5curl -X POST https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/prod/predict \
 6  -H "Content-Type: application/json" \
 7  -H "x-api-key: YOUR_API_KEY" \
 8  -d '{
 9    "prompt": "A serene landscape with mountains and a lake at sunset",
10    "num_inference_steps": 50,
11    "guidance_scale": 7.5,
12    "width": 1024,
13    "height": 1024
14  }'
15
16# Check job status
17curl https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/prod/status/JOB_ID \
18  -H "x-api-key: YOUR_API_KEY"

📊 Monitoring and Observability

📈 CloudWatch Dashboard

 1// Add to lib/stacks/monitoring-stack.ts
 2import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
 3
 4const dashboard = new cloudwatch.Dashboard(this, 'MLInferenceDashboard', {
 5  dashboardName: 'ML-Inference-Metrics',
 6});
 7
 8// SageMaker metrics
 9dashboard.addWidgets(
10  new cloudwatch.GraphWidget({
11    title: 'SageMaker Invocations',
12    left: [
13      new cloudwatch.Metric({
14        namespace: 'AWS/SageMaker',
15        metricName: 'Invocations',
16        dimensionsMap: {
17          EndpointName: endpointName,
18          VariantName: 'AllTraffic',
19        },
20        statistic: 'Sum',
21      }),
22    ],
23  })
24);
25
26// Lambda metrics
27dashboard.addWidgets(
28  new cloudwatch.GraphWidget({
29    title: 'Lambda Duration',
30    left: [
31      predictFunction.metricDuration(),
32    ],
33  })
34);

🔔 Alarms

 1// SageMaker endpoint alarm
 2const endpointAlarm = new cloudwatch.Alarm(this, 'EndpointFailureAlarm', {
 3  metric: new cloudwatch.Metric({
 4    namespace: 'AWS/SageMaker',
 5    metricName: 'ModelInvocation4XXErrors',
 6    dimensionsMap: {
 7      EndpointName: endpointName,
 8      VariantName: 'AllTraffic',
 9    },
10    statistic: 'Sum',
11  }),
12  threshold: 10,
13  evaluationPeriods: 2,
14  alarmDescription: 'Alert when SageMaker endpoint has too many 4XX errors',
15});
16
17// Lambda error alarm
18const lambdaAlarm = new cloudwatch.Alarm(this, 'LambdaErrorAlarm', {
19  metric: predictFunction.metricErrors(),
20  threshold: 5,
21  evaluationPeriods: 2,
22  alarmDescription: 'Alert when Lambda function has too many errors',
23});

💰 Cost Optimization

💡 Cost Breakdown

Service	Cost Factor	Optimization Strategy
SageMaker	Instance hours	Use auto-scaling, smaller instances for dev
Lambda	Invocations + Duration	Optimize code, use appropriate memory
API Gateway	Requests	Cache responses when possible
S3	Storage + Requests	Lifecycle policies, intelligent tiering
DynamoDB	Read/Write units	Use on-demand pricing, TTL for cleanup

🎯 Optimization Tips

 1// 1. Use Spot Instances for SageMaker (development)
 2// Add to SageMaker endpoint config for non-production
 3productionVariants: [{
 4  // ... other config
 5  instanceType: 'ml.g4dn.xlarge',
 6  initialInstanceCount: 1,
 7  // Enable managed spot training (not available for all instances)
 8}]
 9
10// 2. Implement caching in Lambda
11const cache = new Map<string, any>();
12
13export const handler = async (event: any) => {
14  const cacheKey = JSON.stringify(event.body);
15
16  if (cache.has(cacheKey)) {
17    return cache.get(cacheKey);
18  }
19
20  const result = await invokeModel(event);
21  cache.set(cacheKey, result);
22
23  return result;
24};
25
26// 3. Use Reserved Capacity for predictable workloads
27// Purchase SageMaker Savings Plans for production workloads

🔒 Security Best Practices

🛡️ Security Checklist

 1// 1. Enable encryption at rest
 2const resultsBucket = new s3.Bucket(this, 'ResultsBucket', {
 3  encryption: s3.BucketEncryption.S3_MANAGED,
 4  enforceSSL: true,
 5});
 6
 7// 2. Restrict S3 bucket access
 8resultsBucket.addToResourcePolicy(
 9  new iam.PolicyStatement({
10    effect: iam.Effect.DENY,
11    principals: [new iam.AnyPrincipal()],
12    actions: ['s3:*'],
13    resources: [resultsBucket.arnForObjects('*')],
14    conditions: {
15      Bool: {
16        'aws:SecureTransport': 'false',
17      },
18    },
19  })
20);
21
22// 3. Enable API Gateway authentication
23// Use Cognito, API Keys, or Lambda Authorizers
24
25// 4. Implement rate limiting
26const throttleSettings = {
27  rateLimit: 100,
28  burstLimit: 200,
29};
30
31// 5. Enable VPC for SageMaker (production)
32// Isolate SageMaker endpoints in private subnets
33
34// 6. Use Secrets Manager for sensitive data
35const apiSecret = new secretsmanager.Secret(this, 'ApiSecret', {
36  secretName: 'ml-inference-api-key',
37});
38
39// 7. Enable CloudTrail logging
40// Monitor API calls and access patterns

📚 Summary and Best Practices

🎯 Key Takeaways

Infrastructure as Code: Use CDK for reproducible, version-controlled infrastructure
Separation of Concerns: Keep model code (Python) separate from infrastructure (TypeScript)
Auto-scaling: Configure SageMaker and Lambda to scale based on demand
Monitoring: Implement comprehensive logging and alerting
Cost Management: Use auto-scaling, lifecycle policies, and appropriate instance types
Security: Enable encryption, use IAM roles, implement API authentication
Testing: Test locally with Docker before deploying to AWS

🛠️ Essential Commands

 1# Development
 2npm run build          # Build TypeScript
 3cdk synth             # Generate CloudFormation
 4cdk diff              # See changes before deploy
 5
 6# Deployment
 7cdk deploy --all      # Deploy all stacks
 8cdk deploy VpcStack   # Deploy specific stack
 9
10# Docker
11docker build -t model .
12docker push ECR_URI
13
14# Testing
15aws sagemaker-runtime invoke-endpoint \
16  --endpoint-name ENDPOINT_NAME \
17  --body file://request.json \
18  output.json
19
20# Cleanup
21cdk destroy --all     # Delete all resources

📖 Project Checklist

Set up AWS credentials and CDK bootstrap
Create Hugging Face inference script
Build and test Docker image locally
Push image to ECR
Deploy VPC stack (if needed)
Deploy SageMaker stack
Test SageMaker endpoint directly
Deploy Lambda stack
Test Lambda functions
Deploy API Gateway stack
Test end-to-end API
Set up monitoring and alarms
Configure auto-scaling
Implement security best practices
Document API endpoints
Set up CI/CD pipeline

🎓 Further Learning

AWS CDK: AWS CDK Documentation
SageMaker: Amazon SageMaker Developer Guide
Hugging Face: Hugging Face Documentation
MLOps: ML Engineering Best Practices

🎯 Conclusion

Deploying machine learning models to production requires careful consideration of architecture, scalability, cost, and security. By using AWS CDK with TypeScript, you can create infrastructure as code that’s maintainable, testable, and reproducible.

This guide provided a complete solution for deploying Hugging Face models using SageMaker and Lambda, with:

Type-safe infrastructure code
Scalable architecture
Production-ready security
Comprehensive monitoring
Cost optimization strategies

Next Steps:

Implement CI/CD pipeline with GitHub Actions or AWS CodePipeline
Add A/B testing for model versions
Implement caching for frequently requested predictions
Set up multi-region deployment for global availability
Add custom domain and SSL certificates

Related Posts:

Tags: #AWS #CDK #SageMaker #Lambda #HuggingFace #MachineLearning #MLOps #TypeScript #Python #InfrastructureAsCode #Serverless #AI