Building NYC Taxi Data Pipeline with Spark and Kafka
Complete guide to building a production-ready data engineering pipeline for processing NYC taxi trip records using Apache Spark, Kafka streaming, Hadoop ecosystem, and AWS cloud infrastructure.
Engineering insights, architecture deep dives, and technical solutions
Articles in big-data
Complete guide to building a production-ready data engineering pipeline for processing NYC taxi trip records using Apache Spark, Kafka streaming, Hadoop ecosystem, and AWS cloud infrastructure.
🎯 專案概述 SpringDataPlatform 是一個功能完整的企業級大數據平台,專為 Apache Flink 任務管理而設計。這個全端專案整合了現代化的 Web 技術棧,提供直觀的使用者介面來管理和監控分散式數據處理工作流程。 🏗️ 系統架構 graph TB A[Vue.js 前端] --> B[Nginx 反向代理] B --> C[Spring Boot 後端] C --> D[Apache Flink 叢集] C --> E[Apache Zeppelin] D --> F[任務執行引擎] E --> G[互動式筆記本] subgraph "核心功能" H[JAR 任務提交] I[SQL 任務提交] J[任務狀態監控] K[叢集狀態監控] end C --> H C --> I C --> J C --> K 🛠️ 技術架構 前端技術棧 框架:Vue.js 2.x 路由:Vue Router HTTP 客戶端:Axios UI 增強:SweetAlert2 語法高亮:Highlight.