Architecture overview

This guide explains basic architecture of the platform.

FastLake provides a fully integrated, cloud-native solution designed to streamline the management of data lakehouses. Built to simplify data storage, processing, and orchestration, FastLake ensures that all infrastructure, data, and generated code are owned and controlled by the user. Below is an overview of the architecture and key features that allow you to retain full control while simplifying the complexities of data management.

Data Ownership and Control

FastLake ensures that users retain complete ownership and control over their infrastructure, data, and generated code. All data is stored and processed exclusively within the user’s cloud account, guaranteeing compliance with privacy and security regulations.

This design allows users to confidently manage their data, adhering to governance policies while preserving autonomy.

Data Lakehouse Framework

FastLake's architecture builds on the data Lakehouse paradigm, combining the flexibility of data lakes with the performance of data warehouses. This approach ensures efficient data storage, processing, and analytics within a unified framework.

The framework is built around three core zones:

  • Raw Zone: Stores ingested, unprocessed data.
  • Clean Zone: Contains curated and validated data ready for analytical queries.
  • Transform Zone: Holds datasets processed for specific use cases or outputs.

FastLake supports industry-standard storage formats, including Delta Lake, Parquet, ORC, and CSV, ensuring compatibility and performance.

Multiple zones

Asset-Driven Approach

FastLake adopts a logical asset-driven approach for data management. Instead of working directly with files or low-level storage abstractions, it emphasizes working with logical datasets. This approach simplifies the pipeline by clearly separating data definition and execution, allowing for efficient, reproducible workflows.

Logical datasets in FastLake can be reused and versioned, enabling a consistent and auditable data lifecycle. This aligns with modern data engineering best practices, fostering collaboration and improving maintainability across teams.

By treating data as assets, FastLake ensures that each stage in the pipeline has a well-defined purpose, enhancing scalability and reducing operational overhead.

Data assets

Job Processing and Resource Scaling

FastLake’s processing architecture uses Azure Synapse Spark to ha

  • Compute Resource Configuration: Define the size and limits of compute resources at the project or dataset level to match data complexity and processing demands.
  • Scalability: The infrastructure scales automatically, ensuring optimal resource utilization without manual intervention.

This flexibility ensures that FastLake can handle workloads of any scale efficiently.

Settings page

Infrastructure Management and Visualization

FastLake offers a unified infrastructure management interface, where users caInfrastru page provides tools to visualize, configure, and monitor all components of the system. A Directed Acyclic Graph (DAG) visually represents the relationships between project components, simplifying infrastructure navigation and management.

From the DAG, users can:

  • Start, stop, or configure services.
  • Monitor the status of individual components.
  • Access pre-installed Kubernetes tools such as Airflow, Superset, Airbyte, and Spark for enhanced functionality.

This centralized management interface streamlines operational tasks and enhances visibility across the infrastructure.

Infrastructure with Kubernetes

AI-Driven Tools for Data Transformation and Exploration

FastLake integrates external AI services via user-provided API keys or tokens, enabling advanced data workflows:

  • OpenAI: Automates code generation for data transformation, reducing manual effort.
  • Vanna AI: Supports intuitive data exploration by enabling natural language querying and analysis of datasets.

By delegating compute-intensive AI tasks to these external services, FastLake ensures seamless integration without compromising performance or security.

AI Assistant

Security and Privacy

FastLake’s architecture is

  • Role-based access control (RBAC) ensures restricted access to sensitive resources.
  • Azure Active Directory (AAD) secures authentication and provides centralized identity management.
  • All operations occur within the user’s cloud environment, eliminating third-party exposure risks.
Role-base access control

CI/CD and Automation

FastLake automatically pre-configAzure DevOps pipelines, enabling fully automated builds and deployments for projects. This reduces the need for manual setup, allowing users to focus on development and iteration.

The CI/CD pipelines are tailored

In-built Azure DevOps

Billing and Cost Management

The billing interface provides detailed insights into resource usage and associated costs. With interactive dashboard

This feature empowers users to maintain control over their budgets and infrastructure costs.

Architecture Diagram

The overall architecture of our SaaS product is meticulously designed following industry best practices to ensure exceptional efficiency, robust performance, and top-tier security. It leverages modern frameworks, scalable infrastructure, and advanced security measures to deliver a seamless and reliable experience for users.

Architecture diagram