Squadbase

From Prototype to Production

A Practical Guide to Deploying and Operating Streamlit Applications

Building a BI dashboard is the first step. To deliver sustained value, an application requires a robust system for deployment, management, and continuous improvement.

This chapter guides you through turning a Streamlit prototype into a production-ready service, covering essential DevOps and DataOps concepts for the Streamlit ecosystem.

Why DevOps is Essential for Streamlit

Moving from a solo project to a team environment, or from a temporary analysis to a long-running business tool, introduces several challenges:

  • Code Management: Safely managing code changes from multiple developers.
  • Environment Consistency: Ensuring an app that works locally also works in production.
  • Deployment: Updating the live application without downtime.
  • Security & Access Control: Controlling who can access which data.
  • Performance & Scalability: Maintaining speed as data and user numbers grow.
  • Monitoring: Knowing if the application or a data pipeline has failed.

DevOps (for application development) and DataOps (for data pipelines) provide principles to solve these problems, enabling a reliable, scalable, and secure system for Streamlit applications.

How Streamlit Operations Differ

The operational model for Streamlit is fundamentally different from traditional BI tools, offering more flexibility but requiring more engineering discipline.

AspectTraditional BI ToolsStreamlit Applications
Change ManagementTicket-based, managed by ITCode-based (Git), managed by the dev team
DeploymentManual, requires admin rightsAutomated via CI/CD
ScalingRequires new license purchasesBased on cloud infrastructure
Operating CostsFixed license feesUsage-based cloud costs

This modern, code-centric approach is more agile but requires a solid operational framework to manage effectively.

Deployment and CI/CD Best Practices

Automating the deployment process is a key principle of DevOps. A CI/CD (Continuous Integration/Continuous Deployment) pipeline ensures that every code change is tested and deployed reliably.

Choosing a Deployment Method

There are several ways to host a Streamlit application, each with its own trade-offs.

Deployment MethodBest ForProsCons
Streamlit Community CloudPublic-facing hobby projectsFree, easy setupLimited resources, poor security for private data.
Major Cloud Providers (AWS/GCP/Azure)Teams with strong cloud expertiseHighly flexible and scalableComplex to configure and maintain.
Container Platforms (Docker/K8s)Large enterprisesComplete control and portabilityVery high technical barrier to entry.
Managed PlatformsMost business use casesCombines ease of use with enterprise-grade security and operational features.Less raw flexibility than a pure cloud setup.

For most business applications, a managed platform offers a good balance of power and convenience. These platforms typically handle the complexities of security, authentication, and operations, allowing teams to focus on application development.

A Best-Practice CI/CD Workflow

A robust development process separates environments for development, testing, and production. This is typically managed using different Git branches.

Recommended Branching Strategy:

  • feature/* branches: Individual developers work on new features here.
  • staging branch: When a feature is ready for testing, it's merged into staging. This branch is deployed to a staging environment for user acceptance testing (UAT).
  • main branch: After passing UAT, the code is merged into main. This branch represents the production-ready code and is automatically deployed to the live environment.

This workflow ensures that no code reaches production without being reviewed and tested, improving the application's stability and reliability. Many deployment platforms can integrate directly with a Git repository to automate these steps.

Authentication and Access Control

In a business setting, not all users should see all data. Implementing robust authentication (who can log in) and authorization (what they can do) is critical.

Key Concepts

  • Authentication: Verifying a user's identity, typically via a login screen.
  • Authorization: Determining a user's permissions after they have logged in.
  • RBAC (Role-Based Access Control): A common authorization strategy where permissions are assigned to roles (e.g., admin, viewer) rather than individual users.

Implementing RBAC in Streamlit

While Streamlit has basic authentication features, implementing fine-grained RBAC often requires custom code or integration with a managed platform that provides this functionality.

The core concept is to fetch the current user's role and use conditional logic to control UI elements and data access.

import streamlit as st
# This is a conceptual example. 
# The actual function to get user info depends on your auth provider.
from my_auth_provider import get_current_user 

user = get_current_user()

st.write(f"Welcome, {user.name}")

# Role-based access control
if "admin" in user.roles:
    st.header("Admin Panel")
    # ... show admin-specific components ...
elif "editor" in user.roles:
    st.header("Editor View")
    # ... show editing tools ...
else:
    st.header("Viewer Dashboard")
    # ... show read-only charts ...

When designing roles, follow the Principle of Least Privilege: grant users the minimum permissions necessary to perform their tasks.

Practical Performance Tuning

As an app's user base and data volume grow, performance can become a bottleneck. Maintaining a responsive dashboard is crucial for user adoption. The following are effective strategies for keeping Streamlit apps fast.

1. Master Streamlit's Caching

Effective caching is the single most important technique for optimizing Streamlit performance. It allows you to skip re-running expensive computations or data queries.

@st.cache_data: Use for data-like objects (DataFrames, lists, dicts).

# This function will only run if the input `query` changes.
# The result is cached for 10 minutes (3600 seconds).
@st.cache_data(ttl=3600)
def run_query(query: str) -> pd.DataFrame:
    return pd.read_sql(query, connection)

# The function is called, but if the cache is hit, the code inside isn't executed.
df = run_query("SELECT * FROM my_large_table")

@st.cache_resource: Use for global resources like database connections or ML models.

# This function runs only once, when the app starts.
@st.cache_resource
def init_db_connection():
    return create_engine(DATABASE_URL)

conn = init_db_connection()

2. Optimize Your Data Processing

  • Push Processing to the Database: Perform aggregations, filtering, and joins in SQL whenever possible. Databases are highly optimized for these operations and are typically faster than processing data in pandas.
  • Consider Polars for Large Datasets: For very large datasets (e.g., >1 million rows) processed in memory, the Polars library can be significantly faster and more memory-efficient than pandas, due to its lazy evaluation and Rust-based backend.

3. Use Fragments for Partial UI Updates

Introduced in Streamlit 1.37, st.fragment allows you to update a specific part of your UI without re-running the entire script. This is ideal for components that need to refresh frequently, like a real-time metrics display.

@st.fragment
def real_time_metrics():
    # This small section can re-run every 5 seconds
    # without triggering a full script re-run.
    st.metric("Live Users", get_live_user_count())
    time.sleep(5)
    st.rerun()

Use st.fragment for frequent, partial UI updates, and caching for expensive, infrequent computations.

Advanced Data Architecture: Data Marts

As an application grows, running complex analytical queries against a production database in real-time can cause performance bottlenecks. At this stage, consider building a data mart.

A data mart is a subject-specific database containing pre-aggregated, summarized data, designed for fast analytical queries.

Consider a Data Mart When:

  • Dashboard queries take more than a few seconds to run.
  • The same complex joins and aggregations are performed repeatedly.
  • The analytical workload needs to be separated from the transactional production database.

By pre-calculating heavy aggregations (e.g., nightly) and storing them in a data mart, a Streamlit app can run lightweight queries against this optimized source, ensuring consistently fast performance.

A simplified star schema for a sales data mart.

Summary: Building Sustainable BI Systems

This chapter provided a roadmap for moving Streamlit applications from prototype to production. Applying DevOps and DataOps principles helps build BI systems that are reliable, scalable, and secure.

Key Takeaways:

  1. Automate Processes: Use CI/CD pipelines for automated testing and deployment to ensure reliability and speed.
  2. Design for Security: Implement robust authentication and role-based access control from the start.
  3. Prioritize Performance: Master Streamlit's caching and optimize data processing to ensure a fast user experience.
  4. Scale with Data: Adopt advanced data architectures like data marts as data volume and complexity grow.
  5. Foster Collaboration: Successful BI systems rely on a partnership between business users and engineers, facilitated by clear processes and shared tools.

This book has provided the tools and techniques to build effective BI applications with Streamlit and AI. The rest is in your hands. Happy building!