2 min read

Newsletter #2: S3 Misconfigurations, MLOps, Vertical Integration

Issue #2 of Data Management Newsletter, a newsletter of curated content on topics related to data management and data protection
Dragon's Egg Data Management Newsletter
Dragon's Egg Data Management Newsletter

Welcome to Data Management Newsletter #2, where I curate interesting articles on data management and security for data practitioners and executives. This week's  themes are S3 misconfigurations, MLOps, and vertical integration.

Data Management Newsletter #2: S3 Misconfigurations, MLOps, Vertical Integration

Last year, Cloudanix published A Complete List of AWS S3 Misconfigurations. It's a short but handy reference, written by @kedarghule, that enumerates all the ways in which S3 buckets may end up vulnerable or insecure, along with short decsriptions of each misconfiguration and its on impact on compliance (PCI, HIPAA, etc.).

The most common misconfigurations (publicly accessible buckets, buckets with no encryption, etc.) are what get talked about usually, but the article serves as a nice and convenient checklist (perhaps to automate, even) to ensure all of your S3 buckets are provisioned using security best practices. Check it out!

Additional reading:

  • Cloudanix also has a companion article on recipes / best practices to pass an AWS S3 Audit that's quite handy, too.

Brian Costa at TheNewStack has published The Architect’s Guide to Using AI/ML with Object Storage.

It makes a case for why object storage (AWS S3, Azure Blob Storage) is a better fit for data curation, storage, and training of complex AI / ML models compared to traditional SAN or NAS storage. Using object storage has the following advantages:

  • Large scale training datasets are usually in the range of 10s or 100s of petabytes, which far exceed the capabilities of conventional SAN and NAS architectures.
  • Object storage is better suited for the storage, versioning, and processing of structured, semi-structured, and unstructured data.
  • Object storage supports object locking and lifecycle management, which are essential to ensure training data is valid, hasn't been tampered with, and is compliant with privacy laws.
  • RESTful APIs, such as the S3 API, are a modern approach to building complex distributed systems, and allow the decoupling of compute from storage, and interoperability across multiple services.

Additional reading:

In Data Watcher #1, we had looked at The Collision of App Platforms and Database, that talked about combining applications and databases into a single unit of abstraction, similar to what the Heroku and Aptible platforms provide.

@monkchips at Redmonk discusses the launch of Postgres Container Apps by Cruncy Data, which is based on this idea. It's meant to give developers a simple way to build, test, and run PostgreSQL apps quickly, as in the following examples:

  • Adding RESTful APIs, such as those provided by PostgREST, etc., on top of the database.
  • Running administrator and reporting tools on top of the database.
  • Deploying monitoring agents (Datadog, New Relic, pganalyze) alongside the database. This is the classic sidecar pattern that has been popularized by Istio and Kubernetes!

That’s all for this edition of Data Watcher. Hope you enjoy reading the linked articles!

Huge shoutout to the folks at @cloudanix, @thenewstack, and @redmonk for the content!

Cheers, and hope you're having a great weekend!