Start with a S3 data lake, add Snowflake and then Nifi as-needed for automated data and file movement flows. (Microphone drop.) Dare I say most may not even need HDFS? Just a cloud-based object store and linearly scalable performance for standard data access paths like ODBC/JDBC, SQL, AMPQ, DAG-based ELT jobs and file extracts. That’s the power of the new class of distributed storage/compute DB platforms like Snowflake (and maybe Redshift/Spectrum).
If you are a true data platform play (health insurance, online media, etc.) then yes you might want to start spinning up a few HDFS clusters if only for the Spark hotness. But 90%+ of organizations will get 95%+ of what they need in an enterprise data warehouse from AWS + Snowflake. And it’s scalable starting <$100/mo. Sign me up. Oh wait, I already did.