At their core, what most software systems do is collect, organize and put on view data, which has some connection to the real world. Take for example Facebook, which collects data about people, their friends, and the content generated by users and ads, among other things, and organizes the information using ranking algorithms to display it in a neat UI. Dating apps collect the dating preferences of people, organize the information using matching algorithms and display potential matches to the users. Google collects information by running bots on the internet and organizes content by running page rank and advanced ML algorithms. Amazon does the same for retail products.

Most of these behemoths started before the cloud era and gained their moat because of a couple of factors. Firstly, they built the most scalable and resilient distributed systems in the world. I think this innovation was mostly driven by necessity. Software in the pre-internet era consisted of developing Operating systems, writing applications for them, or running simulations for scientific or industrial purposes on a single machine or a local cluster. None of these use cases involved handling millions of requests per second like the modern web servers or storing Petabytes of data(like the modern backend storage systems in big tech). As a result, there was no need to scale infrastructure and no need for distributed systems.

However, as internet usage gained widespread adoption, sites like Amazon and Google were confronted with huge traffic and had to scale or die. Amazon had to maintain metadata about the largest inventory in the world. Google had to store and retrieve metadata about the entirety of the web, which was evergrowing. To solve these challenges, they hired some distributed systems experts like this guy and this dude to solve challenges in scaling.

Moreover, the second decade of the 21st century saw breakthroughs in AI and Machine learning. With Moore’s law following its course, cheap data storage and compute made deep learning, which was first explored in the 80s, practical. The trifecta of abundant data, fast compute and complex model architectures have made computers better than humans in specialized tasks like object recognition, Natural language translation, and generative art among other things. The big tech companies have been quick in incubating AI technologies and have leveraged them to take their products to the next level. For instance, Facebook leads in recommendation systems and has developed frameworks like PyTorch. Google’s deepmind has made several breakthroughs, especially in reinforcement learning. Most of the products at these companies have been improved with some form of machine learning and this isn’t easy to compete against, mostly because of the AI infra and data needed to train the models.

To compete with AWS would involve solving distributed systems challenges in addition to hardware and networking challenges. To compete with Facebook or google would mean developing the infra needed to deploy SOTA AI algorithms at a large scale. To compete with Uber or Lyft would mean developing low latency event-driven microservices along with pricing models which have been trained with years of data. Ultimately, to compete with any of them would mean assembling a group of intrepid distributed systems and AI experts with a contrarian disposition, who can move fast and execute.