In the rapidly advancing world of data engineering, the ability to strike a balance between innovation and cost-efficiency has become paramount, as organizations increasingly rely on vast amounts of data to drive decision-making and maintain a competitive edge. This data explosion, coupled with the ever-growing demands for real-time analytics and machine learning, has created immense pressure on infrastructure, pushing companies to innovate at a pace that often comes with significant costs effectively.
In Oportun, Abhijit Joshi, a staff data engineer, known for cloud cost-saving initiatives and building data pipelines, has managed to cut the cloud making costs while at the same time improving the infrastructure as a whole. It is reported that through the redesign of data pipelines based in Databricks and AWS, the company recorded 50% reduction in overall cloud costs. By restructuring the Databricks Lake house investiture, he consequently gave Oportun the opportunity to handle large volumes of data at a reasonable processing cost. For instance, this initiative alone saved an impressive $100,000 every month on Databricks and $30,000 per month on AWS services, showcasing his ability to manage infrastructure without increasing the operational costs.
The efficient deployment of AWS EC2 and S3 services enabled the organization to build a cloud-based solution that was both effective and economical. This made it possible for Oportun to expand very quickly in response to the needs of the business without incurring significant changes to its spending profile. There are not only cost implications of this optimization strategy but also what it has done in terms of enhancing the company’s data processing capabilities, thus enabling timely decision making and uncovering additional revenue opportunities.
As chair of a 20 person cross-functional engineering team, he has been integral in enabling the data management platforms to realize the benefits of the overall business strategy. One of the most disruptive changes that were brought in was the use of Terraform as a means to manage infrastructure. “Due to automation, deployment mistakes made by the team dropped by 95%, thereby enhancing operational efficiencies”, remarks Joshi. The growing adoption of cloud-native practices in the organization has guaranteed that all infrastructures will be in place and ready for use, which is the key for any organization that operates on a large scale.
There have been a number of different projects which also involved his cloud optimization or data engineering skills. One such was the migration from Minerva1.0 to Minerva2.0 was done and many data pipelines were built again on Databricks, Airflow and DBT. This transition produced data architecture with several levels which propounded easy data access and reduced expenses by 50%.
Yet another sizeable project at ViacomCBS where audience segmentation models were innovated that enhanced the targeting of audience by the corporation. This led to a 40% improvement in the response rates of targeted advertising campaigns, underscoring the benefits of data engineering in enhancing operational efficiency in businesses that have intensive data processing engagements.
Cost efficiency is an important feature. They once rolled out Kerberos-enabled Docker containers in the organization and cut virtualization Google Cloud services by $10,000 per month. Such approach kept the engineering process secure without compromising on the available solutions, demonstrating that security and performance cannot be traded off for efficiency.
His work has had lasting effects in regard to varied metrics in all the projects they have worked on, including the 50% downscaling of cloud infrastructure expenditure, lowering Databricks costs by $100,000 monthly, and a further saving of $30,000 monthly on AWS via constructed data pipelines. “There was also 95% reduction of mistakes in deployment due to automation while targeted advertising at ViacomCBS improved by 40% due to sophisticated data models”, adds Abhijit Joshi.
However, it can be stated that it has not been devoid of struggles. One of the most key challenges was encountered in both ViacomCBS and Oportun in the shift between ‘on-premise’ systems to ‘cloud’ systems. In modelling a cloud-first mindset, cloud sprawl issue was managed and solutions were cost versus performance effective. The transition also saw incorporation of an architectural innovation that is the Databricks Delta Lakehouse, which was designed to simplify and expedite the vectoring of data at scale.
In addition to providing skin-in-the-game data engineering work, he has published a number of different works in the field. They range from the study of integration of Delta lake in data interoperability to ethical AI and even how to govern innovation through quantum safe use of algorithms. The focus on both the practice and the theory of data engineering also indicates that there is a desire to advance the cloud & data management discipline to new offers of service.
Looking forward, the future of cloud and data engineering is set to be shaped by emerging technologies like serverless architectures and AI-driven analytics. With firsthand experience in working with platforms like AWS and Databricks, Abhijit Joshi, envisions a landscape where edge computing and real-time analytics take center stage. Organizations that can harness these technologies will be well-positioned to stay ahead in an increasingly competitive market, and those leading the charge in cloud optimization will undoubtedly play a crucial role in shaping this future.
This content is produced by Payal Sharma.