Blog Databricks on Google Cloud
How this exciting development will simplify your modern data platform across cloud vendors
By DJ Maley / 17 Feb 2021 / Topics: Digital transformation Cloud Data and AI
By DJ Maley / 17 Feb 2021 / Topics: Digital transformation Cloud Data and AI
In the last few years, Databricks has been making waves in the world of data and Artificial Intelligence (AI) by providing a platform for solving innovative data challenges. These challenges range from massive Hadoop migrations (shifting expensive, on-premises, legacy vendors into a more agile and cost-effective cloud framework), to creating highly dynamic and scalable data science environments — all while allowing teams to collaborate in order to drive business outcomes and value.
Databricks has been a focal point of many modern data architectures implemented in the cloud today. The demand for Databricks is only growing, leading to an exciting announcement about the debut of Databricks on Google Cloud.
Databricks provides the capabilities for a single platform experience within organizations, creating an avenue for many different types of users to leverage scalable compute, efficient data storage, and collaborative development and machine learning experimentation.
It provides a workspace for data engineers to transform and store unlimited amounts of data in cloud storage through batch-based processing or streaming Extract, Transform, Load (ETL) jobs. Data analysts can query that data with user-friendly interfaces for finding quick insights, exploring data or connecting their own Business Intelligence (BI) tools for driving reporting outcomes. Data scientists can use the platform to explore data, create and experiment with machine learning models, and build iterative MLOps processes for driving change and innovation.
The fact that these services are now offered in Google Cloud, as a fully managed, integrated Software as a Service (SaaS) solution, provides a first-class experience for organizations to innovate with data and AI. Let's evaluate a few different scenarios where Databricks on Google Cloud can enable that further.
Coming out of this announcement, Databricks on Google Cloud supports several integrations within the Google Cloud Platform (GCP) ecosystem. Databricks provides seamless integration with GCP storage services such as Google Cloud Storage, Google Cloud SQL, Google Pub/Sub and Google BigQuery. Additional integrations for end-to-end analytics and ML include Looker and the Google AI Platform.
Integration of Google Cloud Identity for Single Sign On (SSO) and credential passthrough help to easily onboard your user base into the environment within your existing infrastructure. These native integrations serve to make it easier for users to spend time driving value and allow for embedding in the platform using existing GCP service, or for making it a component of future architecture leveraging these services.
With the support of these many integrations within Google Cloud, the next step for organizations is taking advantage of the next evolution in open data architectures. The Lakehouse — a new data architecture that combines some of the best elements of scalable data lakes and the more familiar business-driven data warehouses — is seen as the next step in this journey.
The capabilities of the architecture allow for redesigning data warehouses in the modern world, or breaking down the data silos and hurdles present across business groups or data science teams. The point of this is to make it easy for an organization to be very agile in their data ingestion, storage, processing and analysis, in order to drive quick insights and business value.
Here are a few key drivers we at Insight see for the Lakehouse concept — and how Databricks on Google Cloud works to address these problems.
There are other benefits to the Lakehouse concept as well, but in my experience, these are some of the main ones that resonate most with combining the best of both worlds of the data lake and the data warehouse architectures.
One of the highlights of this announcement is that Databricks is now offered on Amazon Web Services (AWS), Azure and GCP as a fully managed solution. The impacts of this are tremendous in a world where many organizations are opting for multicloud approaches.
Some organizations are opting for building analytics capabilities that can avoid vendor lock-in. That is, opting for technologies and platforms built on open source, or that scale across different clouds, to allow for flexibility and agility when developing complex solutions. With Databricks offering the same experience across three of the largest cloud platforms, the ability for customers to migrate from one cloud to the other, or run their big data and data science workloads in the cloud of their choice, just got easier.
With integrations supported across all three clouds’ services as well, this provides flexibility for business units to choose where they want to work. A common example we're starting to see is data being hosted in one cloud, and the data science team deciding they’d prefer to work in another cloud, like GCP, then processing all their ML models in the cloud of their choosing. Migrating from an existing cloud-Databricks workspace into a Databricks workspace on GCP is a much easier task since it's all the same experience and doesn't require any re-platforming or recoding much of the existing work.
There's a lot to consider with how organizations can choose to utilize Databricks, and this post barely scratches the surface.
There's a reason Databricks has gained so much momentum and is becoming a large player in the data and AI space. Their ability to work across the three major clouds is going to further provide more flexibility and ease when it comes to solving these challenges — which is how Databricks became popular in the first place.