Cloud computing is booming and has become the foundation for digital businesses.
Now, it is difficult to find an organization that does not use cloud services. Be it web server, development tools, operating systems, data storage, or individual application capabilities – the cloud offers it all.
A recent cloud adoption statistic by Gartner, Inc tells the same story. It says the worldwide public cloud services is expected to go nearly $331.2 bn in 2022, which is at three time the growth of overall IT services.
That said, the cloud seems to keep growing in the foreseeable future.
With the increasing use of the cloud, it is also important for organizations to understand how their systems in the cloud will operate. Thus, enterprises are looking for ways to set up and manage their incoming data in real-time to achieve better insights and make better business decisions.
This is exactly where cloud-based data analytics solutions come in, as they use advanced analysis techniques to represent the data into clear visualizations that can be further synchronized and shared across key employees.
Data analytics: turning enterprise data to value
The data analytics market is driven by various emerging leaders. In this post, we are including some of the major players operating in this market: Alibaba Cloud, Amazon Web Services (AWS), Google Cloud, IBM, and Microsoft Azure. Each of them has its own set of benefits, but the best will depend on the end goals of your organization. We will compare them to help you go for the best for your particular situation.
-
Alibaba Data Lake Analytics
Alibaba Data Lake Analytics (DLA) is a serverless, high performance and interactive query service that quickly collects, stores and handles the flowing data, and turns it into actionable insights.
By using this, customers can perform complex analytics of different formats of data from multiple sources to develop new data insights. Moreover, users can easily and reliably process millions of events for making real-time decisions like social analytics, fraud detection, and more.
Supported language elements: It supports standard SQL language and BI tools to analyze the data.
Integration and input sources: It can connect with multiple sources with the relevant configuration settings.
Price: The price of Alibaba analytics service is billed on actual use and needs of data users.
See diagram of Alibaba’s data lake analytics in the image below:

Image: Alibaba
Read reviews of Data Lake Analytics: Gartner
-
AWS Kinesis Data Analytics
Amazon’s Kinesis Data Analytics is a massively scalable and durable real-time service for data absorption, analysis, and delivery. It can continuously collect gigabytes of data per second from multiple sources.
Users can capture the large stores of data in milliseconds to solve streaming data problems as fast as possible, such as anomaly detection, dynamic pricing, and more.
With Kinesis, data consumers can solve a variety of data streaming problems. Typically, Kinesis streams can load the aggregate data into the data warehouses or data lakes (AWS data stores), including application logs, IoT telemetry data, website click data streams, social media streams, etc., to ensure durability and elasticity.
Supported language elements: It works on standard SQL language with some extensions to perform operations on streaming data.
Integration and input sources: It supports inputs from the Kinesis data stream and Kinesis data firehose delivery stream. Further, it analyzes data using BI (Business Intelligence) tools.
Price: The price of Kinesis Data Analytics depends on the volume of data you ingest, store and consume through the service.
The image below shows the high-level architecture of Kinesis:

Image: Amazon
Read reviews of Kinesis Data Analytics: Gartner.
-
Google Cloud Dataflow
Google’s Cloud Dataflow is a serverless, highly efficient, fully-managed service that allows you to process huge amounts of data and analyze it on a real-time basis. This helps you derive insights and calculate meaningful analytics over your streaming data.
Using this model, users can efficiently perform analytics, as well as implement multi-step processing pipelines, monitor its execution, and get advanced alerting to identify and respond quickly to complex issues.
Supported language elements: It can connect with various types of data sources and supports Java, Python, and Scala language with others to follow. It also supports queries from SQL through Google BigQuery.
Integration and input sources: Dataflow support streaming transfers from cloud storage accounts such as Pub/Sub. Its service for Apache Beam integrates natively with Apache Kafka via Google BigQuery.
Price: The price of Google Dataflow varies based on different services you choose.
See the data transformation of Google Dataflow in the image below:

Image: Google
Read reviews of Google Dataflow: Gartner.
-
IBM Streaming Analytics
IBM’s Streaming Analytics enables users to extract value from data in motion, reduce infrastructure costs and get faster insights and alerts. Essentially, streaming analytics is known to help companies of all sizes by handling millions of high rate events and messages per second.
Users can complement information that comes from different applications, for example, transaction processing to spot threats and opportunities and make real-time decisions.
Supported language elements: It supports applications in SPL, Java, Scala, Apache Beam and Python language.
Integration and input sources: The solution ingests data from a variety of sources, including IBM Event Streams, HTTP, and Internet of Things (IoT), and connects with data streaming sources and systems configured to perform analytics.
Price: IBM streaming analytics comes along with its cloud services. See the detailed price here.
See the infrastructure in the image below:

Image: IBM
Read reviews of Streaming Analytics: Gartner.
-
Microsoft Azure Stream Analytics
Microsoft’s Azure Stream Analytics is a very popular fully managed real-time data analytics service for complex event processing. It enables users to unlock actionable insights from a wide range of data.
Users can examine huge volumes of data that they miss in manual mode. Moreover, users can detect anomalies easily such as spikes or dips, predict positive or negative trends through online learning and scoring models. They can further store the information for later investigation or use the patterns for quick action.
Streaming platform: Azure Event Hubs.
Supported language elements: It works on simple SQL based query language with language extensibility capability via JavaScript user-defined functions (UDFs) or user-defined aggregates, that enables users to perform difficult business calculations.
Integration and input sources: It can connect with multiple IoT devices, and support inputs from Azure Event Hubs, Azure IoT Hub and Azure Blob Storage.
Outputs: You can get the output results in any of the following: Event Hub, Azure Function, Service Bus, SQL Server, Cosmos DB, Blob or Table storage, Data Lake, as well as a streaming Power BI dashboard.
Price: The price of Azure Stream Analytics is based on the number of streaming units required by a user to process the complex data into service.
See the roles of the services within the architecture in the image below:

Image: Microsoft
Read reviews of Azure Stream Analytics: Gartner.
Azure Stream Analytics | AWS Kinesis Data Analytics | Google Cloud Dataflow | IBM Streaming Analytics | Alibaba Data Lake Analytics | |
---|---|---|---|---|---|
Programmability | Stream analytics query language, JavaScript | Data analytics query language, standard SQL | Java, Python and a distributed compute platform | Java, Scala and Python | Standard SQL |
Programming model | Declarative | Flink programming model, Declarative | Apache Beam | Streams Processing Language (SPL), Declarative | Declarative |
Pricing model | Streaming units | Hourly rate based on the average streaming units | Based on Google Compute Engine (GCE) costs plus an additional charge per vCPU per minute | Subscription based | Based on the number of bytes scanned |
Inputs | Azure Event Hubs, Azure IoT Hub, Azure Blob storage | Data sources through SQL JOINS: Streaming data sources like Kinesis Data Streams and reference data sources like Amazon S3 | Cloud Storage and PubSub | File, Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) | Alibaba Cloud Object Storage Service (OSS), PostgreSQL, MySQL, NoSQL (Table Store) and ApsaraDB, using DLA and Quick BI |
Sinks | Azure Data Lake Store, Azure SQL Database, Storage Blobs, Event Hubs, Power BI, Table Storage, Service Bus Queues, Service Bus Topics, Cosmos DB, Azure Functions | Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon DynamoDB, and Amazon S3 (through file sink integrations) | Cloud Storage, BigQuery, BigTable, PubSub, Datastore, etc. | TCP network connection, UDP network connection and User-defined Sink Operator | NA |
Built-in temporal/windowing support | Yes | Yes | NA | Yes | NA |
Input data formats | Avro, JSON or CSV, UTF-8 encoded | JSON, CSV, and TSV | AVRO, CSV, JSON | JSON | JSON, Vector and other multi-media resources |
Scalability | Query partitions | Shards | Shards | Horizontal partitions | Horizontal partitions |
Late arrival and out of order event handling support | Yes | Yes | Yes | NA | Yes |
Due to high competition in the cloud data analytics space, it is getting difficult for organizations to choose one from a variety of options that provide almost similar services. We’ve tried to make it easier for you. Go through the comparison and tell us which one you like the most in the comments section.
READ NEXT: Comparing IoT services: AWS vs Google vs IBM vs Microsoft
Disclaimer: The information contained in this article is for general information purpose only. Price and product information are subject to change. This information has been sourced from the websites and relevant resources available in the public domain of the named vendors as on 26th November, 2019. Wire19 News makes best endeavors to ensure that the information is accurate and up to date, however, it does not warrant or guarantee that anything written here is 100% accurate, timely, or relevant to the website visitors.