Streaming data is becoming the next wave in the data analytics and machine learning landscape. The key reason behind it is that processing only large volumes of data is not sufficient but the ability to process it in a short period of time and making real-time insights out of it is essential so that a business can react to the changing environment in real-time.
The trend of cloud computing requires the streaming data processing engines to be highly scalable and robust towards faults. Cloud-based data stream processing systems, in particular, are made to scale dynamically to hundreds of computing nodes and cope with diverse workloads automatically.
Understanding the importance of data streaming with the increasing variety of different use cases, organizations are adopting hybrid platforms so that they can leverage the advantages of both – batch and streaming data analytics.
To help enterprises in determining the best data streaming services, we have compiled a list of the most-feature-rich tools for you and your business.
Alibaba Cloud
DataHub by Alibaba is a real-time distribution platform designed to process streaming data in the cloud. The key features include its ability to publish, subscribe and distribute the streaming data. It also offers the ability to create applications and easily analyze them based on data streaming.
DataHub has high availability, low latency, high throughput, and high scalability. Further, it can emit streaming data to the cloud, like MaxCompute and OSS. The prices are based on the actual resources you have used.
See architecture of Alibaba big data demo system.

Source: Alibaba Cloud
In the figure, the architecture comprises a data source system, a data warehouse, a big data platform, a web/app platform, process scheduling, data processing and a real-time data streaming platform. Here, real-time data is processed through DataHub + StreamCompute.
With this, varied data processing results are produced on real-time basis, involving real-time charts, statistics, and other information. Overall, Alibaba’s DataHub is great if you want to stream complex data.
Concepts | Alibaba Cloud DataHub |
---|---|
Data Warehouse | MaxCompute |
Data Retention | Default – 24 hours |
SDK Support | MaxCompute Tunnel SDK |
Configuration | Writer plug-in |
Real-time Store | ApsaraDB |
Cost | Pay-As-You-Go |
Read reviews of Alibaba Cloud.
AWS
AWS Kinesis processes data in real-time. The key feature built-in Kinesis is its potential to process hundreds of terabytes of data streams in high volume per hour. It has the power to simplify the process of development of certain apps through real-time decision making on business operations with streaming data.
AWS Kinesis consists of key concepts for stream storage and an API to implement data producers and data consumers. The data producer sends the data as they are generated, and the data consumer retrieves the data in a stream as it is generated.
AWS charges are based on per hour basis of each stream work partition and per volume of data that flows through the stream.
See the diagram below summarizing key concepts of Amazon Kinesis.

Source: AWS
When it comes to features, Amazon Kinesis supports Android, Java, Go and .NET. When it comes to performance, it writes each message synchronously to three different machines. However, it allows only days/shards for configuration.
Concepts | AWS Kinesis |
---|---|
Data Warehouse | Athena, Redshift |
Data Retention | Default – 24 hours, 1-7 days (maximum 7 days) |
SDK Support | AWS SDK supports Android, Java, Go, .NET |
Configuration | Days/Shards |
Real-time Store | Amazon DynamoDB |
Cost | Pay and use |
Read reviews of AWS Kinesis data streams.
Azure
Stream Analytics by Azure is a fully managed, event processing engine for real-time analytics, be it a data stream or multiple streams from sources such as social media, sensors, web data sources, and other applications. It delivers low latency, high throughput, and high scalability.
Stream Analytics is designed on a pull-based communication model that offers built-in recovery and checkpointing abilities. The service can also protect data from downstream failure. It supports input types: Stream and Reference data and source types: Azure Event Hubs and Azure Blob Storage.
The diagram summarizes how data is received, analyzed and sent for other actions in Stream Analytics.

Source: Microsoft
The Event Hubs in Stream Analytics can integrate millions of events per second of various formats. Blob Storage can also store data and direct it to Stream Analytics for operations. Currently, Stream Analytics is charged on the basis of volume of data processed and the number of stream units used.
Concepts | Azure Stream Analytics |
---|---|
Data Warehouse | Azure SQL |
Data Retention | - |
SDK Support | Management .Net SDK |
Configuration | - |
Real-time Store | Azure CosmosDB |
Cost | Pay-As-You-Go |
Read reviews of Azure Streaming Analytics.
Google Cloud
Cloud Dataflow is a managed, data processing service that uses data pipelines to ingest, transform and analyze both real-time and batch data. Based on Apache Beam, the service supports Python and Java jobs.
In Dataflow, the events pass through three steps: validation, enrichment, and ingestion. This service streams, processes and stores over 120,000 events per second with a very low latency. Every incoming event is validated and written in partitioned tables in BigQuery.
See the process of dataflow stream and batch processing below.

Source: Google
Google Cloud Dataflow is a great choice for organizations willing to do production-level data processing in the cloud. Users are charged in per-second increments which is based on the actual use of the service. Any other additional Google Cloud resource consumption is billed per that service.
Concepts | Google Dataflow |
---|---|
Data Warehouse | BigQuery |
Data Retention | - |
SDK Support | Apache Beam SDK |
Configuration | - |
Real-time Store | Cloud Bigtable |
Cost | Based on the actual use of Dataflow batch or streaming workers |
Read reviews of Google Cloud Dataflow.
IBM Cloud
IBM Streaming Analytics can manage high data rates and perform analysis with low latency. It can be used to ingest, analyze and monitor data coming from real-time data sources. With IBM Streams, companies can view information and events as they unfold.
The image below summarizes IBM’s Streaming Analytics’ architecture.

Source: IBM
The architecture offers dynamic approach to resource allocation, i.e. organizations can define the maximum number of nodes required to use in their environment, and the service will scale up or down accordingly. This ensures that a company pays only for the resource it uses, while effortlessly monitoring, managing and making informed decisions.
Concepts | IBM Streaming Analytics |
---|---|
Data Warehouse | IBM Db2 Warehouse |
Data Retention | - |
SDK Support | Eclipse SDK |
Configuration | - |
Real-time Store | IBM Cloud Object Storage |
Cost | Based on instance per hour |
Read reviews of IBM Streaming Analytics.
The time is NOW!
The streaming data architecture is in a constant evolution phase. So, before running off to pick any of these solutions, it is important to get a deep understanding of the existing systems and get a clear picture of it. It would be best to note that all of them are great at what they do in their way.
The question however is which one is right for you. To answer this, you must go through the features of all of them and see which one suits best according to your use case and available resources.
Brief comparison: Alibaba Cloud vs AWS vs Azure vs Google Cloud vs IBM Cloud
Concepts | Alibaba Cloud | AWS | AZURE | Google Cloud | IBM Cloud |
---|---|---|---|---|---|
Data Warehouse | MaxCompute | Athena, Redshift | Azure SQL | BigQuery | IBM Db2 Warehouse |
Data Retention | Default – 24 hours | Default – 24 hours, 1-7 days (maximum 7 days) | - | - | - |
SDK Support | MaxCompute Tunnel SDK | AWS SDK supports Android, Java, Go, .NET | Management .Net SDK | Apache Beam SDK | Eclipse SDK |
Configuration | Writer plug-in | Days/Shards | - | - | - |
Real-time Store | ApsaraDB | Amazon DynamoDB | Azure CosmosDB | Cloud Bigtable | IBM Cloud Object Storage |
Cost | Pay-As-You-Go | Pay and use | Pay-As-You-Go | Based on the actual use of Dataflow batch or streaming workers | Based on instance per hour |
READ NEXT: IoT security comparison: Alibaba Cloud, AWS, Azure, Google Cloud, IBM Cloud
Where did you get your info on the IBM Cloud offering? The info you provided is several years old. For more up to date info please see: https://www.ibm.com/cloud/streaming-analytics
Hi Andy, thanks for the feedback. We have updated the blog and confirmed it with the IBM team. In case, we missed out on something here, please let us know and share the accurate link with us.