A data stream management system (DSMS) is a computer
software system which handles continuous data streams. A data stream is nothing
but a sequence of rows(tuples) where each row contain the information in the
form of attributes. In transactional data streams cares about logging the information
passed between two parties like using credit card, a transaction occurs (purchases)
between consumers and merchants, telecommunications, web access etc. And in measurement
data streams focus on checking the change in the state of the system like using
sensor networks to
monitor parking lots, traffic, weather conditions etc. Data models used in DSMS
are relation-based data models like STREAM, TelegraphCQ and Borealis;Object-based
data models like COUGAR, Tribecca. There are window based on the which
direction of the endpoints like fixed, sliding, landmark windows ;Time-based
vs. Tuple-based data models are also used;Update interval based data models
like eager for new arriving data, batch processing, non-overlapping tumbling
windows etc. Some of the priniciples and charactestics of data streams used in
DSMS to manage this streaming process are volatile data streams whereas in case
of DBMS, it will maintain persistent data. DSMS follows a step by step access
in the process of executing the continuous query. In this, query will be
continuously processed and executed throughout, till it is removed. As the new data comes in,
it generates new outputs since it is continuous query processing system. With
this, it handles the streaming of varying data instantly. Since it performs
continuous query execution, it might not using any blocking operators while
generating the ouptuts by taking the entire data. However, DSMS uses non
blocking operators, where query need not to know the whole input There are many
datastream applications in real time like sensor network which are helpful in
checking applications (for example, setting alarm alert if any wrong in the
system);Network traffic analysis which helps to know traffic at nay point to
any one in any place at the same time via an internet; Financial Tickers which
help in analyzing and checking the changes in the stock prices instantly; Transaction
Log Analysis where it focuses on streams of all web content and telephone calls.
GSQL (Gigascope), CQL (STREAM), EPL (ESPER) are the data stream query languages
based on different data models which has operations like selections and projections, aggregations, joins.
DSMS uses adaptive query operators and plans
like Query scrambling where wide-area
data is accessed, eddies, borealis etc
Continuous query processing and execution helps a
lot in real life time applications.
Analysis of huge amount of data is done in very
It manages whole end to end data streams.
DSMS not only accepts relational queries but
also queries related to pattern matching and event processing which are used more
in real world applications.
Unlike DBMS which has unlimited memory storage,
DSMS has limited main memory.
DSMS considers the order of input and time
focused applications and operations like query of 10 minutes window time where
ordering is an issue in case of distributed systems.
DSMS relies on the summary structures of data
streams which are usually called as Synopses as it do not have the enough storage for a whole data
strame. And thus, the results we get by executing the queries from those
summary structures won’t give the correct outputs.
Because of the storage issues, algorithms are
allowed running only once on the data and hence, backtracking is not possible.
Systems and software requirements may vary and
causes issues when we execute the query which are processed for long time.
Query Optimization is bit difficult as it
includes continuous adaptive optimization which sometimes leads to situation
where resource constraints are hard to meet.
Scalability issues should be dealt properly by
sharing the execution of this continuous queries
Consistency and concurrency control while query
processing is not easy in case of continuous queries.
Feedback, opinion on the topic:
DSMS plays an important role in real time applications like
detecting tornado using weather data where it deals with the updated queries
every time. It is used for the analysis of Stocks in business and their trend
of ups and down which is a complex querying process. And also used in checking
the performance of internet worldwide renowned as PingER project. It can even
analyse the performance of disks. Large amount of data can be analysed to make
benefit out of them using Data Stream Management System.