The data transformation tool known as ETL, or extract, transfer and load, is slowing down companies’ ability to do real-time data analysis, costing those companies business opportunities and making their analytics inefficient.
That is the result of a survey of 502 IT professionals conducted by IDC on behalf of InterSystems Corp., a high-performance database management vendor. The survey also found that Changed Data Capture (CDC) technology is also slowing companies down and impeding their ability to do real-time data analysis.
ETL is a process that has been around since the 1970s. It is used in data transformation to prepare it for storage and analysis in a data warehouse. It’s especially popular in business intelligence, the forerunner of big data analytics. But it can be a long, CPU-intensive process—and that’s the problem.
The study found that nearly two-thirds of data moved via ETL was at least five days old by the time it reached an analytics database, clearly useless for any real-time analytics. When it comes to CDC, which is supposed to be a real-time data replication technology, the survey found that on average, it takes 10 minutes or more to move 65 percent of CDC data into an analytics database. That’s better but still not suitable for real-time work, and it does not say how large that database is. With big data, data sets are only getting larger.
“This study highlights the importance of concurrent transaction processing and real-time data analytics for improving customer experience, business productivity, operations and more,” said Paul Grabscheid, vice president of InterSystems, in a statement. “InterSystems works directly with organizations across the globe to reduce data management complexity, enable real-time data analysis, and insights at the time critical decisions are made, speeding innovation and driving improved outcomes across the entire enterprise.”
The survey also found that 76 percent of IT executives said the lag in transformation has hurt their business, while 27 percent said it is slowing productivity. Meanwhile, more than half of respondents said slow data is limiting operational efficiency.
Improving real-time data analytics
And IT pros are taking steps to deal with it. More than a third, 37 percent, are looking at new database technologies, while 25 percent are looking to retire old databases, 21 percent are looking to move to SaaS, and 17 percent are considering open source databases.
Part of that problem stems from too many databases. More than 60 percent of those surveyed have five or more analytical databases, and 25 percent have more than 10. And they have to assign one or two DBAs per database to manage them.
This is what IDC calls “The Great Divide” between transactions and analytics. Too many data sources means too much time is spent on transformation. Optimizing around a few databases will speed queries, along with new analytical frameworks capable of sorting through the data.
Plus, very few older databases are designed for anything real-time. They tend to run by batch process during slow times of the day, usually at night. So you can’t teach the old database dog new tricks. Real-time analytics requires new technologies designed for it, not 1970s technology.