Tech

Guides
 

What feeds data into enterprise systems

By Justin James, Special to ZDNet Asia
Thursday, March 06, 2008 02:36 PM

Often, work on a project is often about getting the right data into the system, so it pays to know the origin of the data.

As the IT industry matures, the value add to data changes. It used to be that having any data was the value proposition. After all, very few organizations had computers, let alone data that could easily be retrieved or searched or that was otherwise useful.

As computers became more widely adopted, the ease of access became the differentiating factor. The availability of the RDBMS (relational database management system) has enabled businesses to access data in a standardized way and allowed the database vendor to do a lot of the worrying about transaction integrity and performance, but enterprises still had to generate the data.

In the current environment, data comes from all over the place and much of it is standardized and/or commoditized. So where does all of this data actually come from, and why does it matter where it comes from? I'll answer these questions in reverse order.

It is important to know where the data comes from because much of the work on a project is often about getting the data into the system. The origin of the data affects things such as the trustworthiness of the data, how much scrubbing is needed, and what kinds of transformations may be applied. In short, where the data comes from dictates much of the work that needs to be done--even if the data is already neatly placed in a SQL database by the time your software sees it.

You may think that the answer to the question "where does the data come from" is obvious, but it's not anymore.

Even as recently as 10 years ago, much of the data came from one of a few sources. All of these sources had in common the relationship that was established through a different channel from the data exchange. There is a measure of trust that is implicit when that occurs, which is much like the way public and private keys work in the encryption world. You trust a data vendor's weekly updates because you saw their building and know that they're not a fly-by-night company; or you trust a user because you know that Bob the systems administrator set up the user's account in person.

Many applications now use a great number of automated and untrusted data sources. When a data tape came in from a vendor and a manual process was needed to load the data, there were a lot of chances to see that the data might be bad, such as a file size being much different from the previous data tape or eyeballing the data for things that are obviously wrong. When the data comes in automatically or is linked in real-time through a Web service, the opportunity for spot-checking the data is lost. Another example of this effect is that we are allowing anonymous users (or users who sign up within the application itself, with no verification of identity) to add data to the system, which is subsequently used by and shown to other users.

While there is nothing inherently wrong with this, it does require a number of additional layers of protection; these layers are inconvenient to build and are often skipped in the rush to ship. In the period of time since the Web application boom, we have seen the rise of SQL injection attacks, followed by cross-site scripting (XSS) and cross-site forgery (XSF) attacks. The prevalence of XSS and XSF attacks has already forced security-conscious programmers to reduce functionality. The accuracy of the information in and of itself needs to be verified if the data is to be used to make business decisions, not to mention legal liability.

In the modern data equation, there are data initiators, such as the people providing the raw map data; companies measuring overall warehouse shipments to distributors; someone typing in the last stock sale price (or agreeing to the sale price electronically), and so on. This is carried by a data aggregator who transforms data from disparate data initiators into a "single stream of consciousness".

Think of Google taking all sorts of geotagged data and putting it onto maps, IMS' pharmaceutical databases, or Yahoo! Finance having all the stock prices from around the world. The data consumer gets the data from the aggregator and performs the business-specific transformations, like the Web developer embedding the Google Map onto the site to provide directions, the incentive compensation software calculating the bonus for salespeople, or the day trader making purchasing decisions.

Developers can still add value, and companies can still make money in this ecosphere.

The initiators can be very software centric. Better software can make the data cleaner, more accurate, and more plentiful, which means that the initiator can make more money. After all, much of that data is manually created. Software with better usability or a wider user base or that is easier to hook up to an aggregator will rise to the top.

The aggregators have the boring work, but there is tons of money in it. It is such a pain in the neck doing the work that the market is typically dominated by only a few of these companies. To the data consumer, the value that was added is that they only need to program against one data feed and possibly join the data tables to a source identification table. The data initiators are saved the hassle of trying to sell their data to thousands of customers; instead, they cut a deal with a few aggregators. Sure, they make a bit less money, but it is worth not having to manage thousands of small customers. The consumers pay more for the data, but it is still cheaper than paying a developer for his time — and it's more reliable.

The consumers' developers potentially have the most interesting work. I say potentially because, all too often, the software that gets put in front of users is barely more than a database browser that enforces a few business rules. There is a wide gap between business rules and business logic.

There are billions of dollars to be made by selling software that promises to tame the data from dozens of various aggregators (turning the enterprise into a meta-aggregator or a mega-aggregator) and perform business logic on it. This is what all of these reporting portals, business intelligence suites, analytic tools, ERP systems, CRM systems, and so on are about. Anyone who has been involved with these initiatives can tell you about the difficulty of making one successful. After all, the project is so large that by the time you are finished implementing it, all of the underlying requirements and logic have changed anyway.

My personal preference for programming is to work on computational rather than transactional systems; however, the bulk of available work is primarily transactional. By understanding the relationship between your application and its underlying sources of data, you are well on your way to writing a better application.



WORTHWHILE?

0

0 votes
Blog

Talkback 0 comments

There are currently no comments for this post.

Guest user

Guest user

Level: 
Joined: —
Already a member? Log in »



 

Loading...

  • HPC Applications

    Ever wondered if High Performing Computing systems really matter in our day-to-day world? Let Dr David Scott from Intel take you a for quick tour on developing HPC applications.
    Play video


  • Maximize IT Spend: Business Acceleration

    How do you ensure your IT solutions are well integrated and streamlined across your enterprise? Rajen from Oracle highlights the important considerations ...
    Play video


  • HPC Architecture: Explained

    Why is High Performance Computing increasingly in demand in today's businesses? Find out which is the most widely deployed HPC architecture today.
    Play video

Asia earthquakes heighten BCP need

Blog thumbnail

It's hard not to notice the earthquake risk around the Pacific Rim these days. I'm not sure if the risk is actually higher, or if I'm just noticing it more...... by Nathaniel Forbes

Read more »

Whitepapers / Case Studies

Downloads

Enterprise Servers & Storage News


Tech Jobs Now!

Tags

  1. access
  2. backup
  3. data
  4. date
  5. determine
  6. excel
  7. feature
  8. install
  9. keep
  10. list
  11. mailing
  12. microsoft
  13. mobile
  14. processes
  15. project
  16. security
  17. server
  18. service
  19. should
  20. sql
  21. tasks
  22. text
  23. time
  24. title
  25. use
  26. ways
  27. what
  28. windows
  29. wizard
  30. word