Topic > Challenges to Data Integration

One of the key challenges in the data integration process is setting realistic expectations. The term data integration evokes seamless coordination of diverse databases, software, equipment, and personnel into a fully functioning alliance, free from the persistent headaches that characterize less comprehensive information management systems. Think again. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay The requirements analysis phase offers one of the best opportunities in the process to recognize and digest the full scope of the complexity of the data integration task. In-depth attention to this analysis is perhaps the most important ingredient in creating a system that will live to see adoption and maximum usage. As the field of data integration progresses, however, other common obstacles and compensating solutions will be easily identified. Current integration practices have already highlighted some family challenges and strategies to address them, as described below. Heterogeneous Data Challenges For most transportation agencies, data integration involves synchronizing massive amounts of variable, heterogeneous data from internal legacy systems that vary in data format. . Legacy systems may have been built around flat files, networks, or hierarchical databases, as opposed to new generations of databases that use relational data. Data in different formats from external sources continues to be added to legacy databases to improve the value of the information. Each generation, product, and internally developed system has unique needs to address when storing or extracting data. Therefore data integration may involve various strategies to deal with heterogeneity. In some cases, the effort becomes a major data homogenization exercise, which may not improve the quality of the data offered. Strategies A detailed analysis of data characteristics and uses is necessary to mitigate problems with heterogeneous data. First, a model (a federated or data warehouse environment) is chosen that meets the requirements of business applications and other uses of the data. So the database developer will need to ensure that various applications can use this format or, alternatively, that standard operating procedures are in place to convert the data to another format. Bringing disparate data together into a database system or migrating and merging highly incompatible databases is a painstaking task. work that can sometimes seem like an overwhelming challenge. Fortunately, software technology has advanced to minimize obstacles through a set of data access routines that allow structured query languages ​​to access nearly all DBMs and data file systems, relational and non-relational. Poor data quality is a major concern in any data integration strategy. Legacy data must be cleansed before conversion and integration, otherwise an agency will almost certainly face serious data issues later. Legacy data impurities have a cumulative effect; by nature, they tend to cluster around users of large data volumes. If this information is corrupt, so will the decisions made from it. It is not uncommon for previously undiscovered data quality issues to emerge in the process of cleaning information for use by the embedded system. The problem of bad data leads to procedures to regularly check the quality of the information used. But it is not always clear who has ultimate responsibility for thisWork. Strategies: The problem of data quality exists throughout the life of any data integration system. So it is best to establish both practices and responsibilities from the beginning and expect each to continue forever. The best processes result when developers and users work together to determine the quality controls that will be put in place in both the development phase and ongoing use of the system. Lack of storage Capacity challenges The unexpected need for additional performance and capacity is one of the most common challenges for data integration, particularly in data warehousing. Two storage requirements generally come into play: extensibility and scalability. Anticipating the magnitude of growth in an environment where the need for storage can increase exponentially once a system goes live raises concerns that storage costs will outweigh the benefits of data integration. Introducing such massive amounts of data can push the limits of hardware and software. This could force developers to undertake expensive solutions if an architecture for processing much larger amounts of data had to be adapted to the planned system. Strategies Alternative storage is becoming routine for data warehouses that are likely to grow in size. Planning for these options helps keep growing databases accessible. The cost per gigabyte of hard drive storage continues to decline as technology improves. From 2000 to 2004, for example, the cost of data storage decreased tenfold. High-performance storage disks are expected to follow the downward price spiral. Unexpected Costs Challenges Data integration costs are driven largely by elements that are difficult for the uninitiated to quantify and therefore predict. These might include: Labor costs for initial planning, evaluation, programming, and further data acquisition Software and hardware purchases Unexpected technological changes/advances Both labor and direct data storage and maintenance costs It is important to note that regardless of efforts to simplify maintenance, the reality of a fully functioning data integration system can require much more maintenance than you might anticipate. Unrealistic estimates can be driven by an overly optimistic budget, particularly in these times of budget deficits and doing more with less. More users, more analytics needs, and more complex requirements can cause performance and capacity issues. Limited resources may cause project timelines to extend without commensurate funding. Unexpected problems or new problems may require expensive consultation. And you have to take into account the dynamic atmosphere of today's transportation agency, where staff shortages, changes in business processes, problems with hardware and software, and changing leadership can lead to additional expenses. The investment in time and manpower required to extract, clean, load and manage data can creep in if the quality of the data presented is weak. It is not uncommon for this to produce unexpected labor costs that are quite alarmingly disproportionate to the total project budget. Strategies The approach to estimating project costs must be forward-looking and realistic. This requires an investment in experienced analysts, as well as cooperation, where possible, between sister agencies based on lessons learned. Special effort should be made to identify items that may seem unlikely but could have a dramatic impact on the total cost of the project.project. Extraordinary attention to planning, investing in skills, gaining stakeholder buy-in and participation, and managing the process will help ensure that cost overruns are kept to a minimum and, when encountered, can be resolved appropriately. more effective. Data integration is a fluid process where such overruns can occur every step of the way, so trained staff with careful supervision is likely to return dividends rather than increase costs. A viable approach to data integration must recognize that the better data integration works for users, the more fundamental it becomes to business processes. This level of use must be supported by constant maintenance. It may be tempting to think that a well-designed system will, by its nature, work without too much maintenance or modification. Indeed, the best systems and processes tend to thrive on the routine care and support of well-trained staff, a fact that wise managers generously anticipate in the data integration.plan and budget. Lack of cooperation from staff. Challenges User groups within an agency may have developed databases on their own, sometimes independently of information systems staff, that are highly responsive to users' particular needs. It's natural that owners of these functioning self-contained units may be skeptical that the new system can support their needs as effectively. Other proprietary interests may come into play. For example, division staff may not want the data they collect and track to always be transparently visible to headquarters staff without the ability to address the nuances of what the data appears to show. Owners or users may fear that management, without appreciating the peculiarities of a given operating method, will gain more control over how data is collected and accessed organization-wide. In some agencies, the level of staff, advisors and financial support from the highest levels of management may be insufficient to allay these fears and gain cooperation. Top management must be fully involved in the project. Otherwise, the strategic data integration plan and associated resources are less likely to be approved. The additional support needed to engage and convey the need for and benefits of data integration to everyone in the agency is unlikely to come from leaders who have no awareness of or commitment to the benefits of data integration. Strategies Any large-scale data integration project, regardless of model, requires executive management to be fully involved. Without it, the initiative risks, quite simply, failing. Informing and involving the diversity of actors during the crucial requirements analysis phase, and then in every subsequent phase and step, is probably the most effective way to gain consensus, trust and cooperation. Gathering and addressing each user's concerns can be a daunting undertaking, particularly for knowledgeable information professionals who prefer to "cut to the chase." However, without a personal stake in the process and a sense of ownership of the final product, the long-term health of this important investment is likely to be compromised by users who feel that the change has been forced upon them rather than designed to advance their interests. needs. interests. Incremental training, another advantage