HiperFuse automates procedures for extracting, cleaning, restructuring and provisioning data, enabling investigators analyzing the data to spend more of their time focusing on what hypotheses to test, correlations to study, and data to mine activities that make the best use of their domain expertise. Big data problems have been put typically into three buckets: volume, velocity, and variety HiperFuse tackles variety primarily and volume secondarily. Users only need to declare the state of the input data and the desired output state in order to have data integrated for their analyses. The state of the input data of interest could be deduced, for example, from the data dictionary and quality control statistics on the dataset. HiperFuse uses a workflow planning and execution engine to discover and automate procedures for extracting, cleaning, restructuring and provisioning data in the environment that houses the data, freeing the user from needing to specify each step. This is valuable to any data analyst working with a variety of datasets not only speeding up analytical research within PARC, but also for external clients doing their own analytics across a variety of datasets.
Huang, E. Automated Data Integration. IEEE International Conference on Big Data.; Washington DC, VA USA. Date of Talk: 2014-10-27