Query Processing with People and Machines: CrowdDB
The challenge of “Big Data” analytics is not just data size — it’s about issues such as diversity, ambiguity, and incompleteness in both queries and the underlying data.
While advances in scalable data processing are helping to address the data size problem, there remain many important data-centric tasks where humans are more proficient than current state-of-the art algorithms. Crowdsourcing has emerged as a major problem-solving and data-gathering paradigm that provides the ability to leverage human intelligence and activity at large scale. Emerging popular crowdsourcing platforms have programmatic interfaces (APIs) that provide the opportunity to create hybrid human/computer systems for data-intensive applications. An ongoing effort to better understand the development of such hybrid computation systems, the CrowdDB project uses human input via crowdsourcing to process queries that neither database systems nor search engines can adequately answer. While CrowdDB leverages many aspects of traditional database systems, there are also important differences from both an implementation and conceptual perspective.
In this talk, I’ll share an overview of CrowdDB (built with colleagues at ETH Zurich and developed as part of the U.C. Berkeley AMPLab) and developing hybrid human/machine query processing systems. I’ll also share an overview of the broader AMPLab research agenda. AMPLab’s research is supported in part by 18 leading technology companies, including founding sponsors Google and SAP.
Michael Franklin is a Professor of Computer Science at UC Berkeley, focusing on new approaches for data management and data analysis. His recent research projects have included work on data stream processing and continuous analytics, scalable query processing, large-scale sensing environments, data integration, and hybrid human/computer data processing systems.
At Berkeley, Michael directs the Algorithms, Machines and People Laboratory (AMPLab), a cross-disciplinary collaboration taking a new approach to the data analytics problem. He is also the founder and CTO of Truviso, Inc. a real-time data analytics company that enables customers to quickly make sense of diverse, high-speed, continuous streams of information.
Michael is a Fellow of the Association for Computing Machinery and recipient of the NSF CAREER award, ACM SIGMOD "Test of Time" award, and Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. Dr. Franklin received his Ph.D. from the University of Wisconsin and is currently serving as a committee member on the U.S. National Academy of Sciences study on Analysis of Massive Data.
Our work is centered around a series of Focus Areas that we believe are the future of science and technology.
We’re continually developing new technologies, many of which are available for Commercialization.
PARC scientists and staffers are active members and contributors to the science and technology communities.