Create an Account
username: password:
 
  MemeStreams Logo

A Comparison of Approaches to Large-Scale Data Analysis

search

possibly noteworthy
Picture of possibly noteworthy
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

possibly noteworthy's topics
Arts
Business
Games
Health and Wellness
Home and Garden
Miscellaneous
  Humor
Current Events
  War on Terrorism
Recreation
Local Information
  Food
Science
Society
  International Relations
  Politics and Law
   Intellectual Property
  Military
Sports
Technology
  Military Technology
  High Tech Developments

support us

Get MemeStreams Stuff!


 
A Comparison of Approaches to Large-Scale Data Analysis
Topic: Technology 7:29 am EDT, Apr 15, 2009

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Sam Madden, and Michael Stonebraker:

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.

Previously, from Stonebraker:

Database management systems are 20 years out of date and should be completely rewritten to reflect modern use of computers.

Recently:

This is a guest post by Russell Jurney, a technologist and serial entrepreneur. His new startup, Cloud Stenography, will launch later this year. The article is an extension of a simple question on Twitter asking the importance of Map Reduce.

A Comparison of Approaches to Large-Scale Data Analysis



 
 
Powered By Industrial Memetics
RSS2.0