Create an Account
username: password:
 
  MemeStreams Logo

MemeStreams Discussion

search


This page contains all of the posts and discussion on MemeStreams referencing the following web page: Timefire: On Reducing the Size of Compressed Javascript (by up to 20%). You can find discussions on MemeStreams as you surf the web, even if you aren't a MemeStreams member, using the Threads Bookmarklet.

Timefire: On Reducing the Size of Compressed Javascript (by up to 20%)
by Acidus at 1:55 pm EDT, Aug 19, 2009

One idea I started to think about was to repurpose Document Clustering techniques towards code. Document Clustering is commonly used in information retrieval systems to find related documents. Typically, a document is encoded using some technique to measure word importance, such as representing each word by its term frequency inverse document frequency. Then, any two documents can be compared by some distance metric, for example, taking the tf-idf weightings of terms as a vector in N-space and computing the cosine between them.

In this case, we'd let each function be a separate document, and the entire program be like the corpus of documents. We'd then choose some encoding to weigh Javascript grammar nodes by importance in a way that would produce good LZ77 matches, and then proceed in a bottom-up clustering fashion. First, we'd construct all the pairs of functions which match best. Pick a function, pair it with its best match, call that Cluster 1. Pick another function, pair it with its best match, call that Cluster 2, and so on. After this procedure is done, pick a Cluster, and find its nearest Cluster (according to some metric) and pair them up in a Cluster of 4 functions. After that's done, pair up 4-Clusters into Clusters of 8, and so on, until the final cluster encompasses the whole program.

The idea to rearrange the layout of functions in a JS file to better utilize the sliding window characteristics of the Deflate compression algorithm is very sexy!


 
 
Powered By Industrial Memetics