Tuesday, June 29, 2010

eigenvector and Pagerank

The dominant or principal eigenvector of a matrix is an eigenvector corresponding to the eigenvalue of the largest magnitude of that matrix.

Today, I just read a passage by chance talking about using power method to compute eigenvector. Power method is also called vector iteration, which suddenly struck me on pagerank's computation. here is the pagerank formation:

$R^(k+1)=dWR^k+(1-d)E$


and then let's look at how to use power method to compute it. from here.

Eigenvalues can be ordered in terms of their absolute values to find the dominant or largest eigenvalue of a matrix. Thus, if two distinct hypothetical matrices have the following set of eigenvalues

  • 5,8,-7: then |8|>|-7|>|5| and 8 is the dominant eigenvalue.
  • 0.2,-1,1: then |1|=|-1|>|0.2| and since |1|=|-1| there is no dominant eigenvalue.
One of the simplest methods for finding the largest eigenvalue and eigenvector of a matrix is the Power Method, also called the Vector Iteration Method. The method fails if there is no dominant eigenvalue.
In its basic form the Power Method is applied as follows:
  1. Asign to the candidate matrix an arbitrary eigenvector with at least one element being nonzero.
  2. Compute a new eigenvector.
  3. Normalize the eigenvector, where the normalization scalar is taken for an initial eigenvalue.
  4. Multiply the original matrix by the normalized eigenvector to calculate a new eigenvector.
  5. Normalize this eigenvector, where the normalization scalar is taken for a new eigenvalue.
  6. Repeat the entire process until the absolute relative error between successive eigenvalues satisfies an arbitrary tolerance (threshold) value.
It cannot get any easier than this. Let's take a look at a simple example.

Power Method.


What we have done here is apply repeatedly a matrix to an arbitrarily chosen eigenvector. The result converges nicely to the largest eigenvalue of the matrix; i.e.
Equation 6: AkXi = cik*Xi
Figure 7 provides a visual representation of the iteration process obtained through the Power Method for the matrix given in Figure 3. As expected, for its largest eigenvalue the iterated vector converges to an eigenvector of relative coordinates (1, 0.20).
Iterated Eigenvector.
Figure 7. Visual representation of vector iteration.
It can be demonstrated that guessing an initial eigenvector in which its first element is 1 and all others are zero produces in the next iteration step an eigenvector with elements being the first column of the matrix. Thus, one could simply choose the first column of a matrix as an initial seed.
Whether you want to try a matrix column as an initial seed, keep in mind that the rate of convergence of the power method actually depends on the nature of the eigenvalues. For closely spaced eigenvalues, the rate of convergence can be slow. Several methods for improving the rate of convergence have been proposed (Shifted Iteration, Shifted Inverse Iteration or transformation methods).

For more information about power method, look at here.

This method is so similar to Pagerank. Actually, for pagerank, we compute a web graph's dominant eigenvector, which represents the graph. It reduces the dimension from n to 1, n is the number of web pages. So the dominant eigenvector has n elements, each represents a web page, and the value indicates the importance of that page.

Saturday, June 19, 2010

Hiking around Quabbin Reservoir


A very interesting hiking day spent in Quabbin Reservoir. We started off at 10:30am. Although spending a lot of time on finding the correct entrance, it didn't affect our mood. Actually my muscle is quite tight and sour for yesterday's training, but anyway it is still OK for a strong man.











We also played killer game. A day with great fun.

Sunday, June 13, 2010

Yale trip

June 12th-14th, Yale Medical School Campus. Sadly, I am an experiment sample for a physical muscle research. We arrived at New Haven at 9pm, 12th, spent some time on surfing Internet before going to bed. What a tragedy, I lost my sleep in the first night, this is a common case for me when traveling to a new place... Getting up so early at 6am in the second morning, and then I stayed in the MRI hole for another 2 hours. Fortunately, this time I slept well. Followed by another MRI scan for 1 hour, I am done for the first half day, in the afternoon, a training session waiting for me.

1:00pm-2:45pm, I was supposed to be preparing for the later training session. But I asked to Ryan to let me get out and walk around during this time. Oh, year, I was on the way to Yale main campus. The followings are the photos what I took on campus...


New Haven hotel is just in front of my accommodation.

I don't know what it is, it seems famous...

I am on a large square, this man's status looks very popular...



This video was taken when I was in the square. I like this place, very peaceful, very quite...

I am on that square. Take a notice the building behind me, it is the symbol building of Yale.

I don't even know this building's name...anyway, just take it, never regret.

It should be auditorium or something...um..probably..

Library, I know this. You may notice that the windows are made by marble instead of glass. it is said that, the sunshine can filtered through marble, which make Yale University's library quite special...

In Memory of the Men of Yale who true to Her Traditions gave their Lives that Freedom might not perish from the Earth. 1914 Anno Domini 1918. ...

I like this style...


It feels like Tsinghua University's lawn square, isn't it?

Famous Yale's water table...


nice gallery...


Friday, June 11, 2010

This is Brothers -- Celtics has a brilliant game over Lakers

OH MY GOD, what a great victory!!!

When substitute Celtics rout luxurious Lakers, I saw "this is BASKETBALL". It is five, it is passion, it is combatant spirit, they are fighters. At that time, I think they are the best symbol of basketball, they give us a best explanation of basketball.


From the beginning of fourth quarter to the last 2 minutes, most time of the last quarter, Celtics substitute players (except for Ray Allen), had a brilliant performance, totally over Lakers, over Kobe. Big baby is so passionate that everybody had to love him. Robinson's great job out-played his faults. Tony Allen's impenetrable defense makes Kobe dark. Wallace's passion is no weaker than big baby, and he gave Lakers a unbelievable 3 points shoot. Ray Allen is a calm stone, his experience and coolness was done to a turn.

Beginning with this game, I love Celtics to be a fan of them.

Thursday, June 3, 2010

Finding Memory Leaks in Java Apps

Here is a small HOWTO on how to find memory leaks with Java SE.
I’ve written it while trying to find memory leaks in our testing tools: JTHarness and ME Framework, and then wanted to share the HOWTO with the world, but I didn’t have my blog at that time, so I posted this info as a comment to a relevant entry in the excellent Adam Bien’s blog.
Note: Use the latest JDK 6, because it has the latest tools, with lots of bug fixes and improvements. All the later examples assume that JDK6’s bin directory is in the PATH.

Step 1. Start the application.

Start the application as you usually do:
java -jar java_app.jar
Alternatively, you could start java with hprof agent. Java will run slower, but the huge benefit of this approach is that the stack traces for created objects will be available which improves memory leak analysis greatly:
java
  -Dcom.sun.management.jmxremote
  -Dcom.sun.management.jmxremote.port=9000
  -Dcom.sun.management.jmxremote.authenticate=false
  -Dcom.sun.management.jmxremote.ssl=false
  -agentlib:hprof=heap=dump,file=/tmp/hprof.bin,
   format=b,depth=10
  -jar java_app.jar
When the application is up, perform various actions that you think might lead to memory leaks.
For example, if you open some documents in your app, the memory graph could rapidly go up. If closing the docs and invocation of full garbage collection did not bring the memory back to normal level, there is probably a leak somewhere.You might use jconsole from JDK 6 to see the memory consumption graph to have a clue whether memory leak is present or not:
jconsole
It will pop up a dialog with a list of Java applications to connect to. Find the one with java_app.jar and connect. Also, jconsole allows to invoke full GC providing nice button just for that.

Step 2. Find the application pid.

Find out the application’s process id via:
jps
It will print something like:
15976 java_app.jar
7586 startup.jar
22476 Jps
12248 Main
5437 Bootstrap
In our case the pid is 15976.

Step 3. Dump the heap into file.

Dump heap into the file:
jmap -dump:format=b,file=/tmp/java_app-heap.bin 15976
We just told jmap to dump the heap into /tmp/java_app-heap.bin file, in binary from (which is optimized to work with large heaps). The third parameter is the pid we found in Step 2.
Alternatively, if you started java with hprof agent, you could just use Ctrl-\ on Solaris/Linux or Ctrl-Break on Windows to dump heap into the file, specified in hprof agent arguments.

Step 4. Visualize the heap.

Use jhat tool to visualize the heap:
jhat -J-Xmx326m /tmp/java_app-heap.bin
Jhat will parse the heap dump and start a web server at port 7000. Connect to Jhat server by pointing your browser to:
http://localhost:7000
And start investigating. :)
Jhat allows you to see what objects are present in the heap, who has references to those objects, etc.
Here are some tips:
  • Investigate _instances_, not _classes_.
  • Use the following URL to see the instances: http://localhost:7000/showInstanceCounts/
  • Use “Reference Chains from Rootset” (Exclude weak refs!!!) to see who’s holding the instance.