Tagged with " url"

The Winner is Jichun Wang

Dec 4, 2011 by     1 Comment     Posted under: Code, Monthly Contest, New Technologies, Tips & Techniques

The winner of our last programming challenge is Jichun Wang. Jichun is a Sun Microsystems alumnus, now a senior software engineer at Synopsis. He uses perl to call a Google RESTful API and to parse the JSON outcome from google to get the result counting of google search. You can find his program here on my SAS_ACADEMY group.

Because the API he used is deprecated, you can find there is a significant difference in the counting using this API and that you get by googling directly in browser; however, he got the order correct!

A-Hero Cnt By API Cnt By Browser
Batman 30,600,000 332 M
Iron Man 18,700,000 266 M
Superman 13,900,000 184 M
Spiderman 13,000,000 122 M

October Programming Challenge: Ranking American Heroes

Oct 5, 2011 by     2 Comments    Posted under: Code, Monthly Contest

At the beginning of each month, I follow the update of TIOBE Index very closely. TIOBE Index is a measurement of popularity of programming languages based on the result of search engines. By the way, SAS ranks 24 there last month. The way TIOBE Index is defined inspires me so I come up with the following game to play for this month:

Use any programming language you feel comfortable to rank the popularity of the following American heroes (each picture contains a link pointing to its Wikipedia entry):

Superman Batman Spiderman Ironman

Send me your source code by 31/10 to win a $25 gift card.

As for our previous challenge Motto of Hogwarts School, the winner is Jiangtang Hu. This is Jiangtang’s second time to become the winner of our Programming Challenge. Congratulation! :) His method is to tap into an online Latin-English dictionary. Other contenders approach this problem by using Google translate or digging into the search result. As said last time, the Google Translate does not give a correct answer. For example, the following perl one-liner
use LWP::UserAgent;$ua=LWP::UserAgent->new;$ua->agent('Mozilla/6.0');
print $ua->request(HTTP::Request->new('GET','http://translate.google.com/translate_a/t?client=xxxxxxx&text=draco%20dormiens%20nunquam%20titilandus&sl=la&tl=en'))->content;

returns “the dragon never sleeping titilandus” in the JSON format. To me the most Q&D solution is found in the following perl one-liner:
use LWP::UserAgent;$ua=LWP::UserAgent->new;$ua->agent('Mozilla/6.0');
foreach(split /\n/,($ua->request(HTTP::Request->new('GET','http://www.google.com/search?&q=draco+dormiens+nunquam+titillandus')))->content){
if(/mean/i){s/]*>//g,s/^.*means\s+&quot\;([^&]*)&quot\;.*$/\1/;print;}}

Programming Challenge: Web Crawling (L1)

Apr 5, 2011 by     1 Comment     Posted under: Monthly Contest, New Technologies

First of all, Megha Agarwal is the winner of our first programming challenge: Web Crawling (Level 0). She gets the laureate list for every year together with the hyperlinks. I will publish her code together with my comments on this challenge soon.

This time I would like to move one level deep into the web by chasing those links we got from our first challenge. Start from the same domain http://nobelprize.org/nobel_prizes/physics/laureates/. Take 1999 for example. If you click the year, you get to the next web page where you find a brief intro of the achievement of ‘t Hooft and Veltman: “for elucidating the quantum structure of electroweak interactions in physics”.

So here is the challenge: Again write in any programming language you feel most comfortable to loop through the hyperlinks from 1901 to 2010, extract the achievement for each year’s laureates from the next level.

Hint: Navigate to, say the page for 1999. Try the following javascript oneliner as a testing code in your browser address bar:

javascript:alert(document.getElementsByClassName('ingress motivation')[0].innerHTML)

Please send your code to me. The deadline is May 1, 2011 and the prize is still a $25 Fry’s gift card again!

Programming Challenge: Web Crawling (Level 0)

Mar 5, 2011 by     3 Comments    Posted under: Monthly Contest, New Technologies

Yes, SAS can read web pages by http request. This means you can retrieve information and analyze data directly from web. Isn’t it cool? :)

Alright, here is the game: Take the web page “All Nobel Prizes in Physics” (sorry for the idiosyncrasy in this choice :) ) and focus on the list of year and the laureates for each year. How can you get the simple descriptive statistics like “how many years there is no Nobel Prize for Physics awarded”, “how many years there are three laureates”, et cetera from this list?

  • A programmer’s answer: Write a program
  • A SAS programmer’s answer: Write a SAS program
  • The most unacceptable answer: Copy and paste and count by hand :)
  • Answer a la copy and paste but a little smarter: Copy and paste into vi. Then use the command

    :%s/^\(\d\+\)\s*\n/\1\,/g

    and save the result as a csv file. The first three lines of this csv look like

    2010,Andre Geim, Konstantin Novoselov
    2009,Charles Kuen Kao, Willard S. Boyle, George E. Smith
    2008,Yoichiro Nambu, Makoto Kobayashi, Toshihide Maskawa

    As long as you get csv the job is done ’cause SAS can handle the rest for sure.

OK. Now seriously, here is the challenge. Write in any programming language you feel most at home a program to read this webpage, such that:

  • The input is the url.
  • The output shall contain the link behind the year, the year, and the list of laureates (or a note saying there is none) for that year.
  • The result shall be computer readable, for example, sas dataset or csv file:

    “/nobel_prizes/physics/laureates/2010/”,2010,”Andre Geim, Konstantin Novoselov”
    “/nobel_prizes/physics/laureates/2009/”,2009,”Charles Kuen Kao, Willard S. Boyle, George E. Smith”
    “/nobel_prizes/physics/laureates/2008/”,2008,”Yoichiro Nambu, Makoto Kobayashi, Toshihide Maskawa”

Some notes:

  • A research on the html source underneath this particular web page is quite necessary.
  • I call this level 0 web crawling ’cause only one single web page is involved
  • It is better we set a cutoff date—let’s take one month.

Have fun and happy coding :)

Update (3/8/2011): If you like to participate, you can send your code to
me. You can check my credential from my LinkedIn profile. Even though for now this challenge is just an initiative, some of my
colleagues are very enthusiastic about making it into a formal event. So stay tuned ;)

Update (3/10/2011): Thank Sophie, Melina and Leila. Now this challenge goes formal. The deadline is April 1, 2011 and the prize is a $25 Fry’s gift card!

Check out the BioNews, a very handy daily recap of the latest industry news!