Archive for the 'Search' Category


Google Press Day 2006 and Google Co-op

Sunday, May 14th, 2006

Matt Cutts has done a nice write up of Google’s Press Day, including a lot of the Q&A.

One thing that caught my eye is Google Co-op. It appears to be Google’s take on expert/topic/vertical search. The principle is that you subscribe to a topic and you tag your results. Their ‘tag’ is known as a ‘label’ and an ‘annotation’ is matching a label to a web page or set of web pages. People can then subscribe to you to see your tagging for a given topic. Rather than having to find the best taggers yourself, there is a directory that finds them for you. Who is a ‘good’ tagger and hence gets displayed in the directory is automated rather than having manual editors. It’ll be interesting to see how this type of vertical search works out compared to systems such as Swicki, who rely on personalisation a lot more than just bulk tagging.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Google Press Day 2006 and Google Co-op digg.com digg it!  |  reddit reddit!

Looking for a good Java search engine

Saturday, May 6th, 2006

The title is a bit misleading. I’m not actually looking for a search engine written in Java, if I was I’d head straight to Nutch. What I’m looking for is a search that just covers Java articles and tutorials. The company I work for has software to build vertical search engines, in fact I wrote most of it, so I really should be eating my own dog food. But you know how it is, the cobbler’s children never have any shoes, so while I wait for some servers to free up I decided to have a go with similar systems that are freely available on the web.

The two sites I’m trying out are Rollyo and Swicki. Both sites let you specify a list of web sites and search across them. The sites I used for my test were:

Rollyo

First up is Rollyo. It’s very simple to setup the search engine, I won’t bother going into the specifics because anyone could figure it out. Rollyo is actually backed by Yahoo! search, what soon becomes clear is that it’s just a site restricted search in Yahoo. Two of the sites on my list actually cover many things and I just wanted a sub-directory of each, Rollyo searches the whole site, which is not what I want.

My test search is concurreny deadlock detection. The first two results are of the same page, there are plenty of other duplicates, and I get non-Java results back, e.g. ‘DB2 for z/OS: DB2 Universal Database concurrency‘. So far not so good. A few more searches turns up similar results. Restricting the set of data I want to search, and getting rid of duplicates is basic functionality, Rollyo fails on both counts and won’t get any more of my time.

Rollyo - Java Articles

Swicki

Next up is Swicki. First difference, they have their own crawler, which means that have a lot more control over the data. Second is they are community focused, i.e. a group of like minded people contribute sites to the search engine, rather than you building it up yourself. Setting up the search was simple, and I didn’t have to create an account either (although I did so I could keep my search engine). Swicki also says you can search just parts of sites and you don’t have to do any special configuration either, just make sure the directory is included in the URL.

So how did concurrency deadlock detection fair this time? A lot better than Rollyo. There were plenty of relevant links, no duplicates and Swicki actually covered more sites than I selected, but they appear to be relevant so that’s fine with me. Was it perfect? No because some database deadlock information crept in, which I wasn’t interested in. But since they control the data set (remember they crawl the web themselves) you can personalise the system. They claim it learns from your behaviour, but I haven’t used it enough to verify that claim, but what I can see is that for every result you can promote it, or the site, and delete the result or the site it comes from. So over time the results will get better.

So first impressions are good. I’ll have to use it a bit longer to see if it does learn. I’ll try to stick a search box to it on this site, but since I didn’t make this design and I’m not great with CSS, it might be a while, in the meantime, here’s the link:

Swicki - Java Articles

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Looking for a good Java search engine digg.com digg it!  |  reddit reddit!

Google Movie Search

Saturday, May 6th, 2006

I came across this the other day, it’s now possible to use Google to find out what films are showing at cinemas close to you. If you punch ‘movie’ into the search box you get another box appearing above the results asking for your location:

Enter your location and it brings up a list of current releases at the cinemas closest to you. If you want information on a specific film you can enter:

movie: [movie name]

into the search box and it’ll bring up the cinemas, the show times and a link to a map. It’s much better than all the other film searches for London, give it a go.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Google Movie Search digg.com digg it!  |  reddit reddit!

Tracking the Internet into the 21st Century

Thursday, March 9th, 2006

On Tuesday night I went to see Vint Cerf speak at Google’s Open House in London. As one of the guys who started the Internet it was interesting to see where he thought it was going next.

How the Internet is Changing He first covered how the Internet is changing, who’s connected, what they’re connecting with, etc. I was surprised to hear that Asia already has the most people connected, although less than Europe and North America combined. But with most of the World’s population eventually they will have the most people connected so we should expect a shift in culture and content of the Internet.

At the moment there are about one billion people connected to the net. There are about three billion fixed telephone lines and already two billion mobile phones. The number of mobiles is growing while the fixed lines are static so there should be a shift to more mobile connections (the consequences of that are covered later), but this is countered by the speed of the connection. Vint stressed that mobiles are going to become a much more important part of the Internet, but I think the speed and cost of the connection has to change dramatically first.

Broadband is becoming more prevalent, which means more and more people are online all the time with fast connections. He came up with the term ’21st Century dialtone’ to describe how an Internet connection is something we assume to be there. A consequence of this is that you can provide more every day services over the Internet.

A thought one such example of this is VOIP, turn the broadband connection into a cheap phone. But Vint brought up a very good point, the money is made in the Internet/POTN bridge and placing the call, this isn’t always going to be necessary. Once everyone has a connection all the time VOIP is going to become easy, and profitable as email.

Digital Media Faster connections means you can also start doing more things with digital media. Having warehouses and inventory is expensive, and it gets a lot more expensive if you try to scale it. As a result selections aren’t always the best, i.e. you get the current best sellers. By digitizing media you slash your costs and the marginal cost of supplying an additional title is so low you can start taking advantage of the long tail, i.e. low demand titles, but by selling enough low demand titles you make enough money for it to be worthwhile. He estimates the long tail may increase revenues by 30%.

There is still a way for this though because the connections still aren’t fast enough. He used Netflix, an online DVD rental company as an example. They ship 1.7m DVDs a day. Assuming each movie only has one DVD and a steady data flow, that requires a 740 Gbit/s connection! Maybe the movie industry will want to reconsider the Bittorrent protocol and the DivX codec. ;)

Vint then went off for a bit about turing online gaming in a potential video conferencing platform. An odd idea, but Joi Ito agrees, so maybe I’m the one who doesn’t ‘get it’.

Mobiles and Mobility Back to mobiles. There are two billion of them out there, most are a general purpose programming device just waiting for the right software. They’ll stress the architecture of the net, I assume through sheer numbers if they all went online. They have plenty of potential uses. They’re already used for instant message (SMS) and as payment systems (also SMS), but combine them with geolocation services and they get a lot more interesting. They’ll create a great opportunity to monetize information that has been indexed with geographic information.

IMHO there’s so much scope of what you could do with just local search, but throw in a portable handset and geolocation and the sky’s the limit. It might be a bit sacrilegious to post Yahoo! stuff in the post but check out Checkmates and Event Browser, okay the latter isn’t for phones but imagine being able to use your phone to find something happening in your area, then get all your friends to meet up. And that’s just the tip of the iceberg, but enough about that, back to the talk.

New Business Models I think this is the only point Vint talked about Google. They’ve have introduced new business models, the main one of course is making online advertising work. So much so Google is now known as an advertising company rather than a search company in certain circles.

With their APIs they’re trying to expand their advertising space rather than charging directly for the services. e.g. you have a database of information and make a Google Maps mashup, then place Google ads on the site. You get make a cool app fairly easily and Google gets its value from having more ads out there.

I’d be interested to know if Google’s main business plan is to simply create more places to put adverts, or if they plan to come up with different revenue streams.

Back to Basics Here’s the bit for the PhD students. Vint brought up potential research areas of how he’d design the Internet if given the chance again today:

  • End/end connectivity - Basically get rid of NAT
  • Authentication - Get the end points of the network to authenticate themselves
  • Flexible VPN membership - Make it part of the Internet rather than a bolt on
  • Mobility (and spectrum sharing) - Consider a Boeing plane with Internet access, it’s basically a whole network that is getting connected to different points as if flies
  • Confidentiality
  • Early vs. Late Binding of DNS - Late binding is better for mobiles
  • Broadband everywhere - Assume faster connections

Policy and Governance I didn’t take too many notes during this bit. Something about a World Summit creating a working group, which then decided to hold a forum, and all the people involved were shocked that something like the Internet was under central control.

But it wasn’t all negative, he reckons with such a forum they might be able to start tackling problems like SPAM and fraud. Maybe.

Internet Enabled Appliances More and more non-computers are connecting to the net. One particularly cool example was an Internet enabled photo frame that you can give to your family, then update the photos on a website. But as more devices get connected we will have to make the jump to IP6 just to have enough addresses for them all.

RFID also makes things a lot more interesting by letting things collobrate. e.g. your Internet enabled fridge can identify what you have, check a recipe online, then text you what things you need to buy. Or maybe just bypass you can order them from the supermarket itself.

We’ll see a shift in how we use the Internet from accessing it through one terminal to using multiple devices that collobrate.

InterPlanNetary Intenet (IPN) An interesting project, but ultimately related to the Internet only by name. Currently things we send into space have custom networking hardware and protocols for efficiently reasons. The downside to this is that you cannot reuse equipment left in space for future missions.

Vint is working at the Jet Propulsion Labs on a standard networking protocol for things we send into space. So each mission means we’ll have another node in the network that can be used for things such as communication, sensor readings, etc. It looks like NASA is going to adopt this for their missions for the next 40 years.

Q&A There was a brief Q&A session, nothing of note except for a book recommendation, The Singularity Is Near. Amazon describes it as:

Inventor and futurist Ray Kurzweil examines the next step in the evolutionary process of the union of human and machine. Kurzweil foresees the dawning of a new civilization where we will be able to transcend our biological limitations and amplify our creativity, combining our biological skills with the vastly greater capacity, speed and knowledge-sharing abilities of our creations. In practical terms, human ageing and illness will be reversed; pollution will be stopped and world hunger and poverty will be solved. There will be no clear distinction between human and machine, real reality and virtual reality. “The Singularity is Near” offers a view of the coming age that is both a dramatic culmination of centuries of technological ingenuity and a genuinely inspiring vision of our ultimate destiny.

I might pick up a copy, but I’ve been warned it’s quite long and goes into a lot of detail.

After the talk there was drinks and food. As I mentioned in an earlier post I didn’t get a chance to meet any one from Google but did catch up with a few people from Imperial. Cheers Google for hosting the night.

Here’s my friend Simon’s take on the evening, and here is another writeup of the same talk given at Imperial College early that day.

They filmed the talk but I haven’t been able to find the video online. If I find the link, I’ll post it here.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Tracking the Internet into the 21st Century digg.com digg it!  |  reddit reddit!

Google Talk Aftermath

Tuesday, March 7th, 2006

I’m confused. I’m not sure what I just attended. Don’t get me wrong, it was an interesting talk by Vint Cert (which I’ll write up tomorrow), but I thought I was walking into a recruitment event. I even have the pad of paper with www.google.com/jobs printed along the bottom. Maybe I’ve been spoilt by the investment banking recruiting events I went to during university, but this just seemed like a missed opportunity for Google. You’re struggling to fill your London R&D centre, you get a hundred or more smart computing people in one room, why aren’t we being told about the great place Google is to work at, and all the interesting things we will be working on?

There were between five and ten Google employees there, I didn’t get a chance to speak to one of them but from what I overheard they were talking about Google in general rather than what is actually planned for the London office. I think part of the reason why I’m so disappointed was that I wanted to be tempted from my current job. I wanted to see this great office (it is very nice), with excited people, working on interesting problems, but I didn’t. From what I’ve read in the press and their job listings, Google plans to focus on mobile Internet in London but nothing tonight confirmed that.

On the bright side I did catch up with several people from my old department. The other reason I went along was to see if I could find any potential hires for the positions at Runtime, but that was a no go. There were some other people scouting too, and I found out from them that after the dot-com crash the number of computing graduates has fallen and a lot of them (from Imperial at least) are getting hovered up by banks. I guess that is the problem Google is having too. Straight after university people join an investment bank, get paid a lot of money, then decide they don’t like the work and quit but aren’t willing to take the pay cut to join a software company. Annoying but the way it is in London.

Well, that’s enough ranting for one night.

Update: I was wondering why there weren’t any Imperial undergraduates at the talk (just postgraduates). It turns out Vint Cerf gave the same talk a few hours earlier at IC.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Google Talk Aftermath digg.com digg it!  |  reddit reddit!

Google Open House - Vint Cerf

Tuesday, March 7th, 2006

Tonight I’m going to a talk by Vint Cerf at Google’s London office. It’s a great chance to hear one of the ‘founding fathers of the Internet’ speak, but it looks like it’s also going to be a recruiting event.

I’d be lying if I said the idea of working at Google wasn’t an attractive one, but their current engineering jobs in London aren’t geared towards my specialities, unless of course I make the switch to mobile technologies. Ironically at my current job at Runtime Collective, I get to work on more search related things than those listed at Google.

My friend Simon is going too, and I’m sure there will be other old faces from Imperial. I’ll write up the talk here either later tonight (unlikely if I manage to catch the end of the Barcelona/Chelsea game) or sometime tomorrow.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Google Open House - Vint Cerf digg.com digg it!  |  reddit reddit!

Yahoo! MyWeb 2.0 and 360° Update

Wednesday, March 1st, 2006

I’ve been using Yahoo!’s MyWeb 2.0, and 360° to a lesser extend, for a while now. I’m not overwhelmed by either.

MyWeb 2.0

I’ve made it my default search in Konqueror so virtually all my searches have been with it. In two weeks I’ve only saved six pages. I don’t want to save a page unless I feel it’s useful, but by the time I determine that, I’ve long left the search results page. If it integrated into your browser’s bookmarks, I think it would be a lot more useful, but it’s still just a bookmark tool.

It doesn’t use your saved pages or searches you’ve done to personalise your search in anyway, which is disappointing. It’s literately a bookmarking tool. It doesn’t suggest other pages you might be interested in, or other people you might want to add to your community. So unless you know a lot of people using MyWeb 2.0, you’re never going to build a web big enough to restrict your searches to it.

There are a few other nuisances as well. You can edit the tags of your saved pages, but not the notes. Most of the pages I just saved and planned to come back to annotate them, so it was frustrating to find that I couldn’t edit the description. MyWeb 2.0 is also limited to the American search so I couldn’t limit to UK sites, which is something I do quite often. I also found when I was looking for less common information, Google is still miles ahead of Yahoo!’s search.

360° Yahoo! 360° is essentially a homepage maker for the social networking world. It gives you a blog, lets you put up links, photos, etc. It wasn’t what I originally thought it was and since it’s more geared to community and content generation I decided to give it a miss.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Yahoo! MyWeb 2.0 and 360° Update digg.com digg it!  |  reddit reddit!

Collaborative Searching

Tuesday, February 14th, 2006

I’ve been asked to come up with ideas for improving searching when lots of people are involved. Actually it was more like ‘MMORPGs are great, apply their ideas to search’. A bit vague but I get the gist, basically what is it that keeps people coming back to MMORPGs? The obvious answer is they’re fun. Other answers could be:

  • Sense of community
  • Discovering things new
  • Reward for effort
  • Meeting place
  • Meeting new people

Keep in mind I haven’t played a MMORPG since the Ultima Online beta, and I was back in school then. World of Warcraft looks like to be the current ‘in thing’, but no computer I own can run it.

Applying these ideas to search is hard because I think of search as a solitary activity and you have a good idea what you’re looking for. Sure there are things like delicious, digg, etc. but those aren’t so much search, but more just collaborative filtering. My current domain of expertise is crawling the web and building searches.

In the Web 2.0 world the closest thing I can find is Yahoo’s MyWeb 2.0:

http://myweb2.search.yahoo.com

It’s a search engine where you can build up a collection of pages to form your own web. That by itself isn’t too exciting since it’s just a glorified bookmarks tool with a search across it. What’s more interesting is the community web feature. You can hook up with people with similar interests and form your own web of interesting pages. This puts a different spin on the whole delicious/digg thing, so instead of topical pages you can build up a database of useful information.

I think applying this to the web isn’t going to take off because there is so much information out there you’re unlikely to want to limit yourself, and people in your community are going to be covering a wide range of subjects. But the concept is good and the search engines I write are for corporate customers so when the community is a research team, this idea begins to make a lot more sense.

So I’m going to make Yahoo! MyWeb my default search engine for a while and see what ideas come from it.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Collaborative Searching digg.com digg it!  |  reddit reddit!

Google tracking result choices

Friday, January 21st, 2005

It looks like Google is starting to track which search results people select. This technique is used in many enterprise search products to allow the administrator to help users refine searches, create synonym lists, etc.

Googlerecordingchoices

They aren’t doing it for all searches but I found one query. If you search for ‘oyster bay wines’ on Google UK, the result links will go to a preprocessor rather than directly to the site. Check out the status bar in the image:

I would be surprised if Google has started to manually analyze their searches so I assume they now have some clever software that does it for them.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Google tracking result choices digg.com digg it!  |  reddit reddit!

Yahoo! Once a portal, always a portal

Friday, January 21st, 2005

Yahoo! has had a separate search page for a while. Initially it was a copy of the simplicity of the Google home page (see Google’s reasons, although I’ve also heard it’s like that because the founders weren’t particularly good at HTML). I always thought that was a good idea because the main Yahoo! page has become so cluttered over the years it’s now annoying to use.

But it looks like they couldn’t stay away. The Yahoo! search page now features news headlines and some stock information. I wonder how long it will be until the two pages are identical.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Yahoo! Once a portal, always a portal digg.com digg it!  |  reddit reddit!