|
Kate Noerr: Federated search luminary.By Sol http://federatedsearchblog.com/2008/08/08/kate-noerr-federated-search-luminary/ August 8, 2008
I am honored to have had the opportunity to interview Kate Noerr for the federated search luminary series. Kate is co-founder, Chairman, and CEO of MuseGlobal, a leading supplier of content integration software. Kate, through MuseGlobal as well as in prior businesses, has been developing innovative solutions to what are now termed "federated search" problems since the 1980s. Kate is not satisfied to address only the challenges of federation. Her company considers these interrelated areas to be critically important as well: harvesting, transformation, enhancement, security, source maintenance, and multiple delivery mechanisms. Kate is the first luminary that I recognize in this blog. You can find future luminary interviews in the luminary category. I invite you to nominate people who deserve to hold the federated search luminary distinction.
My experience over many years has focused around content and search ("information retrieval" in the olden days). I've dealt with document delivery (before the Internet), with indexing solutions, with extended networks providing content of various sorts, and with libraries. A word about libraries: libraries are extensive users of content of all sorts, whether that content is contained within a book or other medium, or through a wide variety of journals and other sources of information. As technology has advanced, so has the technology applied to providing automation solutions for libraries. Thus I (and most of the executive team at MuseGlobal) have broad and deep expertise in understanding and working with all forms of content, including access to that content - 'search'. The problem of content in multiple repositories has always been an obvious one, and we built MuseGlobal to solve not only that problem, but all the issues associated with it. These include integrating into third party solutions, authentication and other rights management concerns, and of course, processing the results, both iteratively and on a one-time basis, in both syntactic and semantic forms, and then delivering the results to a platform. Well, the need for access to multiple content sources has always existed. The early online aggregators (Dialog, Orbit, BRS) all tackled this problem, and we (the founders of MuseGlobal) also tackled the problem in multiple ways. In fact, an early business of mine, well before the Internet, was to provide what we would now call 'federated' access to standards information, which was delivered in an updated form on a floppy disk and delivered by post. (Note: this was a great idea, and a precursor to much of what we do now, but it was the 1980s, and much too early to be widely accepted). What is now called 'federated search' is to me, only one aspect of what needs to be done to successfully gather, transform, and deliver content. For a while, in the late 1990s (and well before, in the library space), some of the aspects of federation were called cross-search, or broadcast search, and there was a concerted attempt to use the term metasearch, which of course, is now used mainly to refer to web federation. Available url! Although initially Muse was an acronym, and I've long since forgotten what it stood for. [ Editor's note: Those of you who are dying to know what the "Muse" acronym stands for are invited to do a little bit of research on the Wayback machine! ] We are unique primarily because we offer the most complete and flexible content integration solution in the marketplace. Our functionality encompasses seven areas: federation, harvesting, transformation, enhancement, security, source maintenance, and multiple delivery mechanisms. In addition, there are several other elements to our technology that are exclusive to MuseGlobal, including: § Our solutions are designed to address integration at the back-end and as such, have a very robust Information Connection Engine (ICE) underlying our architecture. § We built the Source Factory, a constantly expanding library of more than 5,500 content sources, to handle building, fixing and delivering source connections (variously called connectors, adaptors, workers, etc., depending on the company). This is a highly scalable workflow system which automates checking all the many thousands of connectors we've built on a daily basis, and automating much of the building, diagnosis of fixes, and some fixing of the tools. § We deliver 'fixes' in an automated broadcast way, similar to the way anti-virus definitions are sent out. As you know, it is anticipated that a connector will 'break', because of a simple thing such as a url change, or a more complex thing, such as a switch-out of a search engine, a format change, an internal record structure change, etc. § Early on we started what we call the Content Partner Program, which is non-commercial relationships with hundreds of content providers, to ensure that we have access to their most appropriate api or gateway, to ensure that results from their content are served up to the user in the way the content provider wants and the users expect, and to proactively be informed about forthcoming changes to their environment. Wonderful! The first few sales are closing now, and Adhere and MuseGlobal anticipate a continued highly successful partnership. We have dealt with the cloud from the beginning, and our ICE architecture (as above) is fundamental in supporting that. By the way, I love the term cloud computing - it conjures up all sorts of lovely images. And of course, the data center in the sky (cloud) is a variation on the asp model. I see this as another opportunity for us, for those organizations using cloud computing, considering it, or offering it. We deal with a number of companies along these lines, and technically it's not really different from what we've been doing on the web all along. To continue to grow and be successful with multiple lines of business, which we have achieved to date, and expect to continue to achieve. We will continue to be an OEM provider, as it is my firm belief that we are but part of an overall solution, whether that's a CRM system, a content management system, an integrated library system, or any other system or service. To my mind, the biggest challenge is education of the marketplace to understand what problems federated search (and I would say content integration more broadly) addresses, and how that marries with their total environment. There are a large number of activities underway in the marketplace in general, particularly for the enterprise, which is great news. In considering your recent blog on challenges to federated search, here are my comments. One of the fundamentals of federated search is normalizing search queries across different search engines, which for MuseGlobal, means mapping that search query in all its simplicity or complexity to the underlying (multiple) search engines. This involves a true understanding of search, and an ability to represent boolean where that doesn't exist, and to expand where needed, up to the limits of whatever the search engine accessing the content can handle or be 'forced' to handle. An even more important aspect is to map the content, in as rich a form as is possible and therefore to deliver up to the end-user a full experience, and not a 'dumbed-down' version of a search or a result. This process can be enhanced where standards are used and well-implemented, although as is well known, standards use tends to vary, but standards themselves are critical, whether we're talking SRU/SRW, XML and its variants, even http. While it may be that the some think the 'ideal' situation is everything in one repository, that will assuredly never happen, under any circumstances. I also don't happen to think that's an ideal situation. Why would anyone want everything in one vast repository? What happens to all the specialized information, to all your own unique information, to sensitive commercial (and other) information, and on and on. Google Scholar is one of many many aggregations of collections of information. That's fine, and a great thing to do, but it does not and will never contain all the world's scholarly information, nor does it attempt to do so. It's one of the thousands of sources to which we provide a very well-mapped connector, which combined with many other sources, will provide a more complete results set to the users. It's also well known in the library world that libraries are way under-utilized, but my point on the biggest challenge being education of the marketplace is aimed at the non-library marketplace, whether it's education (as in Blackboard, etc.), or enterprise search (as in FAST, Endeca, Oracle's Secure Enterprise Search, IBM's OmniFind), content management systems (EMC's content suite, etc.) or CRM systems (SalesForce, etc.). The world at large, so to speak, is beginning to understand and to address the issues surrounding multiple silos, and I consider this fabulous news. This I believe is what the Federated Search blog refers to as the 'new e-content business environment'. This gives enormous opportunity to technologies such as ours, and it's what I find most satisfying and thrilling about building a company. The world is coming to our door! Hard to say - perhaps that we've built a company and a technology which are both very scalable, and adaptable to a wide variety of lines of business. We have several very large initiatives underway that will be revealed in the next two quarters. Thank you, Kate, for a most informative interview that I'm sure blog readers will value.
|
||||||||||||||