Lucene Search for multilingual websites or separating searches for the nodes

I searched on internet but dident got the proper solution for using Lucene Search for multilingual websites , so i decided to make this tutorial to help others...
In this tutorial i will show you that how can we separate searches for multilingual websites using lucene search in umbraco.
lets assume that our website is in 2 languages i.e. in english & in danish
Then we will have our nodes in umbraco some thing like this...
>EN   
      >>page1
      >>page2
      >>page3
>DK
      >>Page1
      >>Page2
      >>Page3
 EN node is home page of english website & DK node is home page of danish webiste....

to make searches separate we need to create 2 separate index i.e. one index for each language.
Here we will create indexes in a way that they will index each language home page node & its all childs.
We can do it some thing like this....


<IndexSet SetName="ENIndexSet" IndexPath="/App_Data/ExamineIndexes/EN/" 
IndexParentId="1165">
    <IndexAttributeFields>   <!--1165 is id of the "EN" node -->
       <add Name="id" />
      <add Name="nodeName" />
      <add Name="updateDate" />
      <add Name="writerName" />
      <add Name="path" />
      <add Name="nodeTypeAlias" />
      <add Name="parentID" />
    </IndexAttributeFields>
    <IndexUserFields />
    <IncludeNodeTypes />
    <ExcludeNodeTypes />
  </IndexSet>
  
  <IndexSet SetName="DKIndexSet" IndexPath="/App_Data/ExamineIndexes/DK/" 
IndexParentId="1624">
    <IndexAttributeFields> <!--1624 is id of the "DK" node -->
      <add Name="id" />
      <add Name="nodeName" />
      <add Name="updateDate" />
      <add Name="writerName" />
      <add Name="path" />
      <add Name="nodeTypeAlias" />
      <add Name="parentID" />
    </IndexAttributeFields>
    <IndexUserFields />
    <IncludeNodeTypes />
    <ExcludeNodeTypes />
  </IndexSet>
 
By doing like this what will happen is that indexset named ENIndexSet will only index the "EN" node & its all child nodes &  DKIndexSet will only index the "DK" node & its all child nodes.

also we need to create providers & searcher  ...its is like this ....separate providers & searchers for each index..
Providers...
<add name="ENIndexer" 
type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
indexSet="ENIndexSet"
supportUnpublished="false"
supportProtected="true"
interval="10"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
 
<add name="DKIndexer" 
type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
indexSet="DKIndexSet"
supportUnpublished="false"
supportProtected="true"
interval="10"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>


 Searchers...
<add name="ENSearcher"  
type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" 
indexSet="ENIndexSet"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"  
enableLeadingWildcards="true"/>
<add name="DKSearcher" 
type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"  
indexSet="DKIndexSet"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" 
enableLeadingWildcards="true"/>

 Now we need to create two user controls for each language for taking search words from users ...
on that page load of the user control you can call there corresponding search result  page .
The search result  page will call the xslt macro which will perform the operation of searching ....
the xslt will be something like this ....

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
  <!ENTITY nbsp "&#x00A0;">
]>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:msxml="urn:schemas-microsoft-com:xslt"
  xmlns:umbraco.examine="urn:umbraco.examine"
  xmlns:umbraco.library="urn:umbraco.library"
  xmlns:Exslt.ExsltCommon="urn:Exslt.ExsltCommon"
  xmlns:Exslt.ExsltDatesAndTimes="urn:Exslt.ExsltDatesAndTimes"
  xmlns:Exslt.ExsltMath="urn:Exslt.ExsltMath"
  xmlns:Exslt.ExsltRegularExpressions="urn:Exslt.ExsltRegularExpressions"
  xmlns:Exslt.ExsltStrings="urn:Exslt.ExsltStrings"
  xmlns:Exslt.ExsltSets="urn:Exslt.ExsltSets"
  exclude-result-prefixes="msxml umbraco.examine umbraco.library Exslt.ExsltCommon Exslt.ExsltDatesAndTimes Exslt.ExsltMath Exslt.ExsltRegularExpressions Exslt.ExsltStrings Exslt.ExsltSets ">


  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:param name="currentPage"/>

  <!-- Get the search term from the query string-->
  <xsl:variable name="searchTerm" select="umbraco.library:RequestQueryString('s')" />

<!-- where "s" is the querystring passed from the user control while redirecting to the search page -->

  <xsl:template match="/">

    <!-- Check if there's a search term to search on-->
    <xsl:if test="string-length($searchTerm) > 0">

      
      <!-- Get the search results from examine -->
      <!--************************ THIS IS WHERE THE MAGIC HAPPENS *****************************-->
            
 <xsl:variable name="results" select="umbraco.examine:SearchContentOnly($searchTerm,true,'ENSearcher')"/>

      <!--************************ END OF MAGIC ************************************************-->
      
      <p>
       &nbsp;&nbsp;&nbsp;&nbsp; Search text:&nbsp;<b>
          <u>
            <xsl:value-of select="$searchTerm"/>
          </u>
        </b>&nbsp;&nbsp;<i>
          <b>
            <xsl:value-of select="count($results//node)"/>
          </b>&nbsp;result(s)
        </i>
      </p>

      <xsl:if test="count($results//node) > 0">

        <!-- there is a result, so show them in the order of best score -->

        <ul>
          <xsl:for-each select="$results//node">
            <xsl:sort select="number(@score)" order="descending"/>

            <li>

              <!-- add alternating colors -->
<!--              <xsl:choose>
                <xsl:when test="position() mod 2 ">
                  <xsl:attribute name="class">secondary-a-5</xsl:attribute>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:attribute name="class">secondary-b-4</xsl:attribute>
                </xsl:otherwise>
              </xsl:choose>  -->

              <!-- Get the URL -->
              <xsl:variable name="url" select="concat('http://', umbraco.library:RequestServerVariables('SERVER_NAME') , ':' , umbraco.library:RequestServerVariables('SERVER_PORT') ,umbraco.library:NiceUrl(./data[@alias='id']))" />

              <!-- Create the search result line item-->
      <!--        <a href="{$url}">
                <span class="title">
                  <xsl:value-of select="./data[@alias='nodeName']"/>
                </span>
                <span class="link">
                  <xsl:value-of select="$url"/>
                </span>
                <span class="score">
                  <xsl:value-of select="./@score"/>
                </span>
              </a> -->
              <div>
                <div id="searchTitle"><a href="{$url}"><xsl:value-of select="./data[@alias='nodeName']"/></a> </div> <div id="searchTime"> <!-- <xsl:value-of select="./@score"/> --> </div>
              </div>
              <xsl:value-of select="umbraco.library:TruncateString(./data[@alias='content'], 360,'...')" disable-output-escaping="yes"/>
              <!--<xsl:value-of select = "./data[@alias='content']"/> -->
              <div id="searchLink">
                <a href="{$url}"><xsl:value-of select="$url"/></a>
              </div>
            </li>

          </xsl:for-each>
        </ul>
      </xsl:if>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>


This is the xslt to search for the words in the english webiste.....

To create xslt for danish site you just need to change  ENSearcher  to DKSearcher  which is the name of the searcher.

Like this we can separate the searches using Lucene Search for multilingual websites.

For any doughs & suggestions do comments here  .....

1 comments:

Anonymous said...

It's great to optimize a multilingual website, i now realized that Google translate or any automated translation is not too good..

Multilingual websites

Post a Comment