Google Desktop for Linux With Apache2 On LAN

火星人 @ 2014-03-04 , reply:0


Google Desktop for Linux With Apache2 On LAN

Google Desktop for Linux With Apache2 On LAN

前言:
    在兩年前第一次試作將google desktop與apache結合用於LAN的文件搜索,原文見這裡《第一次原創:使用Google桌面搜索打造企業搜索伺服器》http://blog.chinaunix.net/u/13472/showart.php?id=73880
    當時for linux的google desktop還沒有出來,讓我的samba文件伺服器沒有了集成的搜索服務可用,可謂望眼欲穿。等到for linux出來后,發現居然不支持搜索MS專有格式文檔,又失望了很一段時間。終於,終於等到了google desktop for linux v1.1.1.0075,支持DOC、XLS、PPT的索引支持了,所以就搗鼓著一定要將它放置在我的samba伺服器上,在提供samba服務的同時也提供一個簡單的搜索伺服器。
正文:
    原理和前文一樣,依靠apache來代理google desktop。前文中提到需要埠映射器,經過後來的搜索,原來是缺少了設置反向代理所致,即在ProxyPass後面再接一個ProxyPassReverse代理就可以避免了。所以,現在與apache結合的google desktop已經不需要客戶端做任何設置了,有一個瀏覽器就足夠了,而文件瀏覽器足夠充當這個角色了。
    如果這個apache沒有其他用途,如前文,給伺服器分配第2個ip專門用來處置這個google desktop代理,簡單的配置文件如下:

NameVirtualHost 192.168.1.120:80
<VirtualHost 192.168.1.120:80>
        ServerAdmin webmaster@localhost
       ServerName 192.168.1.120

       ProxyPass // http://127.0.0.1:30043/
#註:這裡的30043埠每個linux用戶是不同的,需要提前在桌面上記錄google desktop的起始頁面。
        ProxyPassReverse // http://127.0.0.1:30043/
       <Proxy http://127.0.0.1:30043>
               Allow from all

       </Proxy>

        <Directory />
                Options FollowSymLinks
                AllowOverride None
                Allow from all   
         
        </Directory>               

        <Location /redir>
                Deny from all
        </Location>

        <Location /openfolder>
                Deny from all
        </Location>

</VirtualHost>

    在重啟apache前還需要修改apache的運行用戶為google desktop的運行用戶,這是因為google desktop的索引文件都是針對單個linux用戶可讀的,其他用戶都不可讀,所以用其他用戶啟動的apache是不能讀取google desktop的數據的,也就無法代理了。
    修訂好這一切,apache重啟后,通過http://192.168.1.120/XXXXXXXX(後面省略的是google desktop的起始地址,每個linux桌面用戶的都不同)就可以訪問On LAN上的google desktop。
    下一步,我試作將這個On LAN的google desktop集成進文件伺服器,畢竟去記住那串後綴地址還是很困難的,所以有必要把這個首頁文件存放在文件伺服器上,通過文件伺服器訪問到這個文件后就可以點擊首頁文件打開搜索代理伺服器了。
    這裡需要注意的是,簡單的將首頁保存下來的文件中由於相對地址的原因,通過文件伺服器啟動的首頁文件不能進行搜索,所以我做了這樣的改動:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

"http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<meta http-equiv="content-type" content="text/html; charset=utf-8">

<meta http-equiv="cache-control" content="no-cache">

<meta http-equiv="pragma" content="no-cache">

<meta http-equiv="expires" content="-1">

<title>Google 桌面</title>

<style>

body,p,td{font-family:arial,sans-serif;color:#000}body{background-color:#fff;margin:4px}img{border:0}table,td{border:0;margin:0;padding:0}.nowrap{white-space:nowrap}.none{display:none}.inline{display:inline}.float_left{float:left}.logo3{margin-top:9px;padding-bottom:10px}a:visited{color:#551a8b}a:link{color:#00c}a:active{color:#00c}a:hover{color:#00c}

.q{color:#00c;padding:4px 0 4px 4px;margin:0;white-space:nowrap}.q a:visited{color:#00c}.q a:link{color:#00c}.q a:hover{color:#00c}.q a:active{color:#00c}span{margin:0px}div{border:0;margin:0;padding:0}div#basic{margin:7px}div#advanced{margin:7px}div#search_box{padding-top:30px;padding-bottom:30px}div#line{background-color:#39c;height:1px}div#bottomquery{background-color:#e8f4f7}

div#querybuttons{padding-top:20px;padding-bottom:20px;text-align:center}div#bottom_links{text-align:center;font-size:small;padding-bottom:80px;white-space:nowrap}p#copyright{padding-top:3px;font-size:x-small;white-space:nowrap}div#home_bottom span#homelink{display:none}div#pref_bottom div#bottom_links{padding-bottom:10px}h1{color:#335cec;font-size:large;font-weight:bold}

div.centerwarning{text-align:center}h4#fixmsg,h4#lowdisk{color:#f60}

input#q { margin-bottom:1px }

div#idxprogress {

text-align: center;

color: #f60;

}

h4#idxongoing {

padding-top: 5px;

padding-bottom: 5px;

}

</style>

<script>

<!--

function sf() {

document.f.q.focus();

}

function sw() {

window.location = "http://www.google.com/search?sourceid=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&sa=N&tab=xw&q=" + encodeURIComponent(document.f.q.value);

}

-->

</script>

</head>

<body onLoad=sf()>

<center>

<br>

<img src="image/hp-logo.gif?hl=zh_CN"

width=276 height=110 alt="Google 桌面">

<br><br>

<form name=f action="http://192.168.1.120/search" method=get>

<input type="hidden" name="hl" value="zh_CN">

<input type="hidden" name="s" value="IKfIRNbuy8oqOJPMZBNzffceB6c">

<div class="q">

<style>TD.q {white-space: nowrap}</style><style>#lgpd{display:none}</style><script defer><!--
function qs(el){if(window.RegExp&&window.encodeURIComponent){var ue=el.href,qe=encodeURIComponent(document.f.q.value);if(ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;}
//-->
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><a class=q href="http://www.google.com/webhp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xw" onclick="return qs(this)">網頁</a>&nbsp;&nbsp;&nbsp;&nbsp;<a class=q href="http://images.google.com/imghp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xi" onclick="return qs(this)">圖片</a>&nbsp;&nbsp;&nbsp;&nbsp;<a class=q href="http://groups.google.com/grphp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xg" onclick="return qs(this)">論壇</a>&nbsp;&nbsp;&nbsp;&nbsp;<a class=q href="http://news.google.com/nwshp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xn" onclick="return qs(this)">資訊</a>&nbsp;&nbsp;&nbsp;&nbsp;<a class=q href="http://ditu.google.com/maps?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xl" onclick="return qs(this)">地圖</a>&nbsp;&nbsp;&nbsp;&nbsp;<b>桌面</b>&nbsp;&nbsp;&nbsp;&nbsp;<!--ENTERPRISE--><b><a href="http://www.google.com/intl/zh-CN/options/" class=q>更多&nbsp;&raquo;</a></b></font></td></tr></table></div>

<table cellspacing=0 cellpadding=0><tr><td width=25%>&nbsp;</td>

<td align=center>

<input id="q" maxlength=512 size=55 name=q value="" title="Google 桌面"><br>

<input type=submit value="搜索桌面">

<input type=button value="搜索網路" onclick=sw()>

</td>

<td valign=top nowrap width=25%><font size=-2>

&nbsp;&nbsp;<a href="http://192.168.1.120/options?hl=zh_CN&s=KfaPHSQRTXpTQAiCn0SSk8QuG2U">桌面使用偏好</a><br>

&nbsp;&nbsp;<a href="http://192.168.1.120/adv?hl=zh_CN&s=RInjEvQpp_ec0xuHJfV2RMld_nM">高級搜索</a><br>

</font></td>

</tr></table>

</form>

<br>

<div class="centerwarning">

<br>



</div>



<br>

<div id="home_bottom">

<div id="bottom_links">

<span id="homelink">

Google 桌面主頁

-&nbsp;

</span>

<a href="http://192.168.1.120/status?hl=zh_CN&s=1lSl07UkINdDG1B8XKs5981ICpM">索引狀態</a>

-&nbsp;

<a href="http://192.168.1.120/privacy?hl=zh_CN&s=7MTauqWsrjfyih2nmolpmXA1mfc">隱私權</a>

-&nbsp;

<a href="http://192.168.1.120/about?hl=zh_CN&s=9H5yl44biBXVBAjJUC77vy7RfPY">關於</a>

<p id="copyright">&copy;2007 Google</p>

</div>

</div>

</center>

</body>

</html>

上面含有的「http://192.168.1.120」字串都是我改動添加上去的,如此啟動的首頁文件便可以觸發搜索代理伺服器。

    我的基本要求達到后,還是沒有達到我的預期。因為我的伺服器本身啟用apache的原因是為了提供samba文件伺服器的跨網段web訪問,所以前面那個首頁文件也可以被原來的apache訪問到,但是卻不能提供搜索服務(我的ip地址有限,不能夠把內網地址全部映射出去的)。所以,接下來,對上面的設置適當加以改造,讓它適合互聯網應用。
    顯然,不能代理成根目錄了,因為根目錄要用來當作文件伺服器的首頁,所以就把它代理到/googlesearch,所以代理部分的內容就變成了:

       ProxyPass /googlesearch/ http://127.0.0.1:30043/
        ProxyPassReverse /googlesearch/ http://127.0.0.1:30043/
       <Proxy http://127.0.0.1:30043>
               Allow from all
       </Proxy>

        <Location /googlesearch/redir>
                Deny from all
        </Location>

        <Location /googlesearch/openfolder>
                Deny from all
        </Location>

    這樣的代理可以打開主頁但是根本不能展開搜索,原因是google desktop啟動搜索的時候的url地址都是從根/search開始的,所以,需要進行URL重寫,如下:

       RewriteEngine On
        <Directory /Fileserver>
#  /Fileserver目錄是DocumentRoot目錄;
                Options Indexes FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
              RedirectMatch ^/search /googlesearch/search
        </Directory>

    終於,google desktop被集成進apache了。最後一步,修改主頁文件,另存為/Fileserver/文件搜索/目錄下的index.html,以保證apache訪問到該目錄時直接打開首頁文件。
    首頁文件的修改很簡單,把上面的http://192.168.1.120全部替換成「/googlesearch」就可以了。

章節附註:
    目前殘留的問題就是將搜索出來的文件打開的問題,上面的處理都是簡單的屏蔽,要實現如DNKA一般的效果需要採用輸出重新,我這裡簡單把mod_sar的說明貼在這兒。

NAME
mod_sar - apache2 module which works as output filter and it's
          purpose is to Search And Replace strings found in web
          content before it's sending to the client.


COMPILE
mod_sar can be compiled with apxs(8) or manually by hand.

1. Using apxs for compilation:
apxs -c mod_sar.c

If everything goes fine, you will find mod_sar.so under .libs in your
current directory.

2. Compiling mod_sar manually:
gcc -pthread -I/usr/include/httpd -c mod_sar.c
gcc -shared mod_sar.o -Wl,-soname -Wl,mod_sar.so -o mod_sar.so

If needed, modify path to your httpd include directory and if everything
goes fine, you will find mod_sar.so in your current directory.


INSTALL
mod_sar can be installed with apxs(8) or manually by hand.

1. Using apxs for instalation:
This command will compile and install your mod_sar module.
apxs -i -a -c mod_sar.c

Restart apache by first stopping it and then starting it:
apachectl stop
apachectl start

2. Installing mod_sar manually:
cp mod_sar.so /usr/lib/httpd/modules
chown root: /usr/lib/httpd/modules/mod_sar.so
chmod 755 /usr/lib/httpd/modules/mod_sar.so

If needed, modify path to your httpd modules directory.
Now, you have to modify your httpd.conf file. Find the bunch of
LoadModule directives and append your own line under them:
LoadModule sar_module modules/mod_sar.so

Restart apache by first stopping it and then starting it:
apachectl stop
apachectl start


DESCRIPTION
mod_sar ("sar" stands for Search And Replace) is apache2 module which
works as output filter. It's purpose is to search and replace strings
found in web content before it's sending to the client.
Search performed can be case sensitive or case insensitive, depending
on configuration.
Perfect example of common usage of this module is reverse proxy.

Reverse proxy is proxy in front of the local server, which can be
accessed from Internet only trough that proxy. In some cases such
configuration can be used effectively to prevent worms and other
unwanted guests but most commonly it just present a false layer of
security for those who do not understand server - client communication.

Whatever reason you have, for usable reverse proxy you will have to
solve two problems: modification of headers and modification of
content before it's sending to client.

1. Header modification
Header modification is not problem at all. It can be achieved two
ways.
You can use mod_proxy_http:
    <IfModule mod_proxy.c>
        <Proxy *>
            Order deny,allow
            Allow from all
        </Proxy>
        ProxyRequests On
        ProxyPass / http://some-domain.local/
        ProxyPassReverse / http://some-domain.local/
        ProxyErrorOverride On
    </IfModule>
Or, you can use mod_rewrite:
    <IfModule mod_rewrite.c>
        RewriteEngine on
        RewriteRule ^/(.*) http://some-domain.local/$1
        RewriteOptions inherit
    </IfModule>

2. Content modification
Header modification will make all relative links look like they are
coming from external domain some-domain.com instead of real, local
domain some-domain.local. But if server behind reverse proxy the
serves pages with absolute links, we will have to modify content of
that pages on the fly, using apache2 output filter mechanism.

There are three choices: mod_proxy_html, mod_ext_filter and mod_sar.
The first uses a libxml2 and because of that, it is not good for
purpose such as reverse proxy. For example, libxml2 will seriously
corrupt HTML code in case of a minor errors in HTML such as missing
quote. mod_proxy_html inherits that nasty habit from libxml2 but
if you want to try it your own, you can find that module at
http://apache.webthing.com/mod_proxy_html/
The second one is not a third party module, it comes with apache2
and it can suite needs for reverse proxy but it is not good for heavy
loaded sites because external command is executed for every request.
Here is example of mod_ext_filter usage:
    <IfModule mod_ext_filter.c>
        ExtFilterDefine fixtext mode=output intype=text/html \
            cmd="/bin/sed s/some-domain\.local/some-domain\.com/g"
        <Location />
            SetOutputFilter fixtext
        </Location>
    </IfModule>
And the third one is the one you are just looking at: mod_sar.
See the DIRECTIVES and EXAMPLES sections for usage information.
mod_sar will do one simple thing. It will replace one string
with another, depending on configuration. It can perform case
insensitive search if needed. It has been tested under heavy load
without performance impact.


DIRECTIVES
SarStrings <search_string> <replace_string>
       This directive requires two parameters, search string and
       replace string enclosed with double quotes.
       It can be used in server config and virtual host context.

SarCaseInsensitive <On|Off>
       If set to On, case insensitive search will be performed instead
       of exact string match.
       Default is Off.
       It can be used in server config and virtual host context.

SarVerbose <On|Off>
       If set to On, every time mod_sar is used as filter, message is
       printed into apache error logs.
       Default is Off.
       It can be used in server config and virtual host context.


EXAMPLES
       <IfModule mod_sar.c>
           AddOutputFilterByType sar_filter text/html
           SarStrings "http://some-domain.local" "http//some-domain.com"
           SarCaseInsensitive Off
           SarVerbose Off
       </IfModule>


REQUIREMENTS
Apache-2.0.


COMPATIBILITY
It has been tested on Linux but there is no obvious reason why it
would'n work on other unix platforms supported by apache2.
             OS:  Linux
       compiler:  gcc-2.9x, gcc-3.x
         apache:  apache-2.0.x


BUGS
Current version of mod_sar does not contain known bugs.


SEE ALSO
apxs(8), http://www.apache.org/


AUTHOR
Josip Deanovic <djosip@linuxpages.org>

    由於新版本的google desktop的輸出url規則比較複雜,重寫很困難,加上linux文件系統中太多的許可權,許多目錄都不會允許apache訪問的,所以就懶得再折騰了,畢竟輸出的信息中已經有文件位置的詳細地址,通過文件伺服器找尋下去也是很方便的。
    最後,希望看到這篇文章的達人能夠幫助寫出mod-sar的輸出規則,幫我完善這個Google Desktop For Linux with Apache2 On LAN,謝謝。
《解決方案》

自己消滅0回復,期待達人幫助解決mod_sar的規則。鄙人不會讀源碼,不知道這個模塊的規則細節。



[火星人 via ] Google Desktop for Linux With Apache2 On LAN已經有108次圍觀

http://www.coctec.com/docs/service/show-post-31409.html