Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

Looking for someone with Chinese knowledge

January 11, 2007
Author
Peter Zaitsev
Share this Post:

We’re looking to implement CJK Support in Open Source Full Text search engine Sphinx .
Initially we’re thinking to base search ob bi-gram indexing to keep it simple, especially as according to research papers it offers decent quality for most cases. This is not that complex to implement however there is no way we can test it as we have zero knowledge of Chinese or Japanese.

If you know Chinese Japanese or Korean and would like us help us testing Sphinx support for these languages let us know. No special development skills are required. If you’re reading this blog you should be technical enough.

0 0 votes
Article Rating
Subscribe
Notify of
guest

25 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
islue
19 years ago

I’m a native Chinese speaker and know a little Japanese. I’d like to do some help.

YoungWoo Kim
19 years ago

Hi, I’m Kim and I’m Korean
I’m living in Seoul, Korea and now working for ‘Daum Communications’ as a DBA (Oracle, MySQL)
I wanna test Sphinx CJK support.

-YW Kim

jedy
jedy
19 years ago

I’m Chinese, also have some Japanese knowledge. And I’d like to help to test.

Hao
Hao
19 years ago

Hi there, I’d like to help your testing of Chinese, write to [email protected] if I can join 😛

Sun
Sun
19 years ago

I am from china,and I Would like to join this test.
Is that OK?

Nick Zhao
Nick Zhao
19 years ago

Hi Peter, I’m a Chinese guy living in Dalian, China. I’m a big fun of LAMP though only have little
knowledge of them. But if you just want someone who knows Chinese much better than you and also
desires to help, please feel free to contact me via email or MSN.

P.S., please prepare to bear my poor English and I’d better let you know that I just began to learn LAMP
for a couple of days. 🙂

Best wishes.

Nick

liang
19 years ago

I am a chinese. I have 3 years c++ program experices. I like opensource project. Please contect me and I’d like to test Spinx.

Dale
Dale
19 years ago

Peter, just as an FYI, I’ve actually implemented this in Sphinx for edgeio.com. You can see it in action at:

http://www.edgeio.com/ss/%E6%88%91%E7%9A%84%E6%B1%BD%E8%BD%A6?location=0

However, I don’t think we’re contributing the code back to Sphinx. We used bigrams along with proximity relevance scoring. Based on what I’ve seen, the relevance ranking is pretty good. So far we’re just doing Chinese UTF-8. We have some folks in China who have done some testing with it.

My knowledge of Chinese was just good enough to get by here, but I’d be interested in seeing how your effort goes, and helping out a bit if I can.

mshk
mshk
19 years ago

Hi, I’m Japanese web programmer and intrested in testing Sphinx.
How can I help you?

hongqn
19 years ago

Peter, I’m a Chinese programmer and I’d like to help. I have good Python/C skills and enough knowledge of CJK character encoding, just FYI.

Philip Tellis
19 years ago

You should collaborate with the Namazu developers (http://www.namazu.org/index.html.en). Namazu is a search engine made primarily for CJK languages, but also works with English. The engine is written in C, and the indexer is written in perl. I’ve found their code fairly easy to read and follow (and I do not know any of CJK), and submitted a few patches in the past. The developers are quite helpful.

frank
frank
19 years ago

I am chinese ,I like your products and I always use them. I want to help you.

Gu Lei
Gu Lei
19 years ago

Hi Peter,

I’m Chinese. I also want to join that test. Contact me if needed.

Bill
Bill
19 years ago

Hi

I am interesting in this testing. Chinese and Japanese is ok for me.

[email protected]

Regards
Bill

Eric
Eric
19 years ago

i can test chinese using osx. eric18 @ gmail . com

yejr
19 years ago

hi,peter,i’m the owner of http://imysql.cn,i'm Chinese,i’m a DBA, i’m skilled with MySQL optimization, i would like to join with you 🙂

Louis
Louis
19 years ago

hi, i’m a chinese. i hope to join the testing. please contact me: [email protected]

Josh
Josh
19 years ago

hi, peter, I am a Chinese web programmer, 3 years PHP experience, if you want to test Sphinx CJK Support on Debian AMD64, please contact me.

epaulin AT gmail dot com

xLight
19 years ago

I am a PHP/MySQL web application programmer.
Have been a MotherBoard Tester.

Lisa Lan
Lisa Lan
19 years ago

I am interesting in this testing. I am a oracle and mysql DBA , I’m Chinese .
Thanks

anakin
18 years ago

i am a Chinese,3 years LAMP experience,and interesting in search technology,please contact me if you need.
anakinsun AT gmail.com

KayL
KayL
17 years ago

I’m Chinese, no skills.
feel free to contact me if you need.

Galen
Galen
17 years ago

How is the progress with this Sphinx Chinese language search test?

The quality of Chinese language search also pretty much depends on the quality of word segmentation.

I am wondering if we can do just unigram when indexing (though bigger index) and do word segmentation for user submitted search query (or ask users to segment their query, that makes sense as they know what they want to search for). and then we use sphinx to search using, say, maximum length match, and relevance sorting etc.

Does it make sense this way if we can not beat Google/Baidu on word segmentation.

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved