Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using this library to detect arabic dialects #46

Open
odaymard opened this issue Apr 16, 2016 · 23 comments
Open

using this library to detect arabic dialects #46

odaymard opened this issue Apr 16, 2016 · 23 comments

Comments

@odaymard
Copy link

I am very happy to find this tool
i have a question
can this library help me in detecting arabic dialects (syrian iraqi gulf)
i will try to build corpora for each dialect and add it to language profile
is that right?

@rmtheis
Copy link

rmtheis commented Apr 16, 2016

This is a very interesting idea. Is the spelling of words different from one Arabic dialect to another?

@odaymard
Copy link
Author

yes of course

@rmtheis
Copy link

rmtheis commented Apr 16, 2016

I think the idea is sound and that it's worth trying. I don't think you would need to make any changes to this library in order to try it.

If you run into any trouble generating profiles from your corpora, feel free to post here and I'll be glad to help.

@odaymard
Copy link
Author

Sorry but i am new to java ,
how to run the code to detect some text
is it mandatory to have maven project
or can i add jar file to library ?

@rmtheis
Copy link

rmtheis commented Apr 21, 2016

Yes, you can use it with a jar file--I find that to be the most convenient. There's sample code here, and here starting at line 16.

@odaymard
Copy link
Author

odaymard commented May 1, 2016

Hi how are you
thank you for helping me
please how to generate profile from text file ?

@rmtheis
Copy link

rmtheis commented May 1, 2016

The easiest way is to use the jar from shuyo's repository. Here's an example of generating an Egyptian Arabic profile from its Wikipedia abstract, using Linux:

git clone git@github.com:shuyo/language-detection.git
cd language-detection
mkdir abstracts
cd abstracts
wget https://dumps.wikimedia.org/arzwiki/20160407/arzwiki-20160407-abstract.xml
mkdir profiles
java -jar ../lib/langdetect.jar --genprofile -d . arz

Run this process to make sure it works. Then replace the abstract text file with your dialect corpus and run the last step again.

@rmtheis
Copy link

rmtheis commented May 23, 2016

@odaymard Any progress on this?

@odaymard
Copy link
Author

I am using facebook api and twitterapi to get data but facebook4j is slow I am trying to make it faster

@odaymard
Copy link
Author

Hi
I have finished syrian profile
how to upload it

@rmtheis
Copy link

rmtheis commented Jun 11, 2016

Nice! You need to (1) fork this project, (2) add your new profile to your fork, and (3) create a pull request to this project.

@odaymard
Copy link
Author

I did that, what next?

@rmtheis
Copy link

rmtheis commented Jun 20, 2016

It's up to you. Some suggestions:

  1. Wait for your pull request to be merged. In the meantime, it may be useful to others if you publish the training data used for various dialects in your own project on Github.
  2. Publish profiles for other dialects. Personally, I'd be interested in some test cases and/or test results showing how effective the profiles are for identifying/distinguishing actual regional texts or test cases. These tests would be most meaningful when more dialects are supported.

@fabiankessler
Copy link
Contributor

Nice work, thanks Oday and Robert.

I believe that we should include Arabic dialect profiles in the library, and start some separation in profile loading. I suspect that most users who want "all" languages just want one Arabic profile, one Norwegian, one English, one German, not dialects. Dialects is special purpose.

@dansupriti
Copy link

Hi @odaymard , @rmtheis , @fabiankessler ,
I have started using the library and it's really helpful, but I might have to add new language profile, could you please help me from where I can get the language corpora and what are all steps involved to generate language profile from language corpora? once I have new language profile how to add the same in profile folder?

@odaymard
Copy link
Author

odaymard commented Mar 16, 2017 via email

@dansupriti
Copy link

Hi @odaymard ,
Thanks for the suggestion. Now I am able to add new language profile as per my requirement.

Regards,
Supriti

@odaymard
Copy link
Author

odaymard commented Mar 31, 2017 via email

@safaahenno
Copy link

Hi @odaymard , @rmtheis,
I have Arabic dataset and want to check the dialects of it, (Egypt, the Levant, Iraq and the Gulf), how I can use this lib to do that.
Thanks in advance
safaa

@rmtheis
Copy link

rmtheis commented Apr 25, 2018

@odaymard Are you willing to publish the other Arabic dialect profiles that you've generated? Apart from the interests of others here, I would like to make a basic free Android app with the profiles you've made. It would be a simple, free app that allows a user to paste in Arabic text and get a dialect estimate based on this library.

@safaahenno I think only the Syrian profile is available as of right now.

@safaahenno
Copy link

@rmtheis Can I use this Lib in Netbeans project, and I get the steps to use this lib in Netbeans without facing problem like "package org.jetbrains.annotations does not exist" because it's not clear in readme file.
thanks in advance.

@rmtheis
Copy link

rmtheis commented Apr 25, 2018

@safaahenno You should open a separate issue for that. This issue pertains to Arabic dialects only.

@safaahenno
Copy link

@rmtheis Ok, will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants