SlideShare a Scribd company logo
1 of 37
TOOLS FOR ARABIC PEOPLE NAMES PROCESSING AND RETRIEVAL   A STATISTICAL APPROACH  By Ali Salhi Adnan Yahya October 30, 2011 اللغة العربية بين الأتمتة والفلسفة في جامعة بيرزيت دراسات منطقية وفلسفية وحاسوبية في اللغة العربية
OUTLINE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MOTIVATION AND BACKGROUND ,[object Object],[object Object],[object Object],[object Object],[object Object]
WHAT WE ARE TRYING TO BUILD? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES TOOLS RESOURCES AND CONSTRUCTION ,[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES TOOLS RESOURCES AND CONSTRUCTION: DIFFERENT FORMATS OF SOURCE DATA ,[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES TABLES AND FILTRATION ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES TABLES AND FILTRATION MALE NAMES TABLE: ,[object Object],[object Object],[object Object],[object Object],[object Object]
MALE NAMES TABLE (CONT …) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MALE NAMES TABLE (CONT …) ,[object Object],[object Object],Item Name Frequency Item  Name Frequency 1 محمد 41280 11 مصطفى 5031 2 محمود 15662 12 موسى 4649 3 أحمد 11752 13 خالد 4199 4 ابرهيم 9287 14 سليمان 4042 5 حسن 8359 15 سعيد 3897 6 علي 8008 16 عبد الله 3893 7 يوسف 7965 17 جمال 3442 8 احمد 7714 18 اسماعيل 3438 9 خليل 5483 19 صالح 3431 10 حسين 5341 20 عمر 3093
FEMALE NAMES TABLE. ,[object Object],[object Object],[object Object],[object Object],[object Object]
FEMALE NAMES TABLE (CONT …) ,[object Object],[object Object],Item Name Frequency Item  Name Frequency 1 ايمان 2177 11 هبة 1178 2 دعاء 2034 12 نداء 1065 3 الاء 1998 13 سماح 1037 4 ولاء 1673 14 روان 1030 5 حنين 1663 15 هديل 1015 6 اسماء 1506 16 مريم 946 7 اسراء 1297 17 حنان 943 8 فداء 1268 18 فاطمة 912 9 ياسمين 1218 19 صابرين 875 10 عبير 1190 20 اماني 871
FAMILY NAMES TABLE ,[object Object],[object Object],[object Object],[object Object],[object Object]
FAMILY NAMES TABLE (CONT …) ,[object Object],[object Object],Item Name Frequency Item  Name Frequency 1 تكروري 952 11 مصري 268 2 حلواني 940 12 جرار 208 3 النجار 450 13 حروب 208 4 عاصي 438 14 الشاعر 203 5 دراغمه 356 15 ربايعة 198 6 بشارات 335 16 رجوب 181 7 جرادات 319 17 سويطي 177 8 دويكات 318 18 صلاحات 175 9 المصري 308 19 شويكي 170 10 ابو الرب 280 20 صوافطه 162
ENGLISH TRANSLATION TABLE ,[object Object],[object Object],Item Name Freq Item  Name Freq 1 Mohammad 5513 11 Mohmad 8 2 Muhammad 783 12 Moh'd 8 3 Mohammed 181 13 Mohamd 5 4 Mohamad 168 14 Mohmmed 5 5 Mohummad 157 15 Mouhamad 4 6 Mohamed 44 16 Mouhammad 4 7 Mohmmad 20 17 Mhamad 4 8 Mohammd 12 18 Mhammed 3 9 Muhamad 11 19 Mhmmad 3 10 Muhammed 11 20 Mhmmed 3
GENERAL NAMES TABLE ,[object Object],[object Object],[object Object]
GENERAL NAMES TABLE (CONT …) ,[object Object],[object Object],[object Object],[object Object]
NAMES METHODS AND TOOLS  ,[object Object],[object Object],[object Object],[object Object],[object Object]
ERROR CORRECTION IN NAMES ,[object Object],[object Object],[object Object]
ERROR CORRECTION (CONT…) NAMES WITH DIFFERENT FORMS ERRORS ,[object Object],[object Object],[object Object]
ERROR CORRECTION(CONT…) NAMES WITH DIFFERENT FORMS ERRORS ,[object Object],[object Object],[object Object],[object Object]
ERROR CORRECTION: SIMPLE EXAMPLE  ,[object Object],[object Object],[object Object]
ERROR CORRECTION: COMPLEX EXAMPLE ,[object Object],[object Object],[object Object],[object Object]
ERROR CORRECTION: COMPLEX EXAMPLE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES CORRECTION TOOL  (CONT…) FORMS RANKING ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES CORRECTION TOOL  (CONT…) ,[object Object],# Input Output(s)  # Input Output(s) 1 ديم ريم ,  ديما , كيم ,  نديم 5 اية راية ,  آية 2 شوشن سوسن ,  شوكت ,  سوزان ,  روان 6 نوزالدين نور الدين 3 خاقلين تالين ,  جاكلين ,  مارلين ,  كاثلين ,  مادلين 7 رمري رمزي ,  رازي 4 اقراجيم إبراهيم 8 غبير عبير ,  غدير
NAMES CORRECTION TOOL  (CONT…)   TEST RESULTS ,[object Object],# Test Type Pass Percentage 1 Speed Writing (test1) Speed Writing (test2) Speed Writing (test3) 87% 84% 85% 2 ,[object Object],[object Object],[object Object],[object Object],91% 79% 70%
NAMES METHODS AND TOOLS  NAME GENDER DETECTOR (NGD) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES METHODS AND TOOLS  NAMES TRANSLATION TOOL ,[object Object],[object Object],# Arabic Name English Translation Freq # Arabic Name English Translation Freq 1 سمير Samir 299 4 أحمد Ahmad 1875 Sameer 85 Ahmed 48 2 نورا Noura 19 Ahamad 6 Nora 7 5 مؤيد Mo'ayad 10 Nura 5 Mu'ayad 9 Noora 3 Moayad 5 3 رياض Riyad 148 Mu'ayyad 5 Riad 24 Mo'ayyad 3 Reyad 8 Muayad 3
NAMES TRANSLATION TOOL (CONT…) ,[object Object],[object Object]
AUTO SUGGESTION TOOL   ,[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES EXTRACTION TOOL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES EXTRACTION TOOL  (CONT…) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NAMES EXTRACTION TOOL (CONT…) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
POSSIBLE USES OF DEVELOPED TOOLS ,[object Object],[object Object],[object Object],[object Object]
CONCLUSIONS ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]

More Related Content

Similar to Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya

Improvement of soundex algorithm for indian language based on phonetic matching
Improvement of soundex algorithm for indian language based on phonetic matchingImprovement of soundex algorithm for indian language based on phonetic matching
Improvement of soundex algorithm for indian language based on phonetic matchingIJCSEA Journal
 
Improvement of Soundex Algorithm for Indian Language Based on Phonetic Matching
Improvement of Soundex Algorithm for Indian Language Based on Phonetic MatchingImprovement of Soundex Algorithm for Indian Language Based on Phonetic Matching
Improvement of Soundex Algorithm for Indian Language Based on Phonetic MatchingIJCSEA Journal
 
PERSONAL IDENTITY MATCHING
PERSONAL IDENTITY MATCHINGPERSONAL IDENTITY MATCHING
PERSONAL IDENTITY MATCHINGcscpconf
 
Personal identity matching
Personal identity matchingPersonal identity matching
Personal identity matchingcsandit
 
Basics of eng grammar wrap up
Basics of eng grammar wrap upBasics of eng grammar wrap up
Basics of eng grammar wrap upSana Malik
 
UNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESS
UNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESSUNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESS
UNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESScscpconf
 

Similar to Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya (7)

Improvement of soundex algorithm for indian language based on phonetic matching
Improvement of soundex algorithm for indian language based on phonetic matchingImprovement of soundex algorithm for indian language based on phonetic matching
Improvement of soundex algorithm for indian language based on phonetic matching
 
Improvement of Soundex Algorithm for Indian Language Based on Phonetic Matching
Improvement of Soundex Algorithm for Indian Language Based on Phonetic MatchingImprovement of Soundex Algorithm for Indian Language Based on Phonetic Matching
Improvement of Soundex Algorithm for Indian Language Based on Phonetic Matching
 
PERSONAL IDENTITY MATCHING
PERSONAL IDENTITY MATCHINGPERSONAL IDENTITY MATCHING
PERSONAL IDENTITY MATCHING
 
Personal identity matching
Personal identity matchingPersonal identity matching
Personal identity matching
 
Basics of eng grammar wrap up
Basics of eng grammar wrap upBasics of eng grammar wrap up
Basics of eng grammar wrap up
 
sorted_listmatch
sorted_listmatchsorted_listmatch
sorted_listmatch
 
UNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESS
UNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESSUNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESS
UNDERSTANDING PEOPLE TITLE PROPERTIES TO IMPROVE INFORMATION EXTRACTION PROCESS
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya

  • 1. TOOLS FOR ARABIC PEOPLE NAMES PROCESSING AND RETRIEVAL A STATISTICAL APPROACH By Ali Salhi Adnan Yahya October 30, 2011 اللغة العربية بين الأتمتة والفلسفة في جامعة بيرزيت دراسات منطقية وفلسفية وحاسوبية في اللغة العربية
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.