Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Datatang Data Service Introduction
1. Data Services in Big Data Era
Datatang Technology
1Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
2. 2
About US
• Founded in 2011
• Headquarter: Beijing, China
• Subsidiaries: China(4), India(1, opening soon), US (1, opening soon)
• Employees: 1000+
• Stock Code: 831428
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
3. 3
Data is the Catalyst
Data makes it spin faster.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
“Data is promoting changes in traditional industries, from its technical frame,
business model, to organization structure. ”
4. 4
Where do we stand at?
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
5. • 45,000+ sets of data
• 98% of the data are free of charge
• 1+ million exchanges in 2014
5
Largest Data Exchange Platform in Asia
✤ Explore ✤ Exchange ✤ Share
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
6. 6
Data Services
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
• 100+ partner
cooperates and
governmental offices
• A professional BD
team
• 500+ global-wide
partner cooperates
• Efficient services in
data crawling,
updating, and
integration
• Online: 400000+
registered users
• Offline: 1000+
employees in 5 offices
• Generate data map
directly to clients’
demands
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data Customizing
7. 7
Industrial / Governmental Data
Business
Transportation
& Geography
Medical & Health
Motor Social & Media
Energy &
Agriculture
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
8. 8
Industrial / Governmental Data Example
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Medical & Health
Transportation &
Geography
Business
Motor
Social & Media
Hospital Health Care Data
Human Engineering Data
Taxi GPS Data
Public Transportation Data
Global Flight Information Data
Custom Import and Export Data
Vehicle OBD (On-Board Diagnostic) Data
Driving Behavior Data
Online Purchase Record in E-Commerce
User’s Internet Using Behavior Data
SNS Data
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
9. 9
Massive Data Crawling
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Web Crawling Tool
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
Basic principle:
Crawling
whatever we
can as long as
the data is legal
• Social media
• Forum
• Web portals
10. How Do We Collect Massive Data?
In 2014, we have processed
• 2000,000+ images
• 30,000+ hrs speech data
• 231 projects
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Data Collection Window on the
Crowd Sourcing Platform (Mobile-end)
400,000+
registered users in
our crowd sourcing
platform
1000+ staffs
Scattered in 5
offices
Data QA
Online
Offline
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
11. 11
Data Customization
1.Corpus Construction
2.Text Understanding
3.Structure Analysis
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Speech Image Text
Other Data Gathered
by Crowd Sourcing
1.Collection
2.Synthesis
3.Transcription
1.Face
2.Vehicle & Road
3.Merchandise
4. OCR
Data can be
sensed and
collected artificially
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
12. 12
Speech : Collection and TTS
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
• Speeches are recorded with different
devices, under various environments,
such as in car, home office,
professional studio, etc.
• Tasks are designated to speakers from
different countries, of various
background, and balance speakers’
accents and genders.
• Provide audio data generated from real-
world apps.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
13. 13
Speech: Collection and TTS
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Chinese Mandarin and Mandarin
with Regional Dialects
Spanish, Italian, French
Speeches
Thai, Vetnamnese, Malysia Speeches
Japanese, Korean Speeches
In-Car Japanese Speech
North American/ Great Britain
Speaker English Speech
Hindi Speech
American Children Speech
4000 hrs, 8000 speakers, mobile
420 hrs, 900 speakers, mobile
350 hrs, 690 speakers, mobile
500 hrs, 1050 speakers, mobile
500 hrs, 690 speakers, car in different speed
1400 hrs, 1400 speakers, mobile+PC
100 hrs, 200 speakers, mobile
100 hrs, 200 speakers, mobile+PC
Dataset Brief Description
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
14. 14
Speech: Transcription
Transcribed over 50,000 hrs of speech data.
Covering Chinese, Japanese, English, Korean, Thai and so on.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Category Annotation
Does the speech contain useful information?
Starting and ending point for effective speeches.
Starting and ending point for effective speeches.
Is there any noise in the speech?
What language is used in the speech?
What is the gender of the speaker?
Is accent detected in the speech?
Yes
0.07, 0.34
Do you celebrate Christmas in China?
No
English
Male
Yes
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
Over 1,000 transcribers
Scattered in China, India, Japan to help us.
15. 15
Office in China and India
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Cooperation with IIT Cooperation with several Chinese Labs
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
16. 16
Image/ Video: Face Image Data
Example.
Collects and label face
image data of human
faces in various angles
and environments.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
17. 17
Image/ Video: Object in Street
Example.
Collects and labels objects on street in image data.
✤ highlighting objects, such as human, vehicles, road signs, and traffic signals, etc.
✤ specific features, such as brand of vehicles, direction of pedestrians’ possible route,
etc.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
18. 18
Image/ Video: Merchandise Tag
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Collects and labels image data in
E-commerce.
The data is applied for making
recommendations on related/similar
product.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
19. Example.
Collects info on news and events, and annotated:
✤ Event Theme
✤ Description
✤ Event Category
✤ Subject
✤ Verb
✤ Object
✤ Time of Occurrence
✤ Event Location
✤ Cause
✤ Course
✤ Comments
✤ Number of Like
19
Text Understanding
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
20. 20
Text: Corpus Construction
Example.
Constructed a text corpus, including FAQs using mobile search apps.
Topics: 30+, e.g. contacts, GPS, weather, calculator, calendar, stock, music, etc.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Question Structure Type of Service Service Parameter
Take me to Times Square
Where is Times Square
How can I drive to Times Square
Fatest route to Times Square
Address of Times Square
Weather in Manhattan
What’s the temperature in Manhattan
Lowest degree in Manhattan
Is Manhattan cold now
Is it raining now in Manhattan
take me to <L1>
where is <L1>
how can i drive to <L1>
fatest route to <L1>
address of <L1>
weather in <L1>
what is the temperature in <L1>
lowest degree in <L1>
is <L1> cold now
is it raining now in <L1>
Aviation
Aviation
Aviation
Aviation
Aviation
Weather
Weather
Weather
Weather
Weather
Destination = L1
Destination = L1
Destination = L1, drive
Destination = L1
Destination = L1
Location = L1
Location = L1
Location = L1
Location = L1
Location = L1
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
21. 21
Offline Crowd Sourcing Data
Example.
Collecting receipts of local supermarkets.
70,000 receipts collected within 3 weeks.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
22. 22
Data Safety
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
• All clients’ data are kept confidential.
• We trade data with authorization/ copyright
only.
• All data provided by us are obtained in a
legal and open fashion.
• All data are securely stocked and
processed.