SlideShare a Scribd company logo
1 of 24
Download to read offline
Data Services in Big Data Era
Datatang Technology
1Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
2
About US
• Founded in 2011
• Headquarter: Beijing, China
• Subsidiaries: China(4), India(1, opening soon), US (1, opening soon)
• Employees: 1000+
• Stock Code: 831428
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
3
Data is the Catalyst
Data makes it spin faster.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
“Data is promoting changes in traditional industries, from its technical frame,
business model, to organization structure. ”
4
Where do we stand at?
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
• 45,000+ sets of data
• 98% of the data are free of charge
• 1+ million exchanges in 2014
5
Largest Data Exchange Platform in Asia
✤ Explore ✤ Exchange ✤ Share
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
6
Data Services
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
• 100+ partner
cooperates and
governmental offices
• A professional BD
team
• 500+ global-wide
partner cooperates
• Efficient services in
data crawling,
updating, and
integration
• Online: 400000+
registered users
• Offline: 1000+
employees in 5 offices
• Generate data map
directly to clients’
demands
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data Customizing
7
Industrial / Governmental Data
Business
Transportation
& Geography
Medical & Health
Motor Social & Media
Energy &
Agriculture
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
8
Industrial / Governmental Data Example
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Medical & Health
Transportation &
Geography
Business
Motor
Social & Media
Hospital Health Care Data
Human Engineering Data
Taxi GPS Data
Public Transportation Data
Global Flight Information Data
Custom Import and Export Data
Vehicle OBD (On-Board Diagnostic) Data
Driving Behavior Data
Online Purchase Record in E-Commerce
User’s Internet Using Behavior Data
SNS Data
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
9
Massive Data Crawling
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Web Crawling Tool
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
Basic principle:
Crawling
whatever we
can as long as
the data is legal
• Social media
• Forum
• Web portals
How Do We Collect Massive Data?
In 2014, we have processed
• 2000,000+ images
• 30,000+ hrs speech data
• 231 projects
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Data Collection Window on the
Crowd Sourcing Platform (Mobile-end)
400,000+
registered users in
our crowd sourcing
platform
1000+ staffs
Scattered in 5
offices
Data QA
Online
Offline
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
11
Data Customization
1.Corpus Construction
2.Text Understanding
3.Structure Analysis
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Speech Image Text
Other Data Gathered
by Crowd Sourcing
1.Collection
2.Synthesis
3.Transcription
1.Face
2.Vehicle & Road
3.Merchandise
4. OCR
Data can be
sensed and
collected artificially
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
12
Speech : Collection and TTS
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
• Speeches are recorded with different
devices, under various environments,
such as in car, home office,
professional studio, etc.
• Tasks are designated to speakers from
different countries, of various
background, and balance speakers’
accents and genders.
• Provide audio data generated from real-
world apps.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
13
Speech: Collection and TTS
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Chinese Mandarin and Mandarin
with Regional Dialects
Spanish, Italian, French
Speeches
Thai, Vetnamnese, Malysia Speeches
Japanese, Korean Speeches
In-Car Japanese Speech
North American/ Great Britain
Speaker English Speech
Hindi Speech
American Children Speech
4000 hrs, 8000 speakers, mobile
420 hrs, 900 speakers, mobile
350 hrs, 690 speakers, mobile
500 hrs, 1050 speakers, mobile
500 hrs, 690 speakers, car in different speed
1400 hrs, 1400 speakers, mobile+PC
100 hrs, 200 speakers, mobile
100 hrs, 200 speakers, mobile+PC
Dataset Brief Description
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
14
Speech: Transcription
Transcribed over 50,000 hrs of speech data.
Covering Chinese, Japanese, English, Korean, Thai and so on.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Category Annotation
Does the speech contain useful information?
Starting and ending point for effective speeches.
Starting and ending point for effective speeches.
Is there any noise in the speech?
What language is used in the speech?
What is the gender of the speaker?
Is accent detected in the speech?
Yes
0.07, 0.34
Do you celebrate Christmas in China?
No
English
Male
Yes
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
Over 1,000 transcribers
Scattered in China, India, Japan to help us.
15
Office in China and India
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Cooperation with IIT Cooperation with several Chinese Labs
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
16
Image/ Video: Face Image Data
Example.
Collects and label face
image data of human
faces in various angles
and environments.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
17
Image/ Video: Object in Street
Example.
Collects and labels objects on street in image data.
✤ highlighting objects, such as human, vehicles, road signs, and traffic signals, etc.
✤ specific features, such as brand of vehicles, direction of pedestrians’ possible route,
etc.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
18
Image/ Video: Merchandise Tag
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Collects and labels image data in
E-commerce.
The data is applied for making
recommendations on related/similar
product.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
Example.
Collects info on news and events, and annotated:
✤ Event Theme
✤ Description
✤ Event Category
✤ Subject
✤ Verb
✤ Object
✤ Time of Occurrence
✤ Event Location
✤ Cause
✤ Course
✤ Comments
✤ Number of Like
19
Text Understanding
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
20
Text: Corpus Construction
Example.
Constructed a text corpus, including FAQs using mobile search apps.
Topics: 30+, e.g. contacts, GPS, weather, calculator, calendar, stock, music, etc.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Question Structure Type of Service Service Parameter
Take me to Times Square
Where is Times Square
How can I drive to Times Square
Fatest route to Times Square
Address of Times Square
Weather in Manhattan
What’s the temperature in Manhattan
Lowest degree in Manhattan
Is Manhattan cold now
Is it raining now in Manhattan
take me to <L1>
where is <L1>
how can i drive to <L1>
fatest route to <L1>
address of <L1>
weather in <L1>
what is the temperature in <L1>
lowest degree in <L1>
is <L1> cold now
is it raining now in <L1>
Aviation
Aviation
Aviation
Aviation
Aviation
Weather
Weather
Weather
Weather
Weather
Destination = L1
Destination = L1
Destination = L1, drive
Destination = L1
Destination = L1
Location = L1
Location = L1
Location = L1
Location = L1
Location = L1
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
21
Offline Crowd Sourcing Data
Example.
Collecting receipts of local supermarkets.
70,000 receipts collected within 3 weeks.
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
Industrial &
Governmental
Data Bridging
Massive Data
Crawling
Data
Customizing
22
Data Safety
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
• All clients’ data are kept confidential.
• We trade data with authorization/ copyright
only.
• All data provided by us are obtained in a
legal and open fashion.
• All data are securely stocked and
processed.
23
Clients Wall
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
24
THANK YOU!
Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
+8610 8260 0553
factory@datatang.com
Contact us

More Related Content

What's hot

Keynote: GraphTour Toronto
Keynote: GraphTour TorontoKeynote: GraphTour Toronto
Keynote: GraphTour TorontoNeo4j
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...Usama Fayyad
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Oomph! Recruitment
 
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014pietvz
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesCédric Fauvet
 
Itag usama bigdata-6-2015-full
Itag usama bigdata-6-2015-fullItag usama bigdata-6-2015-full
Itag usama bigdata-6-2015-fullUsama Fayyad
 
GoogleQuoteWSJ.290213955
GoogleQuoteWSJ.290213955GoogleQuoteWSJ.290213955
GoogleQuoteWSJ.290213955ypai
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016Andrey Karpov
 
HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)Andrey Karpov
 
Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3mcacicio
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data ScienceUsama Fayyad
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...AnthonyOtuonye
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016Richard Vidgen
 
Interesting ways Big Data is used today
Interesting ways Big Data is used todayInteresting ways Big Data is used today
Interesting ways Big Data is used todayDaniel Sârbe
 

What's hot (19)

Keynote: GraphTour Toronto
Keynote: GraphTour TorontoKeynote: GraphTour Toronto
Keynote: GraphTour Toronto
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
 
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
Itag usama bigdata-6-2015-full
Itag usama bigdata-6-2015-fullItag usama bigdata-6-2015-full
Itag usama bigdata-6-2015-full
 
GoogleQuoteWSJ.290213955
GoogleQuoteWSJ.290213955GoogleQuoteWSJ.290213955
GoogleQuoteWSJ.290213955
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data Science
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
 
Interesting ways Big Data is used today
Interesting ways Big Data is used todayInteresting ways Big Data is used today
Interesting ways Big Data is used today
 

Viewers also liked

'An Image Maker's Perspective of Rethinking Nigeria'
'An Image Maker's Perspective of Rethinking Nigeria''An Image Maker's Perspective of Rethinking Nigeria'
'An Image Maker's Perspective of Rethinking Nigeria'i-Octane
 
DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...
DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...
DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...David Plumstead
 
Gorontalo quantity surveyor
Gorontalo quantity surveyorGorontalo quantity surveyor
Gorontalo quantity surveyorsupriyantoedi
 
AD7124-8 Datasheet PDF
AD7124-8 Datasheet PDFAD7124-8 Datasheet PDF
AD7124-8 Datasheet PDFDatasheet
 
De-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 TalkDe-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 TalkLeo Polovets
 

Viewers also liked (11)

Project 1 mr
Project 1 mrProject 1 mr
Project 1 mr
 
'An Image Maker's Perspective of Rethinking Nigeria'
'An Image Maker's Perspective of Rethinking Nigeria''An Image Maker's Perspective of Rethinking Nigeria'
'An Image Maker's Perspective of Rethinking Nigeria'
 
Aritmatikasosial
AritmatikasosialAritmatikasosial
Aritmatikasosial
 
DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...
DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...
DNSSAB Community Services Review, Based on the Ontario Disability Support Pro...
 
Gorontalo quantity surveyor
Gorontalo quantity surveyorGorontalo quantity surveyor
Gorontalo quantity surveyor
 
MSP Foundation 2013
MSP Foundation 2013MSP Foundation 2013
MSP Foundation 2013
 
Lavinia-United Airline
Lavinia-United AirlineLavinia-United Airline
Lavinia-United Airline
 
Character Building
Character BuildingCharacter Building
Character Building
 
AD7124-8 Datasheet PDF
AD7124-8 Datasheet PDFAD7124-8 Datasheet PDF
AD7124-8 Datasheet PDF
 
Artificial sweetners
Artificial sweetnersArtificial sweetners
Artificial sweetners
 
De-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 TalkDe-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 Talk
 

Similar to Datatang Data Service Introduction

Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsVirtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsSplunk
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
 
How to Drive and Win with Innovation Initiatives in Government Institutions
How to Drive and Win with Innovation Initiatives in Government InstitutionsHow to Drive and Win with Innovation Initiatives in Government Institutions
How to Drive and Win with Innovation Initiatives in Government InstitutionsAIIM International
 
PowerPoint presentation
PowerPoint presentationPowerPoint presentation
PowerPoint presentationwebhostingguy
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data! B Spot
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Graeme Wood
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semanticsoftware
 
Harness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleHarness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleSai Janakiram Penumuru
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryInside Analysis
 
Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febJonathan Woodward
 
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform Ellicium Solutions Inc.
 
Biz Nova It Project Bonus Slides
Biz Nova It Project Bonus SlidesBiz Nova It Project Bonus Slides
Biz Nova It Project Bonus SlidesTyHowardPMP
 
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...Aditya Jami
 

Similar to Datatang Data Service Introduction (20)

Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsVirtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
Ims333 vc project
Ims333 vc projectIms333 vc project
Ims333 vc project
 
How to Drive and Win with Innovation Initiatives in Government Institutions
How to Drive and Win with Innovation Initiatives in Government InstitutionsHow to Drive and Win with Innovation Initiatives in Government Institutions
How to Drive and Win with Innovation Initiatives in Government Institutions
 
PowerPoint presentation
PowerPoint presentationPowerPoint presentation
PowerPoint presentation
 
Ornl IT
Ornl ITOrnl IT
Ornl IT
 
The Rise of People Analytics
The Rise of People AnalyticsThe Rise of People Analytics
The Rise of People Analytics
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data!
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
Big Data
Big DataBig Data
Big Data
 
Harness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleHarness the Power of Big Data with Oracle
Harness the Power of Big Data with Oracle
 
Israel IT Market 2006 2008
Israel IT Market 2006 2008Israel IT Market 2006 2008
Israel IT Market 2006 2008
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th feb
 
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
 
Biz Nova It Project Bonus Slides
Biz Nova It Project Bonus SlidesBiz Nova It Project Bonus Slides
Biz Nova It Project Bonus Slides
 
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
 

Datatang Data Service Introduction

  • 1. Data Services in Big Data Era Datatang Technology 1Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
  • 2. 2 About US • Founded in 2011 • Headquarter: Beijing, China • Subsidiaries: China(4), India(1, opening soon), US (1, opening soon) • Employees: 1000+ • Stock Code: 831428 Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
  • 3. 3 Data is the Catalyst Data makes it spin faster. Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. “Data is promoting changes in traditional industries, from its technical frame, business model, to organization structure. ”
  • 4. 4 Where do we stand at? Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
  • 5. • 45,000+ sets of data • 98% of the data are free of charge • 1+ million exchanges in 2014 5 Largest Data Exchange Platform in Asia ✤ Explore ✤ Exchange ✤ Share Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
  • 6. 6 Data Services Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. • 100+ partner cooperates and governmental offices • A professional BD team • 500+ global-wide partner cooperates • Efficient services in data crawling, updating, and integration • Online: 400000+ registered users • Offline: 1000+ employees in 5 offices • Generate data map directly to clients’ demands Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 7. 7 Industrial / Governmental Data Business Transportation & Geography Medical & Health Motor Social & Media Energy & Agriculture Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 8. 8 Industrial / Governmental Data Example Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Medical & Health Transportation & Geography Business Motor Social & Media Hospital Health Care Data Human Engineering Data Taxi GPS Data Public Transportation Data Global Flight Information Data Custom Import and Export Data Vehicle OBD (On-Board Diagnostic) Data Driving Behavior Data Online Purchase Record in E-Commerce User’s Internet Using Behavior Data SNS Data Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 9. 9 Massive Data Crawling Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Web Crawling Tool Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing Basic principle: Crawling whatever we can as long as the data is legal • Social media • Forum • Web portals
  • 10. How Do We Collect Massive Data? In 2014, we have processed • 2000,000+ images • 30,000+ hrs speech data • 231 projects Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Data Collection Window on the Crowd Sourcing Platform (Mobile-end) 400,000+ registered users in our crowd sourcing platform 1000+ staffs Scattered in 5 offices Data QA Online Offline Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 11. 11 Data Customization 1.Corpus Construction 2.Text Understanding 3.Structure Analysis Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Speech Image Text Other Data Gathered by Crowd Sourcing 1.Collection 2.Synthesis 3.Transcription 1.Face 2.Vehicle & Road 3.Merchandise 4. OCR Data can be sensed and collected artificially Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 12. 12 Speech : Collection and TTS Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing • Speeches are recorded with different devices, under various environments, such as in car, home office, professional studio, etc. • Tasks are designated to speakers from different countries, of various background, and balance speakers’ accents and genders. • Provide audio data generated from real- world apps. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 13. 13 Speech: Collection and TTS Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Chinese Mandarin and Mandarin with Regional Dialects Spanish, Italian, French Speeches Thai, Vetnamnese, Malysia Speeches Japanese, Korean Speeches In-Car Japanese Speech North American/ Great Britain Speaker English Speech Hindi Speech American Children Speech 4000 hrs, 8000 speakers, mobile 420 hrs, 900 speakers, mobile 350 hrs, 690 speakers, mobile 500 hrs, 1050 speakers, mobile 500 hrs, 690 speakers, car in different speed 1400 hrs, 1400 speakers, mobile+PC 100 hrs, 200 speakers, mobile 100 hrs, 200 speakers, mobile+PC Dataset Brief Description Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 14. 14 Speech: Transcription Transcribed over 50,000 hrs of speech data. Covering Chinese, Japanese, English, Korean, Thai and so on. Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Category Annotation Does the speech contain useful information? Starting and ending point for effective speeches. Starting and ending point for effective speeches. Is there any noise in the speech? What language is used in the speech? What is the gender of the speaker? Is accent detected in the speech? Yes 0.07, 0.34 Do you celebrate Christmas in China? No English Male Yes Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing Over 1,000 transcribers Scattered in China, India, Japan to help us.
  • 15. 15 Office in China and India Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Cooperation with IIT Cooperation with several Chinese Labs Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 16. 16 Image/ Video: Face Image Data Example. Collects and label face image data of human faces in various angles and environments. Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 17. 17 Image/ Video: Object in Street Example. Collects and labels objects on street in image data. ✤ highlighting objects, such as human, vehicles, road signs, and traffic signals, etc. ✤ specific features, such as brand of vehicles, direction of pedestrians’ possible route, etc. Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 18. 18 Image/ Video: Merchandise Tag Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Collects and labels image data in E-commerce. The data is applied for making recommendations on related/similar product. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 19. Example. Collects info on news and events, and annotated: ✤ Event Theme ✤ Description ✤ Event Category ✤ Subject ✤ Verb ✤ Object ✤ Time of Occurrence ✤ Event Location ✤ Cause ✤ Course ✤ Comments ✤ Number of Like 19 Text Understanding Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 20. 20 Text: Corpus Construction Example. Constructed a text corpus, including FAQs using mobile search apps. Topics: 30+, e.g. contacts, GPS, weather, calculator, calendar, stock, music, etc. Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Question Structure Type of Service Service Parameter Take me to Times Square Where is Times Square How can I drive to Times Square Fatest route to Times Square Address of Times Square Weather in Manhattan What’s the temperature in Manhattan Lowest degree in Manhattan Is Manhattan cold now Is it raining now in Manhattan take me to <L1> where is <L1> how can i drive to <L1> fatest route to <L1> address of <L1> weather in <L1> what is the temperature in <L1> lowest degree in <L1> is <L1> cold now is it raining now in <L1> Aviation Aviation Aviation Aviation Aviation Weather Weather Weather Weather Weather Destination = L1 Destination = L1 Destination = L1, drive Destination = L1 Destination = L1 Location = L1 Location = L1 Location = L1 Location = L1 Location = L1 Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 21. 21 Offline Crowd Sourcing Data Example. Collecting receipts of local supermarkets. 70,000 receipts collected within 3 weeks. Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. Industrial & Governmental Data Bridging Massive Data Crawling Data Customizing
  • 22. 22 Data Safety Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. • All clients’ data are kept confidential. • We trade data with authorization/ copyright only. • All data provided by us are obtained in a legal and open fashion. • All data are securely stocked and processed.
  • 23. 23 Clients Wall Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd.
  • 24. 24 THANK YOU! Created by Yc Xu. All right reserved by Beijing Datatang Technology Co., Ltd. +8610 8260 0553 factory@datatang.com Contact us