SlideShare a Scribd company logo
1 of 13
Download to read offline
Data Driven Innovation - Rome
Self-Service Data
Preparation
Dr. Michele Stecca
24 Feb., 2017
• IoT systems generate massive amounts of data
SELF-SERVICE DATA
PREPARATION
Who knows
how to do
this?
So far,
so good
BUT
• We store this huge amount of information in big
data platforms
• Then we extract value from it
• A "citizen data scientist" is a person who creates or generates
models that leverage predictive or prescriptive analytics but whose
primary job function is outside of the field of statistics and
analytics1
• The person is not typically a member of an analytics team. Citizen
data scientists are typically in a line of business, outside of IT and
outside of a BI team
SELF-SERVICE DATA
PREPARATION
¹ Gartner 2015 Research – “Smart Data Discovery Will Enable a New Class of Citizen Data Scientist”
• Big data discovery will help expand the use of big data analytics
because exploration of big data sources will occur more often,
much faster and at a lower cost per analysis, delivered by a
broader range of users with more rudimentary technical skills
• The global trend is to enable lesser skilled (i.e., citizen data
scientists) users with the ability to solve more complex problems
or access more insights using easier and quicker methods
• Through 2017, the number of citizen data scientists will grow five
times faster than the number of highly skilled data scientists
• The blending in a single tool or tightly coupled portfolio, the ease
of use, interactivity and agility of data discovery, with the richness
of analysis and scale, diversity or immediacy of big data, will be
the inception of big data discovery
SELF-SERVICE DATA
PREPARATION
• Gartner has developed the concept of smart big data discovery
• Preparing data, finding patterns in large, complex data and sharing
findings with other users from data remains largely manual
• Smart self-service data preparation is a smart data discovery
capability, where algorithms are used to find relationships in data
and to profile and recommend to users the best approaches to
minimize modeling time and improve quality
SELF-SERVICE DATA
PREPARATION
• doolytic simplifies access to big data with a modern BI user
experience and functionality
• doolytic enables smart data discovery on both structured and
unstructured data
• doolytic offers sophisticated advanced query capabilities required by
power users/citizen data scientists
• doolytic leverages supervised and unsupervised machine learning
features for further investigation
SELF-SERVICE DATA
PREPARATION
SELF-SERVICE DATA
PREPARATION
• Native Datalake Dictionary
• Join Recommender
• Not based on field name conventions
like traditional BI tools
• Search links between fields and draw
graphs with confidence from Datalake
Dictionary
SELF-SERVICE DATA
PREPARATION
How can
doolytic help
to discover
unknown
correlations?
• The algorithm suggests the user the potential correlations by
associating a degree of confidence
• The user can accept/reject recommendations
• Graphical visualization for usability
• The algorithm is scalable
SELF-SERVICE DATA
PREPARATION
SELF-SERVICE DATA
PREPARATION
• The network planning department needs to optimize the bandwidth allocation by user and traffic type
• Citizen data scientists are limited by the existing technology stack to high aggregation levels and small
fractions of data while performing statistical analyses
• Citizen data scientists must manually correlate data coming from different data sources (including
network probes)
solution
challenge
benefits
• Business users keep track of frequently used queries with responsive interactive dashboards and
visualizations
• Citizen data scientists drill data on the-the-fly at maximum granularity – at user, device and traffic
package level - and discover new paths and rules for network optimization through the Relation-Action
model
• Relationships among datasets are automatically recommended by a specific component
• More accurate and effective network optimizations algorithms are enabled with a wider and
deeper set of inputs
• Citizen data scientist are free to do big data discovery on their own
• Lower TCO than legacy tools
• IT department redirected from support to custom data inquiries
• ROI realized through smaller required investment in optimized network equipment
• Moving from manual data preparation to smart data preparation is
an important trend for IoT/big data applications
• This is particularly true when dealing with heterogeneous data such
as sensor data, structured/unstructured data, etc.
• doolytic supports the citizen data scientist by providing advanced
tools for data preparation on large datasets with the Join
Recommender
SELF-SERVICE DATA
PREPARATION
@steccami
SELF-SERVICE DATA
PREPARATION
• Senior Big Data Analyst, doolytic
• Ph.D. Computer Engineering, Univ. of Genoa, Italy
• Visiting Researcher, ICSI - UC Berkeley, USA
• Principal Investigator, FP6 & FP7 projects co-
funded by EU
• Author 30+ scientific papers in Computer Science
• Main interests: Big data (Hadoop, Spark, etc.), IoT
Self-service Big Data Preparation - Michele Stecca

More Related Content

Viewers also liked

Viewers also liked (17)

A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe Francavilla
 
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
 
How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)
 
The mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia MarzanoThe mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia Marzano
 
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro RosatiIl valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
 
Healthware for medicine - Roberto Ascione
Healthware for medicine - Roberto AscioneHealthware for medicine - Roberto Ascione
Healthware for medicine - Roberto Ascione
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico Neri
 
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide MulaPortabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
 
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
 
No Data, No Party - Roberto Magnifico
No Data, No Party - Roberto MagnificoNo Data, No Party - Roberto Magnifico
No Data, No Party - Roberto Magnifico
 
LCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca RuiniLCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca Ruini
 
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
 
Innovazione per la PA - Andrea D'Acunto
Innovazione per la PA - Andrea D'AcuntoInnovazione per la PA - Andrea D'Acunto
Innovazione per la PA - Andrea D'Acunto
 
L’etica nella società dell’intelligenza artificiale - Edmondo Grassi
L’etica nella società dell’intelligenza artificiale - Edmondo GrassiL’etica nella società dell’intelligenza artificiale - Edmondo Grassi
L’etica nella società dell’intelligenza artificiale - Edmondo Grassi
 

More from Data Driven Innovation

More from Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
 

Recently uploaded

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 

Recently uploaded (20)

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Self-service Big Data Preparation - Michele Stecca

  • 1. Data Driven Innovation - Rome Self-Service Data Preparation Dr. Michele Stecca 24 Feb., 2017
  • 2. • IoT systems generate massive amounts of data SELF-SERVICE DATA PREPARATION Who knows how to do this? So far, so good BUT • We store this huge amount of information in big data platforms • Then we extract value from it
  • 3. • A "citizen data scientist" is a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics1 • The person is not typically a member of an analytics team. Citizen data scientists are typically in a line of business, outside of IT and outside of a BI team SELF-SERVICE DATA PREPARATION ¹ Gartner 2015 Research – “Smart Data Discovery Will Enable a New Class of Citizen Data Scientist”
  • 4. • Big data discovery will help expand the use of big data analytics because exploration of big data sources will occur more often, much faster and at a lower cost per analysis, delivered by a broader range of users with more rudimentary technical skills • The global trend is to enable lesser skilled (i.e., citizen data scientists) users with the ability to solve more complex problems or access more insights using easier and quicker methods • Through 2017, the number of citizen data scientists will grow five times faster than the number of highly skilled data scientists • The blending in a single tool or tightly coupled portfolio, the ease of use, interactivity and agility of data discovery, with the richness of analysis and scale, diversity or immediacy of big data, will be the inception of big data discovery SELF-SERVICE DATA PREPARATION
  • 5. • Gartner has developed the concept of smart big data discovery • Preparing data, finding patterns in large, complex data and sharing findings with other users from data remains largely manual • Smart self-service data preparation is a smart data discovery capability, where algorithms are used to find relationships in data and to profile and recommend to users the best approaches to minimize modeling time and improve quality SELF-SERVICE DATA PREPARATION
  • 6. • doolytic simplifies access to big data with a modern BI user experience and functionality • doolytic enables smart data discovery on both structured and unstructured data • doolytic offers sophisticated advanced query capabilities required by power users/citizen data scientists • doolytic leverages supervised and unsupervised machine learning features for further investigation SELF-SERVICE DATA PREPARATION
  • 8. • Native Datalake Dictionary • Join Recommender • Not based on field name conventions like traditional BI tools • Search links between fields and draw graphs with confidence from Datalake Dictionary SELF-SERVICE DATA PREPARATION How can doolytic help to discover unknown correlations?
  • 9. • The algorithm suggests the user the potential correlations by associating a degree of confidence • The user can accept/reject recommendations • Graphical visualization for usability • The algorithm is scalable SELF-SERVICE DATA PREPARATION
  • 10. SELF-SERVICE DATA PREPARATION • The network planning department needs to optimize the bandwidth allocation by user and traffic type • Citizen data scientists are limited by the existing technology stack to high aggregation levels and small fractions of data while performing statistical analyses • Citizen data scientists must manually correlate data coming from different data sources (including network probes) solution challenge benefits • Business users keep track of frequently used queries with responsive interactive dashboards and visualizations • Citizen data scientists drill data on the-the-fly at maximum granularity – at user, device and traffic package level - and discover new paths and rules for network optimization through the Relation-Action model • Relationships among datasets are automatically recommended by a specific component • More accurate and effective network optimizations algorithms are enabled with a wider and deeper set of inputs • Citizen data scientist are free to do big data discovery on their own • Lower TCO than legacy tools • IT department redirected from support to custom data inquiries • ROI realized through smaller required investment in optimized network equipment
  • 11. • Moving from manual data preparation to smart data preparation is an important trend for IoT/big data applications • This is particularly true when dealing with heterogeneous data such as sensor data, structured/unstructured data, etc. • doolytic supports the citizen data scientist by providing advanced tools for data preparation on large datasets with the Join Recommender SELF-SERVICE DATA PREPARATION
  • 12. @steccami SELF-SERVICE DATA PREPARATION • Senior Big Data Analyst, doolytic • Ph.D. Computer Engineering, Univ. of Genoa, Italy • Visiting Researcher, ICSI - UC Berkeley, USA • Principal Investigator, FP6 & FP7 projects co- funded by EU • Author 30+ scientific papers in Computer Science • Main interests: Big data (Hadoop, Spark, etc.), IoT