The document provides an overview of a presentation on data analysis, mobility, proximity and app-based marketing. The presentation covers topics including big data concepts, artificial intelligence/machine learning, and architectures for data flow and machine learning. It discusses technologies like Elasticsearch, Kafka, and columnar databases. Example applications of AI in areas like retail, banking, and manufacturing are also presented.
Agile Data Science is a lean methodology that is adopted from Agile Software Development. At the core it centers around people, interactions, and building minimally viable products to ship fast and often to solicit customer feedback. In this presentation, I describe how this work was done in the past with examples. Get started today with our help by visiting http://www.alpinenow.com
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
This Edureka Data Science course slides will take you through the basics of Data Science - why Data Science, what is Data Science, use cases, BI vs Data Science, Data Science tools and Data Science lifecycle process. This is ideal for beginners to get started with learning data science.
You can read the blog here: https://goo.gl/OoDCxz
You can also take a complete structured training, check out the details here: https://goo.gl/AfxwBc
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
Agile Data Science is a lean methodology that is adopted from Agile Software Development. At the core it centers around people, interactions, and building minimally viable products to ship fast and often to solicit customer feedback. In this presentation, I describe how this work was done in the past with examples. Get started today with our help by visiting http://www.alpinenow.com
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
This Edureka Data Science course slides will take you through the basics of Data Science - why Data Science, what is Data Science, use cases, BI vs Data Science, Data Science tools and Data Science lifecycle process. This is ideal for beginners to get started with learning data science.
You can read the blog here: https://goo.gl/OoDCxz
You can also take a complete structured training, check out the details here: https://goo.gl/AfxwBc
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
The data science process seeks to transform and empower organizations by finding and exploiting market inefficiencies and potentially hidden opportunities, but this is often an expensive, tedious process. However, many steps can be automated to provide a streamlined experience for data scientists. Eduardo Arino de la Rubia explores the tools being created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation and impact validation.
The promise of the automated statistician is almost as old as statistics itself. From the creations of vast tables, which saved the labor of calculation, to modern tools which automatically mine datasets for correlations, there has been a considerable amount of advancement in this field. Eduardo compares and contrasts a number of open source tools, including TPOT and auto-sklearn for automated model generation and scikit-feature for feature generation and other aspects of the data science workflow, evaluates their results, and discusses their place in the modern data science workflow.
Along the way, Eduardo outlines the pitfalls of automated data science and applications of the “no free lunch” theorem and dives into alternate approaches, such as end-to-end deep learning, which seek to leverage massive-scale computing and architectures to handle automatic generation of features and advanced models.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Tales from an ip worker in consulting and softwareGreg Makowski
Discussion around intellectual property, leverage over consulting projects to build vertical application software. In my use case, data mining, artificial intelligence and intelligence augmentation are part of the value add. Also, discuss software frameworks, open source software and clauses on prior inventions in hiring contracts
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself.
Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities.
The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
•sophisticated analytics techniques, plus
•lean learning principles, plus
•agile delivery methods, plus
•so-called "big data" technologies
Learn:
•The analytical modeling process and techniques
•How analytical models are deployed using modern technologies
•The complexities of data discovery, harvesting, and preparation
•How to apply agile techniques to shorten the analytics development cycle
•How to apply lean learning principles to develop actionable and valuable analytics
•How to apply continuous delivery techniques to operationalize analytical models
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
To be successful as a data science team, we need to continuously deliver data-driven insights and data products that generate business value. Identifying the best opportunities and building solutions that actually get used in production requires very close collaboration with business users and subject matter experts. What can we learn from agile software development methodologies, and how can we apply them to data science projects?
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this session I will share insights and knowledge that I have gained from building up a Data Science department from scratch. The talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organization.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
This video will give you an idea about Data science for beginners.
Also explain Data Science Process , Data Science Job Roles , Stages in Data Science Project
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
Insnap, a hyper-personalized ML-based platform acquired by The Honest Company, has been used to build a real-time data platform based on Apache Spark, Cassandra and Redshift. Users’ behavioral and transactional data have been used to build data models and ML models, and to drive use cases for marketing, growth, finance and operations.
Learn how Honest Company has used Spark as a workhorse for 1) collecting, ETL and storing data from various sources including mysql, mongo, jde, Google analytics, Facebook, Localytics and REST API; 2) building data models and aggregating and generating reports of revenue, order fulfillment tracking, data pipeline monitoring and subscriptions; 3) Using ML to build model for user acquisitions, LTV and recommendations use cases. Spark replaced the monolithic codebase with flexible, scalable and robust pipelines. Databricks helped The Honest Company to focus on data instead of maintaining infrastructure. While Honest users got delightful recommendations to improve experience, data users at Honest understood users much better in terms of segmenting with behavioral information and advanced ML models, leading to increased revenue and retention.
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
The data science process seeks to transform and empower organizations by finding and exploiting market inefficiencies and potentially hidden opportunities, but this is often an expensive, tedious process. However, many steps can be automated to provide a streamlined experience for data scientists. Eduardo Arino de la Rubia explores the tools being created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation and impact validation.
The promise of the automated statistician is almost as old as statistics itself. From the creations of vast tables, which saved the labor of calculation, to modern tools which automatically mine datasets for correlations, there has been a considerable amount of advancement in this field. Eduardo compares and contrasts a number of open source tools, including TPOT and auto-sklearn for automated model generation and scikit-feature for feature generation and other aspects of the data science workflow, evaluates their results, and discusses their place in the modern data science workflow.
Along the way, Eduardo outlines the pitfalls of automated data science and applications of the “no free lunch” theorem and dives into alternate approaches, such as end-to-end deep learning, which seek to leverage massive-scale computing and architectures to handle automatic generation of features and advanced models.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Tales from an ip worker in consulting and softwareGreg Makowski
Discussion around intellectual property, leverage over consulting projects to build vertical application software. In my use case, data mining, artificial intelligence and intelligence augmentation are part of the value add. Also, discuss software frameworks, open source software and clauses on prior inventions in hiring contracts
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself.
Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities.
The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
•sophisticated analytics techniques, plus
•lean learning principles, plus
•agile delivery methods, plus
•so-called "big data" technologies
Learn:
•The analytical modeling process and techniques
•How analytical models are deployed using modern technologies
•The complexities of data discovery, harvesting, and preparation
•How to apply agile techniques to shorten the analytics development cycle
•How to apply lean learning principles to develop actionable and valuable analytics
•How to apply continuous delivery techniques to operationalize analytical models
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
To be successful as a data science team, we need to continuously deliver data-driven insights and data products that generate business value. Identifying the best opportunities and building solutions that actually get used in production requires very close collaboration with business users and subject matter experts. What can we learn from agile software development methodologies, and how can we apply them to data science projects?
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this session I will share insights and knowledge that I have gained from building up a Data Science department from scratch. The talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organization.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
This video will give you an idea about Data science for beginners.
Also explain Data Science Process , Data Science Job Roles , Stages in Data Science Project
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
Insnap, a hyper-personalized ML-based platform acquired by The Honest Company, has been used to build a real-time data platform based on Apache Spark, Cassandra and Redshift. Users’ behavioral and transactional data have been used to build data models and ML models, and to drive use cases for marketing, growth, finance and operations.
Learn how Honest Company has used Spark as a workhorse for 1) collecting, ETL and storing data from various sources including mysql, mongo, jde, Google analytics, Facebook, Localytics and REST API; 2) building data models and aggregating and generating reports of revenue, order fulfillment tracking, data pipeline monitoring and subscriptions; 3) Using ML to build model for user acquisitions, LTV and recommendations use cases. Spark replaced the monolithic codebase with flexible, scalable and robust pipelines. Databricks helped The Honest Company to focus on data instead of maintaining infrastructure. While Honest users got delightful recommendations to improve experience, data users at Honest understood users much better in terms of segmenting with behavioral information and advanced ML models, leading to increased revenue and retention.
TidalScale has created a software defined computer.
At TidalScale, we have created a simple cost-effective way for a data scientist, an analyst, an engineer, a scientist, a database administrator, or a software developer to access a group of servers through a single operating system instance as if it were a single supercomputer. This dramatically simplifies development, while reducing software scaling complexity not to mention a dramatic cost saving in hardware and software.
We configure hosted hardware into one or more TidalPods. Each TidalPod is a virtual supercomputer comprising a set of commodity servers configured with the TidalScale HyperKernel. What the user sees is standard Linux, FreeBSD or Windows running with the sum of all memory, processors, networks, and I/O. The secret sauce is the HyperKernel that fools the guest OS into thinking it’s running directly on a huge, expensive machine when in fact it’s running on a set of smaller, less expensive servers.
We offer an incredibly simple user experience.
• Define the computer size you want (Number of CPU, Amount of Memory), boot the virtual machine, then login to the computer…
Thus, we enable a simple cost-effective way for a data scientist, an analyst, an engineer, a scientist, a database administrator, or a software developer to access a group of servers in a Datacenter through a single operating system instance as if it were a single supercomputer. This dramatically simplifies development, while reducing software scaling complexity not to mention a dramatic cost saving in hardware and software.
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
Watch full webinar here: https://bit.ly/3dMN503
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
Presentation on the OpenML initiative to enable open, collaborative machine learning during the data@Sheffield event. We discuss how data, machine learning algorithms and experiments can be analysed collaboratively by data scientists and domain scientists, as well as citizen scientists.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
Over 90% of today’s data has been generated in the last two years, and growth rates continue to climb. In this session, we’ll step through challenges and best practices with data capturing, how to derive meaningful insights to help predict the future, and common pitfalls in data analysis.
Come discover how integrated solutions involving Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning/Deep Learning result in effective data systems for data scientists and business users, alike.
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
Watch full webinar here: https://bit.ly/3mJJ4w9
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
Elasticsearch in architetture Big Data - EsInADay-2017Alberto Paro
ElasticSearch è diventato una componente essenziale nelle architetture Big Data odierne (FastData), non solo per la sua funzione di motore di ricerca, ma soprattutto per il vantaggio competitivo che i suoi anaytics in real-time offrono. In questo breve talk vedremo il posizionamento di ElasticSearch all’interno del panorama NoSQL, esempi di architetture Big Data che sfruttano le sue caratteristiche e facilità di integrazione con tools come Apache Spark.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Using Elasticsearch in a BigData environment is very simple. In this talk, we analyse what's Big Data and we show how it is easy integrating ElasticSearch with Apache Spark
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
2016 02-24 - Piattaforme per i Big DataAlberto Paro
Saper valutare la corretta soluzione NoSQL o soluzione Big Data per il proprio business è essenziale. Non tutti i datastore NoSQL sono uguali come non sono uguali le necessità di trattamento del dato nel proprio business. Cerchiamo di fare chiarezza sui temi principali del Big Data.
What's Big Data? - Big Data Tech - 2015 - FirenzeAlberto Paro
Big Data Tech - 2015 - Florence
Technologie Big Data spiegate al Management
Comprendere i concetti del bigdata e gli strumenti che esistono per affrontarli (Nosql, Hadoop/Spark) sono essenziali al management attuale per poter affrontare le sfide di domani.
Salma Karina Hayat is Conscious Digital Transformation Leader at Kudos | Empowering SMEs via CRM & Digital Automation | Award-Winning Entrepreneur & Philanthropist | Education & Homelessness Advocate
When listening about building new Ventures, Marketplaces ideas are something very frequent. On this session we will discuss reasons why you should stay away from it :P , by sharing real stories and misconceptions around them. If you still insist to go for it however, you will at least get an idea of the important and critical strategies to optimize for success like Product, Business Development & Marketing, Operations :)
Reflect Festival Limassol May 2024.
Michael Economou is an Entrepreneur, with Business & Technology foundations and a passion for Innovation. He is working with his team to launch a new venture – Exyde, an AI powered booking platform for Activities & Experiences, aspiring to revolutionize the way we travel and experience the world. Michael has extensive entrepreneurial experience as the co-founder of Ideas2life, AtYourService as well as Foody, an online delivery platform and one of the most prominent ventures in Cyprus’ digital landscape, acquired by Delivery Hero group in 2019. This journey & experience marks a vast expertise in building and scaling marketplaces, enhancing everyday life through technology and making meaningful impact on local communities, which is what Michael and his team are pursuing doing once more with Exyde www.goExyde.com
LUISS - Deep Learning and data analyses - 09/01/19
1. Data Analysis, Mobility,
Proximity and App-based
Marketing
Deep Learning and data analyses
A new perspective
on how data support companies
on strategic decisions.
Presenter
Alberto Paro
Date 09/01/19
2. Ø Master Degree in Computer Science Engineering at Politecnico di
Milano
Ø Author of 3 books about ElasticSearch from 1 to 5.x + 6 Tech
reviews
Ø Big Data Trainer, Developer and Consulting on Big data
Technologies (Akka, Playframework, Apache Spark, Reactive
Programming), NoSQL (Accumulo, Hbase, Cassandra,
ElasticSearch, Kafka and MongoDB) and Machine Learning
applied to Big Data.
Ø Evangelist for Scala e Scala.JS Language
ABOUT ME – ALBERTO PARO
3. Ø Big Data Concepts
Ø
Ø
Ø Market position
Ø
Ø
Ø
Ø
Ø Build a Solution for
Intelligence
Ø
Ø
Ø
Ø
Ø
Ø
3
TOPICS
6. The ‘Datafication’
Ø Activity
Ø Conversation
Ø Text
Ø Voice
Ø Social Media
Ø Browser logs
Ø Photos
Ø Videos
Ø IOT
Ø Etc.
Volume
Veracity
Variety
Velocity
Big Data Analysing:
Ø Text analytics
Ø Sentiment
analysis
Ø Face recognition
Ø Voice analytics
Ø Movement
analytics
Ø Etc.
Value
TRANSFORM BIG DATA IN VALUE
10. MACHINE LEARNING
The ability to improve
performance of a task
progressively,
without being
explicitly
programmed
11. 4 BIG IDEAS
Data Driven Decision Making
Cloud Computing
Machine Learning
Cognitive
Computing:
ML + BigData + NLP
12. OUTLOOK
Worldwide Spending on
Cognitive and Artificial
Intelligence Systems reached
about $19.1 Billion in 2018
Source: IDC
40% of Digital Transformation
initiatives will use AI services.
AI spending will grow to $42.2
Billion in 2021.
Source: IDC
18. Ø Customer Raccomandation
Ø Customer Profiling
Ø
Ø
Ø
Ø Customer Pre Selling
Ø
Ø
Ø Customer Post Selling
Ø
Ø
Ø Froud Detection
Ø Prediction Systems for Brookers (banking/finance)
AI TECHNOLOGIES – RETAIL AND BANKING
19. Ø Cost reduction via robots
Ø Creation of new products
Ø
Ø Quality monitoring
Ø
Ø Learning by example
Ø Predictive Maintainance
Ø
AI TECHNOLOGIES – MANUFACTORING
33. Cognitive computers are:
Ø Made with algorithms
Ø Knowledgeable ONLY about what taught
Ø Control ONLY what we give them control of
Ø Aware of nuances and can continue to learn
more
Cognitive Algorithms do:
Ø Do very boring work for you
Ø Often make better, more consistent decisions
than humans
Ø Be efficient, won’t get tired
DATA SCIENTIST TEAM
34. Machine Learning is mathematics / statistics
Ø Algebra Linear
Ø Calculus
Ø Theory of probabilities
Ø Graphic Theory
Ø …
Hardly a person knows all this.
It's a big field with lots of theory
It has two orthogonal aspect
Ø Analytics / machine learning
Ø
Ø Big data
Ø
Ø They can be combined or used separately
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
DATA SCIENTIST TEAM – SKILLS 1/3
38. Ø Application Year 2000
Ø Dozens of servers
Ø Second Order
Ø Response Times
Ø Maintenance downtime hours
Ø Gigabyte order data
Ø Accessed by desktop devices
Ø Application Year >= 2010 : Modern ones.
Ø Clusters of thousands of multicore processes
Ø Millisecond Order response times
Ø 100% uptime
Ø Petabyte Order Data
Ø Accessed by any device
APPLICATION EVOLUTION
39. Ø Reactive application
Ø Orient yourself to objects
Ø Scalable
Ø Resilient/Elastic
Ø Responsive
Ø React to
Ø Events: event-driven nature enables other quality
Ø Failure: Resilient systems allow you to recover errors
at all levels
Ø Load: Scalability does not depend on shared resources
Ø Users: Response times should not depend on workload
APPLICATION EVOLUTION: NEW REQUIREMENTS
40. Ø No redesing to get the
scalibility
Ø Scalibility on-demand
Ø Risk-management
Ø Real-time, engaging,
reach, collaborative
Ø No latency in responses
Ø Loosely coupled design
Ø Communication orientation
Ø Efficient use of resources
Ø Downtime is a waste
of money
Ø Part of the design
REACTIVE MANIFESTO
47. Ø Only master for data that
contains all the knowledge
Ø Each application receives
data from the main data-
lake, but is then
independent for the analysis
Ø Allows you to crop /
anonymize / mask data for
the datascientists
BIG DATA – MULTI TENANT/INTELLIGENCE ARCHITECTURE
53. Most common tools:
Ø Apache Kafka
Ø RabbitMQ
Ø Apache MQ
Ø Redis
Ø Produce messages in "topics / queue”
Ø They serve the messages to the consumer
Ø Essential for back-pressure
Ø They have very little functionality: no queries
NOSQL – MESSAGE QUEUE 1/2
55. Ø Initially developed by LinkedIn and made
open-source in 2011
Ø Apache project since October 23, 2012
Ø In 2014 Confluent was founded by former
LinkedIn developers to provide business
support
Ø Diffused in any enterprise-level project /
infrastructure.
Ø Performance scalable linearly with the number
of nodes
NOSQL – APACHE KAFKA
56. Ø Designed to store very large
data sets (several Peta)
Ø The top market is dominated
by Apache Hbase and
Accumulo
Ø The insert number depends
only on the number of nodes
Ø They offer functionality to
extend them
NOSQL – COLUMNAR DATABASES 1/3
59. Ø Full Text Search Engine
Ø Based on Lucene, written in Java 8
Ø “Distributed, (Near) Real Time, Search Engine”
Ø RESTful JSON HTTP Easy To Debug
Ø Free Schema
Ø Dynamic Mapping
Ø MultiTenant
Ø Scalable
Ø From 1 node to thousands of nodes
Ø Highly available
Ø Rich set of search functions
Ø Built in Analytics
Ø Rich set of search functions
Ø Open Source Apache 2.0)
Ø Originally written by Shay Bannon (Kimchy)
Ø Easy to install
ELASTICSEARCH
60. Ø Near realtime analytics in ms.
Ø Advanced Analytics
Ø Your “company” “Google” engine
Ø New approach to the Business
Ø Fast time to data gathering to results
Ø Few Low cost servers are able to process
so much data in milliseconds than a big
Hadoop cluster or a very expensive DBMS
solution
WHY ELASTICSEARCH?
64. Ø Traditional databases also run Big Data.
Ø NoSQL databases have poor analytics (except
Elasticsearch)
Ø Reduce Map often works on text files
Ø It can also work on data from SQL and NoSQL
Ø NoSQL allows greater throughput
Ø In general, you may have a mix of sources
Ø Text Files, NoSQL and SQL
MACHINE LEARNING – NOSQL AND BIGDATA
65. ØOne of the biggest problems
ØManually entered date is "suspicious”
ØMany datasets are profoundly problematic
ØSometimes recovering data is problematic:
ØSystematic problems with sensors
ØErrors that cause data loss
ØIncorrect metadata on sensors
ØNever, ever, believe the data without checking!
ØGarbage in, garbage out, etc => SIZE
MACHINE LEARNING – DATA QUALITY
66. ØSupervised (Supervised)
ØWe have a train dataset with the correct answers
ØWe use training data to instruct the algorithm
ØThen we apply the data without a response
ØNon-Supervised (Unsupervised)
ØThere is no training data
ØThe data is ingested into the algorithm hoping that it
creates a sense of the data.
ØAnd the date scientist can interpret them
MACHINE LEARNING – TYPES
67. ØPredictive
ØThey predict a variable from the data
ØClassification
ØThey assign records to predefined groups
ØClustering
ØShares records in similarity-based groups
ØAssociative Learning
ØEvaluate Record Relationships: "What Happens With
What"
MACHINE LEARNING – TYPES
68. ØThere is noise in the data
ØInput data is inaccurate
ØThere are hidden / latent values
ØInductive bias
ØEssentially the shape of the algorithm we
choose
ØNot all data can "fit”
ØIntroducing underfitting or overfitting
ØMachine Learning without Bias is not possible.
MACHINE LEARNING – PROBLEMS
69. ØTesting is essential
ØTesting means splitting data into 2 datasets:
ØTraining data (input for algorithms)
ØTest data (used for evaluation)
ØPerformance measures have to be calculated
ØPrecision / Recall
ØAverage Quadratic Error
MACHINE LEARNING – PROBLEMS
70. 1. C4.5
2. k-means clustering
3. Support vector machines
4. The First algorithm
5. The EM algorithm
6. PageRank
7. AdaBoost
8. k-nearest neighbors class.
9. Naïve Bayes
10.CART
MACHINE LEARNING – TOP 10 ALGORITHMS
71. Ø Algorithm to build decision trees
Ø Essentially a tree of Boolean expressions
Ø Each node divides the data into 2
Ø Leaves associate objects with classes
Ø The Decision tree not only serves only for categorization
Ø They also teach us a lot about the classes
Ø C4.5 is a bit complex to learn
Ø ID3 algoritm is much simpler
Ø CART (# 10) is another algorithm for learning decision tree
MACHINE LEARNING – C4.5
72. ØIt is a way to perform binary classifications as matrices
ØSupport vectors are given points closest to a hyperplane
dividing classes
ØSVM maximizes the distance between the support vectors
(VS) and the edges.
MACHINE LEARNING – SUPPORT VECTOR MACHINES
73. ØIt is an algorithm for "frequent item groups”
ØEssentially it extracts which items appear frequently
together
ØFor example, what products are bought together with
the supermarket?
ØUsed by Amazon "Customers who bought this also
bought ...”
ØIt can also be used to create association rules
ØApriori is slow
MACHINE LEARNING – FIRST ALGORITHM
74. ØUsed in various contexts
ØVery difficult to understand what it does
ØVery heavy at mathematical level
ØIt is an iterative algorithm
ØJump between "step" step estimation of "maximization”
ØTry to optimize the output of a function
ØIt can be used for clustering
MACHINE LEARNING – EXPECTATION MAXIMIZATION
75. ØIt is a graph algorithm
ØDetermines the most important nodes
ØIt is used by Google to weight the results
ØIt can be applied to all graphs
ØFor RDF data campions
ØIt works by simulating random paths
ØEstimating the travel value of a given node in a given
time value
ØImplemented by linear algebra
MACHINE LEARNING – PAGERANK
76. ØAlgorithm for "learning ensemble”
ØCombines several algorithms
ØPerforms the same data
ØThe combination of multiple algorithms can be very
functional
ØBetter than just one algorithm
ØAdaBoost essentially weighs the training samples
ØGiving more weight to those who rank worse
MACHINE LEARNING – ADABOOST
77. ØGive a group of elements
ØMovies, books, ...
ØYou have a user rating
Ø1-5 starts, 1-10, ...
ØIt can be used to recommend items to a user depending on
other people's scores
ØFor this reason it is called collaborative filtering
MACHINE LEARNING – COLLABORATIVE FILTERING
78. ØA theorem that combines chances
ØI observed A that in saying that H is true with probability
70%
ØI observed B that in saying that H is true with probability
85%
ØWhat can I cough up?
ØThe Bayes theorem
ØWith the assumption that A and B are independent
ØThis assumption is almost always false from which "naive"
MACHINE LEARNING – NAÏVE BAYSIAN
79. ØWe have a set of numeric values for an object
ØWe want to use these values to predict a new value
ØExamples:
ØEstimating house costs
ØPrediction rating per object
Ø...
MACHINE LEARNING – LINEAR REGRESSION