This document provides an introduction to building machine learning models using IBM Data Science Experience. It first discusses data science and machine learning concepts like the CRISP-DM methodology and neural networks. It then introduces IBM Data Science Experience, describing how it allows users to work with big data on the cloud using Python or R on Spark. The document concludes by introducing TensorFlow and providing an overview of key TensorFlow concepts like tensors, data flow graphs, and how neural networks and deep learning are represented.
Ensemble Methods for Collective Intelligence: Combining Ubiquitous ML Models ...Bharath Sudharsan
Paper yet to appear at IEEE International Conference on Big Data 2021
Github repo: https://github.com/bharathsudharsan/ML-Model-Combining
Abstract: The concept of ML model aggregation rather than data aggregation has gained much attention as it boosts prediction performance while maintaining stability and preserving privacy. In a non-ideal scenario, there are chances for a base model trained on a single device to make independent but complementary errors. To handle such cases, in this paper, we implement and release the code of 8 robust ML model combining methods that achieves reliable prediction results by combining numerous base models (trained on many devices) to form a central model that effectively limits errors, built-in randomness and uncertainties. We extensively test the model combining performance by performing 15 heterogeneous devices and 3 datasets based experiments that exemplifies how a complicated collective intelligence can be derived from numerous elementary intelligence learned by distributed, ubiquitous IoT devices.
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...Bharath Sudharsan
Paper Pdf: https://www.researchgate.net/publication/356366494_Globe2Train_A_Framework_for_Distributed_ML_Model_Training_using_IoT_Devices_Across_the_Globe
Abstract:
Training a problem-solving Machine Learning (ML) model using large datasets is computationally expensive and requires a scalable distributed training platform to complete training within a reasonable time frame. In this paper, we propose a novel concept where, instead of distributed training within a GPU cluster, we train one ML model by utilizing the idle hardware of numerous resource-constrained IoT devices existing across the globe. In such a global setting, staleness and real-world network uncertainties like congestion, latency, bandwidth issues are proven to impact the model convergence speed and training scalability. To implement the novel concept, while simultaneously addressing the real-world global distributed training challenges, we present Globe2Train (G2T), a framework with two components named G2T-Cloud (G2T-C) and G2T-Device (G2T-D) that can efficiently connect together multiple IoT devices and collectively train to produce the target ML models at very high speeds. The evaluation results with analysis show how the framework components jointly eliminate staleness and improve training scalability and speed by tolerating the real-world network uncertainties and by reducing the communication-to-computation ratio.
This was a talk given at the SW Mobile developer meetup in Bristol. It is intended as an overview of machine learning and AI and to give developers the vocabulary they need to start looking into using machine learning techniques and artificial intelligence in their projects.
Introduction To Machine Learning and Neural Networks德平 黄
It's a slideshow given by myselft, a simple introduction to machine learning and neurals . It's mainly about how neural networks work and some basic inductions to Backpropagation Algorithm. Moreover, something abount Convolutional Neural Networks was given in the last few slides.
Presentation given at the Stockholm R useR Group (SRUG) meetup on Dec 6, 2016. Contains a general overview of deep learning, material on using Tensorflow in R etc.
Google announces the open source of MobileNe : Primarily focus on optimizing for latency but also yield small networks. https://arxiv.org/abs/1704.04861
This material is to serve as guide reading of the paper.
Ensemble Methods for Collective Intelligence: Combining Ubiquitous ML Models ...Bharath Sudharsan
Paper yet to appear at IEEE International Conference on Big Data 2021
Github repo: https://github.com/bharathsudharsan/ML-Model-Combining
Abstract: The concept of ML model aggregation rather than data aggregation has gained much attention as it boosts prediction performance while maintaining stability and preserving privacy. In a non-ideal scenario, there are chances for a base model trained on a single device to make independent but complementary errors. To handle such cases, in this paper, we implement and release the code of 8 robust ML model combining methods that achieves reliable prediction results by combining numerous base models (trained on many devices) to form a central model that effectively limits errors, built-in randomness and uncertainties. We extensively test the model combining performance by performing 15 heterogeneous devices and 3 datasets based experiments that exemplifies how a complicated collective intelligence can be derived from numerous elementary intelligence learned by distributed, ubiquitous IoT devices.
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...Bharath Sudharsan
Paper Pdf: https://www.researchgate.net/publication/356366494_Globe2Train_A_Framework_for_Distributed_ML_Model_Training_using_IoT_Devices_Across_the_Globe
Abstract:
Training a problem-solving Machine Learning (ML) model using large datasets is computationally expensive and requires a scalable distributed training platform to complete training within a reasonable time frame. In this paper, we propose a novel concept where, instead of distributed training within a GPU cluster, we train one ML model by utilizing the idle hardware of numerous resource-constrained IoT devices existing across the globe. In such a global setting, staleness and real-world network uncertainties like congestion, latency, bandwidth issues are proven to impact the model convergence speed and training scalability. To implement the novel concept, while simultaneously addressing the real-world global distributed training challenges, we present Globe2Train (G2T), a framework with two components named G2T-Cloud (G2T-C) and G2T-Device (G2T-D) that can efficiently connect together multiple IoT devices and collectively train to produce the target ML models at very high speeds. The evaluation results with analysis show how the framework components jointly eliminate staleness and improve training scalability and speed by tolerating the real-world network uncertainties and by reducing the communication-to-computation ratio.
This was a talk given at the SW Mobile developer meetup in Bristol. It is intended as an overview of machine learning and AI and to give developers the vocabulary they need to start looking into using machine learning techniques and artificial intelligence in their projects.
Introduction To Machine Learning and Neural Networks德平 黄
It's a slideshow given by myselft, a simple introduction to machine learning and neurals . It's mainly about how neural networks work and some basic inductions to Backpropagation Algorithm. Moreover, something abount Convolutional Neural Networks was given in the last few slides.
Presentation given at the Stockholm R useR Group (SRUG) meetup on Dec 6, 2016. Contains a general overview of deep learning, material on using Tensorflow in R etc.
Google announces the open source of MobileNe : Primarily focus on optimizing for latency but also yield small networks. https://arxiv.org/abs/1704.04861
This material is to serve as guide reading of the paper.
Scaling up Deep Learning by Scaling DownDatabricks
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.
Spark Summit 2020 Talk
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning. However, good performance comes at a significant computational cost. This makes scaling training expensive, but an even more pertinent issue is inference, in particular for real-time applications (where runtime latency is critical) and edge devices (where computational and storage resources may be limited). This talk will explore common techniques and emerging advances for dealing with these challenges, including best practices for batching; quantization and other methods for trading off computational cost at training vs inference performance; architecture optimization and graph manipulation approaches.
A late upload. This slide was presented on Aug 31, 2019, when I delivered a talk for AIoT seminar in University of Lambung Mangkurat, Banjarbaru. It's part of Republic of IoT 2019 event.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
This is a 2 hours overview on the deep learning status as for Q1 2017.
Starting with some basic concepts, continue to basic networks topologies , tools, HW/Accelerators and finally Intel's take on the the different fronts.
Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...ICTeam S.p.A.
Tech talk by Luca Grazioli (https://www.linkedin.com/in/luca-grazioli-a74927bb/) in the event ''Tensorflow and Sparklyr: Scaling Deep Learning and R to the Big Data ecosystem'', May 15, 2017 at ICTeam Grassobbio (BG). The event was part of the Data Science Milan Meetup (https://www.meetup.com/it-IT/Data-Science-Milan/).
Leveraging Artificial Intelligence Processing on Edge DevicesICS
The introduction of low-cost, high-performance embedded processors coupled with improvements in Neural Network model optimization lay the foundation for AI and Computer Vision at the edge. Moving intelligence from the cloud to the edge offers many advantages including the reduction of network traffic, predicable ML inference times, and data security to name a few. Challenges exist as many development teams do not have data scientist or AI development engineers. What is needed are practical AI solutions including ML development tools, optimized inference engines and reference platforms that will abstract out the development complexities to stream line prototyping and development.
In this joint webinar with Au-Zone Technologies we will discuss:
- Development challenges and solutions which can be use to enable AI/ML at the edge to implement object detection, classification and tracking for medical and industrial use-cases
- Visualization techniques for activity monitoring and object detection
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
More Related Content
Similar to Creating a Machine Learning Model on the Cloud
Scaling up Deep Learning by Scaling DownDatabricks
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.
Spark Summit 2020 Talk
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning. However, good performance comes at a significant computational cost. This makes scaling training expensive, but an even more pertinent issue is inference, in particular for real-time applications (where runtime latency is critical) and edge devices (where computational and storage resources may be limited). This talk will explore common techniques and emerging advances for dealing with these challenges, including best practices for batching; quantization and other methods for trading off computational cost at training vs inference performance; architecture optimization and graph manipulation approaches.
A late upload. This slide was presented on Aug 31, 2019, when I delivered a talk for AIoT seminar in University of Lambung Mangkurat, Banjarbaru. It's part of Republic of IoT 2019 event.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
This is a 2 hours overview on the deep learning status as for Q1 2017.
Starting with some basic concepts, continue to basic networks topologies , tools, HW/Accelerators and finally Intel's take on the the different fronts.
Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...ICTeam S.p.A.
Tech talk by Luca Grazioli (https://www.linkedin.com/in/luca-grazioli-a74927bb/) in the event ''Tensorflow and Sparklyr: Scaling Deep Learning and R to the Big Data ecosystem'', May 15, 2017 at ICTeam Grassobbio (BG). The event was part of the Data Science Milan Meetup (https://www.meetup.com/it-IT/Data-Science-Milan/).
Leveraging Artificial Intelligence Processing on Edge DevicesICS
The introduction of low-cost, high-performance embedded processors coupled with improvements in Neural Network model optimization lay the foundation for AI and Computer Vision at the edge. Moving intelligence from the cloud to the edge offers many advantages including the reduction of network traffic, predicable ML inference times, and data security to name a few. Challenges exist as many development teams do not have data scientist or AI development engineers. What is needed are practical AI solutions including ML development tools, optimized inference engines and reference platforms that will abstract out the development complexities to stream line prototyping and development.
In this joint webinar with Au-Zone Technologies we will discuss:
- Development challenges and solutions which can be use to enable AI/ML at the edge to implement object detection, classification and tracking for medical and industrial use-cases
- Visualization techniques for activity monitoring and object detection
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
1. Building Your First Machine Learning Model
With IBM Data Science Experience
By Aoun Lutfi and Kunal Malhotra
IBM Cloud Developer Advocates
alutfi@ae.ibm.com, kunal.malhotra1@ibm.com
2. Agenda
1. Introduction to Data Science
2. Introduction to IBM Data Science Experience
3. Introduction to Tensorflow
4. Hands-On
IBM Confidential
3. 3IBM Confidential
We are surrounded by, and are constantly creating
digital data. Whether it’s in emails we write, photos
we take, or where we drive; almost everything
creates data today. Data Science is the discipline of
acquiring, finding insights, and sharing discoveries in
all this data.
12. Training – Backward Propagation
1. Initialize the weights and bias randomly.
2. Fix the input and output.
3. Forward pass the inputs. calculate the cost.
4. compute the gradients and errors.
5. Backprop and adjust the weights and bias accordingly
15. Data Science Experience
Data Science Experience offers the opportunity to work with big data on the cloud. Use Python or R on
Spark to process big data, build models, and deploy models. Data Science Experience allows you to
easily collaborate on descriptive, prescriptive, predictive analytics, and Machine Learning on the cloud.
15
19. PLACE IMAGEHERE
4
TensorFlow
Originally developed by the Google
Brain Team within Google'sMachine
Intelligence research organisation
TensorFlow provides primitivesfor
defining functions on tensors and
automatically computing their
derivatives.
An open source software library for
numerical computation using data flow
graphs
20. Tensor?
Simply put:Tensors can be viewed asa
multidimensional array of numbers. This means
that:
• Ascalar is atensor,
• Avector is atensor,
• Amatrix is atensor
• ...
20
21. Data Flow Graph?
Computations are represented asgraphs:
• Nodes are the operations(ops)
• Edges are theTensors (multidimensional arrays)
Typicalprogram consists of 2 phases:
• construction phase: assemblinga graph (model)
• execution phase: pushing data through thegraph
21
22. Neural Networks? DeepLearning?
22
● Neural Networks are represented by the lower figure, not the
topone....
● Link:
Tinker with a Neural Network inYour Browser
23. Presentation title (Go to View > Master to edit) 8
Source: https://www.udacity.com/course/deep-learning--ud730
24. Presentation title (Go to View > Master to edit) 9
Source: https://www.udacity.com/course/deep-learning--ud730
25. Presentation title (Go to View > Master to edit) 15
Source: https://www.udacity.com/course/deep-learning--ud730
26. Presentation title (Go to View > Master to edit) 16
Source: https://www.udacity.com/course/deep-learning--ud730
27. 18
Why would you use NN /Deep Learning?
• Neural Networks (NNs) are universal function
approximators that work very well with huge
datasets
• NNs / deep networks do unsupervised feature
learning
• Track record, being SotA in:
• image classification,
• language processing,
• speech recognition,
• ...
28. 19
WhyTensorFlow?
There are a lot of alternatives:
● Torch
● Caffe
● Theano (Keras, Lasagne)
● CuDNN
● Mxnet
● DSSTNE
● DL4J
● DIANNE
● Etc.
29. 20
TensorFlow has the largestcommunity
Sources: http://deliprao.com/archives/168
http://www.slideshare.net/JenAman/large-scale-deep-learning-wit
h-tensorflow
30. Runs on CPUs, GPUs, TPUs over one or more
machines, but also on phones(android+iOS) and
raspberrypi’s...
TensorFlow is very portable/scalable
30
31. TensorFlow is more than an R&D project
• Specific functionalities for deployment (TF Serving /
CloudML)
• Easier/more documentation (for more general public)
• Included visualization tool(Tensorboard)
• Simplified interfaces likeSKFlow
31
32. 32
Hands On Lab
Building your first Machine Learning model on IBM Data Science Experience.
Sign in to IBM Cloud on: ibm.biz/Intro2MLonDSX
Access Data Science Experience on: datascience.ibm.com
GitHub Link: github.com/aounlutfi/building-first-ML-model
Any good data science talk should start with the CRISP-DM Model. Here you see the different hats a data scientist is asked to wear. You’re expected to be a Business Analysist, a Data Engineer, a Data Scientist, and an App Developer
When asked to describe data science outside of the abstract concept most professionals in the field refer to CRISP, Cross Industry Standard Process for Data Mining.
It’s considered the “de facto standard for developing data mining and knowledge discovery projects” and breaks data science down into 6 phases.
Business UnderstandingThis initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition
Data UnderstandingThe data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.
Data Preparation (Data Science Experience)The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.
Modeling (Data Science Experience)In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed.
Evaluation. (Data Science Experience)At this stage in the project you have built a model (or models) that appears to have high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached.
Deployment. (Watson Machine Learning)Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that is useful to the customer. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data scoring (e.g. segment allocation) or data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. Even if the analyst deploys the model it is important for the customer to understand up front the actions which will need to be carried out in order to actually make use of the created models.
Look for audience participation here. Ask what tools they use in the different phases of the CRISP-DM Process
I like to think about all the tools and collaboration required to complete this cycle;
What tools are used by a business analyst with an understanding of the domain and objectives
What tools are used to initially explore the data for better understanding?
Where do your data sources intersect during the data preparation stage?
What tools do you use when building a pipeline and training your model?
How and where is your model evaluated?
At the final point when you believe the model is ready for production, how and where is it deployed, and how will it be consumed?