Data science and cloud computing

•

1 like•425 views

This presentation discusses key topics in data science and cloud computing including: 1. Data storage and processing resources in the cloud that help simplify and reduce costs of data science projects. 2. Machine learning services that provide automated algorithms and managed infrastructure as a service. 3. How cloud computing helps data science practitioners by simplifying access to resources and tools for tasks like data storage, processing, and applying machine learning models in applications through API services.

Technology

All images in this presentation are subject to copyright and belong to respective
Hands-on hack session
Data Science &
Cloud
Computing

All images in this presentation are subject to copyright and belong to respective
DISCLAIMER 2:
The opinions expressed in
this presentation are my own
views and not those of
JITHENDRA
BALAKRISHNAN
Technical Leader,
Cloud Product
Solutions
Head of Technology,
47Line Technologies
@jitcompil
e
/jithendrabalakrishn
an
DISCLAIMER 1:
All copyrights and trademarks of images
belong to their respective IP owners and
are used under Fair Use for educational

All images in this presentation are subject to copyright and belong to respective
AGENDA
Cloud
Computing
Storage
Data
Science
Compute
Learning
Hands on
Hack

All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective
Harvard Business Review
“Data Scientist: The Sexiest Job
of the 21st Century”

All images in this presentation are subject to copyright and belong to respective
Data Science
Process

All images in this presentation are subject to copyright and belong to respective
Storage Compute Learning
CLOUD COMPUTING
SERVICES

All images in this presentation are subject to copyright and belong to respective
AMAZONWEB
SERVICES

Structured
Unstructured
Graph
Time Series

All images in this presentation are subject to copyright and belong to respective
DATA IS
THE
NEW OIL
Value
Variety
Velocity
Volume

All images in this presentation are subject to copyright and belong to respective
AMAZON S3
Object storage to store
and retrieve any
amount of data from
anywhere.
AMAZON REDSHIFT
Fully managed
petabyte scale data
warehouse.
AMAZON NEPTUNE
Fully managed graph
database engine.
AMAZON RDS
Fully managed
relational database
service.
AMAZON DYNAMODB
Fast & Flexible NoSQL
database service.
AMAZON ELASTICACHE
Managed Redis &
MemCached as a
Service.
AMAZON AURORA
Fully managed MySQL
& PostgreSQL
compliant cloud
database.
AMAZON GLACIER
Secure, durable & low
cost data archival &
long term backup
service.
AMAZON SIMPLEDB
Highly available,
secure & inexpensive
NoSQL data store.

All images in this presentation are subject to copyright and belong to respective
SCALABLE PROCESSING ELASTICIT
Y
SCALABILI
TY
COST

All images in this presentation are subject to copyright and belong to respective
Secure resizable elastic compute
capacity in the cloud.
EC2
Managed Hadoop framework
for easy, fast and cost-effective
cluster for processing large
amounts of data
Interactive
SQL query
service to
analyze data
in S3.
ATHENA
Fully managed
ETL service to
prepare and
load data for
analytics.
EMRGLUE
COMPUTE
SERVICES

All images in this presentation are subject to copyright and belong to respective
COST OPTIONS
SPOT INSTANCES
Spare AWS capacity available
at up to 90% discount.
Recommended for stateless,
low cost and flexible timed
applications.
RESERVED INSTANCES
Provides up to 75% discount
on committed usage over 1 or
3 year period. Recommended
for Steady state and planned
capacity needs.
SPOT BLOCK
Spare AWS capacity available
at up to 40% discount on
committed usage of 6 hours.
Recommended for low cost,
low risk and known duration
workloads.
02
03
01

All images in this presentation are subject to copyright and belong to respective

All images in this presentation are subject to copyright and belong to respective
o Machine Learning for
everyone
o API-driven ML services
o GPU Instances
o Powerful Compute
o FPGA Hardware
Acceleration
MACHINE
LEARNING AS A
SERVICE

All images in this presentation are subject to copyright and belong to respective
SUMMARY
1
DATA SCIENCE
Inter-disciplinary field that involves
the entire technology organization
2
CLOUD
COMPUTING
Helps data science practitioners by
simplifying usage of resources &
tools
3
DATA STORAGE
Data is collected at volume and
clear storage plan helps in
reducing costs
4
DATA PROCESSING
Cheap compute resources helps in
cleaning & extracting value from
data
5
MACHINE
LEARNING
Automated algorithms available as
service with managed infrastructure
6
MODEL USAGE
API services to apply machine
learning models in real world
applications

The document provides an overview of decision tree learning algorithms: - Decision trees are a supervised learning method that can represent discrete functions and efficiently process large datasets. - Basic algorithms like ID3 use a top-down greedy search to build decision trees by selecting attributes that best split the training data at each node. - The quality of a split is typically measured by metrics like information gain, with the goal of creating pure, homogeneous child nodes. - Fully grown trees may overfit, so algorithms incorporate a bias toward smaller, simpler trees with informative splits near the root.

Python for Data Science | Python Data Science Tutorial | Data Science Certifi...

Edureka!

( Python Data Science Training : https://www.edureka.co/python ) This Edureka video on "Python For Data Science" explains the fundamental concepts of data science using python. It will also help you to analyze, manipulate and implement machine learning using various python libraries such as NumPy, Pandas and Scikit-learn. This video helps you to learn the below topics: 1. Need of Data Science 2. What is Data Science? 3. How Python is used for Data Science? 4. Data Manipulation in Python 5. Implement Machine Learning using Python 6. Demo Subscribe to our channel to get video updates. Hit the subscribe button above. Check out our Python Training Playlist: https://goo.gl/Na1p9G

Cloud Computing - An Introduction

Ravindra Dastikop

Data warehouse

krishna kumar singh

IoT ecosystem

Md. Shamsul Haque

Cloud Computing & Big Data

Mrinal Kumar

This document discusses cloud computing, big data, Hadoop, and data analytics. It begins with an introduction to cloud computing, explaining its benefits like scalability, reliability, and low costs. It then covers big data concepts like the 3 Vs (volume, variety, velocity), Hadoop for processing large datasets, and MapReduce as a programming model. The document also discusses data analytics, describing different types like descriptive, diagnostic, predictive, and prescriptive analytics. It emphasizes that insights from analyzing big data are more valuable than raw data. Finally, it concludes that cloud computing can enhance business efficiency by enabling flexible access to computing resources for tasks like big data analytics.

Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...

Edureka!

Pruning is the process of removing branches or nodes from a decision tree to simplify it and reduce overfitting. Some key points about pruning: - Pruning reduces the complexity of the decision tree to avoid overfitting to the training data. - It is done to improve the accuracy of the model on new unseen data by removing noisy or unstable parts of the tree. - Common pruning techniques include pre-pruning, cost-complexity pruning, reduced error pruning etc. - The goal of pruning is to find a tree with optimal complexity that balances bias and variance for best generalization on new data. To answer your question - tree based models and linear models each have their own advantages in different situations:

Benefits of Cloud Computing

KNOWARTH - Software Development Company

Webinar presentation: November 15, 2016 The topics of interoperability and portability are significant considerations in relation to the use of cloud services, but there is confusion and misunderstanding of exactly what this entails. Interoperability and Portability for Cloud Computing: A Guide provides a clear definition of interoperability and portability and how these relate to various aspects of cloud computing and to cloud services. This webinar will describe interoperability and portability in terms of a set of common cloud computing scenarios. This approach assists in demonstrating that both interoperability and portability have multiple aspects and relate to a number of different components in the architecture of cloud computing, each of which needs to be considered in its own right. The aim is to give both cloud service customers and cloud service providers guidance in the provision and selection of cloud services indicating how interoperability and portability affect the cost, security and risk involved. Download the CSCC's deliverable: http://www.cloud-council.org/deliverables/interoperability-and-portability-for-cloud-computing-a-guide.htm

Tableau

Nilesh Patel

This document provides an overview of Tableau, a data visualization tool. It discusses what Tableau is, how it allows users to transform raw data into understandable visual formats without coding. It also covers the benefits of data visualization for decision making, customer relationships, and performance. The document outlines Tableau's product suite, advantages like handling large data and mobile support, disadvantages like report scheduling. It provides requirements for Tableau Desktop and Server and considers Tableau alternatives.

Decision Tree - C4.5&CART

Xueping Peng

This document discusses decision tree algorithms C4.5 and CART. It explains that ID3 has limitations in dealing with continuous data and noisy data, which C4.5 aims to address through techniques like post-pruning trees to avoid overfitting. CART uses binary splits and measures like Gini index or entropy to produce classification trees, and sum of squared errors to produce regression trees. It also performs cost-complexity pruning to find an optimal trade-off between accuracy and model complexity.

Big Data Evolution

itnewsafrica

- Big data refers to large volumes of data from various sources that is analyzed to reveal patterns, trends, and associations. - The evolution of big data has seen it grow from just volume, velocity, and variety to also include veracity, variability, visualization, and value. - Analyzing big data can provide hidden insights and competitive advantages for businesses by finding trends and patterns in large amounts of structured and unstructured data from multiple sources.

Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...

Edureka!

*** IoT Certification Training: https://www.edureka.co/iot-certificat... *** This Edureka tutorial on "IoT Applications" takes you through the 6 domains which IoT has reinvented, namely, 1. IoT in Everyday LIfe 2. IoT in Healthcare 3. IoT in Smart Cities 4. IoT in Agriculture 5. IoT in Industrial Automation 6. IoT in Disaster Management Know real-time examples of IoT applications in the most interesting use cases of today's world. Understand how they work and how can IoT be used to its complete potential. Follow us to never miss an update in the future. Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

Seven step model of migration into the cloud

Raj Raj

The document describes a seven-step model for migrating applications to the cloud: 1) conduct assessments, 2) isolate dependencies, 3) map messaging and environment, 4) re-architect lost functionalities, 5) leverage cloud features, 6) test the migration, and 7) iterate and optimize. The model involves assessing costs and benefits, isolating on-premise dependencies, mapping components, redesigning for the cloud, leveraging cloud features, extensive testing, and iterating to optimize and ensure a robust migration. Key risks are identified in testing and addressed through optimization iterations.

Big data analytics

Vikram Nandini

The document discusses big data analytics. It begins by defining big data as large datasets that are difficult to capture, store, manage and analyze using traditional database management tools. It notes that big data is characterized by the three V's - volume, variety and velocity. The document then covers topics such as unstructured data, trends in data storage, and examples of big data in industries like digital marketing, finance and healthcare.

Cloud computing and service models

Prateek Soni

Cloud computing provides on-demand access to shared computing resources like applications and storage over the internet. It works based on deployment models (public, private, hybrid, community clouds) and service models (Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS)). IaaS provides basic computing and storage resources, PaaS provides platforms for building applications, and SaaS provides ready-to-use software applications delivered over the internet. The main advantages of cloud computing include lower costs, improved performance, unlimited storage, and device independence while disadvantages include reliance on internet and potential security and control issues.

Informatica Interview Questions | Informatica Tutorial | Informatica Training...

Edureka!

The lookup transformation allows data from one source to be enriched by retrieving additional related data from a secondary source. There are three main types of lookup transformations in Informatica: 1. Cache lookup - caches the entire secondary data in memory for fast lookups. 2. Database lookup - performs lookups directly against a database for larger datasets. 3. File lookup - uses a flat file as the secondary source for lookups. The lookup transformation is used to join or merge additional data from a secondary source to the incoming data flow. It enriches the data with additional related attributes stored in the secondary source.

Introduction to Business Intelligence

Almog Ramrajkar

The document discusses business intelligence and the decision making process. It defines business intelligence as using technology to gather, store, access and analyze data to help users make better decisions. This includes applications like decision support systems, reporting, online analytical processing, and data mining. It also discusses key concepts like data warehousing, OLTP vs OLAP, and the different layers of business intelligence including the presentation, data warehouse, and source layers.

Data Warehouse Modeling

vivekjv

This document provides an overview of data warehousing concepts including dimensional modeling, online analytical processing (OLAP), and indexing techniques. It discusses the evolution of data warehousing, definitions of data warehouses, architectures, and common applications. Dimensional modeling concepts such as star schemas, snowflake schemas, and slowly changing dimensions are explained. The presentation concludes with references for further reading.

Introduction to data science.pptx

SadhanaParameswaran

ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION

DHANUSAIREDDY

Information technology plays an important role in energy conservation in several ways: 1) Building management systems, motion sensors, smart glass, and home automation technologies powered by information technology help conserve energy by automatically turning lights and appliances on and off based on occupancy and external conditions. 2) Technologies like solar panels, Tesla's solar batteries, and smart LED lighting harness renewable energy sources and use much less energy than traditional sources while providing the same services. 3) Information technologies empower consumers to monitor and reduce their energy usage, support the integration of renewable energy sources, and provide opportunities for utilities to implement demand response and load shifting programs.

Evaluating web conference_tools

Aniket Maithani

The document evaluates different web conferencing tools. It discusses what web conferencing is, how the technology works, examples of tools, key features like screen sharing and chat. The document also covers standards, history of tools from the 1990s onward, cloud-based options like WebRTC, advantages like flexibility and scalability, and areas for improvement like privacy and security. It concludes with information about the presentation.

Green computing

Snehasis Panigrahi

Green Computing refers to environmentally sustainable computing practices that minimize environmental impact. Computing harms the environment through high energy use in data centers and devices, as well as hazardous materials in electronics. Approaches to green computing include virtualization, power management, efficient storage and displays, recycling, and reducing travel. Simple individual tasks include using energy efficient devices, enabling power management settings, and recycling electronics. Companies have implemented green computing through products like low-power thin clients and initiatives to offset carbon emissions and recycle equipment.

Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...

Salah Amean

the slides contain: Pattern Mining: A Road Map Pattern Mining in Multi-Level, Multi-Dimensional Space Constraint-Based Frequent Pattern Mining Mining High-Dimensional Data and Colossal Patterns Mining Compressed or Approximate Patterns Sequential Pattern Mining Graph Pattern Mining by Jiawei Han, Micheline Kamber, and Jian Pei, University of Illinois at Urbana-Champaign & Simon Fraser University, ©2013 Han, Kamber & Pei. All rights reserved.

Introduction to Green IT

Chris Hammond-Thrasher

Python Scipy Numpy

Girish Khanzode

This document provides an overview of the Python programming language. It discusses Python's history and evolution, its key features like being object-oriented, open source, portable, having dynamic typing and built-in types/tools. It also covers Python's use for numeric processing with libraries like NumPy and SciPy. The document explains how to use Python interactively from the command line and as scripts. It describes Python's basic data types like integers, floats, strings, lists, tuples and dictionaries as well as common operations on these types.

Clique and sting

Subramanyam Natarajan

CLIQUE is an algorithm for subspace clustering of high-dimensional data. It works in two steps: (1) It partitions each dimension of the data space into intervals of equal length to form a grid, (2) It identifies dense units within this grid and finds clusters as maximal sets of connected dense units. CLIQUE efficiently discovers clusters by identifying dense units in subspaces and intersecting them to obtain candidate dense units in higher dimensions. It automatically determines relevant subspaces for clustering and scales well with large, high-dimensional datasets.

Extending Cloudera SDX beyond the Platform

Cloudera, Inc.

DOAG Big Data Days 2017 - Cloud Journey

Harald Erb

The document discusses Oracle's cloud-based data lake and analytics platform. It provides an overview of the key technologies and services available, including Spark, Kafka, Hive, object storage, notebooks and data visualization tools. It then outlines a scenario for setting up storage and big data services in Oracle Cloud to create a new data lake for batch, real-time and external data sources. The goal is to provide an agile and scalable environment for data scientists, developers and business users.

What's hot

Cloud Computing Benefits

onefederalsolution

Interoperability and Portability for Cloud Computing: A Guide

Cloud Standards Customer Council

Tableau

Nilesh Patel

Decision Tree - C4.5&CART

Xueping Peng

Big Data Evolution

itnewsafrica

Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...

Edureka!

Seven step model of migration into the cloud

Raj Raj

Big data analytics

Vikram Nandini

Cloud computing and service models

Prateek Soni

Informatica Interview Questions | Informatica Tutorial | Informatica Training...

Edureka!

Introduction to Business Intelligence

Almog Ramrajkar

Data Warehouse Modeling

vivekjv

Introduction to data science.pptx

SadhanaParameswaran

ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION

DHANUSAIREDDY

Evaluating web conference_tools

Aniket Maithani

Green computing

Snehasis Panigrahi

Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...

Salah Amean

Introduction to Green IT

Chris Hammond-Thrasher

Python Scipy Numpy

Girish Khanzode

Clique and sting

Subramanyam Natarajan

What's hot (20)

Cloud Computing Benefits

Interoperability and Portability for Cloud Computing: A Guide

Tableau

Decision Tree - C4.5&CART

Big Data Evolution

Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...

Seven step model of migration into the cloud

Big data analytics

Cloud computing and service models

Informatica Interview Questions | Informatica Tutorial | Informatica Training...

Introduction to Business Intelligence

Data Warehouse Modeling

Introduction to data science.pptx

ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION

Evaluating web conference_tools

Green computing

Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...

Introduction to Green IT

Python Scipy Numpy

Clique and sting

Similar to Data science and cloud computing

Extending Cloudera SDX beyond the Platform

Cloudera, Inc.

DOAG Big Data Days 2017 - Cloud Journey

Harald Erb

The Future of Data Science

sarith divakar

1. The document discusses the future of data science and big data technologies. It describes the roles of data scientists and their typical skills, salaries, and job outlook. 2. It discusses technologies like Hadoop, Spark, and distributed computing that are used to handle big data. While Hadoop is good for batch processing, Spark can perform both batch and real-time processing 100x faster. 3. Going forward, data science will shift from descriptive to predictive analytics using machine learning to improve customer experience and business outcomes across industries like internet search and digital advertising.

Enabling Data centric Teams

Data Con LA

Data Con LA 2020 Description Coming from a grand belief of data democratization, I believe that in order for any team to be successful collaborators, it has to be data centric and data should be accessible to all. *To ensure that your non software or software engineering centric team has maximum efficiency, data should be visible, data lake should be accessible. *Form a database for analytics summaries, talk about the different technologies(SQL, NoSQL) cost of deployment, need, team driven structure. Build an API for this database for external/inter team crosstalk. *Build analytics and visual layer on top of it. Flask/Django/Node, etc.., to enable the team to have high visibility in their analysis, and to ensure a higher turnaround of data. *Talk about an easy way of enabling the team to run code, could be local/cloud, JupyterHub is a great way of doing so, talk about the tremendous value added in that and the potential it enables *Talk about the common tools user for version control/CICD/Coding technologies, etc.. *Finally summarize the value of the mixture of all these tools and technologies in order to ensure the maximum efficiency. Speaker Nawar Khabbaz, Rivian, Data Engineer

Analytics in a Day Ft. Synapse Virtual Workshop

CCG

Multiplatform Spark solution for Graph datasources by Javier Dominguez

Big Data Spain

This document summarizes a presentation given by Javier Dominguez at Big Data Spain about Stratio's multiplatform solution for graph data sources. It discusses graph use cases, different data stores like Spark, GraphX, GraphFrames and Neo4j. It demonstrates the machine learning life cycle using a massive dataset from Freebase, running queries and algorithms. It shows notebooks and a business example of clustering bank data using Jaccard distance and connected components. The presentation concludes with future directions like a semantic search engine and applying more machine learning algorithms.

AWS Sydney Summit 2013 - Big Data Analytics

Amazon Web Services

This document discusses big data analytics tools and technologies. It begins with an overview of big data challenges and available tools. It then discusses Packetloop, a company that provides big data security analytics using tools like Amazon EMR, Cassandra, and PostgreSQL on AWS. Next, it discusses how EMR and Redshift from AWS can be used as big data tools for tasks like batch processing, data warehousing, and live analytics. It concludes by discussing how Intel technologies can help power big data platforms by providing optimized processors, networking, and storage to enable analytics at scale.

2017 12 lab informatics summit

Chris Dwan

This document summarizes a presentation on leveraging the cloud to transform laboratory informatics processes. Some key points from the presentation include: 1) The presenter has experience transitioning genomic workflows to public clouds like AWS and Google Cloud over the past 15 years and has seen data volumes grow exponentially from petabytes to exabytes. 2) Senior leadership is often supportive of moving to the cloud because it removes support burdens, simplifies licensing and budgeting, enables automatic technology updates, and provides unlimited scalability. 3) "Cloud" is simply a means to an end - people ultimately care about business, scientific, and clinical outcomes. The cloud provides infrastructure that can help deliver those outcomes. 4

The Edge to AI Deep Dive Barcelona Meetup March 2019

Timothy Spann

The Edge to AI Deep Dive Barcelona Meetup March 2019 A deep dive demo of using MiNiFi, NiFi, CDSW for real-time AI at the edge, in a local cluster, in the cloud and in a Data Science platform at scale with real-time streaming and data storage. Apache NiFi, MiNiFi, NiFi Registry, Cloudera Data Science Workbench (CDSW), Python, Pyspark, Spark SQL, Apache Calcite, Apache Parquet, Apache MXNet, GluonCV.

Analytics in a Day Virtual Workshop

CCG

Building a scalable analytics environment to support diverse workloads

Alluxio, Inc.

Analytics in a Day Ft. Synapse Virtual Workshop

CCG

Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...

Timothy Spann

Microsoft Data Warehousing

Glenture

The document discusses Microsoft's solutions for data warehousing and business intelligence. It highlights key capabilities like performance and scalability, availability, and delivering insights anywhere. Case studies show how various companies have benefited from using Microsoft's offerings like SQL Server and Fast Track appliances to build scalable data warehouses, lower costs, improve analytics and gain insights.

All Change! How the new economics of Cloud will make you think differently ab...

Steve Poole

Devoxxuk talk http://cfp.devoxx.co.uk/2015/talk/AJY-8768/All_Change!_How_the_new_economics_of_Cloud_will_make_you_think_differently_about_Java How far have you got with learning about Cloud? Got your head around Platform as a Service? Understand what IaaS means? Can spell Docker? Working in a DevOps mode? It's easy to focus on learning new technology but it's time to take a step back and look at what the technical implications are when an application is heading to the cloud. In the world of the cloud the benefits are high but the economics (financial and technical) can be radically different. Learn more about these new realities and how they can change application design, deployment and support The introduction of Cloud technologies and its rapid adoption creates new opportunities and challenges. Whether designer, developer or tester, this talk will help you to start thinking differently about Java and the Cloud

Essential Data Engineering for Data Scientist

SoftServe

DeepScale: Real-Time Perception for Automated Driving

Forrest Iandola

ACM Sunnyvale Meetup.pdf

Anyscale

Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications and ML workloads from a laptop to a cluster, with an emphasis on the unique performance challenges of ML/AI systems. It is now used in many production deployments. This talk will cover Ray’s overview, architecture, core concepts, and primitives, such as remote Tasks and Actors; briefly discuss Ray native libraries (Ray Tune, Ray Train, Ray Serve, Ray Datasets, RLlib); and Ray’s growing ecosystem. Through a demo using XGBoost for classification, we will demonstrate how you can scale training, hyperparameter tuning, and inference—from a single node to a cluster, with tangible performance difference when using Ray. The takeaways from this talk are : Learn Ray architecture, core concepts, and Ray primitives and patterns Why Distributed computing will be the norm not an exception How to scale your ML workloads with Ray libraries: Training on a single node vs. Ray cluster, using XGBoost with/without Ray Hyperparameter search and tuning, using XGBoost with Ray Tune Inferencing at scale, using XGBoost with/without Ray

Hands-On with Oracle SOA Cloud Service

Revelation Technologies

Want to see Oracle SOACS in action and understand how it differs from your on-premise Oracle SOA Suite installation? Join us for some hands-on with the entire stack - Oracle Java Cloud Service (JCS), Oracle SOA Cloud Service (SOACS), and Oracle Database Cloud Service (DBaaS). Learn about access, backups, monitoring, and deployment in the Oracle Cloud. Also find out first hand the struggles a recent customer went through and what it took to get everything stabilized and back on track. The lessons learned - part technical, part sales, and part management - should be considered for anyone considering a first time implementation on the Oracle Cloud.

Image and text Encryption using RSA algorithm in java

PiyushPatil73

This document provides an overview and implementation details of an image and text encryption/decryption project using RSA encryption. It includes chapters on introduction/background, hardware/software specifications, feasibility study, preliminary design including ER diagram and data flow diagram, screen layouts, testing approach including white and black box testing, and implementation details of the modules. The implementation utilizes Java and generates RSA public/private key pairs to encrypt and decrypt text and images.

Similar to Data science and cloud computing (20)

Extending Cloudera SDX beyond the Platform

DOAG Big Data Days 2017 - Cloud Journey

The Future of Data Science

Enabling Data centric Teams

Analytics in a Day Ft. Synapse Virtual Workshop

Multiplatform Spark solution for Graph datasources by Javier Dominguez

AWS Sydney Summit 2013 - Big Data Analytics

2017 12 lab informatics summit

The Edge to AI Deep Dive Barcelona Meetup March 2019

Analytics in a Day Virtual Workshop

Building a scalable analytics environment to support diverse workloads

Analytics in a Day Ft. Synapse Virtual Workshop

Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...

Microsoft Data Warehousing

All Change! How the new economics of Cloud will make you think differently ab...

Essential Data Engineering for Data Scientist

DeepScale: Real-Time Perception for Automated Driving

ACM Sunnyvale Meetup.pdf

Hands-On with Oracle SOA Cloud Service

Image and text Encryption using RSA algorithm in java

Recently uploaded

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

Neo4j

Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips

ScyllaDB

ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.

AWS Certified Solutions Architect Associate (SAA-C03)

HarpalGohil4

Christine's Product Research Presentation.pptx

christinelarrosa

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors

DianaGray10

Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more. The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications. We’ll discuss and demo the benefits of UiPath Apps and connectors including: Creating a compelling user experience for any software, without the limitations of APIs. Accelerating the app creation process, saving time and effort Enjoying high-performance CRUD (create, read, update, delete) operations, for seamless data management. Speakers: Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP Charlie Greenberg, host

AI in the Workplace Reskilling, Upskilling, and Future Work.pptx

Sunil Jagani

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research

Neo4j

PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx

christinelarrosa

JavaLand 2024: Application Development Green Masterplan

Miro Wengner

QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...

AlexanderRichford

QR Secure: A Hybrid Approach Using Machine Learning and Security Validation Functions to Prevent Interaction with Malicious QR Codes. Aim of the Study: The goal of this research was to develop a robust hybrid approach for identifying malicious and insecure URLs derived from QR codes, ensuring safe interactions. This is achieved through: Machine Learning Model: Predicts the likelihood of a URL being malicious. Security Validation Functions: Ensures the derived URL has a valid certificate and proper URL format. This innovative blend of technology aims to enhance cybersecurity measures and protect users from potential threats hidden within QR codes 🖥 🔒 This study was my first introduction to using ML which has shown me the immense potential of ML in creating more secure digital environments!

Must Know Postgres Extension for DBA and Developer during Migration

Mydbops

Mydbops Opensource Database Meetup 16 Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting Date & Time: 8th June | 10 AM - 1 PM IST Venue: Bangalore International Centre, Bangalore Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle. Key Takeaways: * Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities. * Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom. * Discover how these key extensions can empower both developers and DBAs during the migration process. * Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends. Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL. Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability. Contact us: info@mydbops.com Visit: https://www.mydbops.com/ Follow us on LinkedIn: https://in.linkedin.com/company/mydbops For more details and updates, please follow up the below links. Meetup Page : https://www.meetup.com/mydbops-databa... Twitter: https://twitter.com/mydbopsofficial Blogs: https://www.mydbops.com/blog/ Facebook(Meta): https://www.facebook.com/mydbops/

Apps Break Data

Ivo Velitchkov

How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?

Principle of conventional tomography-Bibash Shahi ppt..pptx

BibashShahi

GNSS spoofing via SDR (Criptored Talks 2024)

Javier Junquera

In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security. This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing. The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.

QA or the Highway - Component Testing: Bridging the gap between frontend appl...

zjhamm304

Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels

Northern Engraving

What is an RPA CoE? Session 1 – CoE Vision

DianaGray10

Day 2 - Intro to UiPath Studio Fundamentals

UiPathCommunity

In our second session, we shall learn all about the main features and fundamentals of UiPath Studio that enable us to use the building blocks for any automation project. 📕 Detailed agenda: Variables and Datatypes Workflow Layouts Arguments Control Flows and Loops Conditional Statements 💻 Extra training through UiPath Academy: Variables, Constants, and Arguments in Studio Control Flow in Studio

Leveraging the Graph for Clinical Trials and Standards

Neo4j

Recently uploaded (20)

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips

AWS Certified Solutions Architect Associate (SAA-C03)

Christine's Product Research Presentation.pptx

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors

AI in the Workplace Reskilling, Upskilling, and Future Work.pptx

Northern Engraving | Nameplate Manufacturing Process - 2024

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research

PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx

JavaLand 2024: Application Development Green Masterplan

QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...

Must Know Postgres Extension for DBA and Developer during Migration

Apps Break Data

Principle of conventional tomography-Bibash Shahi ppt..pptx

GNSS spoofing via SDR (Criptored Talks 2024)

QA or the Highway - Component Testing: Bridging the gap between frontend appl...

Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels

What is an RPA CoE? Session 1 – CoE Vision

Day 2 - Intro to UiPath Studio Fundamentals

Leveraging the Graph for Clinical Trials and Standards

Data science and cloud computing

1. All images in this presentation are subject to copyright and belong to respective Hands-on hack session Data Science & Cloud Computing

2. All images in this presentation are subject to copyright and belong to respective DISCLAIMER 2: The opinions expressed in this presentation are my own views and not those of JITHENDRA BALAKRISHNAN Technical Leader, Cloud Product Solutions Head of Technology, 47Line Technologies @jitcompil e /jithendrabalakrishn an DISCLAIMER 1: All copyrights and trademarks of images belong to their respective IP owners and are used under Fair Use for educational

3. All images in this presentation are subject to copyright and belong to respective AGENDA Cloud Computing Storage Data Science Compute Learning Hands on Hack

4. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Harvard Business Review “Data Scientist: The Sexiest Job of the 21st Century”

5. All images in this presentation are subject to copyright and belong to respective Data Science Process

6. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Paul Maritz, Pivotal “Cloud is about how you do computing, not where you do computing”

7. All images in this presentation are subject to copyright and belong to respective Storage Compute Learning CLOUD COMPUTING SERVICES

8. All images in this presentation are subject to copyright and belong to respective AMAZONWEB SERVICES

9. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective W. Edwards Deming, Scholar & Teacher “In God we trust. All others must bring data”

10. Structured Unstructured Graph Time Series

11. All images in this presentation are subject to copyright and belong to respective DATA IS THE NEW OIL Value Variety Velocity Volume

12. All images in this presentation are subject to copyright and belong to respective AMAZON S3 Object storage to store and retrieve any amount of data from anywhere. AMAZON REDSHIFT Fully managed petabyte scale data warehouse. AMAZON NEPTUNE Fully managed graph database engine. AMAZON RDS Fully managed relational database service. AMAZON DYNAMODB Fast & Flexible NoSQL database service. AMAZON ELASTICACHE Managed Redis & MemCached as a Service. AMAZON AURORA Fully managed MySQL & PostgreSQL compliant cloud database. AMAZON GLACIER Secure, durable & low cost data archival & long term backup service. AMAZON SIMPLEDB Highly available, secure & inexpensive NoSQL data store.

13. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Peter Norvig, Google Research “More data beats clever algorithms, but better data beats more data.”

14. All images in this presentation are subject to copyright and belong to respective SCALABLE PROCESSING ELASTICIT Y SCALABILI TY COST

15. All images in this presentation are subject to copyright and belong to respective Secure resizable elastic compute capacity in the cloud. EC2 Managed Hadoop framework for easy, fast and cost-effective cluster for processing large amounts of data Interactive SQL query service to analyze data in S3. ATHENA Fully managed ETL service to prepare and load data for analytics. EMRGLUE COMPUTE SERVICES

16. All images in this presentation are subject to copyright and belong to respective COST OPTIONS SPOT INSTANCES Spare AWS capacity available at up to 90% discount. Recommended for stateless, low cost and flexible timed applications. RESERVED INSTANCES Provides up to 75% discount on committed usage over 1 or 3 year period. Recommended for Steady state and planned capacity needs. SPOT BLOCK Spare AWS capacity available at up to 40% discount on committed usage of 6 hours. Recommended for low cost, low risk and known duration workloads. 02 03 01

17. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Andrew Ng, Chairman, Coursera “Artificial Intelligence is the new Electricity”

18. All images in this presentation are subject to copyright and belong to respective

19. All images in this presentation are subject to copyright and belong to respective o Machine Learning for everyone o API-driven ML services o GPU Instances o Powerful Compute o FPGA Hardware Acceleration MACHINE LEARNING AS A SERVICE

20. All images in this presentation are subject to copyright and belong to respective

21. All images in this presentation are subject to copyright and belong to respective

22. All images in this presentation are subject to copyright and belong to respective SUMMARY 1 DATA SCIENCE Inter-disciplinary field that involves the entire technology organization 2 CLOUD COMPUTING Helps data science practitioners by simplifying usage of resources & tools 3 DATA STORAGE Data is collected at volume and clear storage plan helps in reducing costs 4 DATA PROCESSING Cheap compute resources helps in cleaning & extracting value from data 5 MACHINE LEARNING Automated algorithms available as service with managed infrastructure 6 MODEL USAGE API services to apply machine learning models in real world applications

Editor's Notes

Understand audience distribution Set the context Basics of Data Science How Cloud Computing helps doing Data Science
Introduction Explain cmpute.io & the data science work done there Cisco acquisition of cmpute.io Cisco Disclaimer Image Fair Use Disclaimer
Agenda for the workshop What data science process looks like How cloud computing has changed the way things are done today Storage concepts from data science perspective Compute specific services for data science ML & Deep Learning Explain one problem of cmpute.io & walk through how it was resolved
Joke: LinkedIn inclusion of “Data Science” as core skill increased after this article Data Science history Costly and difficult skill Very niche and not available everywhere Increase in data storage increased need to find value in them Data Science is a must have skill in today’s information age
Explain the Process Continuous Learning model – Similarities to cmpute.io bid model
Cloud brings the best processes into organization Design for failure Unlimited Scale
Data Science specific topics in Cloud Storage – Store information Compute – Clean and Process information Learning – Ready to use services for AI & Deep Learning
Showcasing AWS to demonstrate Cloud Computing Early innovator in Cloud space Has multiple choices of Services for each of the previous areas Fit for Beginners to Expert level Presenter is familiar with this cloud 
Data is the starting point for all analysis Collect as much as you can Collect in native forms and then transpose them for analysis
Companies now need varied storage choices Structured – Traditional Relational Storage - SQL Unstructured – Modern Storage – NOSQL Graph – Significant focus on Relationships – Social information Time Series – Streaming data – Metrics
Data is classified based on origin and scale Variety Twitter feed is saved to MongoDB Website form information saved to RDBMS Velocity Downstream mainframes which drop files once a day Twitter sending unending stream of requests for support to company social media handle Volume IOT devices sending many metrics every seconds Leave Management System receiving a few requests per day Value Finding value in all information is the goal of Data Science
Storage Choices on Amazon Relational Amazon RDS Amazon Aurora Amazon Redshift NoSQL Amazon SimpleDb Amazon DynamoDB UnStructured Amazon S3 Amazon Glacier Graph Amazon Neptune Metrics Amazon CloudWatch Speciality Amazon ElastiCache
Collecting data is important Processing the collected data to make meaningful training sets is primary Computers work on the principle of GIGO Garbage In Garbage Out Gold In Gold Out
Cloud Computing solves 3 important needs of data science Elasticity Scale up and down based on your needs Scalability Aim for any size cluster and cloud makes it available Cost Cost conscious computing choices available based on needs
Basic services for processing and querying large data sets EC2 Write processing and scale based on your own framework EMR Process and scale on top of Hadoop, Pig, Hive models Glue Managed ETL without any code Athena Query data directly without any servers
Available cost choices Reserved For predictable work loads Spot Block For checkpoint based time limited work loads Spot For interrupt tolerant processing
Machine Learning became a widely discussed topic due to the free AI course from Coursera.
Machine Learning, a niche skill has became a common skill due to commoditization and open source
Ready to use Machine Learning Services and comparison with other clouds. Tensorflow is backed by Google Glucon is deep learning project backed by Amazon & Microsoft MxNet is open source and supported by Amazon Amazon ML has limited choices Built for beginners Proprietary engine Supports only 3 algorithms – Binary Classification, Multi Classification and Regression Amazon SageMaker has both ready made algorithms and support for custom algorithms Built for data scientists Uses TensorFlow and MxNet Azure and GCP have much advanced support for ML, AI and Deep Learning Cognitive services are outside of the scope of this presentation We are focused only on Data Speech, Image and Other recognition services are considered cognitive All clouds offer ready to use services which have advanced automation and are available over API
Cmpute.io Initial days What we did How we did Issues we faced Why we turned to data science Need for predictions Need for classification How we went about solving our problems Explain flowchart Demo Problem Predict spot prices using historical data Disclaimer: Cmpute.io used multiple sources of data and not just historical information
A simple Real time Spot prediction System Infrastructure Amazon RDS Aurora – Store information AWS Fargate – Scheduled Container execution AWS S3 – Training data storage AWS ML – Machine Learning model and evaluation AWS Api Gateway – REST Service AWS Lambda – Actual functions for API React – Front end Background Services Spot fetcher – Fetch prices every 5 minutes Training data – Convert daily data into training data every day – 1 file per day Machine Learning Create Data source from S3 Training Data Create Model using Regression Create Evaluation using Model Create Real time API for evaluation API Get Current prices – Fetch information from aws api and save to database Get Prediction – Call AWS ML Real time prediction API Front end Simple grid that shows the matrix of region, availability zone, instance type, platform ,current price and predicted price
Data science is a inter-disciplinary process that involves the entire organization Cloud computing is here to stay and offers significant advances to the data science process Storage management solutions allows any type of data and is built for volume, variety, velocity Cleaning and extraction brings out value in data Democratization of AI has made it easy for data processing API based models help in real world usage

Data science and cloud computing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data science and cloud computing

Similar to Data science and cloud computing (20)

Recently uploaded

Recently uploaded (20)

Data science and cloud computing

Editor's Notes