JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...David Taieb
Apache Spark is the next-generation distributed computing framework, rapidly becoming the de facto standard for big data analytics. It provides rich, expressive APIs in multiple languages, including Scala, Java, Python, and R. However, depending on the use case—a data scientist working in an Jupyter Notebook or a data engineer implementing long-running Spark submit jobs—choosing the right language can be a dilemma. This session uses a Spark application that performs “sentiment analysis of Twitter data” to compare and contrast the feature differences between the languages, API coverages, and overall productivity. With concrete examples, it provides insight to help you decide when to use Scala, Java, Python, or perhaps a mix of these.
Гостевая лекция Института биоинформатики. Подробнее: http://bioinformaticsinstitute.ru/lectures/1218
Несмотря на несерьезное название, на лекции разговор пойдет о важной проблеме в работе биоинформатика, почти любая реальная задача которого связана с обработкой и анализом больших данных. И решить задачу нужно не только правильно, но и эффективно. Процесс решения можно условно разделить на две части: «придумать», как решать, и «обучить» этому компьютер. И на лекции речь пойдет именно об эффективном «обучении».
Наивно реализованные алгоритмы работают неприемлемо долго, когда дело доходит до гигабайтов реальных данных. От биоинформатика уже требуются не просто базовые навыки программирования, но и знание технических нюансов. И даже у профессионального программиста уйдет немало времени, например, чтобы выгодно использовать возможности Hadoop при работе с Big Data. Так можно ли современному ученому обойтись без тщательного изучения кучи языков, библиотек и фреймворков и сосредоточиться именно на решении?
Analyze Twitter data completely in Bluemix. Collect data, add sentiment, copy to in-memory database, analyze with R or WatsonAnalytics. All in the cloud.
Is it harder to find a taxi when it is raining? Wilfried Hoge
Using open data to answer the question if it is harder to find a taxi, when it is raining. Live demo of analyzing taxi data with DashDB, R, and Bluemix.
Presented on data2day conference.
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...David Taieb
Apache Spark is the next-generation distributed computing framework, rapidly becoming the de facto standard for big data analytics. It provides rich, expressive APIs in multiple languages, including Scala, Java, Python, and R. However, depending on the use case—a data scientist working in an Jupyter Notebook or a data engineer implementing long-running Spark submit jobs—choosing the right language can be a dilemma. This session uses a Spark application that performs “sentiment analysis of Twitter data” to compare and contrast the feature differences between the languages, API coverages, and overall productivity. With concrete examples, it provides insight to help you decide when to use Scala, Java, Python, or perhaps a mix of these.
Гостевая лекция Института биоинформатики. Подробнее: http://bioinformaticsinstitute.ru/lectures/1218
Несмотря на несерьезное название, на лекции разговор пойдет о важной проблеме в работе биоинформатика, почти любая реальная задача которого связана с обработкой и анализом больших данных. И решить задачу нужно не только правильно, но и эффективно. Процесс решения можно условно разделить на две части: «придумать», как решать, и «обучить» этому компьютер. И на лекции речь пойдет именно об эффективном «обучении».
Наивно реализованные алгоритмы работают неприемлемо долго, когда дело доходит до гигабайтов реальных данных. От биоинформатика уже требуются не просто базовые навыки программирования, но и знание технических нюансов. И даже у профессионального программиста уйдет немало времени, например, чтобы выгодно использовать возможности Hadoop при работе с Big Data. Так можно ли современному ученому обойтись без тщательного изучения кучи языков, библиотек и фреймворков и сосредоточиться именно на решении?
Analyze Twitter data completely in Bluemix. Collect data, add sentiment, copy to in-memory database, analyze with R or WatsonAnalytics. All in the cloud.
Is it harder to find a taxi when it is raining? Wilfried Hoge
Using open data to answer the question if it is harder to find a taxi, when it is raining. Live demo of analyzing taxi data with DashDB, R, and Bluemix.
Presented on data2day conference.
GDG Heraklion - Architecting for the Google Cloud PlatformMárton Kodok
Learn about cloud components, architecture overviews to build an app using GCP components.
You will get hands-on information on how to build highly scalable and flexible applications optimized to run in GCP on the same infrastructure that powers Google. We will discuss cloud concepts and highlights various design patterns and best practices.
By the end of the session you will have hands-on experience to build a basic cloud application, it could be a simple web tier, powered by highly distributed database, background tasks executed on a pub/subsystem, and you get information how to go next level with advanced concepts like analytics warehouse, recommendation engines, and ML.
Getting started with Google Cloud Training Material - 2018JK Baseer
Explore and learn!
Note: This share is to help people learn about Google cloud solutions. Myself or the company associated with have no other thoughts.
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
Learn about FaaS, PaaS architectural patterns that make use of Cloud Functions, Pub/Sub, Dataflow, Kubernetes and platforms that hides the management of servers from the user and have changed how we develop and deploy future software.
We discuss the difference between an event-driven approach - this means that you can trigger a function whenever something interesting happens within the cloud environment - and the simpler HTTP approach. Quota and pricing of per invocation, and the advantages and disadvantages of the serverless systems.
Presto + Alluxio on steroids a romantic drama on Production with happy endAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Presto + Alluxio on steroids a romantic drama on Production with happy end
Speaker:
Danny Linden, Ryte
For more Alluxio events: https://www.alluxio.io/events/
Google Cloud Platform itself has been on a very rapid rise over the past few years. It has a lot of advantages over AWS or Microsoft Azure. In this slideshow, you can learn more about these top advantages. For more details, you can also read this post https://kinsta.com/blog/google-cloud-hosting/
Built on the same infrastructure that allows Google to return billions of search results in milliseconds, serve 6 billion hours of YouTube video per month and provide storage for 680 million Gmail users, Google Cloud Platform enables developers to build, test and deploy applications on Google’s highly-scalable and reliable infrastructure. Wether you use Google Deployment Manager, Ansible, Chef, Puppet, or Salt, you can now virtually automate everything!
Google Cloud Platform for the EnterpriseVMware Tanzu
SpringOne Platform 2016
Speakers: Jay Marshall; Principal Strategic Advisor, Google. Vic Iglesias; Solutions Architect, Google.
Whether you are running Spring Apps on Tomcat or Spring Boot on Cloud Foundry, Google Cloud Platform allows you to deploy all of your applications on the same global infrastructure that allows Google to return billions of search results in milliseconds, serve six billion hours of YouTube video per month, and provide storage for almost a billion Gmail users. Join the Google team as they illustrate how Google's cloud was built for the enterprise.
Google Cloud Connect @ Korea
- Google Cloud Vision
- G Suite Product Roadmap
- Google Cloud Security
- Google Cloud Machine Learning
- G suite Customer Stories
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
What is new and hot on Google Cloud?
How can you work like a pro with some (or all) the new APIs and services... Here are some good starting points to follow.
Introduction to Google Cloud Services / PlatformsNilanchal
The presentation provides a brief Introduction to Google Cloud Services and Platforms. In the course of this slide, we will introduce you the different Google cloud computing options, Compute Engine, App Engine, Cloud function, Databases, file storage and security features of Google cloud platform.
GDG Heraklion - Architecting for the Google Cloud PlatformMárton Kodok
Learn about cloud components, architecture overviews to build an app using GCP components.
You will get hands-on information on how to build highly scalable and flexible applications optimized to run in GCP on the same infrastructure that powers Google. We will discuss cloud concepts and highlights various design patterns and best practices.
By the end of the session you will have hands-on experience to build a basic cloud application, it could be a simple web tier, powered by highly distributed database, background tasks executed on a pub/subsystem, and you get information how to go next level with advanced concepts like analytics warehouse, recommendation engines, and ML.
Getting started with Google Cloud Training Material - 2018JK Baseer
Explore and learn!
Note: This share is to help people learn about Google cloud solutions. Myself or the company associated with have no other thoughts.
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
Learn about FaaS, PaaS architectural patterns that make use of Cloud Functions, Pub/Sub, Dataflow, Kubernetes and platforms that hides the management of servers from the user and have changed how we develop and deploy future software.
We discuss the difference between an event-driven approach - this means that you can trigger a function whenever something interesting happens within the cloud environment - and the simpler HTTP approach. Quota and pricing of per invocation, and the advantages and disadvantages of the serverless systems.
Presto + Alluxio on steroids a romantic drama on Production with happy endAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Presto + Alluxio on steroids a romantic drama on Production with happy end
Speaker:
Danny Linden, Ryte
For more Alluxio events: https://www.alluxio.io/events/
Google Cloud Platform itself has been on a very rapid rise over the past few years. It has a lot of advantages over AWS or Microsoft Azure. In this slideshow, you can learn more about these top advantages. For more details, you can also read this post https://kinsta.com/blog/google-cloud-hosting/
Built on the same infrastructure that allows Google to return billions of search results in milliseconds, serve 6 billion hours of YouTube video per month and provide storage for 680 million Gmail users, Google Cloud Platform enables developers to build, test and deploy applications on Google’s highly-scalable and reliable infrastructure. Wether you use Google Deployment Manager, Ansible, Chef, Puppet, or Salt, you can now virtually automate everything!
Google Cloud Platform for the EnterpriseVMware Tanzu
SpringOne Platform 2016
Speakers: Jay Marshall; Principal Strategic Advisor, Google. Vic Iglesias; Solutions Architect, Google.
Whether you are running Spring Apps on Tomcat or Spring Boot on Cloud Foundry, Google Cloud Platform allows you to deploy all of your applications on the same global infrastructure that allows Google to return billions of search results in milliseconds, serve six billion hours of YouTube video per month, and provide storage for almost a billion Gmail users. Join the Google team as they illustrate how Google's cloud was built for the enterprise.
Google Cloud Connect @ Korea
- Google Cloud Vision
- G Suite Product Roadmap
- Google Cloud Security
- Google Cloud Machine Learning
- G suite Customer Stories
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
What is new and hot on Google Cloud?
How can you work like a pro with some (or all) the new APIs and services... Here are some good starting points to follow.
Introduction to Google Cloud Services / PlatformsNilanchal
The presentation provides a brief Introduction to Google Cloud Services and Platforms. In the course of this slide, we will introduce you the different Google cloud computing options, Compute Engine, App Engine, Cloud function, Databases, file storage and security features of Google cloud platform.
Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk shows how to visualise and combine data with Python notebooks in the Cloud. Examples include data from different sources such as weather and climate, statistics collected by individual countries and Open Street Map data.
Apache Spark is an open-source framework developed by AMPlab of University of California and, successively, donated to Apache Software Foundation. Unlike the MapReduce paradigm based on twolevel disk of Hadoop, the primitive in-memory multilayer provided by Spark allow you to have performance up to 100 times better.
Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
This talk discusses Spark (http://spark.apache.org), the Big Data computation system that is emerging as a replacement for MapReduce in Hadoop systems, while it also runs outside of Hadoop. I discuss why the issues why MapReduce needs to be replaced and how Spark addresses them with better performance and a more powerful API.
Introduction to Apache Spark. With an emphasis on the RDD API, Spark SQL (DataFrame and Dataset API) and Spark Streaming.
Presented at the Desert Code Camp:
http://oct2016.desertcodecamp.com/sessions/all
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single compute engine. Spark is speeding up data pipeline development, enabling richer predictive analytics, and bringing a new class of applications to market.
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
London Spark Meetup 2014-11-11 @Skimlinks
http://www.meetup.com/Spark-London/events/217362972/
To paraphrase the immortal crooner Don Ho: "Tiny Batches, in the wine, make me happy, make me feel fine." http://youtu.be/mlCiDEXuxxA
Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams). These "micro batch" computations operated on small time intervals, generally from 500 milliseconds up. One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc.
This talk will compare case studies for production deployments of Spark Streaming, emerging design patterns for integration with popular complementary OSS frameworks, plus some of the more advanced features such as approximation algorithms, and take a look at what's ahead — including the new Python support for Spark Streaming that will be in the upcoming 1.2 release.
Also, let's chat a bit about the new Databricks + O'Reilly developer certification for Apache Spark…
The Jupyter Notebook has become the de facto platform used by data scientists and AI engineers to build interactive applications and develop their AI/ML models. In this scenario, it’s very common to decompose various phases of the development into multiple notebooks to simplify the development and management of the model lifecycle.
Luciano Resende details how to schedule together these multiple notebooks that correspond to different phases of the model lifecycle into notebook-based AI pipelines and walk you through scenarios that demonstrate how to reuse notebooks via parameterization.
Getting to the Next Level with Eclipse Concierge - Jan Rellermeyer + Tim Verb...mfrancis
OSGi Community Event 2016 Presentation by Jan Rellermeyer (IBM), Tim Verbelen (imec) & Jochen Hiller (Deutsche Telekom AG)
Eclipse Concierge provides a clean, small and lightweight implementation of the OSGi core framework specification, specifically tailored to embedded systems and IoT. In this talk, we will cover how to use and deploy the Concierge OSGi framework (e.g. using OSGi enRoute), and discuss many of the new and upcoming features in the Concierge project such as the OSGi REST interface and Cloud Ecosystems reference implementations. We will also present our work in progress on implementing the OSGi R6 core specification level and novel demonstrations that illustrate the advantages of having a lean and streamlined OSGi implementation to deal with deployment and dynamism in IoT applications.
How to deploy machine learning models into productionDataWorks Summit
Data scientists spend a lot of time on data cleaning and munging, so that they can finally start with the fun part of their job: building models. After you have engineered the features and tested different models, you see how the prediction performance improves. However, the job is not done when you have a high performing model. The deployment of your models is a crucial step in the overall workflow and it is the point in time when your models actually become useful to your company.
In this session you will learn about various possibilities and best practices to bring machine learning models into production environments. The goal is not only to make live prediction calls or have the models available as REST API, but also what needs to be considered to maintain them. This talk will focus on solutions with Python (flask, Cloud Foundry, Docker, and more) and the well established ML packages such as Spark MLlib, scikit-learn, and xgboost, but the concepts can be easily transferred to other languages and frameworks.
Speaker
Sumit Goyal, IBM, Software Engineer
Running Spark In Production in the Cloud is Not Easy with Nayur KhanDatabricks
Apache Spark is the engine powering many data-driven use cases, from data engineering to data science and machine learning applications. At QuantumBlack, Spark is considered a key technology and used in a number of client engagements, from a Data Engineering, Data Science and Platform Engineering point of view. This talk will be around the lessons learned after running successfully Apache Spark workloads in production in the cloud for a number of years. As public cloud adoption grows in the enterprise, more and more organizations are choosing to run Apache Spark workloads on cloud infrastructure. While the cloud presents many benefits, there are a number of challenges that aren’t obvious until you start and require sometimes different approaches or thinking.
This talk will look into a few different areas, starting with the Jigsaw pieces you face with Open Source software, balancing a platform for stability along with allowing innovation. The talk will then look at approaches used to combat the not so obvious challenges and trade-offs of using cloud scalable storage backends for storing/retrieving data. Finally, there’ll be a section on the considerations needed for reliability and manageability of robust analytic pipelines.
IBM i at the eart of cognitive solutionsDavid Spurway
Presentation delivered (twice) on the 16th of February 2017, talking about the heritage of IBM Power Systems, highlighting Harry Potter's glasses and how they illustrate the benefit of real integration, which IBM i is all about. IBM i Strategy Whitepaper and top IBM i customer projects. Analytics and IBM i, with recent new options to trial capabilities for free. Highlights of IBM i 7.2 and 7.3. RPG and Open Source. IBM i systems management, support, lifecycle and recent Technology Refreshes. The new S812 server and new features of the Entitled Software Support website. Two customer examples of the benefits of mixing IBM i and Linux. Future plans and POWER9. Ending with IBM i 7.3 announcement links.
Amazon EC2 F1 is a new compute instance with programmable hardware for application acceleration. With F1, you can directly access custom FPGA hardware on the instance in a few clicks.
Learning Objectives:
• Learn about the capabilities, features, and benefits of the new F1 instances
• Develop your FPGA using the F1 Hardware Developer Kit and FPGA Developer AMI
• Deploy your FPGA acceleration code using F1 instances
• Use F1 instances for hardware acceleration in your applications
• Learn how to offer pre-packaged Amazon FPGA Machine Images (AFIs) to your customers through the AWS Marketplace
Zabbix – Powerful enterprise grade monitoring driven by Open Source by Wolfga...NETWAYS
When it comes to enterprise-level and open source network monitoring, one of the products that comes to mind is Zabbix. The tool, which has been continuously developed and improved for over 20 years, is a great choice for monitoring network devices, servers, applications, container, and cloud environments. It combines powerful monitoring capabilities with easy-to-use configuration and visualization options. This presentation will give a brief overview of its design, capabilities and key features that make it so special.
As a data scientist I frequently need to create web apps to provide interactive functionality, deliver data APIs or simply publish results. It is now easier than ever to deploy your data driven web app by using cloud based application platforms to do the heavy lifting. Cloud Foundry (http://cloudfoundry.org) is an open source public and private cloud platform that enables simple app deployment, scaling and connectivity to data services like PostgreSQL, MongoDB, Redis and Cassandra.
Resources: http://www.ianhuston.net/2015/01/cloud-foundry-for-data-science-talk/
Notebook-based AI Pipelines with Elyra and KubeflowNick Pentreath
A typical machine learning pipeline begins as a series of preprocessing steps followed by experimentation, optimization and model-tuning, and, finally deployment. Jupyter notebooks have become a hugely popular tool for data scientists and other machine learning practitioners to explore and experiment as part of this workflow, due to the flexibility and interactivity they provide. However, with notebooks it is often a challenge to move from the experimentation phase to creating a robust, modular and production-grade end-to-end AI pipeline.
Elyra is a set of open-source, AI centric extensions to JupyterLab. Elyra provides a visual editor for building notebook-based pipelines that simplifies the conversion of multiple notebooks into batch jobs or workflows. These workflows can be executed both locally (during the experimentation phase) and on Kubernetes via Kubeflow Pipelines for production deployment. In this way, Elyra combines the flexibility and ease-of-use of notebooks and JupyterLab, with the production-grade qualities of Kubeflow (and in future potentially other Kubernetes-based orchestration platforms).
In this talk I introduce Elyra and its capabilities, then give a deep dive of Elyra's pipeline editor and the underlying pipeline execution mechanics, showing a demo of using Elyra to construct an end-to-end analytics and machine learning pipeline. I will also explore how to integrate and scale out model-tuning as well as deployment via Kubeflow Serving.
PeopleSoft Cloud Architecture - OpenWorld 2016Graham Smith
Oracle’s PeopleSoft PeopleTools 8.55 saw the introduction of PeopleSoft’s cloud architecture: a platform and set of tools for solving many of the issues associated with effectively running PeopleSoft applications in the cloud. This session explores how you can take advantage of this exciting innovation in PeopleSoft, describes practical use cases for making PeopleSoft’s cloud architecture work for you, and discusses how Oracle Compute Cloud Service can play a key part in this.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas