Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
PixieDust is a new open source library that helps data scientists and developers working in Jupyter Notebooks and Apache Spark be more efficient. PixieDust speeds up data manipulation and display with features like: auto-visualization of Spark DataFrames, real-time Spark job progress monitoring, automated local install of Python and Scala kernels running with Spark, and much more.
Come along and learn how you can use this tool in your own projects to visualize and explore data effortlessly with no coding. Oh, and if you prefer working with a Scala Notebook, this session is also for you, as PixieDust can also run on a Scala Kernel. Imagine being able to visualize your favorite Python chart engines from a Scala Notebook!
We’ll finish the session with a demo combining Twitter, Watson Tone Analyzer, Spark Streaming, and some fun real-time visualizations–all running within a Notebook.
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
PixieDust is a new open source library that helps data scientists and developers working in Jupyter Notebooks and Apache Spark be more efficient. PixieDust speeds up data manipulation and display with features like: auto-visualization of Spark DataFrames, real-time Spark job progress monitoring, automated local install of Python and Scala kernels running with Spark, and much more.
Come along and learn how you can use this tool in your own projects to visualize and explore data effortlessly with no coding. Oh, and if you prefer working with a Scala Notebook, this session is also for you, as PixieDust can also run on a Scala Kernel. Imagine being able to visualize your favorite Python chart engines from a Scala Notebook!
We’ll finish the session with a demo combining Twitter, Watson Tone Analyzer, Spark Streaming, and some fun real-time visualizations–all running within a Notebook.
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.
At Netflix, we've spent a lot of time thinking about how we can make our analytics group move quickly. Netflix's Data Engineering & Analytics organization embraces the company's culture of "Freedom & Responsibility".
How does a company with a $40 billion market cap and $6 billion in annual revenue keep their data teams moving with the agility of a tiny company?
How do hundreds of data engineers and scientists make the best decisions for their projects independently, without the analytics environment devolving into chaos?
We'll talk about how Netflix equips its business intelligence and data engineers with:
the freedom to leverage cloud-based data tools - Spark, Presto, Redshift, Tableau and others - in ways that solve our most difficult data problems
the freedom to find and introduce right software for the job - even if it isn't used anywhere else in-house
the freedom to create and drop new tables in production without approval
the freedom to choose when a question is a one-off, and when a question is asked often enough to require a self-service tool
the freedom to retire analytics and data processes whose value doesn't justify their support costs
Speaker Bios
Monisha Kanoth is a Senior Data Architect at Netflix, and was one of the founding members of the current streaming Content Analytics team. She previously worked as a big data lead at Convertro (acquired by AOL) and as a data warehouse lead at MySpace.
Jason Flittner is a Senior Business Intelligence Engineer at Netflix, focusing on data transformation, analysis, and visualization as part of the Content Data Engineering & Analytics team. He previously led the EC2 Business Intelligence team at Amazon Web Services and was a business intelligence engineer with Cisco.
Chris Stephens is a Senior Data Engineer at Netflix. He previously served as the CTO at Deep 6 Analytics, a machine learning & content analytics company in Los Angeles, and on the data warehouse teams at the FOX Audience Network and Anheuser-Busch.
The presentation aims to demystify the practice of building reliable data processing pipelines. It includes a brief overview of the pieces needed to build a stable processing platform: data ingestion,processing engines, workflow management, and schemas. For each component, suitable components are suggested, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
Original document: https://goo.gl/rmKxZM
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
“The ability to detect malware has needed to drastically change in the past few years away from traditional signature or list based techniques. Couple this with the rise of mobile device based attacks, where the scale of the data is predicted to be 60% of the internet in 2018*, our online lives will need Machine Learning (ML) and Data Science to ensure its security. At Wandera we have successfully implemented a malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm.
In this talk we will touch on the training data and why we use Spark at all, the features we extract from mobile phone applications and how we then obtain our high accuracy scores in the cloud. At Wandera we have successfully implemented a Malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm. *https://blog.cloudflare.com/our-predictions-for-2018/”
Realtime streaming architecture in INFINARIOJozo Kovac
About our experience with realtime analyses on never-ending stream of user events. Discuss Lambda architecture, Kappa, Apache Kafka and our own approach.
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
"Most data practitioners grapple with data quality issues and data pipeline complexities—it's the bane of their existence. Data engineers, in particular, strive to design and deploy robust data pipelines that serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.
Databricks Delta, part of Databricks Runtime, is a next-generation unified analytics engine built on top of Apache Spark. Built on open standards, Delta employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data pipelines, the challenges data engineers face when it comes to data reliability and performance and how Delta can help. Through presentation, code examples and notebooks, we will explain pipeline challenges and the use of Delta to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.
This tutorial will be both instructor-led and hands-on interactive session. Instructions in how to get tutorial materials will be covered in class. WHAT
YOU’LL LEARN:
– Understand the key data reliability and performance data pipelines challenges
– How Databricks Delta helps build robust pipelines at scale
– Understand how Delta fits within an Apache Spark™ environment – How to use Delta to realize data reliability improvements
– How to deliver performance gains using Delta
PREREQUISITES:
– A fully-charged laptop (8-16GB memory) with Chrome or Firefox
– Pre-register for Databricks Community Edition"
Speakers: Steven Yu, Burak Yavuz
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
Upwork has the biggest closed-loop online dataset of jobs and job seekers in labor history (>10M Profiles, >100M Job Posts, Job Proposals and Hiring Decisions, >10B of Messages, Transaction and Feedback Data). Besides sheer quantity, our data is also contextually very rich. We have client and contractor data for the entire job-funnel – from finding jobs to getting the job done.
For various machine learning applications including search and recommendations and labor marketplace optimization (rate, supply and demand), we heavily relied on a Greenplum-based data warehouse solution for data processing and ad-hoc ML pipelines (weka, scikit-learn, R) for offline model development and online model scoring.
In this talk, we present our modernization efforts in moving towards a 1) holistic data processing infrastructure for batch and stream data processing using S3, Kinesis, Spark and Spark Structured Streaming 2) model development using Spark MLlib and other ML libraries for Spark 3) model serving using Databricks Model Scoring, Scoring over Structured Streams and microservices and 3) how we orchestrate and streamline all these processes using Apache Airflow and a CI/CD workflow customized to our Data Science product engineering needs. The focus of this talk is on how we were able to leverage the Databricks service offering to reduce DevOps overhead and costs, complete the entire modernization with moderate efforts and adopt a collaborative notebook-based solution for all our data scientists to develop model, reuse features and share results. We will shared the core lessons learned and pitfalls we encountered during this journey.
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
Around the world, businesses are turning to AI to transform the way they operate and serve their customers. But before they can implement these technologies, companies must address the roadblock of moving from batch analytics to making real-time decisions by rapidly accessing and analyzing the relevant information amidst a sea of data. Yaron will explain how to make Spark handle multivariate real-time, historical and event data simultaneously to provide immediate and intelligent responses. He will present several time sensitive use-cases including fraud detection, prevention of outages and customer recommendations to demonstrate how to perform predictive analytics and real-time actions with Spark.
Speaker: Yaron Ekshtein
Data Warehousing with Spark Streaming at ZalandoDatabricks
Zalandos AI-driven products and distributed landscape of analytical data marts cannot wait for long-running, hard-to-recover, monolithic batch jobs taking all night to calculate already outdated data. Modern data integration pipelines need to deliver fast and easy to consume data sets in high quality. Based on Spark Streaming and Delta, the central data warehousing team was able to deliver widely-used master data as S3 or Kafka streams and snapshots at the same time.
The talk will cover challenges in our fashion data platform and a detailed architectural deep dive about separation of integration from enrichment, providing streams as well as snapshots and feeding the data to distributed data marts. Finally, lessons learned and best practices about Delta’s MERGE command, Scala API vs Spark SQL and schema evolution give more insights and guidance for similar use cases.
At Netflix, we've spent a lot of time thinking about how we can make our analytics group move quickly. Netflix's Data Engineering & Analytics organization embraces the company's culture of "Freedom & Responsibility".
How does a company with a $40 billion market cap and $6 billion in annual revenue keep their data teams moving with the agility of a tiny company?
How do hundreds of data engineers and scientists make the best decisions for their projects independently, without the analytics environment devolving into chaos?
We'll talk about how Netflix equips its business intelligence and data engineers with:
the freedom to leverage cloud-based data tools - Spark, Presto, Redshift, Tableau and others - in ways that solve our most difficult data problems
the freedom to find and introduce right software for the job - even if it isn't used anywhere else in-house
the freedom to create and drop new tables in production without approval
the freedom to choose when a question is a one-off, and when a question is asked often enough to require a self-service tool
the freedom to retire analytics and data processes whose value doesn't justify their support costs
Speaker Bios
Monisha Kanoth is a Senior Data Architect at Netflix, and was one of the founding members of the current streaming Content Analytics team. She previously worked as a big data lead at Convertro (acquired by AOL) and as a data warehouse lead at MySpace.
Jason Flittner is a Senior Business Intelligence Engineer at Netflix, focusing on data transformation, analysis, and visualization as part of the Content Data Engineering & Analytics team. He previously led the EC2 Business Intelligence team at Amazon Web Services and was a business intelligence engineer with Cisco.
Chris Stephens is a Senior Data Engineer at Netflix. He previously served as the CTO at Deep 6 Analytics, a machine learning & content analytics company in Los Angeles, and on the data warehouse teams at the FOX Audience Network and Anheuser-Busch.
The presentation aims to demystify the practice of building reliable data processing pipelines. It includes a brief overview of the pieces needed to build a stable processing platform: data ingestion,processing engines, workflow management, and schemas. For each component, suitable components are suggested, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
Original document: https://goo.gl/rmKxZM
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
“The ability to detect malware has needed to drastically change in the past few years away from traditional signature or list based techniques. Couple this with the rise of mobile device based attacks, where the scale of the data is predicted to be 60% of the internet in 2018*, our online lives will need Machine Learning (ML) and Data Science to ensure its security. At Wandera we have successfully implemented a malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm.
In this talk we will touch on the training data and why we use Spark at all, the features we extract from mobile phone applications and how we then obtain our high accuracy scores in the cloud. At Wandera we have successfully implemented a Malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm. *https://blog.cloudflare.com/our-predictions-for-2018/”
Realtime streaming architecture in INFINARIOJozo Kovac
About our experience with realtime analyses on never-ending stream of user events. Discuss Lambda architecture, Kappa, Apache Kafka and our own approach.
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
"Most data practitioners grapple with data quality issues and data pipeline complexities—it's the bane of their existence. Data engineers, in particular, strive to design and deploy robust data pipelines that serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.
Databricks Delta, part of Databricks Runtime, is a next-generation unified analytics engine built on top of Apache Spark. Built on open standards, Delta employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data pipelines, the challenges data engineers face when it comes to data reliability and performance and how Delta can help. Through presentation, code examples and notebooks, we will explain pipeline challenges and the use of Delta to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.
This tutorial will be both instructor-led and hands-on interactive session. Instructions in how to get tutorial materials will be covered in class. WHAT
YOU’LL LEARN:
– Understand the key data reliability and performance data pipelines challenges
– How Databricks Delta helps build robust pipelines at scale
– Understand how Delta fits within an Apache Spark™ environment – How to use Delta to realize data reliability improvements
– How to deliver performance gains using Delta
PREREQUISITES:
– A fully-charged laptop (8-16GB memory) with Chrome or Firefox
– Pre-register for Databricks Community Edition"
Speakers: Steven Yu, Burak Yavuz
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
Upwork has the biggest closed-loop online dataset of jobs and job seekers in labor history (>10M Profiles, >100M Job Posts, Job Proposals and Hiring Decisions, >10B of Messages, Transaction and Feedback Data). Besides sheer quantity, our data is also contextually very rich. We have client and contractor data for the entire job-funnel – from finding jobs to getting the job done.
For various machine learning applications including search and recommendations and labor marketplace optimization (rate, supply and demand), we heavily relied on a Greenplum-based data warehouse solution for data processing and ad-hoc ML pipelines (weka, scikit-learn, R) for offline model development and online model scoring.
In this talk, we present our modernization efforts in moving towards a 1) holistic data processing infrastructure for batch and stream data processing using S3, Kinesis, Spark and Spark Structured Streaming 2) model development using Spark MLlib and other ML libraries for Spark 3) model serving using Databricks Model Scoring, Scoring over Structured Streams and microservices and 3) how we orchestrate and streamline all these processes using Apache Airflow and a CI/CD workflow customized to our Data Science product engineering needs. The focus of this talk is on how we were able to leverage the Databricks service offering to reduce DevOps overhead and costs, complete the entire modernization with moderate efforts and adopt a collaborative notebook-based solution for all our data scientists to develop model, reuse features and share results. We will shared the core lessons learned and pitfalls we encountered during this journey.
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
Around the world, businesses are turning to AI to transform the way they operate and serve their customers. But before they can implement these technologies, companies must address the roadblock of moving from batch analytics to making real-time decisions by rapidly accessing and analyzing the relevant information amidst a sea of data. Yaron will explain how to make Spark handle multivariate real-time, historical and event data simultaneously to provide immediate and intelligent responses. He will present several time sensitive use-cases including fraud detection, prevention of outages and customer recommendations to demonstrate how to perform predictive analytics and real-time actions with Spark.
Speaker: Yaron Ekshtein
Data Warehousing with Spark Streaming at ZalandoDatabricks
Zalandos AI-driven products and distributed landscape of analytical data marts cannot wait for long-running, hard-to-recover, monolithic batch jobs taking all night to calculate already outdated data. Modern data integration pipelines need to deliver fast and easy to consume data sets in high quality. Based on Spark Streaming and Delta, the central data warehousing team was able to deliver widely-used master data as S3 or Kafka streams and snapshots at the same time.
The talk will cover challenges in our fashion data platform and a detailed architectural deep dive about separation of integration from enrichment, providing streams as well as snapshots and feeding the data to distributed data marts. Finally, lessons learned and best practices about Delta’s MERGE command, Scala API vs Spark SQL and schema evolution give more insights and guidance for similar use cases.
Building an Open Source iOS app: lessons learnedWojciech Koszek
Building an Open Source iOS app: lessons learned
Dec 12, 2016, Hacker Dojo (Santa Clara), 6pm
In this talk I'm going to talk about lessons learned from building Sensorama (http://www.sensorama.org), an Open Source sensor platform for data science. The main theme of the talk will be Open Source: what is great about it, what is bad and how you must become a part of the Open Source community to really move quickly and benefit from it. For this project, I did both the code and the design, so you'll have a chance to see how solo-developer deals with time/feature constraints, which tools I've used and what my approach towards development in this mode is. In other words: I'll tell you what I did to stay sane. If the iOS development were a walk in a dark city park, this talk may turn out to be your flashlight. If you like it, star it at GitHub: https://github.com/wkoszek/sensorama-ios
Agenda
https://www.meetup.com/svmobiledev/events/235836893/
Materials
https://github.com/wkoszek/talks/tree/master/svmobiledev2016
Some links from the slides
Fake it till you make it presentation https://developer.apple.com/videos/play/wwdc2014/223
Designing for Future Hardware https://developer.apple.com/videos/play/wwdc2015/801/
References
WWW: http://www.sensorama.org
GitHub (code): https://github.com/wkoszek/sensorama-ios
GitHub (artwork): https://github.com/wkoszek/sensorama-artwork
Author
WWW: http://www.koszek.com
Twitter: https://twitter.com/wkoszek
LinkedIn: https://www.linkedin.com/in/wkoszek/
Email: wojciech (at) koszek.com
Luckily, there’s plenty of good quality coding training available here on the Internet. What better place to learn web development than the web itself?
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Simplilearn
This presentation on Pig will help you understand why Pig is required, what is Pig, MapReduce vs Hive vs Pig, Pig architecture, working of Pig, Pig Latin data model, Pig Execution modes, and finally a demo which shows Pig Latin scripts. Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets. It operates on various types of data like structured, semi-structured and unstructured data. Pig Latin is the procedural data flow language used in Pig to analyze data. It is easy to program using Pig Latin as it is similar to SQL.
Now, let us get started with Pig.
Below topics are explained in this Pig presentation:
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
How Open Source / Open Technology Could Help On Your ProjectWan Leung Wong
ITFest 2014, Seminar on Free & OSS in HK
How Open Source / Open Technology Could Help On Your Project?
A talk brief to talk about how to use open source or open technology to help on start a new project. How to choose technology, and what should people to concern on.
Grid Reader is an Interaction Design Semester-long project.
We try to develop an application lets users read feeds they are interested in instantly, efficiently, in a customized way from anywhere .
Breaking out of the endless callback look - #jsday Italy keynoteChristian Heilmann
As JavaScript developers we are hot property. This can be overwhelming and in this keynote Chris Heilmann gives some history lessons to show that we don't need to get frustrated, but pick our battles.
WebRTC has had a tough 3 or 4 years. But it's gone through a rebirth. Node.js developers are a perfect match for the technology. Come and play with it!
Talk given at Hackference in Birmingham
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)Hafiz Ismail
Talk for FOSSASIA 2016 (http://2016.fossasia.org)
----
This talk will give a brief and enlightening look into how GraphQL can help you address common weaknesses that you, as a web / mobile developer, would normally face with using / building typical REST API systems.
Let's stop fighting about whether we should implement the strictest interpretation of REST or how pragmatic REST-ful design is the only way to go, or debate about what REST is or what it should be.
A couple of demos (In Golang! Yay!) will be shown that are guaranteed to open up your eyes and see that the dawn of liberation for product developers is finally here.
Background: GraphQL is a data query language and runtime designed and used at Facebook to request and deliver data to mobile and web apps since 2012.
Hafiz Ismail (@sogko) is a contributor to Go / Golang implementation of GraphQL server library (https://github.com/graphql-go/graphql) and is looking to encourage fellow developers to join in the collaborative effort.
Deep learning has exceeded massive powers of human mind and most popularity for using scientific computing, and its algorithmic procedures to purposeful industries that solve complete difficulties.
Similar to Converging Big Data and Application Infrastructure by Steven Poutsy (20)
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
Insights can only be as good as the data. The data quality domain is enormously large, so you need to understand your company pain points to know what to focus on first.
https://www.bigdataspain.org/2017/talk/big-data-big-quality
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
2gether is a financial platform based on Blockchain, Big Data and Artificial Intelligence that allows interaction between users and third-party services in a single interface.
https://www.bigdataspain.org/2017/talk/scaling-a-backend-for-a-big-data-and-blockchain-environment
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
All modern Big Data solutions, like Hadoop, Kafka or the rest of the ecosystem tools, are designed as distributed processes and as such include some sort of redundancy for High Availability.
https://www.bigdataspain.org/2017/talk/disaster-recovery-for-big-data
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
In this presentation, attendees will see how to speed up existing Hadoop and Spark deployments by just making Apache Ignite responsible for RAM utilization. No code modifications, no new architecture from scratch!
https://www.bigdataspain.org/2017/talk/boost-hadoop-and-spark-with-in-memory-technologies
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
The power of this new set of tools for Data Science. Is really easy to start applying these technics in your current workflow.
https://www.bigdataspain.org/2017/talk/data-science-for-lazy-people-automated-machine-learning
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
GPUs on the cloud as Infrastructure as a Service (IaaS) seem a commodity. However to efficiently distribute deep learning tasks on several GPUs is challenging.
https://www.bigdataspain.org/2017/talk/training-deep-learning-models-on-multiple-gpus-in-the-cloud
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
Unbalanced data is a specific data configuration that appears commonly in nature. Applying machine learning techniques to this kind of data is a difficult process, usually addressed by unbalanced reduction techniques.
https://www.bigdataspain.org/2017/talk/unbalanced-data-same-algorithms-different-techniques
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
Time series related problems have traditionally been solved using engineered features obtained by heuristic processes.
https://www.bigdataspain.org/2017/talk/state-of-the-art-time-series-analysis-with-deep-learning
Big Data Spain 2017
November 16th - 17th
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
Not long ago only banks and hedge funds could afford doing automated and High Frequency Trading, that is, the ability to send buy commodities in microseconds intervals.
https://www.bigdataspain.org/2017/talk/trading-at-market-speed-with-the-latest-kafka-features
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
The shift to stream processing at LinkedIn has accelerated over the past few years. We now have over 200 Samza applications in production processing more than 260B events per day.
https://www.bigdataspain.org/2017/talk/apache-samza-jake-maes
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
IBM has built a “Data Science Experience” cloud service that exposes Notebook services at web scale.
https://www.bigdataspain.org/2017/talk/the-analytic-platform-behind-ibms-watson-data-platform
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
Artificial Intelligence and Data-centric businesses.
https://www.bigdataspain.org/2017/talk/tbc
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
Ten years ago there were rumours of the death of causal inference. Big data was supposed to enable us to rely on purely correlational data to predict and control the world.
https://www.bigdataspain.org/2017/talk/why-big-data-didnt-end-causal-inference
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
The Meme of the Internet Index will be the new normal to analyze and predict facts and sensations which go around the Internet.
https://www.bigdataspain.org/2017/talk/meme-index-analyzing-fads-and-sensations-on-the-internet
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
Geotab is a leader in the expanding world of Internet of Things (IoT) and telematics industry with Big Data.
https://www.bigdataspain.org/2017/talk/vehicle-big-data-that-drives-smart-city-advancement
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
The talk will focus on explaining why operational databases do not scale due to limitations in legacy transactional management.
https://www.bigdataspain.org/2017/talk/end-of-the-myth-ultra-scalable-transactional-management
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
In recent years Machine Learning (ML) and especially Deep Learning (DL) have achieved great success in many areas such as visual recognition, NLP or even aiding in medical research.
https://www.bigdataspain.org/2017/talk/attacking-machine-learning-used-in-antivirus-with-reinforcement
Big Data Spain 2017
16th - 17th Kinépolis Madrid
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
Primary function of banking sector is promoting economic activity; which means “commerce”, exchanging what someone produces-has for something that someone consumes-desires.
https://www.bigdataspain.org/2017/talk/more-people-less-banking-blockchain
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
Bol.com has been an early Hadoop user: since 2008 where it was first built for a recommendation algorithm.
https://www.bigdataspain.org/2017/talk/make-the-elephant-fly-once-again
Big Data Spain 2017
16th - 17th Kinépolis Madrid
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Converging Big Data and Application Infrastructure by Steven Poutsy
1.
2. Converging Big Data and
Application Infrastructure
Steven Pousty
OpenShift/Red Hat
@TheSteve0 on Twitter, IRC,SmugMug,Github,
Ingress, Instagram, Skype,and
3. Goals
1. Brief introduction to how things have evolved
2. Talk about what we have now
3. Demo of this in practice
4. Have fun!
4. Assumptions
1. You have written a web app at some point in your life
2. You know some basics of Spark and such
17. We have had some great developments
1. People figured out how to make commodity hardware do great things
2. Distributed software has gotten better
3. Programming languages have gotten easier
4. CS has put effort into statistical libraries
5. The browser has grown up
23. We are lucky
We can still do analysis on our desktop if we want
But now we can easily integrate
1. Data or APIs
2. Statistics and Machine Learning
3. Interactive Web Applications
24. To Do this on your own:
1. To run your own OpenShift either:
https://www.openshift.org/vm/
Or
oc cluster up (get oc here - https://github.com/openshift/origin/releases )
2. To get Oshinko:
https://github.com/radanalyticsio then you need oshinko-rest and oshinko-
console
3. For the word analysis app get it here:
https://github.com/ophicleide
We LOVE pull requests