This document provides an overview of building a business intelligence (BI) system using a data lake ecosystem. It discusses using Hadoop, Hive, Teradata, Tableau and Jenkins together. The goals are to deal with big data problems like high volume, variety and velocity of data in a cost effective way. Sample architectures are proposed to handle ETL processes, data storage and querying, and visualization. Key considerations for choosing components include storage and management costs, processing time, and balancing needs with costs. The document concludes by suggesting that a good framework can help support business growth over time within a given cost curve.
In this presentation we will examine various scalability options in order to improve the robustness and performance of your Spring Batch applications. We start out with a single threaded Spring Batch application that we will refactor so we can demonstrate how to run it using:
* Concurrent Steps
* Remote Chunking
* AsyncItemProcessor and AsyncItemWriter
* Remote Partitioning
Additionally, we will show how you can deploy Spring Batch applications to Spring XD which provides high availability and failover capabilities. Spring XD also allows you to integrate Spring Batch applications with other Big Data processing needs.
10 Things Learned Releasing Databricks Enterprise WideDatabricks
Implementing tools, let alone an entire Unified Data Platform, like Databricks, can be quite the undertaking. Implementing a tool which you have not yet learned all the ins and outs of can be even more frustrating. Have you ever wished that you could take some of that uncertainty away? Four years ago, Western Governors University (WGU) took on the task of rewriting all of our ETL pipelines in Scala/Python, as well as migrating our Enterprise Data Warehouse into Delta, all on the Databricks platform. Starting with 4 users and rapidly growing to over 120 users across 8 business units, our Databricks environment turned into an entire unified platform, being used by individuals of all skill levels, data requirements, and internal security requirements.
Through this process, our team has had the chance and opportunity to learn while making a lot of mistakes. Taking a look back at those mistakes, there are a lot of things we wish we had known before opening the platform to our enterprise.
We would like to share with you 10 things we wish we had known before WGU started operating in our Databricks environment. Covering topics surrounding user management from both an AWS and Databricks perspective, understanding and managing costs, creating custom pipelines for efficient code management, learning about new Apache Spark snippets that helped save us a fortune, and more. We would like to provide our recommendations on how one can overcome these pitfalls to help new, current and prospective users to make their environments easier, safer, and more reliable to work in.
During this talk, speaker provided a detailed overview of the Elasticsearch search system, gave an insight into offline search tools, and suggested how to fine-tune Elasticsearch depending on specific goals.
This presentation by Mykhailo Brodskyi (Senior Software Engineer, Сonsultant, GlobalLogic, Kharkiv), was delivered at GlobalLogic Kharkiv Java Conference 2018 on June 10, 2018.
Mario Cartia - SMACK is the new LAMP! - Codemotion Milan 2017Codemotion
SMACK è l'acronimo di Spark, Mesos, Akka, Cassandra e Kafka. Il titolo del talk "provocatoriamente" confronta lo stack di tecnologie per lo sviluppo di applicazioni Reactive con quello più comunemente utilizzato nell'ambito dello sviluppo web. Durante il talk verranno illustrati i concetti di base della Reactive programming, le differenze concettuali introdotte da questo paradigma rispetto all'approccio "classico" della programmazione web ed alcuni casi di successo legati all'utilizzo di queste tecnologie.
In this presentation we will examine various scalability options in order to improve the robustness and performance of your Spring Batch applications. We start out with a single threaded Spring Batch application that we will refactor so we can demonstrate how to run it using:
* Concurrent Steps
* Remote Chunking
* AsyncItemProcessor and AsyncItemWriter
* Remote Partitioning
Additionally, we will show how you can deploy Spring Batch applications to Spring XD which provides high availability and failover capabilities. Spring XD also allows you to integrate Spring Batch applications with other Big Data processing needs.
10 Things Learned Releasing Databricks Enterprise WideDatabricks
Implementing tools, let alone an entire Unified Data Platform, like Databricks, can be quite the undertaking. Implementing a tool which you have not yet learned all the ins and outs of can be even more frustrating. Have you ever wished that you could take some of that uncertainty away? Four years ago, Western Governors University (WGU) took on the task of rewriting all of our ETL pipelines in Scala/Python, as well as migrating our Enterprise Data Warehouse into Delta, all on the Databricks platform. Starting with 4 users and rapidly growing to over 120 users across 8 business units, our Databricks environment turned into an entire unified platform, being used by individuals of all skill levels, data requirements, and internal security requirements.
Through this process, our team has had the chance and opportunity to learn while making a lot of mistakes. Taking a look back at those mistakes, there are a lot of things we wish we had known before opening the platform to our enterprise.
We would like to share with you 10 things we wish we had known before WGU started operating in our Databricks environment. Covering topics surrounding user management from both an AWS and Databricks perspective, understanding and managing costs, creating custom pipelines for efficient code management, learning about new Apache Spark snippets that helped save us a fortune, and more. We would like to provide our recommendations on how one can overcome these pitfalls to help new, current and prospective users to make their environments easier, safer, and more reliable to work in.
During this talk, speaker provided a detailed overview of the Elasticsearch search system, gave an insight into offline search tools, and suggested how to fine-tune Elasticsearch depending on specific goals.
This presentation by Mykhailo Brodskyi (Senior Software Engineer, Сonsultant, GlobalLogic, Kharkiv), was delivered at GlobalLogic Kharkiv Java Conference 2018 on June 10, 2018.
Mario Cartia - SMACK is the new LAMP! - Codemotion Milan 2017Codemotion
SMACK è l'acronimo di Spark, Mesos, Akka, Cassandra e Kafka. Il titolo del talk "provocatoriamente" confronta lo stack di tecnologie per lo sviluppo di applicazioni Reactive con quello più comunemente utilizzato nell'ambito dello sviluppo web. Durante il talk verranno illustrati i concetti di base della Reactive programming, le differenze concettuali introdotte da questo paradigma rispetto all'approccio "classico" della programmazione web ed alcuni casi di successo legati all'utilizzo di queste tecnologie.
Lessons from the Trenches - Building Enterprise Applications with RavenDBOren Eini
It's easy, fun, and simple to get a prototype application built with RavenDB, but what happens when you get to the point of shipping v1.0 into Production? Many of the subtle decisions made during development can have undesirable consequences in Production. In this session, Dan Bishop will explore some of the pain points that arise when building, deploying, and supporting enterprise-grade applications with RavenDB.
Building a friendly .NET SDK to connect to SpaceMaarten Balliauw
Space is a team tool that integrates chats, meetings, git hosting, automation, and more. It has an HTTP API to integrate third party apps and workflows, but it's massive! And slightly opinionated.
In this session, we will see how we built the .NET SDK for Space, and how we make that massive API more digestible. We will see how we used code generation, and incrementally made the API feel more like a real .NET SDK.
5 Amazing Reasons DBAs Need to Love Extended EventsJason Strate
Extended events provide DBAs with a powerful tool that can be used to troubleshoot and investigate SQL Server. Throughout this session, you’ll walk through five great reasons, with demos. By the end of the webcast, you’ll be itching to grab the scripts from the demos to start building your own extended event sessions today.
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...HostedbyConfluent
FREE NOW business is growing rapidly as a ride-hailing industry in general which creates a fair amount of technical challenges related to real-time data aggregation and processing. FREE NOW was a long-time user of Kafka and lately adopted Confluent Cloud as a mainstreaming data platform. We managed to scale it towards several hundreds of topics containing various information about the trip, location and business performance overall. This information is heavily utilized to create streaming applications like dynamic pricing computation, fraud detection as well as real-time analytics for marketing campaigns, and much more. We would like to share the details of the implementation for the real-time computation of the dynamic tour pricing which is based on more than 200 million events daily. Also, we would like to reflect on how Confluent helped us to address the development complexity and provide scalability options at the same time.
Building Codealike: a journey into the developers analytics worldOren Eini
Codealike plugins in Visual Studio, Eclipse and Chrome, track developers while they code and perform analytic calculations at the millisecond level. For such write heavy workloads and using RavenDB as the main and only database was not without challenge. In this talk, we will reveal how we built and scaled such a solution, how we were able to improve performance with Voron and glance at our own mistakes and architectural choices down the line.
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...Sencha
View this presentation to see a real-time and data-centric application designed to help people manage large facilities, buildings, and homes in a smart way. It notably features D3.js dashboards, user-friendly device mapping, and automatic alerts on suspicious power consumptions.
What comes to you if you want to have some fun with GraphQL?
Join us for the meetup in Munich and enjoy the interesting talk about ‚GraphQL vs. (the) REST ‘.
In this session our colleagues Tsvetan Nikolov (Senior Developer at coliquio) and Tom Sedlmeier (Senior Developer at coliquio) will discuss the basic concepts and how to use them.
We will present you GraphQL on the backend and on the client side with Apollo.
Logging is one of those things that everyone complains about, but doesn't dedicate time to. Of course, the first rule of logging is "do it". Without that, you have no visibility into system activities when investigations are required. But, the end goal is much, much more than this. Almost all applications require security audit logs for compliance; application logs for visibility across all cloud properties; and application tracing for tracking usage patterns and business intelligence. The latter is that magic sauce that helps businesses learn about their customer or in some cases the data is FOR the customer. Without a strategy this can get very messy, fast. In this session Michele will discuss design patterns for a sound logging and audit strategy; considerations for security and compliance; the benefits of a noSQL approach; and more.
Database migrations with Flyway and LiquibaseLars Östling
An agile world of continuous integration and deployment reinforces the need to be able to seamlessly and effortlessly update your database to keep it in sync with the latest changes in your code. Implementing database migrations with Flyway or Liquibase will help you do just that. This presentation gives a quick overview of the two frameworks accompanied by some simple demos.
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBOren Eini
Join a real uplift experience with Hagay Albo, the CTO of the Zap/Yellow Page Group in Israel, in which he explains how his team was able to take a legacy (slow and hard to modify) group of sites and make them easier to work with, MUCH faster and greatly simplified the operational environment.
By prioritizing high availability, flexible data modeling and focusing on raw speed Zap was able to reduce its load times by Two Orders of Magnitudes. Using RavenDB as the core engine behind Zap's new sites had improved site traffic, reduced time to market and made it possible to implement the next-gen features that were previously beyond reach.
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
A traditional data team has roles including data engineer, data scientist, and data analyst. However, many organizations are finding success by integrating a new role – the analytics engineer. The analytics engineer develops a code-based data infrastructure that can serve both analytics and data science teams. He or she develops re-usable data models using the software engineering practices of version control and unit testing, and provides the critical domain expertise that ensures that data products are relevant and insightful. In this talk we’ll talk about the role and skill set of the analytics engineer, and discuss how dbt, an open source programming environment, empowers anyone with a SQL skillset to fulfill this new role on the data team. We’ll demonstrate how to use dbt to build version-controlled data models on top of Delta Lake, test both the code and our assumptions about the underlying data, and orchestrate complete data pipelines on Apache Spark™.
Lessons from the Trenches - Building Enterprise Applications with RavenDBOren Eini
It's easy, fun, and simple to get a prototype application built with RavenDB, but what happens when you get to the point of shipping v1.0 into Production? Many of the subtle decisions made during development can have undesirable consequences in Production. In this session, Dan Bishop will explore some of the pain points that arise when building, deploying, and supporting enterprise-grade applications with RavenDB.
Building a friendly .NET SDK to connect to SpaceMaarten Balliauw
Space is a team tool that integrates chats, meetings, git hosting, automation, and more. It has an HTTP API to integrate third party apps and workflows, but it's massive! And slightly opinionated.
In this session, we will see how we built the .NET SDK for Space, and how we make that massive API more digestible. We will see how we used code generation, and incrementally made the API feel more like a real .NET SDK.
5 Amazing Reasons DBAs Need to Love Extended EventsJason Strate
Extended events provide DBAs with a powerful tool that can be used to troubleshoot and investigate SQL Server. Throughout this session, you’ll walk through five great reasons, with demos. By the end of the webcast, you’ll be itching to grab the scripts from the demos to start building your own extended event sessions today.
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...HostedbyConfluent
FREE NOW business is growing rapidly as a ride-hailing industry in general which creates a fair amount of technical challenges related to real-time data aggregation and processing. FREE NOW was a long-time user of Kafka and lately adopted Confluent Cloud as a mainstreaming data platform. We managed to scale it towards several hundreds of topics containing various information about the trip, location and business performance overall. This information is heavily utilized to create streaming applications like dynamic pricing computation, fraud detection as well as real-time analytics for marketing campaigns, and much more. We would like to share the details of the implementation for the real-time computation of the dynamic tour pricing which is based on more than 200 million events daily. Also, we would like to reflect on how Confluent helped us to address the development complexity and provide scalability options at the same time.
Building Codealike: a journey into the developers analytics worldOren Eini
Codealike plugins in Visual Studio, Eclipse and Chrome, track developers while they code and perform analytic calculations at the millisecond level. For such write heavy workloads and using RavenDB as the main and only database was not without challenge. In this talk, we will reveal how we built and scaled such a solution, how we were able to improve performance with Voron and glance at our own mistakes and architectural choices down the line.
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...Sencha
View this presentation to see a real-time and data-centric application designed to help people manage large facilities, buildings, and homes in a smart way. It notably features D3.js dashboards, user-friendly device mapping, and automatic alerts on suspicious power consumptions.
What comes to you if you want to have some fun with GraphQL?
Join us for the meetup in Munich and enjoy the interesting talk about ‚GraphQL vs. (the) REST ‘.
In this session our colleagues Tsvetan Nikolov (Senior Developer at coliquio) and Tom Sedlmeier (Senior Developer at coliquio) will discuss the basic concepts and how to use them.
We will present you GraphQL on the backend and on the client side with Apollo.
Logging is one of those things that everyone complains about, but doesn't dedicate time to. Of course, the first rule of logging is "do it". Without that, you have no visibility into system activities when investigations are required. But, the end goal is much, much more than this. Almost all applications require security audit logs for compliance; application logs for visibility across all cloud properties; and application tracing for tracking usage patterns and business intelligence. The latter is that magic sauce that helps businesses learn about their customer or in some cases the data is FOR the customer. Without a strategy this can get very messy, fast. In this session Michele will discuss design patterns for a sound logging and audit strategy; considerations for security and compliance; the benefits of a noSQL approach; and more.
Database migrations with Flyway and LiquibaseLars Östling
An agile world of continuous integration and deployment reinforces the need to be able to seamlessly and effortlessly update your database to keep it in sync with the latest changes in your code. Implementing database migrations with Flyway or Liquibase will help you do just that. This presentation gives a quick overview of the two frameworks accompanied by some simple demos.
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBOren Eini
Join a real uplift experience with Hagay Albo, the CTO of the Zap/Yellow Page Group in Israel, in which he explains how his team was able to take a legacy (slow and hard to modify) group of sites and make them easier to work with, MUCH faster and greatly simplified the operational environment.
By prioritizing high availability, flexible data modeling and focusing on raw speed Zap was able to reduce its load times by Two Orders of Magnitudes. Using RavenDB as the core engine behind Zap's new sites had improved site traffic, reduced time to market and made it possible to implement the next-gen features that were previously beyond reach.
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
A traditional data team has roles including data engineer, data scientist, and data analyst. However, many organizations are finding success by integrating a new role – the analytics engineer. The analytics engineer develops a code-based data infrastructure that can serve both analytics and data science teams. He or she develops re-usable data models using the software engineering practices of version control and unit testing, and provides the critical domain expertise that ensures that data products are relevant and insightful. In this talk we’ll talk about the role and skill set of the analytics engineer, and discuss how dbt, an open source programming environment, empowers anyone with a SQL skillset to fulfill this new role on the data team. We’ll demonstrate how to use dbt to build version-controlled data models on top of Delta Lake, test both the code and our assumptions about the underlying data, and orchestrate complete data pipelines on Apache Spark™.
Blockchain: a Singularity-class technology - No other technology has the power to
pull 2 billion people out of poverty overnight (with intermediary-free international remittances), produce a safe and orderly transition to the automation economy (with humans and machines in collaboration, and enacting friendly artificial intelligence), and fundamentally transform the only remaining sectors not yet re-engineered for the Internet era: economics and politics. There are growing classes of activities for smartnetwork execution, moving up the stack, pushing different qualitative states through the Internet pipes, building future smartnetworks. The smartnetworks thesis is that complex future operations will involve automated fleet coordination of “quantized” items via smartnetworks, using some kind of technology like blockchains with algorithmically-derived trust.
Big Data no es una moda ni algo que esté por venir. Gran parte de las organizaciones ya cuentan con bases de datos tan grandes que requieren usar herramientas especiales. Ésta presentación nos ayuda a dar el primer paso, a conocer que en realidad qué es y como funciona, así como a adentrarnos en este maravilloso mundo de los datos al por mayor.
Estudio "Big Data: retos y oportunidades para el turismo"Invattur
Estudio sobre big data y turismo desarrollado por Invat.tur y Territorio creativo.
Más información: http://invattur.gva.es/
Vídeo presentación: http://youtu.be/kKh0_-E8OC0
Download at http://DavidHubbard.net/powerpoint - This Introduction to Business Intelligence gives an overview of how Business Intelligence fits into business strategy in general. It does not go into the specific technologies of Business Intelligence. It is meant to be used to explain Business Intelligence to those not already familiar with Business Intelligence.
Business Intelligence made easy! This is the first part of a two-part presentation I prepared for one of our customers to help them understand what Business Intelligence is and what can it do...
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
Measure and increase developer productivity with help of Severless by Kazulki...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Geek Sync | Deployment and Management of Complex Azure EnvironmentsIDERA Software
You can watch the replay of this Geek Sync webinar in the IDERA Resource Center: http://ow.ly/pg7N50A4svf.
Today's data management professional is finding their landscape changing. They have multiple database platforms to manage, multi-OS environments and everyone wants it now.
Join IDERA and Kellyn Pot’Vin-Gorman as she discusses the power of auto deployment in Azure when faced with complex environments and tips to increase the knowledge you need at the speed of light. Kellyn will cover scripting basics, advanced Portal features, opportunities to lessen the learning curve and how multi-platform and tier doesn't have to mean multi-cloud.
Attendees can expect to learn how to build automation scripts efficiently, even if you have little scripting experience, and how to work with Azure automation deployments. This session will allow you to begin building a repository of multi-platform development scripts to use as needed.
About Kellyn: Kellyn Pot’Vin-Gorman is a member of the Oak Table Network and an IDERA ACE and Oracle ACE Director alumnus. She is the newest Technical Solution Professional in Power BI with AI in the EdTech group at Microsoft. Kellyn is known for her extensive work with multi-database platforms, DevOps, cloud migrations, virtualization, visualizations, scripting, environment optimization tuning, automation, and architecture design. She has spoken at numerous technical conferences for Oracle, Big Data, DevOps, Testing and SQL Server. Her blog, http://dbakevlar.com and social media activity under her handle, DBAKevlar is well respected for her insight and content.
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?TechWell
When you’re building the next killer mobile app, how can you ensure that your app is both stable and capable of near-instant data updates? The answer: Build a backend! Siva Katir says that there’s much more to building a backend than standing up a SQL server in your datacenter and calling it a day. Since different types of apps demand different backend services, how do you know what sort of backend you need? And, more importantly, how can you ensure that your backend scales so you can survive an explosion of users when you are featured in the app store? Siva discusses the common scenarios facing mobile app developers looking to expand beyond just the device. He’ll share best practices learned while building the PlayFab and other companies’ backends. Join Siva to learn how you can ensure that your app can scale safely and affordably into the millions of concurrent users and across multiple platforms.
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDatabricks
We started out processing big data using AWS S3, EMR clusters, and Athena to serve Analytics data extracts to Tableau BI.
However as our data and teams sizes increased, Avro schemas from source data evolved, and we attempted to serve analytics data through Web apps, we hit a number of limitations in the AWS EMR, Glue/Athena approach.
This is a story of how we scaled out our data processing and boosted team productivity to meet our current demand for insights from 20M+ Smart Homes and 500M+ devices across the globe, from numerous internal business teams and our 150+ CSP partners.
We will describe lessons learnt and best practices established as we enabled our teams with DataBricks autoscaling Job clusters and Notebooks and migrated our Avro/Parquet data to use MetaStore, SQL Endpoints and SQLA Console, while charting the path to the Delta lake…
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software
Presentation given at the OpenStack summit in Paris (Kilo) on Tue Nov 4th.
Last summit I had the pleasure to present a talk which encountered some success "Are enterprise ready for the OpenStack transformation?" (also published on SlideShare) . This talk is a follow up on what are the best practices that are successful in operating the transformation. We will first focus on identifying the right use cases for a generic enterprise, then define a roadmap with an organisational and a technical track, to finish with the definition what would be our success criterias for our group. This will happen as a workshop summary based on the multiple engagements eNovance has been delivering over the past 2 years.
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
Network devices play a crucial role; they are not just in the Data Center. It's the Wifi, VOIP, WAN and recently underlays and overlays. Network teams are essential for operations. It's about time we highlight to the configuration management community the importance of Network teams and include them in our discussions. This talk describes the personal experience of systems engineer on how to kickstart a network team into automation. Most importantly, how and where to start, challenges faced, and progress made. The network team in question uses multi-vendor network devices in a large traditional enterprise.
NetDevOps, we do not hear that term as frequent as we should. Every time we hear about automation, or configuration management, it is usually the application, if not, it is the systems that host the applications. How about the network systems and devices that interconnect and protects our services? This talk aims to describe the journey a systems engineer had as part of an automation assignment with the network management team. Building from lessons learned and challenges faced with system automation, how one can kickstart an automation project and gain small wins quickly. Where and how to start the journey? What to avoid? What to prioritise? How to overcome the lack of network skills for the automation engineer and lack of automation and Linux/Unix skills for network engineers. What challenges were faced and how to overcome them? What fights to give up? Where do I see network automation and configuration management as a systems engineer? What are the status quo and future expectations?
BI Team @ LinkedIn hosted a user group meeting for MicroStrategy customers in bay area. Presentation includes information about LinkedIn, concepts of metadata driven model for business dashboards, customizations using SDK, JSP and JQUERY.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Measure and Increase Developer Productivity with Help of Serverless at Server...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Working Software Over Comprehensive DocumentationAndrii Dzynia
Не один десяток раз каждый из нас видео этот пункт Agile манифеста. Кто на официальном сайте Agile Manifesto, кто в книгах или статьях, кто на тренингах или конференциях. Звучит правильно очевидно и просто, но на практике возникают некие сложности с его реализацией. Как определить какие документы писать нужно, а какие не стоит? Как поддерживать документы с наименьшими усилиями? От каких документов нужно отказаться или заменить на более простые решения? Что стоит документировать тестировщику, разработчику, бизнес-аналитику в Agile проектах, для того чтобы презентовать результаты своей работы. На все эти вопросы я постараюсь ответить в своем докладе, закрепляя примерами которые вы сможете попытаться применить на своих проектах.
Bring-your-ML-Project-into-Production-v2.pdfLiang Yan
Machine Learning (ML) is quite popular today in the academic/research world. However, it is quite difficult to put into a product, especially a product with a huge customer base. This session will give Kubeflow, the open source ML toolkit on top of Kubernetes, a deep look from the MLOps perspective. Furthermore, we will have a brief look at Distributed MLSys and how Kubeflow copes with scalability. Last, a demo of the Kubeflow stack setup and ML project pipeline deployment will be demonstrated in the cloud.
Similar to Building your bi system-HadoopCon Taiwan 2015 (20)
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The affect of service quality and online reviews on customer loyalty in the E...
Building your bi system-HadoopCon Taiwan 2015
1. BUILD YOUR BI SYSTEM
PRACTICE IN DATA LAKE ECOSYSTEM
Bryan@Vpon Data
2. • Experience
Vpon Data Engineer
TWM, Keywear, Nielsen
• Bryan’s notes for data analysis
http://bryannotes.blogspot.tw
• Spark.TW
• Linikedin
https://tw.linkedin.com/pub/bryan-yang/7b/763/a79
ABOUT ME
13. 3 KINDS OF PROBLEMS
https://kavyamuthanna.wordpress.com/category/big-data/
14. BIG DATA BIG PROBLEM
http://www.mn.uio.no/ifi/studier/masteroppgaver/nd/masteroppgave_cloud_bigdata_hpc.html
15. BIG DATA BIG COST
• The cost of data storage
What does the data keep?
How long?
• The cost of data management
Is the machine and infra easy to maintain?
Data Flow(ETL)?
• The time cost of data processing
How long will the users can wait?
Accessibility of the data
Human costs you can not see
23. Overviews
Business intelligence (BI) is the set of techniques and tools for
the transformation of raw data into meaningful and useful
information for business analysis purposes. —Wikipedia
24. DIFFERENT FEATHERS
Price Perfomance Accessibility
Hadoop Low Median Low
SQL Server Low-Median Depends on Median
Data
Warehouse
High High Median
BI System High Depends on High
29. HIVE
• Create at Facebook
• Data warehouse in Hadoop ecosystem
• HiveQL(SQL like interface)
• Metastore(Save the schema of data,
schema on read)
• UDF
33. TERADATA
• Massively Parallel Processing
• Each processor handles different threads
of the program, and Each processor itself
has its own operating disk
• Teradata SQL is fully certified at the SQL
92
55. HOW TO CHOOSE THE
COMPONENT IN YOUR BI
FRAMEWORK ?
• The cost of data storage
• The cost of data management
• The time cost of data processing
56. CONSIDERINGS AND
SUGGESTIONS
• Time is money
• HDD space/ money for the time
• Understanding the components and
relationships
• Get balance of the needs and costs
• Good framework will help business growth
big data brings the problem in 3 ways.
Variety: kinds of data types, data sources , databases
Volume: log data, transection data, crawler data
Velocity: real time ,near real time, batch
Vpon is a big data advertising company. We receive and produce amount of data a day.
業務需求反應能等待的處理時間
We receive so many adhoc queries a day.
Queries are com from each development like Business development, sales, Account services
RD blahblah.
For example, how many users a day, how many requests a day, click rate, etc.