Approaches and Open Source Tools for Wrangling and Modeling Massive Datasets (Sarah Aerni)
Text Analytics at Scale on MPP (Srivatsan Ramanujam)
A Scalable Framework For Real Time Monitoring & Prediction Of Sensor Data (Jarrod Vawdrey)
Transforming Data to Unlock Its Latent ValueTony Ojeda
At the heart of data analysis, there lies a need to understand the real world entities being represented in the data. Every data set we encounter is an attempt to capture a slice of our complex world and communicate some information about it in a way that has potential to be informative to humans, machines, or both. Moving from basic analyses to advanced analytics requires the ability to imagine multiple ways of conceptualizing the composition of entities and the relationships present in our data. It also requires the realization that different levels of aggregation, disaggregation, and transformation can open up new pathways to understanding our data and identifying the valuable insights it contains.
In this talk, we’ll discuss several ways to think about the composition and representation of our data. We’ll also demonstrate a series of methods that leverage tools like networks, hierarchical aggregations, and unsupervised clustering to visually explore our data, transform it to discover new insights, help frame analytical problems and questions, and even improve machine learning model performance. In exploring these approaches, and with the help of Python libraries such as Pandas, Scikit-Learn, Seaborn, and Yellowbrick, we will provide a practical framework for thinking creatively and visually about your data and unlocking latent value and insights hidden deep beneath its surface.
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Srivatsan Ramanujam
These slides give an overview of the technology and the tools used by Data Scientists at Pivotal Data Labs. This includes Procedural Languages like PL/Python, PL/R, PL/Java, PL/Perl and the parallel, in-database machine learning library MADlib. The slides also highlight the power and flexibility of the Pivotal platform from embracing open source libraries in Python, R or Java to using new computing paradigms such as Spark on Pivotal HD.
Building a Data Ingestion & Processing Pipeline with Spark & AirflowTom Lous
Why we build a Data Ingestion & Processing Pipeline with Spark & Airflow @Datlinq and all the parts needed to get it all together in our big data system.
Presented 2017-02-10 at Data Driven Rijnmond Meetup:
https://www.meetup.com/nl-NL/Data-Driven-Rijnmond/events/236256531/
Transforming Data to Unlock Its Latent ValueTony Ojeda
At the heart of data analysis, there lies a need to understand the real world entities being represented in the data. Every data set we encounter is an attempt to capture a slice of our complex world and communicate some information about it in a way that has potential to be informative to humans, machines, or both. Moving from basic analyses to advanced analytics requires the ability to imagine multiple ways of conceptualizing the composition of entities and the relationships present in our data. It also requires the realization that different levels of aggregation, disaggregation, and transformation can open up new pathways to understanding our data and identifying the valuable insights it contains.
In this talk, we’ll discuss several ways to think about the composition and representation of our data. We’ll also demonstrate a series of methods that leverage tools like networks, hierarchical aggregations, and unsupervised clustering to visually explore our data, transform it to discover new insights, help frame analytical problems and questions, and even improve machine learning model performance. In exploring these approaches, and with the help of Python libraries such as Pandas, Scikit-Learn, Seaborn, and Yellowbrick, we will provide a practical framework for thinking creatively and visually about your data and unlocking latent value and insights hidden deep beneath its surface.
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Srivatsan Ramanujam
These slides give an overview of the technology and the tools used by Data Scientists at Pivotal Data Labs. This includes Procedural Languages like PL/Python, PL/R, PL/Java, PL/Perl and the parallel, in-database machine learning library MADlib. The slides also highlight the power and flexibility of the Pivotal platform from embracing open source libraries in Python, R or Java to using new computing paradigms such as Spark on Pivotal HD.
Building a Data Ingestion & Processing Pipeline with Spark & AirflowTom Lous
Why we build a Data Ingestion & Processing Pipeline with Spark & Airflow @Datlinq and all the parts needed to get it all together in our big data system.
Presented 2017-02-10 at Data Driven Rijnmond Meetup:
https://www.meetup.com/nl-NL/Data-Driven-Rijnmond/events/236256531/
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs
Marlabs’ Business Intelligence and Analytics practice can support customers’ needs throughout the information management lifecycle. As a vendor-agnostic and holistic service provider with expertise in a range of tools and technologies, we can help clients make informed decisions to employ the right technologies that align with their business needs.
Data Science Gravity is pulling us down! Watch this modern presentation inspired from Star Trek and the movie gravity to understand how to solve your Big Data Science problems
Analyttica, established in early 2012, is a team of highly experienced professionals who are intensely focused in the fields of information management and analytics across industries and functions.
Analyttica’s goal is to enable businesses make fact-based decisions with a blend of art and science through their proprietary solutions, and train their partners along those skills through their knowledge immersion platforms and/or build appropriate products/tools.
Mission:
Enable innovative analytical decisioning and learning, with solutions to create sustained business impact across the customer lifecycle.
Process mining provides new ways to utilize the abundance of event data in our society. This emerging scientific discipline can be viewed as a bridge between data science and process science: It is both data-driven and process-centric. Process mining provides a novel set of tools to discover the real processes, to detect deviations from normative processes, and to analyze bottlenecks and waste. The Internet of Events (IoE) not only includes classical sources of information like the webpages, information systems, and social media, but also incorporates the Internet of Things (IoT), wearables, mobile devices and Industry 4.0. Analogous to spreadsheets, process mining provides a generic domain-independent technology (starting from events rather than numbers). In his talk, Wil van der Aalst will argue that process mining should be an integral part of tomorrow's data scientist. He will introduce basic concepts and elaborate on his collaboration with industry. His research group at TU/e applied process mining in over 150 organizations, developed the open-source tool ProM, and influenced the 25+ commercial process mining tools available today.
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
This presentation introduces big data and explains how to generate actionable insights using analytics techniques. The deck explains general steps involved in a typical analytics project and provides a brief overview of the most commonly used predictive analytics methods and their business applications.
Vijay Adamapure is a Data Science Enthusiast with extensive experience in the field of data mining, predictive modeling and machine learning. He has worked on numerous analytics projects ranging from healthcare, business analytics, renewable energy to IoT.
Vijay presented these slides during the Internet of Everything Meetup event 'Predictive Analytics - An Overview' that took place on Jan. 9, 2015 in Mumbai. To join the Meetup group, register here: http://bit.ly/1A7T0A1
Building Highly Scalable Spring Applications using In-Memory Data GridsJohn Blum
Slides for Luke Shannon and I's presentation at SpringOne2GX-2015 in Washingon D.C. on Tuesday, September 15th from 10:30 am to 12:00 PM EDT.
Session details @ https://2015.event.springone2gx.com/schedule/sessions/building_highly_scalable_spring_applications_with_in_memory_distributed_data_grids.html.
Who Does What? Mapping Cloud Foundry Activities and Entitlements to IT RolesVMware Tanzu
SpringOne Platform 2016
Speaker: Cornelia Davis; Sr. Director of Technology, Pivotal.
While “cf push” is the center of it all, there are many more things that various individuals can do with the Cloud Foundry platform. They can monitor, scale and upgrade those deployed apps. And also deploy, monitor, scale and upgrade the platform itself. Further, to operationalize the platform in an enterprise there are quotas, security groups, route services, environment variable groups and many other “knobs” that may also be tuned, and there are various roles and permission structures to govern these. In this session Cornelia will take a holistic view of the Cloud Foundry “control plane” and map the key functions to IT roles (perhaps with some redefinition), and she’ll show which entitlements allow which configurations. Ultimately the goal is to understand how Cloud Foundry can be effectively used to optimize the development and operations (Devops) in your organization. Participants will leave with a concrete framework for transforming current IT practices, roles and responsibilities using the Cloud Foundry platform.
SpringFramework 5에서 선보이는 Reactive와 같은 핵심기능이 2017 2017년 12월 샌프란시스코에서 열린 Spring One Platform행사에서 소개된 내용중 Spring Data, Spring Security, Spring WebFlux프로젝트에 녹아져 있는지 살펴봅니다. 또한 이러한 기능들이 어떻게 여러분의 시스템의 반응성을 높이고 효율적으로 동작하게 하는지 알아봅니다.
IO State In Distributed API ArchitectureOwen Rubel
The API pattern bind IO functionality to business functionality by binding IO state either through annotation (ie JAX) or by extending a RestfulController. As a result, the data associated IO State cannot be shared with the architectural instances because it is bound to the controller. This creates architectural cross cutting concerns not only with the functionality but also with the data. By abstracting the functionality, we can create a versioned data object for IO state that can be shared,cached,synced,reloaded on the fly for all architectural instances without having to restart any instance. This greatly improve automation, performance and flow of api applications and architecture.
Implementing a highly scalable stock prediction system with R, Geode, SpringX...William Markito Oliveira
Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud. This session is going to walk-through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the web interface that will consume data through a RESTful endpoint, real-time. The solution leverages in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms.
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To Th...VMware Tanzu
SpringOne Platform 2018
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To The Cloud - Joe Szodfridt, Rohit Kelapure, Shaun Anderson
GAINING APPLICATION LIFECYCLE INTELLIGENCE
Applied Spring Track
Today we are facing an ever-increasing speed of product delivery. DevOps practices
like continuous integration and deployment increase the dependence of systems
like task tracking and source code repositories with build servers and test suites.
With data moving rapidly through these different tools, it becomes challenging to
maintain a grasp of the process, especially as the data is distributed and in a variety
of formats. But it is still critical to maintain full visibility of the product development
journey – from user stories to production data. By starting at the beginning of the
Product Development Lifecycle, you can track a problem in production all the way
back to the code that was checked into the build and the developer responsible for
the code.
In this session I'll demonstrate some of the ways in which Splunk software can be
used to collect and correlate data throughout the various stages of the lifecycle of
your code, to ultimately make you more efficient and make your code better.
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs
Marlabs’ Business Intelligence and Analytics practice can support customers’ needs throughout the information management lifecycle. As a vendor-agnostic and holistic service provider with expertise in a range of tools and technologies, we can help clients make informed decisions to employ the right technologies that align with their business needs.
Data Science Gravity is pulling us down! Watch this modern presentation inspired from Star Trek and the movie gravity to understand how to solve your Big Data Science problems
Analyttica, established in early 2012, is a team of highly experienced professionals who are intensely focused in the fields of information management and analytics across industries and functions.
Analyttica’s goal is to enable businesses make fact-based decisions with a blend of art and science through their proprietary solutions, and train their partners along those skills through their knowledge immersion platforms and/or build appropriate products/tools.
Mission:
Enable innovative analytical decisioning and learning, with solutions to create sustained business impact across the customer lifecycle.
Process mining provides new ways to utilize the abundance of event data in our society. This emerging scientific discipline can be viewed as a bridge between data science and process science: It is both data-driven and process-centric. Process mining provides a novel set of tools to discover the real processes, to detect deviations from normative processes, and to analyze bottlenecks and waste. The Internet of Events (IoE) not only includes classical sources of information like the webpages, information systems, and social media, but also incorporates the Internet of Things (IoT), wearables, mobile devices and Industry 4.0. Analogous to spreadsheets, process mining provides a generic domain-independent technology (starting from events rather than numbers). In his talk, Wil van der Aalst will argue that process mining should be an integral part of tomorrow's data scientist. He will introduce basic concepts and elaborate on his collaboration with industry. His research group at TU/e applied process mining in over 150 organizations, developed the open-source tool ProM, and influenced the 25+ commercial process mining tools available today.
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
This presentation introduces big data and explains how to generate actionable insights using analytics techniques. The deck explains general steps involved in a typical analytics project and provides a brief overview of the most commonly used predictive analytics methods and their business applications.
Vijay Adamapure is a Data Science Enthusiast with extensive experience in the field of data mining, predictive modeling and machine learning. He has worked on numerous analytics projects ranging from healthcare, business analytics, renewable energy to IoT.
Vijay presented these slides during the Internet of Everything Meetup event 'Predictive Analytics - An Overview' that took place on Jan. 9, 2015 in Mumbai. To join the Meetup group, register here: http://bit.ly/1A7T0A1
Building Highly Scalable Spring Applications using In-Memory Data GridsJohn Blum
Slides for Luke Shannon and I's presentation at SpringOne2GX-2015 in Washingon D.C. on Tuesday, September 15th from 10:30 am to 12:00 PM EDT.
Session details @ https://2015.event.springone2gx.com/schedule/sessions/building_highly_scalable_spring_applications_with_in_memory_distributed_data_grids.html.
Who Does What? Mapping Cloud Foundry Activities and Entitlements to IT RolesVMware Tanzu
SpringOne Platform 2016
Speaker: Cornelia Davis; Sr. Director of Technology, Pivotal.
While “cf push” is the center of it all, there are many more things that various individuals can do with the Cloud Foundry platform. They can monitor, scale and upgrade those deployed apps. And also deploy, monitor, scale and upgrade the platform itself. Further, to operationalize the platform in an enterprise there are quotas, security groups, route services, environment variable groups and many other “knobs” that may also be tuned, and there are various roles and permission structures to govern these. In this session Cornelia will take a holistic view of the Cloud Foundry “control plane” and map the key functions to IT roles (perhaps with some redefinition), and she’ll show which entitlements allow which configurations. Ultimately the goal is to understand how Cloud Foundry can be effectively used to optimize the development and operations (Devops) in your organization. Participants will leave with a concrete framework for transforming current IT practices, roles and responsibilities using the Cloud Foundry platform.
SpringFramework 5에서 선보이는 Reactive와 같은 핵심기능이 2017 2017년 12월 샌프란시스코에서 열린 Spring One Platform행사에서 소개된 내용중 Spring Data, Spring Security, Spring WebFlux프로젝트에 녹아져 있는지 살펴봅니다. 또한 이러한 기능들이 어떻게 여러분의 시스템의 반응성을 높이고 효율적으로 동작하게 하는지 알아봅니다.
IO State In Distributed API ArchitectureOwen Rubel
The API pattern bind IO functionality to business functionality by binding IO state either through annotation (ie JAX) or by extending a RestfulController. As a result, the data associated IO State cannot be shared with the architectural instances because it is bound to the controller. This creates architectural cross cutting concerns not only with the functionality but also with the data. By abstracting the functionality, we can create a versioned data object for IO state that can be shared,cached,synced,reloaded on the fly for all architectural instances without having to restart any instance. This greatly improve automation, performance and flow of api applications and architecture.
Implementing a highly scalable stock prediction system with R, Geode, SpringX...William Markito Oliveira
Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud. This session is going to walk-through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the web interface that will consume data through a RESTful endpoint, real-time. The solution leverages in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms.
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To Th...VMware Tanzu
SpringOne Platform 2018
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To The Cloud - Joe Szodfridt, Rohit Kelapure, Shaun Anderson
GAINING APPLICATION LIFECYCLE INTELLIGENCE
Applied Spring Track
Today we are facing an ever-increasing speed of product delivery. DevOps practices
like continuous integration and deployment increase the dependence of systems
like task tracking and source code repositories with build servers and test suites.
With data moving rapidly through these different tools, it becomes challenging to
maintain a grasp of the process, especially as the data is distributed and in a variety
of formats. But it is still critical to maintain full visibility of the product development
journey – from user stories to production data. By starting at the beginning of the
Product Development Lifecycle, you can track a problem in production all the way
back to the code that was checked into the build and the developer responsible for
the code.
In this session I'll demonstrate some of the ways in which Splunk software can be
used to collect and correlate data throughout the various stages of the lifecycle of
your code, to ultimately make you more efficient and make your code better.
Cloud-Native Streaming and Event-Driven MicroservicesVMware Tanzu
MARIUS BOGOEVICI SPRING CLOUD STREAM LEAD
Join us for an introduction to Spring Cloud Stream, a framework for creating event-driven microservices that builds on on the ease of development and execution of Spring Boot, the cloud-native capabilities of Spring Cloud, and the message-driven programming model of Spring Integration. See how Spring Cloud Stream’s abstractions and opinionated primitives allow you to easily build applications that can interchangeably use RabbitMQ, Kafka or Google PubSub without changing the application logic. Finally, we will show how these applications can be orchestrated and deployed on different modern runtimes such as Cloud Foundry, Kubernetes or Mesos using Spring Cloud Data Flow.
Lattice: A Cloud-Native Platform for Your Spring ApplicationsMatt Stine
As presented at SpringOne2GX 2015 in Washington, DC.
Lattice is a cloud-native application platform that enables you to run your applications in containers like Docker, on your local machine via Vagrant. Lattice includes features like:
Cluster scheduling
HTTP load balancing
Log aggregation
Health management
Lattice does this by packaging a subset of the components found in the Cloud Foundry elastic runtime. The result is an open, single-tenant environment suitable for rapid application development, similar to Kubernetes and Mesos Applications developed using Lattice should migrate unchanged to full Cloud Foundry deployments.
Lattice can be used by Spring developers to spin up powerful micro-cloud environments on their desktops, and can be useful for developing and testing cloud-native application architectures. Lattice already has deep integration with Spring Cloud and Spring XD, and you’ll have the opportunity to see deep dives into both at this year’s SpringOne 2GX. This session will introduce the basics:
Installing Lattice
Lattice’s Architecture
How Lattice Differs from Cloud Foundry
How to Package and Run Your Spring Apps on Lattice
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...Christian Tzolov
https://springoneplatform.io/2018/sessions/machines-can-learn-a-practical-take-on-machine-intelligence-using-spring-cloud-data-flow-and-tensorflow
Machine learning (ML) has brought unprecedented abilities to the software engineering field. ML allows you to reason about and to solve otherwise "un-programmable" tasks such as computer vision and language processing. If you're a Java developer and you're interested in leveraging ML to deliver richer business insights to your customers, in this talk you'll learn what it takes to build cloud-native applications to perform data-driven machine intelligence operations. This coding-centric talk walks through the different facets of iterative development and testing using Spring Cloud Stream and the orchestration of such applications into coherent data pipelines using Spring Cloud Data Flow. Specifically, we will also review TensorFlow, a popular Machine Learning toolkit, and how it is integrated in the overall design. This talk will showcase how building a complex use-cases such as real-time image recognition or object detection, can be simplified with the help of Spring Ecosystem and TensorFlow. More importantly, I'd will share the findings from the ML space; tips and tricks on what goes into developing such complex solutions.
Software-Defined Security: The New School of Security Designed for DevOpsVMware Tanzu
SpringOne Platform 2019
Session Title: Software-Defined Security: The New School of Security Designed for DevOps
Speaker: Tim Reilly, COO, Zettaset
Youtube: https://youtu.be/3aBgj4DqASE
Similar to Data Driven Action : A Primer on Data Science (20)
Sales and marketing teams in enterprises have too many leads to pursue but have limited time and budget at their disposal. To build a strong sales pipeline, marketers should target their prospects with the right content to engage their interests and nurture them before handing them off to their sales teams. Prioritizing the right deals for sales team requires effective strategies for scoring leads and accurately forecasting opportunities help them identify issues early to meet their targets. In this talk, we will look under the hood of the machine learning pipelines in Salesforce Einstein that help sales and marketing teams win more deals. Specifically, we'll look at the problem of scoring prospects based on their engagement so that marketers know when they are ready to buy. Next, we will share our journey on model interpretability in providing actionable insights with our predictions. Finally, we will describe how we generate scores and insights for all customers through a model tournament, so that enterprises and small businesses alike can reap the benefits of machine learning.
Climate Data Lake: Empowering Citizen Scientists in Acadia National ParkSrivatsan Ramanujam
Learn how EMC and Pivotal are teaming up to empower citizen scientists @ Acadia National Park to study climate change and its influence on phenology in the park, by building a Climate Data Lake.
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
Unstructured data is everywhere - in the form of posts, status updates, bloglets or news feeds in social media or in the form of customer interactions Call Center CRM. While many organizations study and monitor social media for tracking brand value and targeting specific customer segments, in our experience blending the unstructured data with the structured data in supplementing data science models has been far more effective than working with it independently.
In this talk we will show case an end-to-end topic and sentiment analysis pipeline we've built on the Pivotal Greenplum Database platform for Twitter feeds from GNIP, using open source tools like MADlib and PL/Python. We've used this pipeline to build regression models to predict commodity futures from tweets and in enhancing churn models for telecom through topic and sentiment analysis of call center transcripts. All of this was possible because of the flexibility and extensibility of the platform we worked with.
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
These are slides from my talk @ DataDay Texas, in Austin on 30 Mar 2013
(http://2013.datadaytexas.com/schedule)
Favorite and Fork PyMADlib on GitHub: https://github.com/gopivotal/pymadlib
MADlib: http://madlib.net
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.