Why big data
What is big data
When big data is big data
Big data information system layers
Hadoop echo system
What is machine learning
Why machine learning with big data
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
Data visualization can be a tricky problem, even more if the dataset is made of several billions of 3-dimensional particles moving along the time. The talk will focus on some simple indexing and data thinning techniques and how (and how do not) implement them with Cassandra and Spark.
Processing transactions is at the core of any bank’s business. Danske Bank’s journey started with recognising the value that could be gleaned from generating insights from the data to improve customer behaviour analytics. Today, the company streams large volumes of transactional data in near-real time onto its Hortonworks data Platform to improve fraud detection and customer marketing. In this session, Nadeem will outline the bank’s vision, how it was socialised across the executive board team and the resulting sponsorship, the technological path, challenges overcome and the results that have not only improved the customer experience but quantifiable metrics fraud and opening new revenue streams. Furthermore, Nadeem will cover future use cases around maintenance and operations.
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
Data visualization can be a tricky problem, even more if the dataset is made of several billions of 3-dimensional particles moving along the time. The talk will focus on some simple indexing and data thinning techniques and how (and how do not) implement them with Cassandra and Spark.
Processing transactions is at the core of any bank’s business. Danske Bank’s journey started with recognising the value that could be gleaned from generating insights from the data to improve customer behaviour analytics. Today, the company streams large volumes of transactional data in near-real time onto its Hortonworks data Platform to improve fraud detection and customer marketing. In this session, Nadeem will outline the bank’s vision, how it was socialised across the executive board team and the resulting sponsorship, the technological path, challenges overcome and the results that have not only improved the customer experience but quantifiable metrics fraud and opening new revenue streams. Furthermore, Nadeem will cover future use cases around maintenance and operations.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
During the rise and innovation of “big data,” the geospatial analytics landscape has grown and evolved. We are beyond just analyzing static maps. Geospatial data is streaming from devices, sensors, infrastructure systems, or social media, and our applications and use cases must dynamically scale to meet the increased demands.
Cloud can provide cost-effective storage and that ephemeral resource-burst needed for fast processing and low latency, all to monetize the immediate value of fresh geospatial data. Geospatial analytics require optimized spatial data types and algorithms to distill data to knowledge. Such processing, especially with strict latency requirements, has always been a challenge.
We propose an open source big data stack for geospatial analytics on Cloud based on Apache NiFi, Apache Spark and LocationTech GeoMesa. GeoMesa is a geospatial framework deployed in a modern big data platform that provides a scalable and low latency solution for indexing volumes of historical data and generating live views and streaming geospatial analytics. CONSTANTIN STANCA, Solutions Engineer, Hortonworks and JAMES HUGHES, Mathematician, CCRi
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
The Health Cyberinfrastructure Division at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego has been deploying and managing a number of big data platforms ranging from the traditional data warehouse to the more recent big data platforms leveraging Hadoop in a secure cloud platform, Sherlock Cloud, for nearly a decade. We understand the necessity to remain agile and visionary in this arena to grow with the ever-changing technological and customer requirements while simultaneously ensuring a compliant environment to secure data.
As such, during our presentation, we will speak to our more recent deployment, namely a multi-cloud, Hadoop-based data management platform and the mechanisms employed to marry best-of-breed big data technology solutions and cloud platforms to support large-scale data management and analytics within the highly secure and compliant (U.S. HIPAA-compliant) boundaries of our hybrid cloud that spans an on-premises cloud running at UC San Diego and another operating in AWS Cloud. We will further identify the challenges and lessons learned from deploying, and securely operating, a big data platform offering capabilities that include disaster recovery and business continuity across a hybrid cloud setup.
Speaker
Sandeep Chandra, Division Director, San Diego Supercomputer Center
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
Mate Radalj's presentation on how to operationalize machine learning using GPU-accelerated, in-database analytics, given at the Bay Area GPU-Accelerated Computing Meetup on October 19, 2017. Presentation includes use cases and links to demos.
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
During the rise and innovation of “big data,” the geospatial analytics landscape has grown and evolved. We are beyond just analyzing static maps. Geospatial data is streaming from devices, sensors, infrastructure systems, or social media, and our applications and use cases must dynamically scale to meet the increased demands.
Cloud can provide cost-effective storage and that ephemeral resource-burst needed for fast processing and low latency, all to monetize the immediate value of fresh geospatial data. Geospatial analytics require optimized spatial data types and algorithms to distill data to knowledge. Such processing, especially with strict latency requirements, has always been a challenge.
We propose an open source big data stack for geospatial analytics on Cloud based on Apache NiFi, Apache Spark and LocationTech GeoMesa. GeoMesa is a geospatial framework deployed in a modern big data platform that provides a scalable and low latency solution for indexing volumes of historical data and generating live views and streaming geospatial analytics. CONSTANTIN STANCA, Solutions Engineer, Hortonworks and JAMES HUGHES, Mathematician, CCRi
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
The Health Cyberinfrastructure Division at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego has been deploying and managing a number of big data platforms ranging from the traditional data warehouse to the more recent big data platforms leveraging Hadoop in a secure cloud platform, Sherlock Cloud, for nearly a decade. We understand the necessity to remain agile and visionary in this arena to grow with the ever-changing technological and customer requirements while simultaneously ensuring a compliant environment to secure data.
As such, during our presentation, we will speak to our more recent deployment, namely a multi-cloud, Hadoop-based data management platform and the mechanisms employed to marry best-of-breed big data technology solutions and cloud platforms to support large-scale data management and analytics within the highly secure and compliant (U.S. HIPAA-compliant) boundaries of our hybrid cloud that spans an on-premises cloud running at UC San Diego and another operating in AWS Cloud. We will further identify the challenges and lessons learned from deploying, and securely operating, a big data platform offering capabilities that include disaster recovery and business continuity across a hybrid cloud setup.
Speaker
Sandeep Chandra, Division Director, San Diego Supercomputer Center
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
Mate Radalj's presentation on how to operationalize machine learning using GPU-accelerated, in-database analytics, given at the Bay Area GPU-Accelerated Computing Meetup on October 19, 2017. Presentation includes use cases and links to demos.
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
Most organizations still rely on batch and offline processing of data streams to gain meaningful analysis and insight into their business. However, in our instant gratification world, real-time computation and analysis of streaming data is crucial in gaining insight into patterns and threats. A trend is emerging for real-time and instant analysis from live data streams, promoting the value of logs and a move toward functional programming.
This shift in technology is not about what and how to store the data, but what we can do with it to see emerging patterns and trends across multiple resources, applications, services and environments. Log data represents a wealth of information, yet is often sporadic, unstructured, scattered across the enterprise and difficult to track.
These slides provide insights into some of the most helpful Big Data tools used by the largest social media and data-centric organizations for competitive trends, instant analysis and feedback from large volume data streams. We show how how using Big Data tools Storm, ElasticSearch and an elastic UI can turn application logs into real-time analytical views.
You will also learn how Big Data:
Contains data that is elastic, minimally structured, flexible and scalable
Helps process live streams into meaningful data
Promotes a move toward functional programming
Effects the enterprise data architecture
Works with real-time CEP tools like Storm for functional programming
It describe cloud infrastructure required for big data. It discusses the object storage and virtualization required for big data. Ceph is discussed as example.
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.
Customer value analysis of big data productsVikas Sardana
Business value analysis through Customer Value Model for software technology choices with a case study from Mobile Advertising industry for Big Data use case.
Learn more about the tools, techniques and technologies for working productively with data at any scale. This presentation introduces the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Jon Einkauf, Senior Product Manager, Elastic MapReduce, AWS
Alan Priestley, Marketing Manager, Intel and Bob Harris, CTO, Channel 4
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
This session covers 9 new and exciting big data technologies that are starting to become relevant in the enterprise. The session focuses on technologies that are still not mainstream but that have the potential to influence the next generation of enterprise big data solutions
Presentation: Overview of Kognitio, Kognitio Cloud and the Kognitio Analytical Platform
Kognitio is driving the convergence of Big Data, in-memory analytics and cloud computing. Having delivered the first in-memory analytical platform in 1989, it was designed from the ground up to provide the highest amount of scalable compute power to allow rapid execution of complex analytical queries without the administrative overhead of manipulating data. Kognitio software runs on industry-standard x86 servers, or as an appliance, or in Kognitio Cloud, a ready-to-use analytical platform. Kognitio Cloud is a secure, private or public cloud Platform-as-a-Service (PaaS), leveraging the cloud computing model to make the Kognitio Analytical Platform available on a subscription basis. Clients span industries, including market research, consumer packaged goods, retail, telecommunications, financial services, insurance, gaming, media and utilities.
To learn more, visit www.kognitio.com and follow us on Facebook, LinkedIn and Twitter.
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
Challenges in Data Analytics:
Different application scenarios need different storage solutions: HBASE is ideal for point query scenarios but unsuitable for multi-dimensional queries. MPP is suitable for data warehouse scenarios but engine and data are coupled together which hampers scalability. OLAP stores used in BI applications perform best for Aggregate queries but full scan queries perform at a sub-optimal performance. Moreover, they are not suitable for real-time analysis. These distinct systems lead to low resource sharing and need different pipelines for data and application management.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...Amr Kamel Deklel
Abstract
Long Term Artificial Neural Network Memory (LTANN-MEM) and Neural Symbolization Algorithm (NSA)
are proposed for solving symbolic regression problems. Although this approach is capable of solving Boolean
decoder problems of sizes 6, 11 and 20, it is not capable of solving decoder problems of higher dimensions like
decoder-37; decoder-n is decoder with sum of inputs and outputs is n for example decoder-20 is decoder with 4
inputs and 16 outputs. It is shown here that LTANN-MEM and NSA approach is a kind of transfer learning
however it lacks for sub tasking transfer and updatable LTANN-MEM. An approach for adding the sub tasking
transfer and LTANN-MEM updates is discussed here and examined by solving decoder problems of sizes 37, 70
and 135 efficiently. Comparisons with two learning classifier systems are performed and it is found that the
proposed approach in this work outperforms both of them. It is shown that the proposed approach is used also for
solving decoder-264 efficiently. According to the best of our knowledge, there is no reported approach for solving
this high dimensional problem.
What is Quantum Computing
What is Quantum bits (Qubit)
What is Reversible Logic gates and Logic Circuits
What is Quantum Neuron (Quron)
What are the methods of implementing ANN using Quantum computing
Abstract— This presents a comprehensible neural network tree (CNNTREE). CNNTREE is a proposed general modular neural network structure, where each node in this tree is a comprehensible expert neural network (CENN). One advantage of using CNNTREE is that it is a “gray box”; because it can be interpreted easily for symbolic systems; where each node in the CNNTREE is equivalent for symbolic operator in the symbolic system. Another advantage of CNNTREE is that it could be trained as any normal multi layer feed forward neural network. An evolutionary algorithm is given for designing the CNNTREE. Back propagation is also checked as local learning algorithm that fits for real time learning constraints. The tree generalization and training performance are examined using experiments with a digit recognition problem.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Giza At A Glance
• We are system integrator
• 43 years in the market
• Work in 25 countries
• 4 Regions of operation
• Enterprise Business
Solutions
• SCADA
• Transmission &
Distribution
• Transportation
Infrastructure
• Field Solutions
• Smart Buildings
3. Contents
• Introduction
• When Data is “Big”
• Big Data Information System Layers
Data Platform
Data Science & Advanced Analytics
Information Presentation
Actionable Insights
• Machine Intelligence
5. • 2014, EMC & IDC digital universe report
• A study to analyze and forecast the amount of
data produced annually
• It is the universe of digital data
• Like the physical universe
It expands fast
Includes stars
Includes dark matter
About everything
The Digital Universe
6. Digital Universe Expands Fast
• Digital data doubles every two year
• Expected 44 ZB by 2020 44 Trillion GB
– ZB 103 EB 106 PB 109 TB
• Every second 205,000 new GB
• During this presentation ~ new 550 Million GB
• Less than 25% of recorded data is tagged
7. Telecommunication Revolution
• Smart phones full of
sensors
• Smart phone cameras
• High speed networks
• Mobile penetration
• Multiple devices per
customer
• Huge amount of data
transferred
• Communication
control data
8. Social Networks
• YouTube Statistics
1,300,000,000 users
300 hours / minute
uploaded
30 million visitors /
day
9. Internet of Things: Smart Cities
• Metering
• Smart homes
• Smart buildings
• Smart parking
• Street lighting
• Traffic monitoring
• And others
10. Internet of Things: Smart Farming
• Weather measuring
• Air sensors
• Water sensors
• Water leakage sensors
• Soil monitoring
• Irrigation monitoring and control
• Harvesting machines tracking and monitoring
• Farm animals tracking and monitoring
• And others
11. Internet of Things: Industrial
• Air craft sensors gather ~1TB per flight
• Jet engines produces ~25 MB per flight hour per
engine
• Think about
– power plants,
– oil plants,
– water plants, etc.
13. • Gartner, the known provenance of 3Vs of Big Data defines
Big Data as: High-volume, high-velocity and high-variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making.
• IDC defines Big Data technologies as: A new
generation of technologies and architectures, designed to
economically extract value from very large volumes of a
wide variety of data by enabling high-velocity capture,
discovery, and/or analysis.
Definitions
15. • Structured, semi-structured and non-structured data
• Semi-structured
Log files
Manually edited excel files
Others
• Non-structured
Chat conversations
Emails
Images & videos
Others
• Most of this data already belongs to organizations, but it is
sitting there unused — that’s why Gartner calls it dark data
Data Variety
16. • The speed at which data is:-
Created
Stored
Analyzed
• In Big Data systems, data is created in real-time or
near real-time
Data Velocity
17. • 90% of all data ever created, was created in past 2
years
• Estimated amount of data doubles every two year
• The era of a trillion sensor is upon us
Data Volume
21. Hadoop Distributed File System
(HDFS)
• Open source project
• Java-based file system that
• Scalable up to 200 PB
• Up to 4500 server of single cluster
• Close to a billion files and blocks
• Concurrent access through
“YARN”
22. Map-Reduce Algorithm
• A framework for
processing problems in
parallel
• Uses multiple computing
cluster nodes
23. Apache HBase
• Open source project
• Non-relational database
• Column-oriented key-value
data store
• Part of Hadoop project
• Can serve as input & output of
map-reduce jobs in Hadoop
• Data access through Java API
24. Apache Phoenix
• Open source
• Part of Apache Hadoop
Project
• Based on Apache HBase
• Provides a JDBC and
ODBC drivers for Hbase
25. Hadoop Distributions
• Top Known:-
- Cloudera
- MapR
- Hortonworks
- IBM
- Pivotal HD
- Intel distribution
• Cloud based:-
- Azure HDInsight
- Amazon Elastic MapReduce
27. Massively Parallel Processing
(MPP) Data Warehouse
Architecture
• Share nothing architecture, no single point of failure
• Scale horizontally by adding nodes
• Breaks large queries across nodes for parallel
processing
• Higher data ingestion rates through parallelized data
movement
28. MPP Database Examples
• Teradata
• Netezza
• Vertica
• Greenplum
• Microsoft PDW (Parallel
Data Warehouse)
• DB2 UDB with database
partitioning feature
(DPF)
31. Types of Data Analytics
Analytics
Descriptive
Diagnostic
Predictive
Prescriptive
32. Descriptive Analytics
• What happened
- Which KPIs
- Which time frame
- Which filter
- What chart type
- How remove noise
33. Diagnostic Analytics
• Why happened
- Why this KPI is low
- What factors of KPI
- Which factors use
to compare
- How to compare
with changing
single factor and fix
others
37. Data Mining
• Data mining is the computing process of discovering
patterns in large data sets.
• Cross Industry Standard Process for Data Mining
(CRISP-DM):-
- Business understanding
- Data understanding
- Modeling
- Evaluation
- Deployment
42. Reporting / Dashboards
• Reporting
Rich formatted and interactive
reports
Reports with / or without
parameters
Using scheduling capabilities
• Dashboards
Publishing web based / mobile
reports
Interactive display for KPI
comparisons with targets
Integration with operational
applications and or event
processing engines
43. Alerts
• Alerts of business intelligence and analytics content
via:
Emails
SMS
Or customized receiver (i.e. custom web
service)
44. Geospatial and Location
Intelligence
• Combining geographical
and location-related data
from data sources
including:-
- Aerial maps
- GISs
- Consumer
demographics
• Displaying relationships by
overlaying data on
interactive maps
45. Mobile Information Presentation
• Develop and deliver
content to mobile devices
• Publishing mode and/or
interactive mode
• Takes advantage of mobile
devices’ native caps i.e.:-
- Touch screens
- Camera
- Location awareness
- Natural-Language
query
47. Linking Insights to Actions
• Forrester reports that
74% of firms want to be
“data driven”
• But only 29% are
actually successfully
connecting analytics to
action
• Actionable insights are
the missing link
48. Attributes of Actionable Insights
Aligned with your
business goals
Insight results have
context
Relevance; Insights
delivered to the right
person, in the right time
and settings
Insights are Specific
Novel insights have an
advantage over familiar
ones
Clarity of the insight
51. Why Machine Learning for Big
Data Analytics
• Dark data makes up more than 90% of the digital
universe
• This is huge amount of data volume, formats, and
sources to be handled in a conventional way
• Analysis of non-structured data like images, videos,
and sound files is usually done using Machine
Learning algorithms
• More data better training results
52. Artificial Neural Networks (ANN)
• Computing systems are
inspired by biological neural
networks
• Based on a collection of
artificial “neurons” connected
by “synaptic connections”
• Synaptic connections have
weights to control transmitted
signal strength
• Neurons may have thresholds
to control aggregated signal
transmission
53. Deep Neural Networks (DNN)
• ANN with multiple hidden
layers between the input
and output layers
• The extra layers enable
composition of features from
lower layers
• Applied technology for
tagging of huge amount of
Dark Data images, videos,
speech, music, etc.
54. Graphics Processing Units (GPU)
• Rapidly create images in frame buffers for output
to display device
• General Purpose GPU (GPGPU), stream
processor or vector processor running compute
kernels
• Suitable for deep neural networks learning
• Several orders of magnitude higher than CPU
• GPU clusters
• Cloud-based GPU (IaaS)
55. Combining HDFS with GPU
Conventional Large Scale Distributed Deep
Learning on Hadoop Clusters