This document discusses big data concepts including what makes data "big", Hadoop, HDFS, MapReduce, and the Hadoop ecosystem. It also covers big data applications in sectors like enhancing customer experience, big data revenue opportunities, and current limitations. Additionally, it discusses privacy and security issues, case studies of data breaches, dimensional modeling concepts, and governance best practices for big data.
Big Data Analytics in Bangladesh | Pridesys IT LtdPridesys IT Ltd.
Big Data Analytics in Bangladesh | Enterprise Resource Planning (ERP) Software System | Pridesys IT Ltd, Different Innovation and Excellence | data science companies in bangladesh
Organizations want to use all the data available to them for analytics. But they’ve been thwarted by data silos and top-down, mostly manual approaches to unifying data for analytics. A new approach, based on machine learning combined with human expert sourcing, dramatically speeds analytics’ time-to-value. It automates data unification end-to end: from finding and connecting diverse data to interactive consumption by virtually anyone using any analytic tool.
Dama Ireland slides - Data Trust event 9th June 2016Ken O'Connor
Do we need a Data Trust / Data Quality Mark?
Presentation by Data Management Specialist, Ken O'Connor:
Our food packaging provides facts about the food we buy. It's required by law. These facts enable us to make informed decisions about the food we consume. What about when we seek to make informed decisions in our business processes? What do we know about the data we're consuming? How can we trust that the data we depend on is fit for the purpose for which we need it? In this presentation, you will learn:
Your rights and responsibilities as a data consumer and provider;
The questions you should ask about the data you consume;
The facts you should provide about the data you provide;
The need for a "Data Q-Mark" or a "Data Trust-Level" ;
The presentation was followed by a panel discussion with Ronan Brennan, the CTO of Silverfinch (a MoneyMate company). In October 2015, Silverfinch announced it was handing €2.5 trillion of look-through assets for asset manager clients worldwide. Ronan shared the SilverFinch success story with the attendees, which is built on solid data management practices.
Tamr | MDM and the Data Unification ImperativeTamr_Inc
A successful digital information strategy depends on being able to find, connect and consume diverse data sources repeatably and at scale. But top-down, deterministic data unification approaches (such as ETL, ELT and MDM) weren’t designed to scale to the variety of hundreds, thousands or tens of thousands of data silos. A new bottom-up, probabilistic approach to data unification complements MDM by providing the agility and scalability to exploit data variety.
Rapid digitization has resulted in the production of large volumes of unstructured data. This trend is expected to provide significant opportunities for graph database market in the upcoming years
A Dynamic Data Catalog for Autonomy and Self-ServiceDenodo
Watch Daves' presentation on-demand from Fast Data Strategy Virtual Summit here: https://buff.ly/2Kj7muc
Denodo’s new dynamic catalog is the new black. It combines the power of data delivery infrastructure with data catalog for contextual information and collective intelligence.
Attend this session to discover:
• What is unique about Dynamic Data Catalog?
• How it empowers a community of analysts and decisions makers?
• How it facilitates data curation and data stewardship in your organization?
Big Data Analytics in Bangladesh | Pridesys IT LtdPridesys IT Ltd.
Big Data Analytics in Bangladesh | Enterprise Resource Planning (ERP) Software System | Pridesys IT Ltd, Different Innovation and Excellence | data science companies in bangladesh
Organizations want to use all the data available to them for analytics. But they’ve been thwarted by data silos and top-down, mostly manual approaches to unifying data for analytics. A new approach, based on machine learning combined with human expert sourcing, dramatically speeds analytics’ time-to-value. It automates data unification end-to end: from finding and connecting diverse data to interactive consumption by virtually anyone using any analytic tool.
Dama Ireland slides - Data Trust event 9th June 2016Ken O'Connor
Do we need a Data Trust / Data Quality Mark?
Presentation by Data Management Specialist, Ken O'Connor:
Our food packaging provides facts about the food we buy. It's required by law. These facts enable us to make informed decisions about the food we consume. What about when we seek to make informed decisions in our business processes? What do we know about the data we're consuming? How can we trust that the data we depend on is fit for the purpose for which we need it? In this presentation, you will learn:
Your rights and responsibilities as a data consumer and provider;
The questions you should ask about the data you consume;
The facts you should provide about the data you provide;
The need for a "Data Q-Mark" or a "Data Trust-Level" ;
The presentation was followed by a panel discussion with Ronan Brennan, the CTO of Silverfinch (a MoneyMate company). In October 2015, Silverfinch announced it was handing €2.5 trillion of look-through assets for asset manager clients worldwide. Ronan shared the SilverFinch success story with the attendees, which is built on solid data management practices.
Tamr | MDM and the Data Unification ImperativeTamr_Inc
A successful digital information strategy depends on being able to find, connect and consume diverse data sources repeatably and at scale. But top-down, deterministic data unification approaches (such as ETL, ELT and MDM) weren’t designed to scale to the variety of hundreds, thousands or tens of thousands of data silos. A new bottom-up, probabilistic approach to data unification complements MDM by providing the agility and scalability to exploit data variety.
Rapid digitization has resulted in the production of large volumes of unstructured data. This trend is expected to provide significant opportunities for graph database market in the upcoming years
A Dynamic Data Catalog for Autonomy and Self-ServiceDenodo
Watch Daves' presentation on-demand from Fast Data Strategy Virtual Summit here: https://buff.ly/2Kj7muc
Denodo’s new dynamic catalog is the new black. It combines the power of data delivery infrastructure with data catalog for contextual information and collective intelligence.
Attend this session to discover:
• What is unique about Dynamic Data Catalog?
• How it empowers a community of analysts and decisions makers?
• How it facilitates data curation and data stewardship in your organization?
A short overview about Business Intelligence. What BI is in short, how BI market is growing, what vendors are operating in the market today. Future directions.
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
Speaker: Hari Shreedharan
Data Day Texas 2015
Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster.
Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in.
In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
B2B DATA: You Don't Have to Love it, But Don't Ignore itamdia
En el mundo digital los anunciantes B2B son dolorosamente conscientes de la importancia de los datos para alcanzar a sus clientes y prospectos. Sin embargo, algunos de ellos lo encuentran como una tarea o un peso. En este encuentro, vamos a explorar lo que los anunciantes B2B pueden realmente estar haciendo para obtener el máximo valor empresarial de los datos sobre los clientes actuales y potenciales.
Veremos nuevas fuentes de datos, cómo mantener tus datos actualizados, las 5 mejores aplicaciones de datos en B2B y un caso de estudio acerca de cómo una compañía estadounidense aplica los datos para impulsar el crecimiento sostenible de su negocio.
Oradora:
Ruth P. Stevens
Es profesora de marketing en varias escuelas de negocios en Estados Unidos. Colabora con frecuencia en publicaciones especializadas como Biznology, Target Marketing Magazine y AdAge.
Es considerada una de las 100 personas más influyentes del marketing de negocios en Estados Unidos por la revista Crain´s BtoB. Su último libro es “B2B Data-Driven Marketing: Sources, Uses, Results”. Stevens ocupó posiciones senior en áreas de Marketing en Time Warner, Ziff-Davis e IBM.
Más info: www.ruthstevens.com.
Changing audience expectations means your marketing message needs to be consistent, in context, personal and relevant, irrespective of what channel your listener uses. Without it, your customer sales, retention, spend and lifetime value all suffer. So, you market, yet which marketing channel is most effective? To understand and anticipate your audience’ needs, you need a real-time 360-degree customer view. Creating it requires torrents of new, structured as well as unstructured data in addition to the means to derive insight from it. That’s where most organisations falter.
Customer Success sits at the center of a company’s data web. But data challenges exist! It’s time to break through all data roadblocks standing in your way.
Talking about Big Data generates a lot of questions; however, most of the focus is on the technologies and skills required to collect and store this volume of information as opposed to the insight that companies need to derive from it. What factors should organizations consider in order to ensure that they are capitalizing on their investments with these technologies? How do you break through business silos to enable sharing of data to increase organizational value? Leveraging his cross-industry experience at companies like The Walt Disney Company, Travelers Insurance and Demand Media, Brendan Aldrich will discuss the question of “big value” with industry examples and a particular focus on his current work to deploy a “data democracy” within the City Colleges of Chicago.
Session Discovery Topics:
• Big value - keeping an eye on the forest (assumptions, judgment and bias)
• Data democracy - increasing productivity with data transparency and open access
The objective of this module is to gain an overview of how to use the data you already have available in order to improve your business.
Upon completion of this module you will:
Gain an understanding of how to take advantage of the existing data you already have
Comprehend the location of where internal data already lies within your company
Improve your knowledge on how data can help build your brand
A short overview about Business Intelligence. What BI is in short, how BI market is growing, what vendors are operating in the market today. Future directions.
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
Speaker: Hari Shreedharan
Data Day Texas 2015
Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster.
Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in.
In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
B2B DATA: You Don't Have to Love it, But Don't Ignore itamdia
En el mundo digital los anunciantes B2B son dolorosamente conscientes de la importancia de los datos para alcanzar a sus clientes y prospectos. Sin embargo, algunos de ellos lo encuentran como una tarea o un peso. En este encuentro, vamos a explorar lo que los anunciantes B2B pueden realmente estar haciendo para obtener el máximo valor empresarial de los datos sobre los clientes actuales y potenciales.
Veremos nuevas fuentes de datos, cómo mantener tus datos actualizados, las 5 mejores aplicaciones de datos en B2B y un caso de estudio acerca de cómo una compañía estadounidense aplica los datos para impulsar el crecimiento sostenible de su negocio.
Oradora:
Ruth P. Stevens
Es profesora de marketing en varias escuelas de negocios en Estados Unidos. Colabora con frecuencia en publicaciones especializadas como Biznology, Target Marketing Magazine y AdAge.
Es considerada una de las 100 personas más influyentes del marketing de negocios en Estados Unidos por la revista Crain´s BtoB. Su último libro es “B2B Data-Driven Marketing: Sources, Uses, Results”. Stevens ocupó posiciones senior en áreas de Marketing en Time Warner, Ziff-Davis e IBM.
Más info: www.ruthstevens.com.
Changing audience expectations means your marketing message needs to be consistent, in context, personal and relevant, irrespective of what channel your listener uses. Without it, your customer sales, retention, spend and lifetime value all suffer. So, you market, yet which marketing channel is most effective? To understand and anticipate your audience’ needs, you need a real-time 360-degree customer view. Creating it requires torrents of new, structured as well as unstructured data in addition to the means to derive insight from it. That’s where most organisations falter.
Customer Success sits at the center of a company’s data web. But data challenges exist! It’s time to break through all data roadblocks standing in your way.
Talking about Big Data generates a lot of questions; however, most of the focus is on the technologies and skills required to collect and store this volume of information as opposed to the insight that companies need to derive from it. What factors should organizations consider in order to ensure that they are capitalizing on their investments with these technologies? How do you break through business silos to enable sharing of data to increase organizational value? Leveraging his cross-industry experience at companies like The Walt Disney Company, Travelers Insurance and Demand Media, Brendan Aldrich will discuss the question of “big value” with industry examples and a particular focus on his current work to deploy a “data democracy” within the City Colleges of Chicago.
Session Discovery Topics:
• Big value - keeping an eye on the forest (assumptions, judgment and bias)
• Data democracy - increasing productivity with data transparency and open access
The objective of this module is to gain an overview of how to use the data you already have available in order to improve your business.
Upon completion of this module you will:
Gain an understanding of how to take advantage of the existing data you already have
Comprehend the location of where internal data already lies within your company
Improve your knowledge on how data can help build your brand
Applying Data Quality Best Practices at Big Data ScalePrecisely
Global organizations are investing aggressively in data lake infrastructures in the pursuit of new, breakthrough business insights. At the same time, however, 2 out of 3 business executives are not highly confident in the accuracy and reliability of their own Big Data. Regaining that confidence requires utilizing proven data quality tools at Big Data scale.
In this on-demand webinar, discover how to ensure your data lake is a trusted source for advanced business insights that lead to new revenue, cost savings and competitiveness. You will have the opportunity to:
• Compare your organization’s data lake “readiness” against initial findings from our upcoming annual Big Data Trends survey
• Gain insight into where and how to leverage data quality best practices for Big Data use cases
• Explore how a ‘Develop Once, Deploy Anywhere’ approach, including to native Big Data infrastructures such as Hadoop and Spark, facilitates consistent data quality patterns
Data Done Right: Ensuring Information IntegritySharala Axryd
It’s the ultimate “garbage in, garbage out” quandary. Data can be an organization’s most valuable asset — but only to the degree its quality can be validated and trusted. Without the right guidelines, processes, and solutions in place to control the way applications, systems, databases, messages, and documents are managed, "dirty" data can permeate systems across the enterprise, negatively impacting everything from strategic planning to day-to-day decision making. High-quality data will ensure more efficiency in driving a company’s success because of the dependence on fact-based decisions, instead of habitual or human intuition.
To gain a better understanding of this topic, this speaking session will examine:
- what data quality and master data management is
- why they are so crucial for successful business operations and strategies
- how to improve data quality by organizational, procedural and technological means
Rplus offers analytics solution for retail industry through cloud based DemandSense application and big data analytics platform. Retail companies can leverage data to improving the profitability and efficiency of operations at a low cost in a faster timeframe.
The business models across industries around the world are becoming Customer Centric. Recent studies show that “knowing” customers based on internal as well as external data is one of the top priorities of business leaders. On the other hand various surveys also reveal that customers do not mind to share their semi-personal data for the benefit of differentiated service. In that context, the 360 degree view of customer – which was once thought to be a business process, master data management, data integration and data warehouse / business intelligence related problem has now entered into the whole new big world of BIG data including integration with unstructured data sources. Impact of big data on Customer Master Data Management is spread across - from Integration and linkage of unstructured or semi-structured data with structured master data that is maintained within enterprise; to analyze and visualization of the same to generate useful insight about the customers. There are various patterns to handle the challenges across the steps i.e. acquire, link, manage, analyze and distribute the enhanced customer data for differentiated product or services.
How to Monetize Your Data Assets and Gain a Competitive AdvantageCCG
Join us for this session where Doug Laney will share insights from his best-selling book, Infonomics, about how organizations can actually treat information as an enterprise asset.
Data set Improve your business with your own business dataData-Set
The objective of this module is to gain an overview of how to use the data you already have available in order to improve your business.
Upon completion of this module you will:
-Gain an understanding of how to take advantage of the existing data you already have
-Comprehend the location of where internal data already lies within your company
-Improve your knowledge on how data can help build your brand
15. MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Hadoop Ecosystem
16. MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Big Data Landscape
17. MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
What’s in store for us?
• More jobs
• More opportunities
• More Money!
18. MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Big Data Landscape
19. MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Big Data Landscape
20. MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Big Data Landscape
21. Sectors Using Big Data
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Enhancing the Multichannel Consumer experience:
• Use big data to integrate promotions and pricing for shoppers
seamlessly, whether those consumers are online, in-store, or perusing
a catalog.
• Integrate customer databases with information on households such
as income, housing values, and number of children and thus create
different versions of catalogs etc attuned to the behavior and
preferences of different groups of customers
24. Current Limitations for Big Data Analytics
• Meeting the need for speed
• Understanding the data
• Addressing data quality
• Displaying meaningful results
• Big data skills are in short supply.
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
25. Problems & Treats – Big Data
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
• Privacy breaches and embarrassments
• Anonymization could become impossible
• Data masking could be defeated to reveal personal
information
• Unethical actions based on interpretations
• Big data analytics are not 100% accurate
• Discrimination
• Few (if any) legal protections exist for the involved
individuals
• Big data will probably exist forever
• Concerns for e-discovery
• Making patents and copyrights irrelevant
26. Case Studies – Recent Data Breaches
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
• Target breach, in which 40 million credit and debit accounts were
compromised over a three-week period - lost $148 million dollars.
• JP Morgan reporting that 76 million households and 8 million small business
were exposed in a data breach.
• Customer names, addresses, phone numbers and e-mail addresses were
taken
• Hackers also obtained internal data identifying customers by category,
such as whether they are clients of the private-bank, mortgage, auto or
credit-card divisions, said a person briefed on the matter.
• Third party – External Data - News: Banks turn to Facebook and Twitter to
keep track of education loan takers
27. Thinking Dimensionally
Sentiment_Analysis Table
Sentiment_ID ( e g-1,2,3,)
Sentiment_description
(eg-Wow, Awesome, Crap)
Customer_ID
Product_ID
Dim_Customer
Customer_ID
Customer_Name
Gender
Age
Dim_Product
Product_ID
Product_Name
Category
Product_Description
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Data-Big or small
28. Customer Name Location
Avadhoot Patil Dallas
Customer
name
Location
Ankur Kaushik Dallas
Customer
Name
Location
Avadhoot Patil Dallas
Ankur Kaushik Dalllas
Sort and Merge
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Conformed Dimensions
Online_Customer Table Store_Customer Table
29. Airport
Name
City Country
ABC Dallas USA
Airport_ID Airport
Name
City Country
1001 ABC Dallas USA
1002 XYZ Dallas USA
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Selecting Keys
• Anchor Dimensions with Durable Surrogate Keys
Natural Keys
durable surrogate keys.
slowly changing dimension
Datawarehouse System
Airport Data_source
30. Dimensionalize data before applying governance
Dimensionalize data as early as possible in the data pipeline
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Governance
Parse Match
Identify
Resolution
on Fly
31. • Privacy is the Most Important Governance Perspective
For Most form of Analysis the personal details should be
masked
Data aggregated enough not to allow identification of
individuals
Data masked or encrypted on write or data should be
masked on read.
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6
Privacy
32. THANK YOU !
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6