Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Big Data raises challenges about how to process such vast pool of raw data and how to aggregate value to our lives. For addressing these demands an ecosystem of tools named Hadoop was conceived.
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
Top Hadoop Big Data Interview Questions and Answers for Fresher , Hadoop, Hadoop Big Data, Hadoop Training, Hadoop Interview Question, Hadoop Interview Answers, Hadoop Big Data Interview Question
Big Data raises challenges about how to process such vast pool of raw data and how to aggregate value to our lives. For addressing these demands an ecosystem of tools named Hadoop was conceived.
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
Top Hadoop Big Data Interview Questions and Answers for Fresher , Hadoop, Hadoop Big Data, Hadoop Training, Hadoop Interview Question, Hadoop Interview Answers, Hadoop Big Data Interview Question
Fundamentals of Big Data, Hadoop project design and case study or Use case
General planning consideration and most necessaries in Hadoop ecosystem and Hadoop projects
This will provide the basis for choosing the right Hadoop implementation, Hadoop technologies integration, adoption and creating an infrastructure.
Building applications using Apache Hadoop with a use-case of WI-FI log analysis has real life example.
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
YouTube Link: https://youtu.be/ll_O9JsjwT4
** Big Data Hadoop Certification Training - https://www.edureka.co/big-data-hadoop-training-certification **
This Edureka PPT on "Hadoop components" will provide you with detailed knowledge about the top Hadoop Components and it will help you understand the different categories of Hadoop Components. This PPT covers the following topics:
What is Hadoop?
Core Components of Hadoop
Hadoop Architecture
Hadoop EcoSystem
Hadoop Components in Data Storage
General Purpose Execution Engines
Hadoop Components in Database Management
Hadoop Components in Data Abstraction
Hadoop Components in Real-time Data Streaming
Hadoop Components in Graph Processing
Hadoop Components in Machine Learning
Hadoop Cluster Management tools
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
This presentation Simplify the concepts of Big data and NoSQL databases & Hadoop components.
The Original Source:
http://zohararad.github.io/presentations/big-data-introduction/
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
Fundamentals of Big Data, Hadoop project design and case study or Use case
General planning consideration and most necessaries in Hadoop ecosystem and Hadoop projects
This will provide the basis for choosing the right Hadoop implementation, Hadoop technologies integration, adoption and creating an infrastructure.
Building applications using Apache Hadoop with a use-case of WI-FI log analysis has real life example.
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
YouTube Link: https://youtu.be/ll_O9JsjwT4
** Big Data Hadoop Certification Training - https://www.edureka.co/big-data-hadoop-training-certification **
This Edureka PPT on "Hadoop components" will provide you with detailed knowledge about the top Hadoop Components and it will help you understand the different categories of Hadoop Components. This PPT covers the following topics:
What is Hadoop?
Core Components of Hadoop
Hadoop Architecture
Hadoop EcoSystem
Hadoop Components in Data Storage
General Purpose Execution Engines
Hadoop Components in Database Management
Hadoop Components in Data Abstraction
Hadoop Components in Real-time Data Streaming
Hadoop Components in Graph Processing
Hadoop Components in Machine Learning
Hadoop Cluster Management tools
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
This presentation Simplify the concepts of Big data and NoSQL databases & Hadoop components.
The Original Source:
http://zohararad.github.io/presentations/big-data-introduction/
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
Ebd 1° trimestre 2017 lição 9 fidelidade, firmes na fé.
OBJETIVO GERAL
Explicar que a fidelidade, fruto do Espírito, nos ajuda a permanecermos firmes na fé até a Segunda Vinda de Jesus.
OBJETIVOS ESPECÍFICOS
I. Saber que a fidelidade é a característica do que é fiel;
II. Mostrar que a idolatria e a heresia são um perigo à fidelidade;
III. Compreender que precisamos permanecer fiéis até o fim.
Our team proposed an innovative marketing strategy that promotes exclusivity to Macy's shoppers. Our strategy was to target challenges that current Macy's customers expressed, then implement and effective solution to satisfy both the customer and Macy's needs. After assessing the risks of our new marketing strategy, our team developed potential offsets to ensure the continuity and success of our marketing strategy. Our proposal projected an increase in revenue for Macy's as well as directly resolved the problem 90% of Macy's surveyed customers expressed.
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
Thinking about your sales team's goals for 2017? Drift's VP of Sales shares 3 things you can do to improve conversion rates and drive more revenue.
Read the full story on the Drift blog here: http://blog.drift.com/sales-team-tips
How to Become a Thought Leader in Your NicheLeslie Samuel
Are bloggers thought leaders? Here are some tips on how you can become one. Provide great value, put awesome content out there on a regular basis, and help others.
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
Introduction to Apache Hadoop. Includes Hadoop v.1.0 and HDFS / MapReduce to v.2.0. Includes Impala, Yarn, Tez and the entire arsenal of projects for Apache Hadoop.
this is a presentation on hadoop basics. Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.
Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools or processing applications. A lot of challenges such as capture, curation, storage, search, sharing, analysis, and visualization can be encountered while handling Big Data. On the other hand the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Big Data certification is one of the most recognized credentials of today.
For more details Click http://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Data infrastructure at Facebook with reference to the conference paper " Data warehousing and analytics infrastructure at facebook"
Datewarehouse
Hadoop - Hive - scrive
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Overview of Big data, Hadoop and Microsoft BI - version1
1. An Overview of Big data & Hadoop
Prepared & presented by
Tony Nguyen
July 2014
2. Presentation outline
This presentation gives Big data concepts and an
overview of different Big Data technologies
Understand different tools and use the right tools for
DW and ETL
How does current BI/DW fit to the Big Data context?
How do Microsoft BI and Hadoop get married?
3. What is big data?
Refers to any collection of data sets so large
and complex i.e. hundreds of Petabytes
4. Why is Big Data concerned?
• 2 billion internet users in the world today,
• 7.3 billion active cell phones in 2014
• 7TB of data is processed by Twitter everyday
• 500TB of data is processed by Facebook everyday
• With massive quantity of data, businesses need fast,
reliable, deeper data insight
6. What is Hadoop?
refers an ecosystem which includes large
scale distributed filesystem in order to store
and process big data across multiple storage
servers.
Hadoop technologies include MapReduce &
Hadoop Distributed Filesytem (HDFS)
7. Who are the major Hadoop vendors?
IBM InfoSphere BigInsights : IBM packs Hadoop with
its products including Text analytics, Social Data
Analytics Accelerator, Big SQL, Big R
Clourera: pack Hadoop core components with its well-
known analytic SQL product named Impala and
provides enterprise support. Current Clourera Hadoop
versions includes CDH4.7 and CDH5.1
Hortonworks: a company is formed by Yahoo and
Benchmark Capital, Hortonworks makes Hadoop
ready for enterprise with the latest version of HDP 2.1
Microsoft: contributes HDInsight as Hadoop on
Windows platform
8. HDFS
The Hadoop distributed file system
(HDFS) is a distributed, scalable, and
portable file-system written in Java for the
Hadoop framework.
It is designed to run across low-cost
commodity hardware
9. MapReduce
MapReduce is a programming model and an
associated implementation for processing
and generating large data sets with a
parallel, distributed algorithm on a cluster.
From Hadoop version 2.1, Yet Another
MapReduce (YARN) was introduced.
11. Core components on the top of Hadoop
1. Hive (Facebook)
2. Pig (Yahoo)
3. Hbase
4. HCatalog
5. Knox
6. ZooKeeper
7. Sqoop
12. Pig
1. Originally developed by Yahoo
2. Best used for large data set ETL
3. Dataflow scripting language called PigLatin, a High-level
language designed to remove the complexities of coding
MapReduce applications.
4. Pig converts its operators into MapReduce code.
5. Instead of needing Java programming skills and an
understanding of the MapReduce coding infrastructure,
people with little programming skills, can simply invoke
SORT or FILTER operators without having to code a
MapReduce application to accomplish those tasks.
13. Hive
Originally developed by facebook in 2007
Hive is a data warehouse built on the top of
Hadoop file system (HDFS) and allowing
developers use SQL-like scripts (called Hive
SQL or HQL) to create databases & tables.
Hive translates the SQL-like scripts into the
MapReduce algorithm to store and process large
data sets.
The short learning curve as BI developers use
familiar SQL-like scripts
14. Hive (Cont’d)
UPDATE or DELETE a record isn't allowed in Hive,
but INSERT INTO is acceptable.
A way to work around this limitation is to use
partitions: if you're getting different batches of ids
separately, you could redesign your table so that it is
partitioned by id, and then you would be able to
easily drop partitions for the ids you want to get rid
of.
15. Hbase
HBase is a column-oriented database management system that
runs on top of HDFS
The database that is modelled after Google’s BigTable
technology. HBase was created for hosting very large tables
with billions of rows and millions of columns.
An HBase system comprises a set of tables. Each table contains
rows and columns, much like a traditional database
HBase provides random, real time access to your Big Data.
Does not support a structured query language like SQL
Referred as NoSQL technology (NoSQL means Not Only SQL)
as HBase is not intended to replace your traditional RDBMS
16. HCatalog
1. HCatalog is a table and storage management layer
for Hadoop that enables users with different data
processing tools – Apache Pig, Apache MapReduce,
and Apache Hive – to more easily read and write data
on the grid
2. Frees the user from having to know where the data is
stored, with the table abstraction
3. Enables notifications of data availability
4. Provides visibility for data cleaning and archiving
tools
17. Knox
A system that provides a single point of
authentication and access for Apache Hadoop
services in a cluster. The goal of the project is
to simplify Hadoop security for users who
access the cluster data and execute jobs, and
for operators who control access and manage
the cluster.
21. Three popular open source Hadoop-based
SQL databases
1. Impala (Cloudera)
2. Stinger (Hortonworks) –(aka Hive 11, Hive
12, Hive 13 or Hive-on-Tez)
3. Presto (Facebook)
22. Impala
1. Developed by Cloudera in 2012
2. SQL query engine that runs natively in Apache Hadoop
3. Query data uses SELECT, JOIN, and aggregate
functions – in real time
4. Access directly to HDFS and use MPP computation
instead of MapReduce. Therefore, provide nearly real
time data access
5. The entire process happen on memory, therefore it
eliminates the latency of Disk IO that happen extensively
during MapReduce job.
23. MPP vs MapReduce
Both are distributed data processing systems but difference are as follows:
MPP MapReduce
used on expensive, specialized
hardware tuned for CPU, storage
and network performance
deployed to clusters of commodity
servers that in turn use commodity
disks
Faster Slower
In memory computation Disk I/O computation
Queried with SQL Java code
Declarative query Imperative code (machine code)
SQL is easier and more productive More difficult for IT processional
24. Stinger
1. Refers to new versions of Hive (versions
0.11 - 0.13) to overcome the performance
barrier of MapReduce computation
2. More SQL compliance for Hive SQL
http://hortonworks.com/labs/stinger/
26. Presto
1. Respond to Cloudera Impala, Facebook introduced
Presto in 2012
2. Presto is similar in approach to Impala in that it is
designed to provide an interactive experience whilst
still using your existing datasets stored in Hadoop.
It provides:
JDBC Drivers
ANSI-SQL syntax support (presumably ANSI-92)
A set of ‘connectors’ used to read data from existing data sources. Connectors
include: HDFS, Hive, and Cassandra.
Interop with the Hive metastore for schema sharing
28. Comparison of Hive, Impala, Presto and
Stinger
Hive Impala Presto Stinger
Year 2007 2012 Developing Developing
Orginal developer Facebook Cloudera Facebook hortonworks
Main Purpose Data warehouse Enable analysts and data
scientists to directly interact
with any data stored in
Hadoop. Offload self-service
business intelligence to
Hadoop.
RDBMS RDBMS
Computation
approach
MapReduce Massively parallel processing
(MPP) architecture
MPP MPP
Performance low fast fast fast
Latency High low latency low latency low latency
Language SQL like script ANSI-92 SQL support with
user-defined functions (UDFs)
SQL including RANK,
LEAD, LAG
SQL like script
Interfaces CLI, Web, ODBC,
JDBC
ODBC, JDBC , impala-shell,
web JDBC JDBC
High availability
Hadoop 2.0/CDH4
has HA on hdfs level
Yes
Hadoop 2.0/CDH4 has
HA on hdfs level
Hadoop 2.0/CDH4
has HA on hdfs
level
Replication Yes supported between two CDH 5
clusters
Unknown Unknown
29. Hive pros and cons
Reference: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/
Advantage Disadvantage
It’s been around 5 years. You could say it
is matured and proven solution.
Since it is using MapReduce, It’s carrying
all the drawbacks which MapReduce has
such as expensive shuffle phase as well
as huge IO operations
Runs on proven MapReduce framework Hive still not support multiple reducers that
make queries like Group By and Order By
lot slower
Good support for user defined functions Lot slower compare to other competitors.
It can be mapped to HBase and other
systems easily
30. Impala pros and cons
Advantage Disadvantage
Lighting speed and promise near real
time adhoc query processing.
No fault tolerance for running queries.
If a query failed on a node, the query
has to be reissued, It can’t resume
from where it fails.
The computation happen in memory,
that reduce enormous amount of
latency and Disk IO
Latest version supports UDF
Open source, Apache licensed
Reference: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/
31. Presto pros and cons
Advantage Disadvantage
Lighting fast and promise near real time
interactive querying.
It’s a new born baby. Need to wait and watch
since there were some interesting active
developments going on.
Used extensively in Facebook. So it is proven
and stable.
As of now support only Hive managed tables.
Though the website claim one can query
hbase also, the feature still under
development.
Open Source and there is a strong momentum
behind it ever since it’s been open sourced.
Still no UDF support yet. This is the most
requested feature to be added.
It is also using Distributed query processing
engine. So it eliminates all the latency and
DiskIO issues with traditional MapReduce.
Well documented. Perhaps this is the first open
source software from Facebook that got a
dedicated website from day 1.
Reference: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/
36. Comments on Impala
Among Impala, Hive and Presto, it seems that
Impala is a matured SQL in Hadoop
Impala appears to be the winner in term of
performance and matured level
38. Combining Hadoop and SQL Server tools
Both Hadoop and SQL Server have strengths and
weaknesses
Combining Hadoop and SQL Server tools will
overcome strengths and weaknesses of each
technology
39. SQL Server vs SQL on Hadoop
SQL Server SQL on Hadoop
SQL Server enforces data
quality and consistency better
(unique index, key and
foreign key)
Lack of data quality
enforcement
There is scalability limit Better for scaling and
processing massive data
40. Deployment options
Hadoop on Premise
Hadoop in the Cloud
1. Infrastructure as a Service (IAAS) – providers of IaaS
offer computers – physical or (more often) virtual
machines
2. Platform as a Service (PAAS) - including operating
system, programming language execution environment,
database, and web server.
3. Software as a service (SaaS) - provide access to
application software and databases
45. Use right ETL tools
SSIS – existing skills in organisation, need
transformation, performance tuning is impartant
Pig – use when very large data set, take advantage
of the scalability of Hadoop, IT staff is comfortable
learning a new language
Sqoop –Little need to transform the data, easy to
use, IT staff isn’t comfortable with SSIS or Pig, load
sql table directly to Hadoop.
46. SQL Server Parallel Data Warehouse –
- A high performance & expensive solution
SQL Server Parallel Data Warehouse is the MPP edition of SQL
Server.
Unlike the Standard, Enterprise or Data Center editions, PDW is
actually a hardware and software bundle rather than just a piece of
software. Microsoft call it a database "appliance".
It isn't a substitute for SSIS, SSAS and SSRS. It's Microsoft's
answer for customers needing to process 10s or 100s of terabytes
who want the ability to scale out large workloads across multiple
servers, large storage arrays and many processors.
It includes:
◦ Microsoft PolyBase
◦ Microsoft Analytics Platform System (APS)
◦ Run on the top of Hadoop
49. References
Microsoft Big Data Solutions, Wiley, February 2014
Microsoft SQL 2012 Server with Hadoop, Debarchan
Sarkar, published by Packt Publishing Ltd 2013
Cloudera.com
Hortonworks.com
Hadoop.apache.org
Microsoft.com/bigdata
Impala.io
Prestodb.io
Hive.apache.org