HDInsight & CosmosDB - Global IoT · Big data processing infrastructureDataWorks Summit
We introduce HDInsight which is PaaS of Hadoop / Spark and IoT and big data processing infrastructure by CosmosDB which is a globally deployable distributed / multi-model database.
A Benchmark Test on Presto, Spark Sql and Hive on TezGw Liu
Presto、Spark SQLとHive on Tezの性能に関して、数万件から数十億件までのデータ上に、常用クエリパターンの実行スピードなどを検証してみた。
We conducted a benchmark test on mainstream big data sql engines including Presto, Spark SQL, Hive on Tez.
We focused on the performance over medium data (from tens of GB to 1 TB) which is the major case used in most services.
Start of a New era: Apache YARN 3.1 and Apache HBase 2.0DataWorks Summit
The adoption of Machine Learning (ML) and Deep Learning (DL) is a necessary step in an organization’s digital transformation journey. The insights gained from these applications enable businesses to improve their internal and external processes to maintain a competitive advantage. To address ML and DL apps, organizations are procuring expensive hardware resources to handle the extensive processing power required by these workloads.
As organizations make these investments, it is becoming important that they consider the needs of business stakeholders in addition to the purely infrastructure-focused stakeholders. Hortonworks released HDP 3.0 as our major HDP version change in July of this year. Our HDP 3.0 includes the major version change of Apache Hadoop, Apache Hive, Apache HBase and so on, and we added extensively a lot of features to HDP 3.0. In this session, we are going to talk all about what's new in Apache YARN 3.1 and Apache HBase 2.0.
This chapter introduces organizational behavior and discusses several key topics:
1) It defines organizational behavior as the study of human behavior in organizational settings and the interface between human behavior and organizations.
2) It outlines the disciplines of psychology, sociology, anthropology, and political science that contribute to the field of organizational behavior.
3) It describes common management functions like planning, organizing, staffing, and directing, as well as management roles and important skills for managers.
4) It discusses managing workforce diversity, globalization, improving customer service and employee skills, and creating a positive work environment.
This document contains summaries of multiple articles related to veganism and its impacts. Some of the key points covered include:
- Animal agriculture is a major contributor to environmental problems like deforestation, pollution and greenhouse gas emissions. Going vegan can significantly reduce an individual's environmental footprint.
- Animals raised for food endure stressful and inhumane conditions on factory farms. They are routinely subjected to painful procedures without painkillers.
- The number of vegans, especially young people, is rising in the UK and elsewhere due to health, environmental and animal welfare concerns. Social media has helped promote vegan lifestyles.
- Many cosmetics and household products are still tested on animals
HDInsight & CosmosDB - Global IoT · Big data processing infrastructureDataWorks Summit
We introduce HDInsight which is PaaS of Hadoop / Spark and IoT and big data processing infrastructure by CosmosDB which is a globally deployable distributed / multi-model database.
A Benchmark Test on Presto, Spark Sql and Hive on TezGw Liu
Presto、Spark SQLとHive on Tezの性能に関して、数万件から数十億件までのデータ上に、常用クエリパターンの実行スピードなどを検証してみた。
We conducted a benchmark test on mainstream big data sql engines including Presto, Spark SQL, Hive on Tez.
We focused on the performance over medium data (from tens of GB to 1 TB) which is the major case used in most services.
Start of a New era: Apache YARN 3.1 and Apache HBase 2.0DataWorks Summit
The adoption of Machine Learning (ML) and Deep Learning (DL) is a necessary step in an organization’s digital transformation journey. The insights gained from these applications enable businesses to improve their internal and external processes to maintain a competitive advantage. To address ML and DL apps, organizations are procuring expensive hardware resources to handle the extensive processing power required by these workloads.
As organizations make these investments, it is becoming important that they consider the needs of business stakeholders in addition to the purely infrastructure-focused stakeholders. Hortonworks released HDP 3.0 as our major HDP version change in July of this year. Our HDP 3.0 includes the major version change of Apache Hadoop, Apache Hive, Apache HBase and so on, and we added extensively a lot of features to HDP 3.0. In this session, we are going to talk all about what's new in Apache YARN 3.1 and Apache HBase 2.0.
This chapter introduces organizational behavior and discusses several key topics:
1) It defines organizational behavior as the study of human behavior in organizational settings and the interface between human behavior and organizations.
2) It outlines the disciplines of psychology, sociology, anthropology, and political science that contribute to the field of organizational behavior.
3) It describes common management functions like planning, organizing, staffing, and directing, as well as management roles and important skills for managers.
4) It discusses managing workforce diversity, globalization, improving customer service and employee skills, and creating a positive work environment.
This document contains summaries of multiple articles related to veganism and its impacts. Some of the key points covered include:
- Animal agriculture is a major contributor to environmental problems like deforestation, pollution and greenhouse gas emissions. Going vegan can significantly reduce an individual's environmental footprint.
- Animals raised for food endure stressful and inhumane conditions on factory farms. They are routinely subjected to painful procedures without painkillers.
- The number of vegans, especially young people, is rising in the UK and elsewhere due to health, environmental and animal welfare concerns. Social media has helped promote vegan lifestyles.
- Many cosmetics and household products are still tested on animals
RIWC_PARA_A036 Independent Living for learning disabled peopleMarco Muscroft
This document summarizes research on legislation in Korea to support independent living for adults with developmental disabilities. It describes how parent movements advocated for laws to provide services after schooling ends. The research interviewed parents and support center workers. It found that parents' advocacy led to a 2014 law establishing independence supports. However, adults with disabilities still face doubts, discomfort from others, and need lifelong education programs tailored to individual needs to facilitate community inclusion. Future improvements require better coordination between support organizations and research on continuous education models.
This document contains notes from a math lesson on rates. It includes the do now, exit ticket questions about comparing the speeds of dogs skateboarding, and a closing question about how unit rates are helpful for making comparisons. Students are assigned problem set page 4 as homework due on Wednesday.
The document is notes from a class reviewing for an end of module test scheduled for the next day. It instructs students to do a review packet on Socrative and review homework with a partner. Most of the document is blank pages indicating it is the end of the notes.
Module 3 lesson 19 vertical and angles at a pointErik Tjersland
The document outlines homework assignments for a math class, including completing problems from Lesson 19 and preparing for an exam on Friday. It also provides notes on vertical angles being opposite and equal and the sum of angles around a point being 360 degrees.
Psychokinesis is the ability of mind to manipulate matter without any physical intervention. There are many types of psychokinesis. Telekinesis is the most common type which is performed by many psychics.
Avis rapporté par M. Jean Jouzel et Mme Agnès Michelot, au nom de la section de l'environnement présidée par Mme Anne-Marie Ducroux.
L’objectif de la justice climatique est de tout faire pour que le réchauffement n’accroisse pas les inégalités. Elle est apparue comme une thématique centrale au moment de l’ouverture de la COP 21. Revendication forte de la société civile à l’échelle internationale depuis 2003, elle est également utilisée en amont des négociations par les responsables politiques des pays en développement.
Le CESE soutient la lutte contre toutes les formes d’inégalités. L’avis souhaite contribuer par ses propositions à des politiques publiques qui permettront à l’échelle nationale de limiter et si possible de réduire les inégalités sociales et économiques générées par le réchauffement planétaire.
Data persistence for time series is an old and in many cases traditional task for databases. In general, the time series is just a sequence of data elements. The typical use case is a set of measurements made over a time interval. Much of the data generated by sensors, in a machine to machine communication, in Internet of Things area could be collected as time series. Time series are used in statistics, mathematical and finance. In this paper, we provide a survey of data persistence solutions for time series data. The paper covers the traditional relational databases, as well as NoSQL-based solutions for time series data.
This document summarizes a presentation about using multimedia technologies for emergency situations. It discusses:
1) The potential of emerging multimedia technologies like smartphones and social media to help with disaster response and management by providing real-time situation reports and coordinating relief efforts.
2) Prototype applications that have been developed like one for the Thai floods that used flood level and shelter data along with tweets to help direct aid.
3) Remaining research challenges around issues like human reporting of data, real-time situation recognition from multiple data streams, and predictive analytics.
4) The vision of building a global "situation map" by analyzing the billions of photos uploaded from smartphones to help recognize situations worldwide.
Avis corapporté par Emelyn Weber au nom de la section du travail et de l’emploi présidée par Sylvie Brunet, et Etienne Caniard au nom de la section des affaires sociales et de la santé présidée par Aminata Koné.
Renouer avec le projet européen, telle est l’ambition d’un socle de droits sociaux robustes, effectifs et universels.
Outil au service d’une Europe solidaire, compétitive, inclusive et qui donne confiance dans l’avenir, ce socle doit permettre de répondre à différents enjeux majeurs au sein de l’Union européenne : conception de politiques publiques plus proches des citoyen.ne.s et de leurs besoins, articulation nouvelle entre politiques macro-économiques et sociales, protection contre les risques sociaux pour améliorer la cohésion sociale et renforcer la compétitivité des entreprises et la qualité de l’emploi.
APG West Social Media Week: David Wilding, TwitterAPGWest
David looks at how Twitter is helping people navigate the world, and how brands can work with the platform to unlock more value in their communications.
Speaking to People: The Strategist’s Secret WeaponOpen Strategy
'My secret weapon for developing and selling a strategic narrative' by Loz Horner at Lucky Generals. 1/3 of the second 'School of Planning' event on Strategic Narrative.
http://hortonworks.com/hadoop/spark/
Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=03debab5ba04b34a033dc5c2f03c7967
As the ratio of memory to processing power rapidly evolves, many within the Hadoop community are gravitating towards Apache Spark for fast, in-memory data processing. And with YARN, they use Spark for machine learning and data science use cases along side other workloads simultaneously. This is a continuation of our YARN Ready Series, aimed at helping developers learn the different ways to integrate to YARN and Hadoop. Tools and applications that are YARN Ready have been verified to work within YARN.
This document discusses appropriate and inappropriate uses of Apache Spark for different types of data and workloads. It provides guidance on when to use Spark versus other data stores like databases. Good uses of Spark include general purpose processing of file-based data, data transformation/ETL, and machine learning/data science. Bad uses include random access queries, frequent inserts/updates, external reporting with high load, and content searching with high load, as Spark is not optimized for these types of workloads. The document recommends using a database instead for workloads involving random access, frequent changes, or high query loads.
SQL Server 使いのための Azure Synapse Analytics - Spark 入門Daiyu Hatakeyama
Japan SQL Server Users Group - 第35回 SQL Server 2019勉強会 - Azure Synapese Analytics - SQL Pool 入門 のセッション資料です。
Spark の位置づけ。Synapse の中での入門編の使い方。そして、Synapse ならではの価値について触れてます。
- The document discusses running Hive/Spark on S3 object storage using S3A committers and running HBase on NFS file storage instead of HDFS. This separates compute and storage and avoids HDFS operations and complexity. S3A committers allow fast, atomic writes to S3 without renaming files. Benchmark results show the magic committer is faster than the file committer for S3 writes. HBase performance tests show FlashBlade NFS providing low latency for random reads/writes compared to Amazon EFS.
This document provides an introduction to Apache Kafka. It begins with an overview of Kafka as a distributed messaging system that is real-time, scalable, low latency, and fault tolerant. It then covers key concepts such as topics, partitions, producers, consumers, and replication. The document explains how Kafka achieves fast reads and writes through its design and use of disk flushing and replication for durability. It also discusses how Kafka can be used to build real-time systems and provides examples like connected cars. Finally, it introduces Apache Metron as an example of a cyber security solution built on Kafka.
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
Introducing new feature of Hive 2 and how it achieve interactive SQL for big data. Features including the new LLAP engine, ACID merge, Hive + Druid integration, etc. I will explain what it is, how it works and what use cases it is for. I will also have some benchmark numbers to show.
Introduction to Streaming Analytics ManagerYifeng Jiang
This document introduces Streaming Analytics Manager (SAM), an open source project led by Hortonworks to simplify building streaming analytics applications. SAM aims to provide the same easy experience for streaming analytics as NiFi does for flow management applications. It allows users to create a streaming analytics application in 10 minutes and supports prescriptive, predictive, and descriptive analytics functions including routing, filtering, predictive modeling, and real-time dashboards. SAM applications are scalable through one-click deployment on distributed streaming platforms.
This document discusses Hortonworks DataFlow (HDF) 3.0 for building IoT platforms. It introduces HDF 3.0 and its key components for data ingestion, management, security, and real-time analysis. These include NiFi for data movement, Streaming Analytics Manager (SAM) for building streaming analytics apps visually, and Schema Registry for managing schemas. The document also presents example IoT use cases and demonstrates building a real-time analytics app in SAM to analyze vehicle event data.
Hortonworks Data Cloud for AWS 1.11 UpdatesYifeng Jiang
This document discusses Hortonworks Data Cloud, which provides an enterprise-ready Hadoop distribution on AWS. Key points include: HDC offers pre-configured Hortonworks Data Platform clusters on AWS that can be easily deployed and managed; the latest release of HDC (version 1.11) introduces compute nodes that allow using spot instances to reduce costs; and node recipes enable running custom scripts during cluster installation and configuration.
This document discusses security requirements and solutions for Apache Spark production deployments. It covers authenticating users with Kerberos/AD, authorizing access to Spark jobs and data with Ranger, auditing access, and encrypting data at rest and in motion. It provides examples of configuring Kerberos authentication for Spark, using Ranger to control authorization to HDFS and SparkSQL, and demonstrates dynamic row filtering and masking of sensitive data in SparkSQL queries based on user policies.
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
Hortonworks Data Cloud is a new cloud product from Hortonworks that offers pay-as-you-go pricing for launching and managing Hadoop clusters on AWS. It handles common big data use cases and focuses on ease of use by providing prescriptive cluster types. The product aims to improve enterprise readiness in the cloud by providing scalable storage, security and governance features, and reliability through auto-recovery of unhealthy nodes. It also matches Hadoop with cloud capabilities like scalable storage, customizability, and cost-effective compute.
This document discusses real-time analytics in the financial industry. It describes a use case of detecting abnormal stock transactions in real-time and an architecture to handle it. The architecture uses Kafka as the messaging bus, Storm for real-time processing, and HBase for the data store. It discusses challenges like data ingestion, lookups, deduplication, and late events. Predictive analytics is also mentioned as an extension where machine learning models can be integrated to enhance detection.
Yifeng Jiang gives a presentation introducing Apache Nifi. He begins with an overview of himself and the agenda. He then provides an introduction to Nifi including terminology like FlowFile and Processor. Key aspects of Nifi are demonstrated including the user interface, provenance tracking, queue prioritization, cluster architecture, and a demo of real-time data processing. Example use cases are discussed like indexing JSON tweets and indexing data from a relational database. The presentation concludes that Nifi is an easy to use and powerful system for processing and distributing data with 90 built-in processors.
This document discusses strategies for achieving sub-second SQL query performance on Hadoop at scale. It describes two use cases: highly parallel batch reporting on a massive dataset, and online reporting with low latency requirements. For the latter use case, the document evaluates Hive LLAP and Phoenix, finding that Phoenix generally has lower latency, especially for queries with large result sets, through optimizations like skip scans, merging improvements, and table splitting. Tuning HBase and Phoenix configurations can further reduce latency.
This document provides a summary of Amazon Kinesis and Apache Kafka, two platforms for processing real-time streaming data at large scale. It describes key features of each system such as durability, interfaces, processing options, and deployment. Kinesis is a fully managed cloud service that provides high durability for data across AWS availability zones. Kafka is an open source platform that offers lower latency and more flexibility in how data is processed but requires more operational overhead. The document also includes a deep dive on concepts and internals of the Kafka platform.
Yifeng Jiang presented on Apache Hive's present and future capabilities. Hive has achieved 100x performance improvements through technologies like ORC file format, Tez execution engine, and vectorized processing. Upcoming features like LLAP caching and a persistent Hive server aim to provide sub-second query response times for interactive analytics. Hive continues to evolve as the standard SQL interface for Hadoop, supporting a wide range of use cases from ETL and reporting to real-time analytics.
Hadoop Present - Open Enterprise HadoopYifeng Jiang
The document is a presentation on enterprise Hadoop given by Yifeng Jiang, a Solutions Engineer at Hortonworks. The presentation covers updates to Hadoop Core including HDFS and YARN, data access technologies like Hive, Spark and stream processing, security features in Hadoop, and Hadoop management with Apache Ambari.