Altiscale provides a big data-as-a-service platform based on Apache Hadoop and related technologies like Spark, Hive, and Tez. Interest in big data is growing rapidly but many independent implementations fail. Altiscale aims to help with its experienced team and fully managed platform that offers fast time to value, scalability, security, and lower total cost of ownership. The platform core is Apache open source components like Hadoop, Spark, Hive and Tez. Altiscale handles administration of the Hadoop cluster including hardware, upgrades, tuning, and addressing failures so customers can focus on their data and jobs.
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
You like to use R, and you need to use big data. dplyr, one of the most popular packages for R, makes it easy to query large data sets in scalable processing engines like Apache Spark and Apache Impala.
But there can be pitfalls: dplyr works differently with different data sources—and those differences can bite you if you don’t know what you’re doing.
Ian Cook is a data scientist, an R contributor, and a curriculum developer at Cloudera University. In this webinar, Ian will show you exactly what you need to know about sparklyr (from RStudio) and the package implyr (from Cloudera). He will show you how to write dplyr code that works across these different interfaces. And, he will solve mysteries:
Do I need to know SQL to use dplyr?
When is a “tbl” not a “tibble”?
Why is 1 not always equal to 1?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
3 things to learn:
Do I need to know SQL to use dplyr?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...DataWorks Summit
In this talk Mark Baker (CSL) will show how CSL Behring is Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI to a central Hadoop data lake at CSL Behring
The challenge of merging data from disparate systems has been a leading driver behind investments in data warehousing systems, as well as, in Hadoop. While data warehousing solutions are ready-built for RDBMS integration, Hadoop adds the benefits of infinite and economical scale – not to mention the variety of structured and non-structured formats that it can handle. Whether using a data warehouse or Hadoop or both, physical data movement and consolidation is the primary method of integration.
There may also be challenges with synchronizing rapidly changing data from a system of record to a consolidated Hadoop platform .
This introduces the need for “data federation” , where data is integrated without copying data between systems.
For historical/batch data use cases there is a replication of data across remote data hubs into a central data lake using Apache NIFI.
We will demo using Apache Zeppelin for analyzing data using Apache Spark and Apache HIVE.
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Data Con LA
There is a novel approach to identifying big data use cases, one which will ultimately lower the barrier to entry to big data projects and increase overall implementation success. This talk describes the approach used by big data pioneer and Datameer CEO Stefan Groschupf to drive over 200 production implementations.
This is the presentation from Bangalore Big Data November Meetup given by Davin Chaiken, AltiScale.
technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- Altiscale Company Introduction and Perspective
- Altiscale Architecture
- Use Cases: Performance, Job Analysis, Scheduling
- Infinite Hadoop
- Challenges to the Hadoop Community
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
You like to use R, and you need to use big data. dplyr, one of the most popular packages for R, makes it easy to query large data sets in scalable processing engines like Apache Spark and Apache Impala.
But there can be pitfalls: dplyr works differently with different data sources—and those differences can bite you if you don’t know what you’re doing.
Ian Cook is a data scientist, an R contributor, and a curriculum developer at Cloudera University. In this webinar, Ian will show you exactly what you need to know about sparklyr (from RStudio) and the package implyr (from Cloudera). He will show you how to write dplyr code that works across these different interfaces. And, he will solve mysteries:
Do I need to know SQL to use dplyr?
When is a “tbl” not a “tibble”?
Why is 1 not always equal to 1?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
3 things to learn:
Do I need to know SQL to use dplyr?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...DataWorks Summit
In this talk Mark Baker (CSL) will show how CSL Behring is Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI to a central Hadoop data lake at CSL Behring
The challenge of merging data from disparate systems has been a leading driver behind investments in data warehousing systems, as well as, in Hadoop. While data warehousing solutions are ready-built for RDBMS integration, Hadoop adds the benefits of infinite and economical scale – not to mention the variety of structured and non-structured formats that it can handle. Whether using a data warehouse or Hadoop or both, physical data movement and consolidation is the primary method of integration.
There may also be challenges with synchronizing rapidly changing data from a system of record to a consolidated Hadoop platform .
This introduces the need for “data federation” , where data is integrated without copying data between systems.
For historical/batch data use cases there is a replication of data across remote data hubs into a central data lake using Apache NIFI.
We will demo using Apache Zeppelin for analyzing data using Apache Spark and Apache HIVE.
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Data Con LA
There is a novel approach to identifying big data use cases, one which will ultimately lower the barrier to entry to big data projects and increase overall implementation success. This talk describes the approach used by big data pioneer and Datameer CEO Stefan Groschupf to drive over 200 production implementations.
This is the presentation from Bangalore Big Data November Meetup given by Davin Chaiken, AltiScale.
technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- Altiscale Company Introduction and Perspective
- Altiscale Architecture
- Use Cases: Performance, Job Analysis, Scheduling
- Infinite Hadoop
- Challenges to the Hadoop Community
Insights into Real World Data Management ChallengesDataWorks Summit
Data is your most valuable business asset and it's also your biggest challenge. This challenge and opportunity means we continually face significant road blocks toward becoming a data driven organisation. From the management of data, to the bubbling open source frameworks, the limited industry skills to surmounting time and cost pressures, our challenge in data is big.
We all want and need a “fit for purpose” approach to management of data, especially Big Data, and overcoming the ongoing challenges around the ‘3Vs’ means we get to focus on the most important V - ‘Value’.Come along and join the discussion on how Oracle Big Data Cloud provides Value in the management of data and supports your move toward becoming a data driven organisation.
Speaker
Noble Raveendran, Principal Consultant, Oracle
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureDataWorks Summit
Customers that are implementing Big Data Analytics projects in enterprise environments driven by line of business applications are faced with the three critical issues of Managing Complexity, Data Movement and Replication, and Cloud Integration. In this session you will learn about the characteristics of these pain points and how designing and implementing a data driven approach enables enterprises to implement quickly and efficiently with a future proof architecture of hybrid cloud.
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Set of product roadmap + capabilities slides from Oracle Data Integration Product Management, and thoughts on data integration on big data implementations by Mark Rittman (Independent Analyst)
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
Progressive Insurance is well known for its innovative use of data to better serve its customers, and the important role that Hortonworks Data Platform has played in that transformation. However, as with most things worth doing, the path to the Data Lake was not without its challenges. In this session, I’ll share our top use cases for Hadoop – including telematics and display ads, how a skills shortage turned supporting these applications into a nightmare, and how – and why – we now use Syncsort DMX-h to accelerate enterprise adoption by making it quick and easy (or faster and easier) to populate the data lake – and keep it up to date – with data from across the enterprise. I’ll discuss the different approaches we tried, the benefits of using a tool vs. open source, and how we created our Hadoop Ingestor app using Syncsort DMX-h.
Today enterprises desire to move more and more of their data lakes to the cloud to help them execute faster, increase productivity, drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with risks and challenges in the areas of data security, privacy, and governance. In this talk we cover how enterprises can overcome governance and security obstacles to leverage these new advances that the cloud can provide to ease the management of their data lakes in the cloud. We will also show how the enterprise can have consistent governance and security controls in the cloud for their ephemeral analytic workloads in a multi-cluster cloud environment without sacrificing any of the data security and privacy/compliance needs that their business context demands. Additionally, we will outline some use cases and patterns as well as best practices to rationally manage such a multi-cluster data lake infrastructure in the cloud.
Speaker:
Jeff Sposetti, Product Management, Hortonworks
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
Big Data adoption is a journey. Depending on the business the process can take weeks, months, or even years. With any transformative technology the challenges have less to do with the technology and more to do with how a company adapts itself to a new way of thinking about data. Building a Center of Excellence is one way for IT to help drive success.
This talk will explore Enterprise Holdings Inc. (which operates the Enterprise Rent-A-Car, National Car Rental and Alamo Rent A Car) and their experience with Big Data. EHI’s journey started in 2013 with Hadoop as a POC and today are working to create the next generation data warehouse in Microsoft’s Azure cloud utilizing a lambda architecture.
We’ll discuss the Center of Excellence, the roles in the new world, share the things which worked well, and rant about those which didn’t.
No deep Hadoop knowledge is necessary, architect or executive level.
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
It’s no secret that Apache Spark is becoming the successor to MapReduce for data processing in Hadoop. With it’s easy development, flexible API, and performance benefits, Spark is a powerful data processing engine that has quickly gained popularity within the community. On the other hand Hive continues to be the most widely used data warehouse/ETL engine with large scale adoption across enterprises. Therefore, it’s imperative to enable Spark as the underlying execution engine for Hive to seamlessly allow existing and future Hive workloads to leverage the advantages of Spark.
With the recent release of Cloudera 5.7, we have delivered on this goal by adding support for Hive-on-Spark. Data engineers and ETL developers can now transition from MR to Spark for their Hive workloads seamlessly thereby benefitting from the advantages of Spark without any disruption on their end.
Join Santosh Kumar, Senior Product Manager at Cloudera, and Rui Li, Apache Hive committer and engineer at Intel, as we discuss:
An Introduction to Spark and its advantages over MR
An introduction of Hive-on-Spark: Goals and Design Principles
Migrating to HoS and a live demo
Configuring and tuning for batch workloads
What’s next for both tools
Extreme Sports & Beyond: Exploring a new frontier in data with GoProCloudera, Inc.
GoPro is a powerful global brand, thanks in large part to its innovative cameras and accessories that capture moments other cameras just miss: surfing in Maui, skiing in Tahoe, recording your child’s first steps. And today, the company is nearly as well known for its user-generated social and content networks.
Join us for this special webinar hosted by Tableau, Trifacta, and Cloudera—featuring GoPro. We’ll dive into GoPro’s data strategy and architecture, from ingest and processing to data prep and reporting, all on AWS.
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
Apache Metron (Incubating) is a streaming cybersecurity application
built on Apache Storm and Hadoop. One of its core missions is to enable
advanced analytics through machine learning and data science to the
users. Because of the relative immaturity of data science platform
infrastructure integrated into Hadoop that is oriented to streaming
analytics applications, we have been forced to create the requisite
platform components out of necessity, utilizing many of the pieces of
the Hadoop ecosystem.
In this talk, we will speak about the Metron analytics architecture and
how it utilizes a custom data science model deployment and autodiscovery
service that is tightly integrated with Hadoop via Yarn and Zookeeper.
We will discuss how we interact with the models deployed there via a
custom domain specific language that can query models as data streams
past. We will generally discuss the full-stack data science tooling that
has been created to enable data science at scale on an advanced analytics
streaming application.
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
Overview of Machine Learning and how the Cloudera Data Science Workbench provides full access to data while supporting IT SLAs. The presentation includes details on Fast Forward Labs and The Value of Interpretability in Models.
3 Things to Learn:
How to deploy community defined open data models to break vendor lock-in and gain complete enterprise visibility
How to open up application flexibility while building on a future proofed architecture
How to infinitely scale data storage, access, and machine learning
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
3 Things to Learn About:
-How to uplevel your existing analytics stack with a collaborative environment that supports the latest open source languages and libraries.
-How to get better use of your core data management investments while opening up new supported tools for data science.
-How to expand data science outside of silo’d environments and enable self-service data science access.
Insights into Real World Data Management ChallengesDataWorks Summit
Data is your most valuable business asset and it's also your biggest challenge. This challenge and opportunity means we continually face significant road blocks toward becoming a data driven organisation. From the management of data, to the bubbling open source frameworks, the limited industry skills to surmounting time and cost pressures, our challenge in data is big.
We all want and need a “fit for purpose” approach to management of data, especially Big Data, and overcoming the ongoing challenges around the ‘3Vs’ means we get to focus on the most important V - ‘Value’.Come along and join the discussion on how Oracle Big Data Cloud provides Value in the management of data and supports your move toward becoming a data driven organisation.
Speaker
Noble Raveendran, Principal Consultant, Oracle
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureDataWorks Summit
Customers that are implementing Big Data Analytics projects in enterprise environments driven by line of business applications are faced with the three critical issues of Managing Complexity, Data Movement and Replication, and Cloud Integration. In this session you will learn about the characteristics of these pain points and how designing and implementing a data driven approach enables enterprises to implement quickly and efficiently with a future proof architecture of hybrid cloud.
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Set of product roadmap + capabilities slides from Oracle Data Integration Product Management, and thoughts on data integration on big data implementations by Mark Rittman (Independent Analyst)
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
Progressive Insurance is well known for its innovative use of data to better serve its customers, and the important role that Hortonworks Data Platform has played in that transformation. However, as with most things worth doing, the path to the Data Lake was not without its challenges. In this session, I’ll share our top use cases for Hadoop – including telematics and display ads, how a skills shortage turned supporting these applications into a nightmare, and how – and why – we now use Syncsort DMX-h to accelerate enterprise adoption by making it quick and easy (or faster and easier) to populate the data lake – and keep it up to date – with data from across the enterprise. I’ll discuss the different approaches we tried, the benefits of using a tool vs. open source, and how we created our Hadoop Ingestor app using Syncsort DMX-h.
Today enterprises desire to move more and more of their data lakes to the cloud to help them execute faster, increase productivity, drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with risks and challenges in the areas of data security, privacy, and governance. In this talk we cover how enterprises can overcome governance and security obstacles to leverage these new advances that the cloud can provide to ease the management of their data lakes in the cloud. We will also show how the enterprise can have consistent governance and security controls in the cloud for their ephemeral analytic workloads in a multi-cluster cloud environment without sacrificing any of the data security and privacy/compliance needs that their business context demands. Additionally, we will outline some use cases and patterns as well as best practices to rationally manage such a multi-cluster data lake infrastructure in the cloud.
Speaker:
Jeff Sposetti, Product Management, Hortonworks
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
Big Data adoption is a journey. Depending on the business the process can take weeks, months, or even years. With any transformative technology the challenges have less to do with the technology and more to do with how a company adapts itself to a new way of thinking about data. Building a Center of Excellence is one way for IT to help drive success.
This talk will explore Enterprise Holdings Inc. (which operates the Enterprise Rent-A-Car, National Car Rental and Alamo Rent A Car) and their experience with Big Data. EHI’s journey started in 2013 with Hadoop as a POC and today are working to create the next generation data warehouse in Microsoft’s Azure cloud utilizing a lambda architecture.
We’ll discuss the Center of Excellence, the roles in the new world, share the things which worked well, and rant about those which didn’t.
No deep Hadoop knowledge is necessary, architect or executive level.
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
It’s no secret that Apache Spark is becoming the successor to MapReduce for data processing in Hadoop. With it’s easy development, flexible API, and performance benefits, Spark is a powerful data processing engine that has quickly gained popularity within the community. On the other hand Hive continues to be the most widely used data warehouse/ETL engine with large scale adoption across enterprises. Therefore, it’s imperative to enable Spark as the underlying execution engine for Hive to seamlessly allow existing and future Hive workloads to leverage the advantages of Spark.
With the recent release of Cloudera 5.7, we have delivered on this goal by adding support for Hive-on-Spark. Data engineers and ETL developers can now transition from MR to Spark for their Hive workloads seamlessly thereby benefitting from the advantages of Spark without any disruption on their end.
Join Santosh Kumar, Senior Product Manager at Cloudera, and Rui Li, Apache Hive committer and engineer at Intel, as we discuss:
An Introduction to Spark and its advantages over MR
An introduction of Hive-on-Spark: Goals and Design Principles
Migrating to HoS and a live demo
Configuring and tuning for batch workloads
What’s next for both tools
Extreme Sports & Beyond: Exploring a new frontier in data with GoProCloudera, Inc.
GoPro is a powerful global brand, thanks in large part to its innovative cameras and accessories that capture moments other cameras just miss: surfing in Maui, skiing in Tahoe, recording your child’s first steps. And today, the company is nearly as well known for its user-generated social and content networks.
Join us for this special webinar hosted by Tableau, Trifacta, and Cloudera—featuring GoPro. We’ll dive into GoPro’s data strategy and architecture, from ingest and processing to data prep and reporting, all on AWS.
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
Apache Metron (Incubating) is a streaming cybersecurity application
built on Apache Storm and Hadoop. One of its core missions is to enable
advanced analytics through machine learning and data science to the
users. Because of the relative immaturity of data science platform
infrastructure integrated into Hadoop that is oriented to streaming
analytics applications, we have been forced to create the requisite
platform components out of necessity, utilizing many of the pieces of
the Hadoop ecosystem.
In this talk, we will speak about the Metron analytics architecture and
how it utilizes a custom data science model deployment and autodiscovery
service that is tightly integrated with Hadoop via Yarn and Zookeeper.
We will discuss how we interact with the models deployed there via a
custom domain specific language that can query models as data streams
past. We will generally discuss the full-stack data science tooling that
has been created to enable data science at scale on an advanced analytics
streaming application.
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
Overview of Machine Learning and how the Cloudera Data Science Workbench provides full access to data while supporting IT SLAs. The presentation includes details on Fast Forward Labs and The Value of Interpretability in Models.
3 Things to Learn:
How to deploy community defined open data models to break vendor lock-in and gain complete enterprise visibility
How to open up application flexibility while building on a future proofed architecture
How to infinitely scale data storage, access, and machine learning
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
3 Things to Learn About:
-How to uplevel your existing analytics stack with a collaborative environment that supports the latest open source languages and libraries.
-How to get better use of your core data management investments while opening up new supported tools for data science.
-How to expand data science outside of silo’d environments and enable self-service data science access.
The current Hadoop ecosystem is challenged and slowed by fragmented and duplicated efforts.
An industry standard is required that translates to immediate benefits that will increase stability, capabilities and compatibility among Hadoop distributions. Its also important to include an open data management core with emphasis on making it enterprise focused.
The ODPi is a shared industry effort focused on build such standards and also promoting and advancing the state of Big Data technologies. Linaro is actively involved in this effort and also to make sure ODPi is ARM compatible.
This talk will go over some of specifications defined, Linaro's contributions, Roadmap and a quick demo
Yosef Kerzner's report on Toorcamp 2016. Presented at Houston Hadoop Meetup in July 2016.
• Your own drone to deliver vegetarian tacos from nearby town (of Seattle)
• Reverse engineering and attacking the .NET applications
• Hacking the North American railways, and more...
Vskills certification for Apache Cassandra Professional assesses the candidate for Apache Cassandra database. The certification tests the candidates on various areas in Apache Cassandra which includes knowledge of installing, administering and developing applications utilizing the Apache Cassandra.
http://www.vskills.in/certification/Certified-Apache-Cassandra-Professional
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
http://bit.ly/1wsAuRS - There are many hidden costs for Apache Hadoop that have different effects across different Hadoop distributions. With the new MapR TCO calculator organisations have a simple and reliable tool that is based on facts to compare costs.
SAP HANA Cloud Portal - Overview PresentationSAP Portal
Get all the details on SAP HANA Cloud Portal which is the name for a cloud-based portal solution that delivers easy site creation and social consumption at end user level, while leveraging SAP’s differentiating assets: in-memory computing, business applications and installed base.
Lacking the technology to directly leverage Hadoop, some companies are foregoing its full benefits opting to treat Hadoop as just another data source for their legacy BI tools. But storage is only one benefit of Hadoop and ignores its linear scalability and data flexibility across all data types. Using Hadoop natively for both storage and computation in an analytic capacity has already led to dramatic increases in business benefits. Hadoop analytics has already identified over $2B in potential fraud at one of the world’s largest credit card companies. Sears has already reduced reporting times over traditional BI from 12 weeks to 3 days. A major internet security company increased customer conversion by 60% and revenue by $20 million. Meaningful returns are spread across Fortune 100 enterprises and fast growing startups with the common thread being self-service big data analytics leveraging Hadoop’s native capabilities. In this talk, we’ll highlight the core value proposition of building analytics natively on Hadoop, share real-world use cases that resulted in dramatic ROI, and reveal the next major step in visual big data analytics.
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
Oracle Itay Systems Presales Team presents : Big Data in any flavor, on-prem, public cloud and cloud at customer.
Presentation done at Digital Transformation event - February 2017
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
Hot Technologies of 2013 with Robin Bloor, Rick Sherman and IBM
Live Webcast June 19, 2013
http://www.insideanalysis.com
The promise of Hadoop can be seen in all kinds of ways -- the proliferation of open source projects; the virtually limitless applications of Big Data; the sheer number of vendors getting involved. But the real value only comes from a mature environment, and that's Hadoop 2.0. What are the component parts of a robust solution? How are today's cutting-edge organizations leveraging the power of Big Data?
Register for this episode of Hot Technologies to hear veteran Analysts Dr. Robin Bloor of The Bloor Group, and Rick Sherman of Athena IT Solutions, as they offer perspective on how the Hadoop movement is shaping up. Larry Weber of IBM will then offer his take on the tools and architecture necessary to tackle the new challenges posed by Big Data. He'll discuss IBM's latest big data offerings including IBM InfoSphere BigInsights, IBM InfoSphere Streams, and IBM InfoSphere Data Explorer, and IBM's vision for simplifying an organization's big data journey.
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
Inefficient data workloads are all too common across enterprises - causing costly delays, breakages, hard-to-maintain complexity, and ultimately lost productivity. For a typical enterprise with multiple data warehouses, thousands of reports, and hundreds of thousands of ETL jobs being executed every day, this loss of productivity is a real problem. Add to all of this the complex handwritten SQL queries, and there can be nearly a million queries executed every month that desperately need to be optimized, especially to take advantage of the benefits of Apache Hadoop. How can enterprises dig through their workloads and inefficiencies to easily see which are the best fit for Hadoop and what’s the fastest path to get there?
Cloudera Navigator Optimizer is the solution - analyzing existing SQL workloads to provide instant insights into your workloads and turns that into an intelligent optimization strategy so you can unlock peak performance and efficiency with Hadoop. As the newest addition to Cloudera’s enterprise Hadoop platform, and now available in limited beta, Navigator Optimizer has helped customers profile over 1.5 million queries and ultimately save millions by optimizing for Hadoop.
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
Azure Big Data: “Got Data? Go Modern and Monetize”.
In this session you will learn how to architected, developed, and build completely in the open, Hortonworks Data Platform (HDP) that provides an enterprise ready data platform to adopt a Modern Data Architecture.
Today the terms "Big Data" and "Internet of Things" draw a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on traditional "enterprise data". Beyond that critical data, however, is a potential treasure trove of additional data: weblogs, social media, email, sensors, photographs and much more that can be mined for useful information. More and more organizations are therefore looking to include non-traditional yet potentially very valuable data with their traditional enterprise data in their business intelligence analysis.
As the world's most popular open source database, and the leading open source database for Web-based and Cloud-based applications, MySQL is a key component of numerous big data platforms. This presentation explores how you can unlock extremely valuable insights using MySQL with the Hadoop platform.
HP Helion Webinar #4 - Open stack the magic pillBeMyApp
We will go through a quick overview about the 5 years of OpenStack cloud computing platform. This webinar explains the short history of this fast growing open-source initiative, and try to answer the common questions about the place of infrastructure and platform services in the IT hierarchy.
The technology is ready, but are we ready for the cloud adoption? Does it really solve our business problems? Learn the basic terminology, get an insight about the IT operation and development transition steps required to win the efficiency race.
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
Take Data Management to the next level: Connect Analytics and Machine Learning in a single governed platform consisting of a curated protable open source stack. Run this platform on-prem, hybrid or multicloud, reuse code and models avoid lock-in.
Level Up – How to Achieve Hadoop AccelerationInside Analysis
The Briefing Room with Robin Bloor and HP Vertica
Live Webcast on August 26, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=3dd6d1b068fe395f665c75adb682ac41
Hadoop has long passed the point of being a nascent technology, but many users have found that when left to its own devices, Hadoop can be a one trick pony. To get the most out of Hadoop, organizations need a flexible platform that empowers analysts and data managers with a complete set of information lifecycle management and analytics tools without a performance tradeoff.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he outlines Hadoop’s role in a big data architecture. He’ll be briefed by Walt Maguire of HP Vertica, who will showcase his company’s big data solutions, including HAVEn and the HP Big Data Platform. He will demonstrate how HP Vertica acts as a complement to Hadoop, and how the combination of the two provides a versatile and highly performant solution.
Visit InsideAnlaysis.com for more information.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
Slides from the STL Big Data IDEA meeting from January 2019. The presenters discussed technologies to continue using, stop using, and start using in 2019.
Transform Your Business with Big Data and Hortonworks Pactera_US
Customer insight and marketplace predictions are a few of the profitable benefits found in big data technology. Leading companies are using the advanced analytics solution to find new revenue streams, increase customer satisfaction and optimize the supply chain.
Houston Technology Center presentation by SHMsoft. eDiscovery, data governance, and compliance vision that can be build on Hadoop clusters and public or private clouds.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. • Market Background
• Who is Altiscale?
• Why are we different/better?
• Hadoop Admin
• Apache Hadoop Stack
• Platform/Access/Demo
• Q/A
2
Big Data As A Service
5. 5
Big Data in The Cloud is Accelerating
On-
Premises
32%
Cloud
Only
23%
Cloud
Plus On-
Premises
29%
Source: “Hadoop Expansion Boosts Cloud and Unsupported On-Premises Deployments,” Merv Adrian, Nick Huedecker, 3 September 2015
6. But the journey has dangers
Gartner:
70% of independent
Big Data implementations
will fail to meet revenue
and cost objectives,
through 2018.
8. Altiscale Data Cloud GA in 2014
Financed by top-tier technology investors
Recognized innovator in Hadoop-as-a-Service
About Altiscale
9. About Altiscale
Led by experienced, renowned Hadoop team from Yahoo!
• Raymie Stata, CEO. Former Yahoo! CTO,
well-known advocate of Apache Software Foundation
• David Chaiken, CTO. Former Yahoo! Chief Architect
Built and managed by veterans of Big Data, SaaS, and
enterprise software
• From Google, Netflix, LinkedIn, VMware, Oracle, and Yahoo!
40,000 nodes
500 PB
1,000 users
$ billions at stake
Raymie Stata, CEO David Chaiken, CTO Ricardo Jenez
VP of Engineering
Charles Wimmer
Head of Operations
10. Big data built for speed
Fast time to value—days not months
Easier, faster scalability—with elastic scaling
Operations support—so your jobs get done
Lower TCO—for fast investment payback
16. Altiscale Data Cloud is 100% based on Apache open source.
Our current Altiscale Data Cloud 4.0 release is composed of the following Apache components and
versions:
• Apache Hadoop 2.7.1
• Apache Spark 1.5*
• Apache Hive (& HCatalog) 1.2
• Apache Tez 0.7.0
• Apache Pig 0.15.1
• Apache Oozie 4.2.0
• Apache Flume 1.5.2
• Avro 1.7.4
• JDK/JRE 7 (Sun/Oracle version)
• HttpFS
In addition to the above, we also support the three latest versions of Spark to our customers. That
allows our customers the options of a conservative approach as well as a the option to work with
the “bleeding edge” fast moving Spark community.
Concurrency with Apache Versioning
17. Hire an expert to take care of the cluster
• Hardware setup and Cluster installation
• Address hardware failure
• Upgrade Hadoop stack
• Tuning config parameters
• yarn-site.xml ex : yarn.nodemanager.resource.memory-mb
• mapred-site.xml ex : mapreduce.task.io.sort.mb
• hdfs-site.xml ex : dfs.blocksize
Hadoop Administration
19. Spark example
• Build Spark code laptop using maven
• Build the jar and copy over Altiscale’s workbench (Gateway) node.
• Launch Spark job on YARN.
• Monitor using Resource Manager
Quick Spark Demo