This talk will detail the HSBC Big Data journey to date walking through the genesis of the Big Data initiative which was triggered by continual challenges in delivering data driven products. The global scale, diversity and legacy of an organization like HSBC presents challenges for Hadoop adoption not typically faced by younger companies. Big Data technologies are by their very nature disruptive to the established Enterprise IT environment. Hadoop and the peripheral toolsets in the big data ecosystem do not fit comfortably into an Enterprise Data Centre, IT Operational processes and can even prove disruptive to current organization structures. Alasdair will focus on the steps that HSBC has taken to mitigate concerns about Hadoop and raise awareness of the game changing benefits a successful adoption of the technology will bring. HSBC have taken an innovative approach to proving out the value of the technology engaging developers with a brakes off opportunity to use the platform and by placing Hadoop in a competitive scenario with traditional technologies. The Hadoop journey in HSBC was initiated in Scotland, blessed in London and proved out in China.
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Revolution Analytics
Everyone involved in high-stakes analytics wants power, speed and flexibility regardless of the size of the data set and complexity of the analysis. Trailblazing organizations that have deployed IBM Netezza Analytics with their IBM Netezza data warehouse appliances (TwinFin) with Revolution R Enterprise are getting all three.
Geometric's interoperability solution offers greater efficiencies through Business Process Integration in the product realization value stream, enabling speeding up, our customer's business processes across desperate PLM systems.
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Revolution Analytics
Everyone involved in high-stakes analytics wants power, speed and flexibility regardless of the size of the data set and complexity of the analysis. Trailblazing organizations that have deployed IBM Netezza Analytics with their IBM Netezza data warehouse appliances (TwinFin) with Revolution R Enterprise are getting all three.
Geometric's interoperability solution offers greater efficiencies through Business Process Integration in the product realization value stream, enabling speeding up, our customer's business processes across desperate PLM systems.
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this presentation, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
This informative presentation on integration of PLM and ERP comes to you from Barry-Wehmiller International resources (BWIR), global services & solutions partner to SolidWorks Enterprise PDM, This was made at SolidWorks World 2010 in specific context to integration of various ERP systems to Enterprise PDM . This presentation covers:
1. Role of PDM & ERP in Product Lifecycle
2. Need for integration between PDM/PLM and ERP
3. Understanding Industry-specific demands
4. SolidWorks Enterprise PDM and ERP integration
5. Case Study 1 : SolidWorks EPDM – Infor XA Integration
6. Case Study 2 : SolidWorks EPDM – SAP Integration
To Each Their Own: How to Solve Analytic ComplexityInside Analysis
The Briefing Room with Shawn Rogers and Noetix
Slides from the Live Webcast on Aug. 14, 2012
One size will never fit all in the complex world of information management. In fact, the variety of information systems in use continues to expand. That includes all kinds of systems: data-producing applications, data-processing apps, and the downstream tools used for reporting and analytics. How can data-savvy organizations stay ahead of the curve?
Check out this episode of The Briefing Room to learn from Analyst Shawn Rogers of Enterprise Management Associates, who will explain how effective use of standard data models can solve the complexity of increasingly heterogeneous information architectures. Rogers will be briefed by Daryl Orts of Noetix who will tout his company’s wide range of industry and application-specific data models which can be used to satisfy the particular needs of today’s diverse user community.
For more information, visit: http://www.insideanalysis.com
PDM: Key for Successful Management of the
Corporate-wide Product Data
Master data and parts management
Reuse, transparency, integrated network
Standard parts management
Atos organiza el Workshop ‘Reinventando los Medios de Comunicación’ que tuvo lugar el 26 de enero de 2012. Contamos con la participación de D. Santiago Miralles, Director General Inout TV, Digital Entertainment que nos aportó su visión del panorama actual del sector de los medios de comunicación y comentamos juntos cómo desde Atos impulsamos y respaldamos la transformación de este sector.
Estamos ante una nueva era de los medios de comunicación que conlleva nuevas formas de comunicación personal, nuevos dispositivos y la expansión de la banda ancha. Los servicios de TI son un elemento estratégico para el crecimiento y transformación de las organizaciones, y los convierte en factor de éxito y supervivencia de la empresa.
Atos Origin y Siemens IT Solutions & Services se han unido para formar Atos. Con presencia en más de 42 países y una plantilla de 74.000 tecnólogos de negocio, ofrecemos a nuestros clientes un profundo conocimiento del mercado, una presencia aún más global y un impresionante portfolio de servicios. Entendemos los retos a los que se enfrenta su negocio y le ofrecemos soluciones de TI que satisfacen sus necesidades.
Atos organized the Workshop "Reinventing the Media 'held on January 26, 2012. We have the participation of D. Santiago Miralles, Director General Inout TV, Digital Entertainment gave us his view of current industry landscape of the media and discuss together how we promote and endorse Atos from the transformation of this sector.
We are facing a new era of media has led to new forms of personal communication, new devices and the expansion of broadband. IT services are a strategic element for growth and transformation of organizations, and converts them into strength and survival of the company.
Atos Origin and Siemens IT Solutions & Services have joined to form Atos. With presence in over 42 countries and a workforce of 74,000 business technologists, we offer our clients a deep knowledge of the market, an even more comprehensive and impressive portfolio of services. We understand the challenges facing your business and offer solutions that meet their needs.
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this presentation, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
This informative presentation on integration of PLM and ERP comes to you from Barry-Wehmiller International resources (BWIR), global services & solutions partner to SolidWorks Enterprise PDM, This was made at SolidWorks World 2010 in specific context to integration of various ERP systems to Enterprise PDM . This presentation covers:
1. Role of PDM & ERP in Product Lifecycle
2. Need for integration between PDM/PLM and ERP
3. Understanding Industry-specific demands
4. SolidWorks Enterprise PDM and ERP integration
5. Case Study 1 : SolidWorks EPDM – Infor XA Integration
6. Case Study 2 : SolidWorks EPDM – SAP Integration
To Each Their Own: How to Solve Analytic ComplexityInside Analysis
The Briefing Room with Shawn Rogers and Noetix
Slides from the Live Webcast on Aug. 14, 2012
One size will never fit all in the complex world of information management. In fact, the variety of information systems in use continues to expand. That includes all kinds of systems: data-producing applications, data-processing apps, and the downstream tools used for reporting and analytics. How can data-savvy organizations stay ahead of the curve?
Check out this episode of The Briefing Room to learn from Analyst Shawn Rogers of Enterprise Management Associates, who will explain how effective use of standard data models can solve the complexity of increasingly heterogeneous information architectures. Rogers will be briefed by Daryl Orts of Noetix who will tout his company’s wide range of industry and application-specific data models which can be used to satisfy the particular needs of today’s diverse user community.
For more information, visit: http://www.insideanalysis.com
PDM: Key for Successful Management of the
Corporate-wide Product Data
Master data and parts management
Reuse, transparency, integrated network
Standard parts management
Atos organiza el Workshop ‘Reinventando los Medios de Comunicación’ que tuvo lugar el 26 de enero de 2012. Contamos con la participación de D. Santiago Miralles, Director General Inout TV, Digital Entertainment que nos aportó su visión del panorama actual del sector de los medios de comunicación y comentamos juntos cómo desde Atos impulsamos y respaldamos la transformación de este sector.
Estamos ante una nueva era de los medios de comunicación que conlleva nuevas formas de comunicación personal, nuevos dispositivos y la expansión de la banda ancha. Los servicios de TI son un elemento estratégico para el crecimiento y transformación de las organizaciones, y los convierte en factor de éxito y supervivencia de la empresa.
Atos Origin y Siemens IT Solutions & Services se han unido para formar Atos. Con presencia en más de 42 países y una plantilla de 74.000 tecnólogos de negocio, ofrecemos a nuestros clientes un profundo conocimiento del mercado, una presencia aún más global y un impresionante portfolio de servicios. Entendemos los retos a los que se enfrenta su negocio y le ofrecemos soluciones de TI que satisfacen sus necesidades.
Atos organized the Workshop "Reinventing the Media 'held on January 26, 2012. We have the participation of D. Santiago Miralles, Director General Inout TV, Digital Entertainment gave us his view of current industry landscape of the media and discuss together how we promote and endorse Atos from the transformation of this sector.
We are facing a new era of media has led to new forms of personal communication, new devices and the expansion of broadband. IT services are a strategic element for growth and transformation of organizations, and converts them into strength and survival of the company.
Atos Origin and Siemens IT Solutions & Services have joined to form Atos. With presence in over 42 countries and a workforce of 74,000 business technologists, we offer our clients a deep knowledge of the market, an even more comprehensive and impressive portfolio of services. We understand the challenges facing your business and offer solutions that meet their needs.
How do you combine comprehensive analysis running on large amount of data with the demand for responsiveness of today's api services?
This talk illustrates one of recipes that we currently use at ING to tackle this problem. Our analytical stack combines machine learning algorithms running on hadoop cluster and api services executed by an akka cluster.
Cassandra is used as a 'latency adapter' between the fast and the slow path. Our api services are executed by the akka/spray layer. Those services consume both live data sources as well as intermediate results as promoted by the hadoop layer via cassandra. This approach allows us to provide internal api services which are both complete and responsive.
API Adoption Patterns in Banking & The Promise of MicroservicesAkana
Akana VP of Product Marketing, Sachin Agarwal, explains API adoption patterns that are specific to banking, and how microservices can be used to help develop financial applications.
Fundamentals of Big Data, Hadoop project design and case study or Use case
General planning consideration and most necessaries in Hadoop ecosystem and Hadoop projects
This will provide the basis for choosing the right Hadoop implementation, Hadoop technologies integration, adoption and creating an infrastructure.
Building applications using Apache Hadoop with a use-case of WI-FI log analysis has real life example.
Apache Hadoop is quickly becoming the technology of choice for organizations investing in big data, powering their next generation data architecture. With Hadoop serving as both a scalable data platform and computational engine, data science is re-emerging as a center-piece of enterprise innovation, with applied data solutions such as online product recommendation, automated fraud detection and customer sentiment analysis. In this talk Ofer will provide an overview of data science and how to take advantage of Hadoop for large scale data science projects: * What is data science? * How can techniques like classification, regression, clustering and outlier detection help your organization? * What questions do you ask and which problems do you go after? * How do you instrument and prepare your organization for applied data science with Hadoop? * Who do you hire to solve these problems? You will learn how to plan, design and implement a data science project with Hadoop
Franciscan Alliance Blazes New Trails in Healthcare DeliveryAvaya Inc.
Franciscan Alliance operates 13 hospitals and more than 170 medical practices across Indiana, Illinois and Michigan. Avaya Fabric Networking gave them the bandwidth they need to support future technologies and the flexibility to grow. Learn more: http://bit.ly/1ICcUww
Saleseffectivity and business intelligencemarekdan
some information about business intelligence second generation (in-memory) Tibco Spotfire and InfomatiX view how to use BI and mobile solutions to increase sales and marketing effectivennes
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
Michael Wrinn
Research Program Director, University Research Office,
Intel Corporation
Jason Dai
Engineering Director and Principal Engineer,
Intel Corporation
Webinar: Open Source Business Intelligence IntroSpagoWorld
The presentation supported the webinar delivered by Stefano Scamuzzo, SpagoBI International Manager, on 22nd December 2010 within SpagoWorld Webinar Center. http://www.spagoworld.org/
Watch full webinar here: https://bit.ly/2vN59VK
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
- What data virtualization really is.
- How it differs from other enterprise data integration technologies.
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations.
Building a business intelligence architecture fit for the 21st century by Jon...Mark Tapley
Objectives of the presentation:
To record some history –what has happened in the past that makes the future quite challenging.
To provide real examples of BI at work –good and bad.
To illustrate the nature of data and why it has become so important in driving forward
the business in the 21stcentury.
To outline a way to align technology with the business so that efforts and budget are spent
in a way that will enable the future rather that support the past.
To propose a set of principles and ideas that can guide a company in a way to make data available to all who have the penchant to turn it into useful and valuable information.
To describe the new organisation unit that will be needed to realise the dream.
About ActuateOne for Utility Analytics
Water and Energy Utilities are under tremendous pressure to demonstrate progress in asset optimization, grid optimization and performance gains across traditional business drivers such as customers, revenue protection, utility regulatory compliance and financials. ActuateOne for Utility Analytics provides a comprehensive portfolio of software and utility analytics industry expertise to ensure today’s utility leaders and customers always have access to the right information, insight and collaborative capabilities for accurate and informed decisions. Delivered through a single platform, ActuateOne for Utility Analytics ignites any utility or grid Analytics initiative with integrated asset optimization dashboards, grid optimization dashboards, utility compliance reports as well as Transformer Management Scorecards, Substation & Equipment Management Scorecards and Utility KPI Dashboards which help today’s Utility enhance performance and maximize grid performance.
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
The Briefing Room with Colin White and Jaspersoft
Slides from the Live Webcast on June 12, 2012
As the corporate appetite for analytics and reporting grows, companies must find a way to secure a strategic view of their information architecture. End users with varying degrees of expertise need a wide range of data and reports delivered in a timely fashion. As the audience for analytics expands, that puts pressure on IT infrastructure and staff. And now with the promise of Hadoop and MapReduce, the organization's desire for business insight becomes even more significant.
In this episode of The Briefing Room, veteran Analyst Colin White of BI Research will explain the value of being strategic with enterprise reporting. White will be briefed by Karl Van den Bergh of Jaspersoft, who will tout his company's “data funnel” concept, which is designed to strategically manage an organization's information architecture. By aligning information assets along this funnel, IT can effectively address the spectrum of analytical needs – from simple reporting to complex, ad hoc analysis – without over-taxing personnel and system resources.
Microsoft Business Intelligence Vision and StrategyNic Smith
Microsoft Business Intelligence slide deck, learn the Microsoft vision and strategy for business intelligence. These slides include the offering and value proposition for Microsoft BI.
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
Karya develops mobile application services that fits the unique needs of your business. Our Mobile Application Services helps the users to better utilize the power of Mobile Technology.
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
LocationTech GeoMesa enables spatial and spatiotemporal indexing and queries for HBase and Accumulo. In this talk, after an overview of GeoMesa’s capabilities in the Cloudera ecosystem, we will dive into how GeoMesa leverages Accumulo’s Iterator interface and HBase’s Filter and Coprocessor interfaces. The goal will be to discuss both what spatial operations can be pushed down into the distributed database and also how the GeoMesa codebase is organized to allow for consistent use across the two database systems.
OCLC has been using HBase since 2012 to enable single-search-box access to over a billion items from your library and the world’s library collection. This talk will provide an overview of how HBase is structured to provide this information and some of the challenges they have encountered to scale to support the world catalog and how they have overcome them.
Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.
Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).
In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Data serves as the platform for decision-making at Uber. To facilitate data driven decisions, many datasets at Uber are ingested in a Hadoop Data Lake and exposed to querying via Hive. Analytical queries joining various datasets are run to better understand business data at Uber.
Data ingestion, at its most basic form, is about organizing data to balance efficient reading and writing of newer data. Data organization for efficient reading involves factoring in query patterns to partition data to ensure read amplification is low. Data organization for efficient writing involves factoring the nature of input data - whether it is append only or updatable.
At Uber we ingest terabytes of many critical tables such as trips that are updatable. These tables are fundamental part of Uber's data-driven solutions, and act as the source-of-truth for all the analytical use-cases across the entire company. Datasets such as trips constantly receive updates to the data apart from inserts. To ingest such datasets we need a critical component that is responsible for bookkeeping information of the data layout, and annotates each incoming change with the location in HDFS where this data should be written. This component is called as Global Indexing. Without this component, all records get treated as inserts and get re-written to HDFS instead of being updated. This leads to duplication of data, breaking data correctness and user queries. This component is key to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. This component will need to have strong consistency and provide large throughputs for index writes and reads.
At Uber, we have chosen HBase to be the backing store for the Global Indexing component and is a critical component in allowing us to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. In this talk, we will discuss data@Uber and expound more on why we built the global index using Apache Hbase and how this helps to scale out our cluster usage. We’ll give details on why we chose HBase over other storage systems, how and why we came up with a creative solution to automatically load Hfiles directly to the backend circumventing the normal write path when bootstrapping our ingestion tables to avoid QPS constraints, as well as other learnings we had bringing this system up in production at the scale of data that Uber encounters daily.
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions.
These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.
To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.
In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.
Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as:
● Optimizing merchandising execution, in-stocks and sell-thru
● Enhancing operational efficiencies, enable real-time customer engagement
● Enhancing loss prevention capabilities, response time
● Creating frictionless experiences for shoppers
Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry.
We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey.
Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables.
We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance.
We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing.
Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems.
By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
3. Business Context: HSBC (HSS) a business with a lot of data…..
Global Business
Global outsourcer of
investment operations
Active in 40+ countries
& jurisdictions
Over 150 operational
technology systems
Outsourcing is a
diverse and
incrementally complex
business
3 PUBLIC
4. Challenges in building Big Data Environments
ETL is a brittle 1 shot at success One version of the truth….
Design
Tight coupling to the relational model Any significant change initiates data migration Time
Source Integration Warehouse Division Marts Channels
Ops Product
Product Read
ODS ETL Product eCommerce
Trades Product
ETL
Position
ETL Enterprise
Logical Strategic Marts Analytical
Corp CMF
Actions Model Tools
Function
Function Read
External ETL Function
ETL Staging Function Reporting
Market Data
Client
Exchange Vertical Scale RDBMS struggle with scale out Multi-Marts increase duplication Run
Big Batch Appliances are uneconomic Cost increases with proliferation
Time
Time to Market: Months for any given slice, years in total
Total Cost: Any volume or low latency environment requires annual spend in the millions to 10’s of millions
4 PUBLIC
5. Building Big Data platforms has been an unhappy experience
Time to market has increased proliferation not consolidation
Delivery risk is high, as witnessed in industry wide failure rates
Ultimate Customer satisfaction is low, we often end up
answering yesterdays questions tomorrow
The economics of traditional technologies are against
proliferation of analytical platforms
– Costs increase with addition of data sources
– Costs of change increase with addition of data sources
Processing ceilings are reached quickly when adding newer
sources of data to traditional platforms
5 PUBLIC
6. Crisis of Supply and Demand, we need a new approach
High level requirements……
A single data platform that can provide 360 views of clients, operations and products
– Functionally the platform should support:
– Continual development, integration and deployment
– Parallel development streams
– Integration of poly-structured datasets
– Multi-views on single data sets
– ……..act as an ENABLER of change
– Non-functionally the platform should support:
– A low cost economic model for analytical platforms
– Scale to terabytes with high throughput ingest and integration
– Co-exist with our current estate
– Be accessible to business and technology teams
Enter Hadoop!
6 PUBLIC
7. Introducing any new technology to an enterprise
Adoption Lifecycle: Hadoop
Learn Plan Build
Proof Business Pilot Projects
Of Concept Value Strategic Stack
What have we done?
Whats left, whats next?
7 PUBLIC
9. Big Data Vision: The Agile Information Lifecycle
Data
Events
Discovery
Analytical
Blotters
Application
Map
Reduce Ingest
Processing
Insights rarely happen on the first query or build, more likely to occur after
several iterations on a dataset
9 PUBLIC
10. Hadoop Proof of Concept Scope: Gaungzhou China
Using Time to install Ease of
Performance
a vendor maintaining
Hadoop comparison
package the cluster
Developing Integration of Building Porting
existing applications existing code
on Hadoop databases on the cluster to Hadoop
Advanced Enhance an
Build out a
Development existing
Analytics skills levels analytics
new modelling
service
on Hadoop package
10 PUBLIC
11. Proof of Concept Results
Hadoop was installed and operational in a week
18 RDBMS Warehouse and Marts databases were ported to
Hadoop in 4 weeks
A existing batch that currently take 3 hours was reengineering
on Hadoop: Run Time 10 minutes
A current Java based analytics routine was ported onto Hadoop
increasing data coverage and reducing execution time
We lost the namenode and had to rebuild the cluster…..
11 PUBLIC
12. Hadoop Code Day: Gaungzhou China
We sponsored a 24 hour code competition
to allow the off-shore teams to show their
stuff
We had over 50 volunteers for the event
The volunteers were split into teams of 3
and given 24 hours to develop an
application using the Proof of Concept
cluster
1 weeks training was offered to the
participant on a casual basis
All the teams delivered…………
12 PUBLIC
13. Next Step: Planning
Adoption Lifecycle
Learn Plan Build
Proof Business Pilot Projects
Of Concept Value Strategic Stack
13 PUBLIC
14. Big Data Plan: Big Data Economics (names removed to protect the innocent)
14 PUBLIC
15. Hadoop Economics: Technology for Austerity
REVENUE
MARGIN
COST
Hadoop speaks to the economics of today
Growing product and capacity at the same time as increasing margin
15 PUBLIC
16. Generic HSBC Big Data Use Cases
Volume File Processing Big Warehouse Advanced Analytics
Characteristics Characteristics Characteristics
• High Volume, High Throughput • Multi-source warehouse analytics • Statistical modeling and what if
processing of legacy flat files, XML environment providing a single data analysis on group wide data across
or other structured and semi- platform across multiple business multiple business lines
structured data lines • Production of data derived products
• Integration of polystructred data
Current challenges Current challenges Current challenges
• Cost: High volumes processing • Time to Market: Data Warehouse / • Scale: Traditional Analytic Data
predominantly still reside on the MI projects have proved extremely platforms have only been able to
mainframe, making low complexity challenging to implement in HSBC scale on the vertical
processing expensive and in the Finance Industry in • Cost: The amount of compute
• Scale: the ability to grow out general power required to perform volume
mainframe capacity quickly is • Complexity: Data Integration of statistical operations is cost
limited, the ability to scale on even group standard systems has prohibitive
distributed platforms is limited proved difficult due the variety of • Fidelity: Analytical calculations are
data structures and content typically run on aggregate totals
• Latency: Real Time MI is still only leading to a disconnect between
available via reporting from source events and the derived conclusions
directly or decisions
.
Day 1 Value
Strategic Value
16 PUBLIC
20. Remaining Challenges: Big Data Operations
Big Data Operations Big Data Organisation Hype / Cynicism
Is Hadoop anti-virtualisation? Segregation of duties USE IT AS A POSITIVE!!!
High Availability / disaster Big Data doesn’t want a Place Big Data into a competitive
Recovery needs to improve separate app, database, os & situation against your existing
storage team. The platform Information Management
Security and data privacy demands skilled generalists technologies, if you can’t get the
concerns job done better/faster/cheaper
then alter your decision tree?
Data Federation
PUBLIC
20
21. The art of the possible in 24 hours…..
Hadoop excites……
Hadoop on iPad & Android
(and tires)
The Winners….
Hadoop on HTML5 & Flex
Hadoop & R for Portfolio Optimisation
21 PUBLIC
Editor's Notes
In essence: We are a processor of other peoples dataChallengesNobody does data the same way, even in the same systemsDifferences are inDefinitionsFormatscontent
In essence: We are a processor of other peoples dataChallengesNobody does data the same way, even in the same systemsDifferences are inDefinitionsFormatscontent
Dedicated ETL is an expensive way of doing thingsBig RDBMS or dedicated appliances are expensiveMarts mart everywhereCONCLUSION: high volume or/and low latency is very expensive to runRESULT: People are becoming reluctant to invest in these platforms and are looking for a service that can start small and grow
The road to damascus…..Vision is HSS only at this point in timeThe search for an alternate way of doing things has led us to hadoopHadoop lowers the barrier to entry for compute style solution to data problemsCONCLUSION: We view Hadoop as THE future technology for data platformsRESULT: We have begun the tech adoption process in the bank
The road to damascus…..Vision is HSS only at this point in timeThe search for an alternate way of doing things has led us to hadoopHadoop lowers the barrier to entry for compute style solution to data problemsCONCLUSION: We view Hadoop as THE future technology for data platformsRESULT: We have begun the tech adoption process in the bank
Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
…..here’s what it looks likeWalk left to rightExplain Map ReduceContrast with the old way, our vision of the new wayEDW will be around for some time to come but will be gradually superceededMap Reduce will be implemented via high level languagesA single warehouse become achievableMarts are demised in favour of views onto the base dataThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: Hadoop brings massive compute levels to bear on these problems, affordably
The is the next generation ETLETL process become truly iterativeAccept that you will get it wrong the first time round, Hadoop make the penalty for failure minimalThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: ETL moves from brittle to bend don’t breakRESULT: In building your Big Warehouse adding additional data/systems/perspect is a low tax operation
Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
….our experience wasA vendor Hadoop package makes sense to an organisation like usData loads tooks days not monthsWe were quickly able to automate the loadsUsed Apache tools onlyBONUS Calypso data…. New for HSSHACKATHONOpen invite to all markets staffObjective; to use Hadoop against the business use caseSet judging criteriaStraight 24 hours over a weekendCompetition Prizes Attended by nearly 60 staff, equal to 20% of our China office18 teams, 17 delivered Wining application was stunningCONCLUSION: Hadoop is a great functional fit for our business demandRESULT: High level of confidence around the technology
Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
The is the next generation ETLETL process become truly iterativeAccept that you will get it wrong the first time round, Hadoop make the penalty for failure minimalThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: ETL moves from brittle to bend don’t breakRESULT: In building your Big Warehouse adding additional data/systems/perspect is a low tax operation
Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
….our experience wasA vendor Hadoop package makes sense to an organisation like usData loads tooks days not monthsWe were quickly able to automate the loadsUsed Apache tools onlyBONUS Calypso data…. New for HSSHACKATHONOpen invite to all markets staffObjective; to use Hadoop against the business use caseSet judging criteriaStraight 24 hours over a weekendCompetition Prizes Attended by nearly 60 staff, equal to 20% of our China office18 teams, 17 delivered Wining application was stunningCONCLUSION: Hadoop is a great functional fit for our business demandRESULT: High level of confidence around the technology