Avancerad dataanalys och ”big data” har under de senaste åren klättrat på trendlistorna och är nu ett av de mest prioriterade områdena i utvecklingen av nya tjänster och produkter för ledarföretag i det digitala landskapet.
Informationen som byggs upp i systemen när kundmötena digitaliseras har visat sig vara guld värt. Här finns allt vi behöver veta för att göra våra affärer mer effektiva.
Sedan sommaren 2013 har Connecta tillsammans med Google ett etablerat samarbete för att hjälpa våra kunder med övergången till moln-tjänster för bland annat avancerad dataanalys. För att göra oss själva redo att hjälpa våra kunder har vi under ett antal år utvecklat såväl kunskaper som skaffat oss erfarenheter kring Googles olika moln-produkter, som exempelvis ”Big Query”.
Big Query är ett molnbaserat analysverktyg och en del av Google Cloud Platform. Big Query gör det möjligt att ställa snabba frågor mot enorma dataset på bara någon sekund. Big Query och Google Cloud Platform erbjuder färdiga lösningar för att sätta upp och underhålla en infrastruktur som med enkla medel gör allt detta möjligt.
På Connecta Digital Consultings tredje event för våren introducerade vi våra kunder och partners i koncepten dataanalys och Big Query.
Under eventet berördes följande punkter:
- Big Data och Business Intelligence (BI)
- “The Google Big Data tools” – framgångsfaktorer och hur man kommer igång
- Google Cloud Platform och hur man genomför en framgångsrik molnsatsning
Vi presenterade case och berättade om viktiga lärdomar vi dragit i samarbetet med Google och våra kunder.
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
This topic will cover the intermediate understanding of Google Big Query and how Media Prima Digital utilizing Big Query as data warehouse for production.
In this webinar you'll learn about the best practices for Google BigQuery—and how Matillion ETL makes loading your data faster and easier. Find out from our experts how to leverage one of the largest, fastest, and most capable cloud data warehouses to improve your business and save money.
In this webinar:
- Discover how to work fast and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Learn to leverage Matillion ETL and optimize Google BigQuery
- Get tips and tricks for better performance
Google BigQuery for Everyday DeveloperMárton Kodok
IV. IT&C Innovation Conference - October 2016 - Sovata, Romania
A. Every scientist who needs big data analytics to save millions of lives should have that power
Legacy systems don’t provide the power.
B. The simple fact is that you are brilliant but your brilliant ideas require complex analytics.
Traditional solutions are not applicable.
The Plan: have oversight over developments as they happen.
Goal: Store everything accessible by SQL immediately.
What is BigQuery?
Analytics-as-a-Service - Data Warehouse in the Cloud
Fully-Managed by Google (US or EU zone)
Scales into Petabytes
Ridiculously fast
Decent pricing (queries $5/TB, storage: $20/TB) *October 2016 pricing
100.000 rows / sec Streaming API
Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
Familiar DB Structure (table, views, record, nested, JSON)
Convenience of SQL + Javascript UDF (User Defined Functions)
Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
Client libraries available in YFL (your favorite languages)
Our benefits
no provisioning/deploy
no running out of resources
no more focus on large scale execution plan
no need to re-implement tricky concepts
(time windows / join streams)
pay only the columns we have in your queries
run raw ad-hoc queries (either by analysts/sales or Devs)
no more throwing away-, expiring-, aggregating old data.
An short introduction on Big Query. With this presentation you'll quickly discover :
How load data in BigQuery
How to build dashboard using BigQuery
How to work with BigQuery
and, at last but not least, we've added some best practices
We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
This topic will cover the intermediate understanding of Google Big Query and how Media Prima Digital utilizing Big Query as data warehouse for production.
In this webinar you'll learn about the best practices for Google BigQuery—and how Matillion ETL makes loading your data faster and easier. Find out from our experts how to leverage one of the largest, fastest, and most capable cloud data warehouses to improve your business and save money.
In this webinar:
- Discover how to work fast and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Learn to leverage Matillion ETL and optimize Google BigQuery
- Get tips and tricks for better performance
Google BigQuery for Everyday DeveloperMárton Kodok
IV. IT&C Innovation Conference - October 2016 - Sovata, Romania
A. Every scientist who needs big data analytics to save millions of lives should have that power
Legacy systems don’t provide the power.
B. The simple fact is that you are brilliant but your brilliant ideas require complex analytics.
Traditional solutions are not applicable.
The Plan: have oversight over developments as they happen.
Goal: Store everything accessible by SQL immediately.
What is BigQuery?
Analytics-as-a-Service - Data Warehouse in the Cloud
Fully-Managed by Google (US or EU zone)
Scales into Petabytes
Ridiculously fast
Decent pricing (queries $5/TB, storage: $20/TB) *October 2016 pricing
100.000 rows / sec Streaming API
Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
Familiar DB Structure (table, views, record, nested, JSON)
Convenience of SQL + Javascript UDF (User Defined Functions)
Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
Client libraries available in YFL (your favorite languages)
Our benefits
no provisioning/deploy
no running out of resources
no more focus on large scale execution plan
no need to re-implement tricky concepts
(time windows / join streams)
pay only the columns we have in your queries
run raw ad-hoc queries (either by analysts/sales or Devs)
no more throwing away-, expiring-, aggregating old data.
An short introduction on Big Query. With this presentation you'll quickly discover :
How load data in BigQuery
How to build dashboard using BigQuery
How to work with BigQuery
and, at last but not least, we've added some best practices
We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions
Quick Intro to Google Cloud TechnologiesChris Schalk
This is the "Lightning Presentation" given at DreamForce 2011 on Google's Cloud Technologies. It covers, App Engine, Google Storage and BigQuery. #df11
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
As big data and data warehousing scale-up and move into the cloud, they’re increasingly likely to be delivered as services using distributed cloud query engines such as Google BigQuery, loaded using streaming data pipelines and queried using BI tools such as Looker. In this session the presenter will walk through how data modelling and query processing works when storing petabytes of customer event-level activity in a distributed data store and query engine like BigQuery, how data ingestion and processing works in an always-on streaming data pipeline, how additional services such as Google Natural Language API can be used to classify for sentiment and extract entity nouns from incoming unstructured data, and how BI tools such as Looker and Google Data Studio bring data discovery and business metadata layers to cloud big data analytics
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
Google BigQuery is one of the largest, fastest, and most capable cloud data warehouses on the market. In this webinar, we review BigQuery best practices and show you how Matillion ETL can help you get the most out of the platform to gain a competitive edge.
In this webinar:
- Discover how to work quickly and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Hear tips and tricks for loading and transforming massive amounts of data in BigQuery with Matillion ETL
- Get expert advice on improving your performance in BigQuery for quicker data analysis
- Learn how to optimize BigQuery for your marketing analytics needs
in this presentation we go through the differences and similarities between Redshift and BigQuery. It was presented during the Athens Big Data meetup May 2017.
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
Every scientist who needs big data analytics to save millions of lives should have that power. Complex interactive Big Data analytics solutions require massive architecture, and Know-How to build a fast real-time computing system.BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, working with BigQuery, streaming inserts, User Defined Functions in Javascript, and several use cases for everyday developer: funnel analytics, behavioral analytics, exploring unstructured data.
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights.
At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations.
We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
In this session, TrafficGuard’s Head of Data Science, Raigon Jolly, will discuss how TrafficGuard uses Druid and its partnership with Imply to:
- Provide granular reporting to clients in near-real time
- Monitor rules and concept drift
- Staying ahead of the moving target that is ad fraud
- Facilitate performance tuning and right-sizing infrastructure so our team can focus on innovation of our core product
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...Data Con LA
Big Data as a Service: Running Elasticsearch on Pure by Brian Gold, Founding Member, FlashBlade, PureStorage
As organizations look to scale their use of modern analytics, the traditional deployment model of these tools has become a drag on productivity. Existing big-data architectures typically run on fixed sets of server instances with tightly coupled storage. While originally designed for scalability, these rigid environments cause server sprawl and increase time-to-deployment.
Zeotap: Data Modeling in Druid for Non temporal and Nested DataImply
Druid has been the production workhorse for the past 2+ years at Zeotap powering the core Audience planning across our Connect and Targeting products. Though Druid is best suited for data having time as a dimension as it partitions data based on time first, we have used Druid to serve ML powered enhanced insights and Estimation of potential dataset sizes, to assist us with our core business case of Audience planning. These are datasets without timestamp a.k.a non-temporal with high scale and having nested dimensions. These have been achieved using nuanced data modelling to store the data sets and achieve millisecond latency retrieval on top of the same. The core of the presentation would be on the data modelling journey to achieve these use cases detailing the query access patterns. We also delve upon the architecture - ingestion into druid sink and processing including ML. In the end we go over the production setup and configurations and provide the performance tunings applied. The presentation would have the following heads:
The presentation would have the following heads
* Business case in Ad-Tech and Mar-Tech vertical
* Audience Planner Usecase 1 - Insights
-Lambda Architecture and data flow
-Deep dive on data model
-Takeaways
*Audience Planner Usecase 2 - Estimator
-Architecture and data flow
-Stratified sampling explained
-Data model to solve nested data - deep dive
-Takeaways
*Audience Planner Usecase 3 - Skew correction
-Skew correction model
-Query Access
-Data model in Druid to accommodate output from ML models
-Takeaways
*Production setup, config and Tunings
*Production Operation experience takeaways
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Matillion
As companies grow, so does the volume of their data. Without the proper solutions in place to quickly store, measure and analyze that data, its usefulness quickly declines.
See our latest webinar to learn about how companies are increasingly turning towards cloud-based data warehousing to derive more value out of their data and apply their findings to make smarter business decisions. The webinar covers core topics including:
- The benefits of using Snowflake’s unique architecture for interacting with data.
- How Matillion can help you quickly load and transform your data to maximize its value.
- Expert advice on how to apply data warehousing and ETL best practices.
Watch the full webinar: https://youtu.be/mIOm3j431OQ
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
Transitioning to a Big Data architecture is a big step; and the complexity of moving existing analytical services onto modern platforms like Cloudera, can seem overwhelming.
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
Infectious Media runs on data. But, as an ad-tech company that records hundreds of thousands of web events per second, they have have to deal with data at a scale not seen by most companies. You can not make decisions with data when people need to write manual SQL only for queries take 10-20 minutes to return. Infectious Media made the switch to Google BigQuery and Looker and now every member of every team can get the data they need in seconds.
Infectious Media shares:
- Why they chose their current stack
- Why faster data means happier customers
- Advantages and practical implications of storing and processing that much data
Check out the recording at https://info.looker.com/h/i/308848878-power-to-the-people-a-stack-to-empower-every-user-to-make-data-driven-decisions
Quick Intro to Google Cloud TechnologiesChris Schalk
This is the "Lightning Presentation" given at DreamForce 2011 on Google's Cloud Technologies. It covers, App Engine, Google Storage and BigQuery. #df11
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
As big data and data warehousing scale-up and move into the cloud, they’re increasingly likely to be delivered as services using distributed cloud query engines such as Google BigQuery, loaded using streaming data pipelines and queried using BI tools such as Looker. In this session the presenter will walk through how data modelling and query processing works when storing petabytes of customer event-level activity in a distributed data store and query engine like BigQuery, how data ingestion and processing works in an always-on streaming data pipeline, how additional services such as Google Natural Language API can be used to classify for sentiment and extract entity nouns from incoming unstructured data, and how BI tools such as Looker and Google Data Studio bring data discovery and business metadata layers to cloud big data analytics
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
Google BigQuery is one of the largest, fastest, and most capable cloud data warehouses on the market. In this webinar, we review BigQuery best practices and show you how Matillion ETL can help you get the most out of the platform to gain a competitive edge.
In this webinar:
- Discover how to work quickly and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Hear tips and tricks for loading and transforming massive amounts of data in BigQuery with Matillion ETL
- Get expert advice on improving your performance in BigQuery for quicker data analysis
- Learn how to optimize BigQuery for your marketing analytics needs
in this presentation we go through the differences and similarities between Redshift and BigQuery. It was presented during the Athens Big Data meetup May 2017.
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
Every scientist who needs big data analytics to save millions of lives should have that power. Complex interactive Big Data analytics solutions require massive architecture, and Know-How to build a fast real-time computing system.BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, working with BigQuery, streaming inserts, User Defined Functions in Javascript, and several use cases for everyday developer: funnel analytics, behavioral analytics, exploring unstructured data.
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights.
At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations.
We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
In this session, TrafficGuard’s Head of Data Science, Raigon Jolly, will discuss how TrafficGuard uses Druid and its partnership with Imply to:
- Provide granular reporting to clients in near-real time
- Monitor rules and concept drift
- Staying ahead of the moving target that is ad fraud
- Facilitate performance tuning and right-sizing infrastructure so our team can focus on innovation of our core product
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...Data Con LA
Big Data as a Service: Running Elasticsearch on Pure by Brian Gold, Founding Member, FlashBlade, PureStorage
As organizations look to scale their use of modern analytics, the traditional deployment model of these tools has become a drag on productivity. Existing big-data architectures typically run on fixed sets of server instances with tightly coupled storage. While originally designed for scalability, these rigid environments cause server sprawl and increase time-to-deployment.
Zeotap: Data Modeling in Druid for Non temporal and Nested DataImply
Druid has been the production workhorse for the past 2+ years at Zeotap powering the core Audience planning across our Connect and Targeting products. Though Druid is best suited for data having time as a dimension as it partitions data based on time first, we have used Druid to serve ML powered enhanced insights and Estimation of potential dataset sizes, to assist us with our core business case of Audience planning. These are datasets without timestamp a.k.a non-temporal with high scale and having nested dimensions. These have been achieved using nuanced data modelling to store the data sets and achieve millisecond latency retrieval on top of the same. The core of the presentation would be on the data modelling journey to achieve these use cases detailing the query access patterns. We also delve upon the architecture - ingestion into druid sink and processing including ML. In the end we go over the production setup and configurations and provide the performance tunings applied. The presentation would have the following heads:
The presentation would have the following heads
* Business case in Ad-Tech and Mar-Tech vertical
* Audience Planner Usecase 1 - Insights
-Lambda Architecture and data flow
-Deep dive on data model
-Takeaways
*Audience Planner Usecase 2 - Estimator
-Architecture and data flow
-Stratified sampling explained
-Data model to solve nested data - deep dive
-Takeaways
*Audience Planner Usecase 3 - Skew correction
-Skew correction model
-Query Access
-Data model in Druid to accommodate output from ML models
-Takeaways
*Production setup, config and Tunings
*Production Operation experience takeaways
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Matillion
As companies grow, so does the volume of their data. Without the proper solutions in place to quickly store, measure and analyze that data, its usefulness quickly declines.
See our latest webinar to learn about how companies are increasingly turning towards cloud-based data warehousing to derive more value out of their data and apply their findings to make smarter business decisions. The webinar covers core topics including:
- The benefits of using Snowflake’s unique architecture for interacting with data.
- How Matillion can help you quickly load and transform your data to maximize its value.
- Expert advice on how to apply data warehousing and ETL best practices.
Watch the full webinar: https://youtu.be/mIOm3j431OQ
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
Transitioning to a Big Data architecture is a big step; and the complexity of moving existing analytical services onto modern platforms like Cloudera, can seem overwhelming.
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
Infectious Media runs on data. But, as an ad-tech company that records hundreds of thousands of web events per second, they have have to deal with data at a scale not seen by most companies. You can not make decisions with data when people need to write manual SQL only for queries take 10-20 minutes to return. Infectious Media made the switch to Google BigQuery and Looker and now every member of every team can get the data they need in seconds.
Infectious Media shares:
- Why they chose their current stack
- Why faster data means happier customers
- Advantages and practical implications of storing and processing that much data
Check out the recording at https://info.looker.com/h/i/308848878-power-to-the-people-a-stack-to-empower-every-user-to-make-data-driven-decisions
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Revolution in Business Analytics-Zika Virus ExampleBardess Group
Even from the “man in the street” perspective, there is a sense that we are living in an increasingly algorithmic world. Self-driving cars, pizza delivery by drone, and smart houses are commonplace. The technologies enabling this revolution are both simultaneously mature and evolving rapidly.
In this session, we’ll took a look at a real world problem, the recent global outbreak of the ZIka virus, and used data analytics technologies to gain valuable insights that can assist authorities and the general public to understand and potentially prevent the spread of this disease.
Bardess Group, a sponsor of the event and business analytics consulting firm, will demonstrate how huge, extremely jagged data from a variety of sources can be collected and prepared and rapidly made available for analysis. Advanced machine learning and predictive analysis further enhance the value of those insights.
Finally, Bardess will make the case that using a systematic approach to conceptually visualize the strategic journey to insightful business analytics, the analytics value chain, can assist any organization prepare for this revolution in analytics.
Also see http://cloudera.qlik.com for the demos.
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Cloudera, Inc.
Are you struggling to validate the added costs of a Hadoop implementation? Are you struggling to manage your growing data?
The costs of implementing Hadoop may be more beneficial than you anticipate. Dell and Intel recently commissioned a study with Forrester Research to determine the Total Economic Impact of the Dell | Cloudera Apache Hadoop Solution, accelerated by Intel. The study determined customers can see a 6-month payback when implementing the Dell | Cloudera solution.
Join Dell, Intel and Cloudera, three big data market leaders, to understand how to begin a simplified and cost-effective big data journey and to hear case studies that demonstrate how users have benefited from the Dell | Cloudera Apache Hadoop Solution.
Watch here: https://bit.ly/3i2iJbu
You will often hear that "data is the new gold". In this context, data management is one of the areas that has received more attention by the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
Join us for an exciting session that will cover:
- The most interesting trends in data management.
- Our predictions on how those trends will change the data management world.
- How these trends are shaping the future of data virtualization and our own software.
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
https://www.capgemini.com/insights-data/data/leap-data-transformation-framework
The complexity of moving existing analytical services onto modern platforms like Cloudera can seem overwhelming. Capgemini’s Leap Data Transformation Framework helps clients by industrializing the entire process of bringing existing BI assets and capabilities to next-generation big data management platforms.
During this webinar, you will learn:
• The key drivers for industrializing your transformation to big data at all stages of the lifecycle – estimation, design, implementation, and testing
• How one of our largest clients reduced the transition to modern data architecture by over 30%
• How an end-to-end, fact-based transformation framework can deliver IT rationalization on top of big data architectures
As government agencies continue to advance their digital transformation and improve the citizen digital experience, they are moving to new platforms. A successful digital transformation, and continued compliance with federal technology update mandates, such as Cloud First and the Modernizing Government Technology Act, involves embracing multiple cloud platforms. This has a ripple effect of streamlining agency operations and presenting a new and updated digital experience for citizens.
Our government agency speaker will outline how one agency moved operations to the cloud and the lessons they learned along the way.
During today’s webcast, you will learn:
- How cloud platforms can help improve the citizen digital experience
- How to architect a multi-cloud platform that makes sense for your agency
- Lessons learned by other government agencies and from private sector companies
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...Precisely
Data quality: it’s what we all strive for, and yet we don’t always have what we need to achieve it.
Embracing the cloud with a more holistic, yet simplified user experience will help you find exponential value in your data today – and plan for tomorrow. Join us to learn about a more modern approach that will empower your teams to more deeply understand, trust, and pro-actively address anomalies in your critical data.
Learn more about the value of next-generation cloud solutions that will power your organization into the future by joining us on September 22 where you will hear from Precisely’s Emily Washington, SVP of Product Management, Chuck Kane, VP of Product Management, and David Woods, SVP of Strategic Services. Be sure to bring your questions for our team of experts to the live Q&A session following their presentations and demos.
Unleash the power of your data and gain instant insights without additional investments in IT infrastructure. We review the state of data analytics, discuss the differences in long-term, medium-term and (near) real-time data and how companies can leverage it with PowerBI.
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
Teams working on new business initiatives, whether for enhancing customer engagement, creating new value, or addressing compliance considerations, know that a successful strategy starts with the synchronization of operational and reporting data from across the organization into a centralized repository for use in advanced analytics and other projects. However, the range and complexity of data sources as well as the lack of specialized skills needed to extract data from critical legacy systems often causes inefficiencies and gaps in the data being used by the business.
The first part of our webcast series on Foundation Strategies for Trust in Big Data provides insight into how Syncsort Connect with its design once, deploy anywhere approach supports a repeatable pattern for data integration by enabling enterprise architects and developers to ensure data from ALL enterprise data sources– from mainframe to cloud – is available in the downstream data lakes for use in these key business initiatives.
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...Denodo
Watch full webinar here: https://bit.ly/3cbpipB
Uno de los sectores en los que la transformación digital está teniendo un efecto más disruptivo es el de la fabricación. Líderes del sector manufacturero están apostando por el Big Data, la computación en la nube, la inteligencia artificial y el Internet de las Cosas (IoT) entre otras tecnologías, además de contemplar la llegada de la 5G, con el fin de:
- Automatizar los procesos de manera eficiente, para permitir una mayor producción en menor tiempo
- Crear valor añadido en los productos manufacturados
- Conectar la planta industrial con el punto de venta
- Impulsar el análisis en tiempo real de datos provenientes de diferentes cadenas de producción
Sin embargo, para alcanzar estos objetivos y llevar a cabo esta revolución tecnológica, también conocida como industria 4.0, las manufacturas tienen que enfrentarse a una serie de desafíos no negligentes. El sector industrial es el que genera más datos en el mundo, y en la era digital, la velocidad, la diversidad y el volumen exponencial de los datos pueden superar las arquitecturas de TI tradicionales. Además, la mayoría de los fabricantes se enfrentan a silos de datos, lo que hace que su tratamiento sea lento y costoso. Necesitan entonces una plataforma de TI fiable que permita integrar, centralizar y analizar datos de distintas fuentes y diferentes formatos de manera ágil y segura para poner la información al servicio del negocio.
Los expertos de Enki y Denodo te proponen este seminario online para descubrir qué es la virtualización de datos, y por qué líderes del sector apuestan por esta tecnología innovadora para optimizar su estrategia de TI y conseguir un ROI significativo gracias a un acceso más rápido, simple y unificado a los datos industriales.
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
Bernard Doering, Senior Slaes Director DACH, Cloudera.
Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
29. For the past 15 years, Google
has been building out the world’s
fastest, most powerful, highest
quality cloud infrastructure on
the planet.
Images by Connie Zhou
30. Google has been running some of
the world’s largest distributed
systems with unique and stringent
requirements.
Images by Connie Zhou
35. May 2013
Google Compute Engine
(Preview)
PHP for App Engine
(Preview)
Big JOIN in BigQuery
The Last Year in the Cloud Platform
November 2013
Cloud Endpoints GA
Dedicated Memcache GA
August 2013
Layer 3 Load
Balancing
Encryption at
Rest for Cloud
Storage
December 2013
Compute Engine GA
Live Migration
Persistent Disks
July 2013
Dedicated
Memcache
Offline Disk
Import
February 2014
HIPAA Support
Cloud SQL GA
38. We can do better
Lower and simplify pricing
Make developers more productive
39. Prices are falling
• Public cloud prices
have dropped 6-8%
annually
Source: Google Internal Data
20142006
Public Cloud Prices
40. But prices are not falling fast enough
• Hardware costs have
dropped 20-30%
annually
Hardware Cost
Public Cloud Prices• Public cloud prices
have dropped 6-8%
annually
Source: Google Internal Data
20142006
41. Pricing Updates (Effective April 1st, 2014)
35% price drop on Compute Engine, across all sizes,
regions, and classes
37% price drop on App Engine frontend instance hours, 33%
on Datastore writes and 50% on Dedicated Memcache
68% price drop on Cloud Storage
On Demand pricing reduced by 85% - $5/TB
42. You should get the best price with...
No Upfront Payments
No Lock-in
No Complexity
43. 100%0% 20% 40% 60% 80%
Sustained Use
Previous
On Demand
New
On Demand
$0.11
$0.10
$0.09
$0.08
$0.07
$0.06
$0.05
$0.04
$0.03
Sustained-use discountsNetPricePerHour
45. • Managed VMs
• The Flexibility of Compute Engine
• The productivity of App Engine
• Provides best of both worlds
• IaaS + PaaS
Flexibility Managementand
Managed VMs
46. Developer Productivity
• Use the tools you know and love
• Fast, reliable deployments
• Isolate and fix issues in production
with Continuous Integration
Developer Productivity
Time to
Market
and
Robust
Design
47. 1000X BigQuery Streaming
• Near real-time analysis
• High fidelity, low latency
• Focus on results, not sharding
and transforming
$0.01 per 100,000 rows Real time availability of data100,000 rows per second
48. • Deployment Manager
• Replica Pools
• Cloud DNS
• Windows Server, SuSE, RHEL support
and so much more...
49. Agenda 25th, 2014
Google Cloud Platform Introduction, Gaining Momentum
Big Data on Google Cloud Platform
Discussion
2
3
1
52. • Applications at the heart
of business interactions
• Devices and sensors
• Lower cost of storage &
ingestion
• New programming
models
• New scale and
capabilities for SQL
• Easily available software
(Open Source)
• Easy on-ramp, cost
effective experimentation
• Unlimited scale, low TCO
• Combine Open Source
software and platform
services
Ability to process Cloud consumption modelData availability
Key drivers in the growth of Big Data
53. Google Cloud Storage
Mix and match storage and computation from OSS and Google Cloud Platform
BigQuery and Datastore Connectors
BigQueryDatastore
Hadoop
BigQuery
Connector
Datastore
Connector
Cloud
Storage
Connector
HBase HivePig
Hadoop Applications
Hadoop, Pig, HBase, and Hive are trademarks of the Apache Software Foundation.
56. Ease of use
• Simplified infrastructure for realtime use cases
• Stream events row-by-row via simple API
Use cases
• Server Logs, Mobile apps, Gaming, In-App real time
analytics
BigQuery Streaming
Low cost: $0.01 per 100,000 rows Real time availability of data100,000 rows per second
Customer example:
57. Google Analytics + BigQuery
Google Analytics Premium Platform Google BigQueryData Pipeline
Native Data Pipeline to Load Data into BigQuery Project
59. BigQuery in Action
" The interactive performance of Google BigQuery,
combined with Tableau’s intuitive visualization tools,
enabled our analysts to interactively explore huge
quantities of data – hundreds of millions of rows – with
incredible efficiency. Previously, analyses would
require hours or days to complete, if they would even
complete at all. With Google BigQuery it takes
minutes, if that, to process. This time-to-insight was
previously impossible"
– Giovanni DeMeo
Vice President
Global Marketing and Analytics
60. " The simulation cluster ran for nearly two months as
part of the ATLAS distributed compute grid, logging
over 5 million core-hours, completing 458,000
computationally intensive jobs and processing about
214 million events. The cluster achieved sustained
peak throughput of 15,000 jobs per day. “We had a
great experience with Google Compute Engine … and
think that it is modern cloud infrastructure that can
serve as a stable, high performance platform for
scientific computing”.
– Dr. Panitkin
CERN Atlas Project
CERN Atlas Compute Grid Extended on GCE
61. • 1.5TB in 60 seconds
• 8,412 cores
• Google Compute Engine
MapR Breaks Minute Record Sort
66. “[Google's] ability to build, organize, and operate a
huge network of servers and fiber-optic cables
with an efficiency and speed that rocks physics on
its heels.
This is what makes Google Google: its physical
network, its thousands of fiber miles, and those
many thousands of servers that, in aggregate, add
up to the mother of all clouds.”
- Wired
Images by Connie Zhou