wikibon #iot #hyperconvergence presentation via @thecube

The Journey To IoT Systems Of Intelligence:Determined By Combination of Tech and Enterprise Capabilities

Smart Grid

Adjunct Data Warehouse

Customer 360

Real-time loyaltyomni-channelmulti-touchpoint

Predictive model learns from and anticipates consumer in near real-time

Continuously updated predictive models of energy supply, demand tune end-point consumption

Autonomic Systems Management System learns “normal” behavior of apps and infrastructure and flags or fixes anomalies

Data Lake with some production analytics offload from Data Warehouse

Enough internal and external customer data in a pipeline to start predictive modeling

Applications

Foundation Capabilites:Speed, Richness of Analytics

2

Vendor New Services

Telco Manage capacity of towers, cells, switches, connections, devices. Performance dashboards and reports on customer consumption for billing and infrastructure utilization for capacity planning.

Intelligent Service Provider

Real-time updates/integration between individual plans, consumption, and promotions; Real-time integration of individual consumer SLAs and connection / bandwidth allocation in order to support tiered pricing

Use CaseSystems of Record Transition to IoT Systems of Intelligence:

From Telco OSS/BSS to Intelligent Service Provider

Use Case: Bridging Carrier App Billing and Network Operations

Customer- and developer-facing services Billing and settlement• App store and in-app billing via carrier billing• Provisioning app install order on credit verification• Settle developer royalties based on splitsOffers• Offer discount on monthly top-up of bandwidth if user is heavy consumer over time and

approaching monthly limit• Serve app install adds based on user profile Network operations-facing servicesNetwork performance and configuration management• Real-time ingestion of CDRs to create heat map of network performance. This requires

such fast ingest that it would likely be done by streaming products in absence of in-memory DBMS. (this is IoT machine data app example)

Bridging customer-facing and network-facing services• Enrich CDR data with information about customer profitability• Real-time prioritization of bandwidth on a per customer basis when there is high

congestion

Spectrum of Applications: Fast Data vs. Big Data

Fast Data Big Data

Range of “Real-Time” Interactions• REAL RT: high frequency algorithmic

securities trading on one end of the spectrum

• Updates every couple hours: inventory levels accessed by ecommerce, mobile apps at other end of spectrum

Modern SoR makes it easier to get to fastest part of spectrum

Real-Time is a Matter of Degree: Choices Depend on Usage Scenario, Accessibility of Applications That Need to be Integrated – Including Legacy and Modern Systems of Record

GB

TB

PB

Data

Vol

ume

Yr Mo Day Hr Min Sec MS µS

AdvancedAnalytics

Data Velocity

Data Warehouse OLTP,

Operational Intelligence

Big Data: Machine Learning, Predictive Analytics

OLTP

Business Intelligence,Production Reporting

Fast Data: Streaming DataPer Event Decisions

*TRADITIONAL* Analytic Trade-Off:Speed vs. Richness

Traditional Data Warehouse PipelineTime-to-analysis bottlenecked by • Design time: Need to

decide questions before building the analytic pipeline

• Runtime: Batch ETL

DataWarehouse

OLTPApplications

Batch ETL

Ingest: SlowAnalysis: Rich But Slow

Analytic Trade-Off:Speed vs. Richness

Hive ETL

Pig/Sqoop ETL

Hadoop/HDFS

Iterative self-service and incremental database design

Data provisioning

Interactive BI

Production

Reporting

OLTPApplications

Hadoop Data PipelineTime-to-analysis bottlenecked by • Design time: Iterative,

incremental analysis and enrichment

• Runtime: Inherent batch design center

Ingest: SlowAnalysis: Rich But Slow


OLTPApplications

Hadoop/HDFS

Iterative self-service and incremental database design

Interactive BI

Production

Reporting

Hadoop Data Pipeline with Streaming IngestTime-to-analysis bottlenecked by • Design time: Still need

iterative, incremental analysis and enrichment

• Runtime: real-time ingest but data still needs to be stored before rich analytics

Streaming Ingest: FastAnalysis but Limited

Hadoop Cluster

Analysis: Rich but Slow

Stream Processor

BOTTLENECK: DBMS Storage *Before* Rich Analysis


Hadoop Cluster

Integrated Streaming and Persistence: Real-Time, Rich Analysis

StoreE-Mail

Social Media

Operational apps

Customer interactions

Customer“Breadcrumbs”

Predictions,Recommendations

ImprovingPredictions(Machine Learning)

Operational Data

IoT – Devices, MachinesMachine

Data

Stream Processor

Better Integration of Real-Time and Batch:Analytic Trade-Off Between Speed vs. Richness Diminishes

GB

TB

PB

Data

Vol

ume

Yr Mo Day Hr Min Sec MS µS

AdvancedAnalytics

Data Velocity

Big *AND* Fast Data: Machine Learning onHistorical AND Recent DataDrives Per Event Decisions

OLTP

Better Integration of Real-Time and Batch:Analytic Trade-Off Between Speed vs. Richness Diminishes

GB

TB

PB

Batc

h Pr

oces

sing

Min Sec MS µS

Streaming - Velocity

Big Data Maximum throughput of dataExploratory analysis of historical data

Fast DataFastest speed to make a decision on each event

Streaming is Newest Religious War: Use It For *All* Analytic Workloads? Processing Lots of Data vs. Analyzing Each Event = Inherent Conflict

“Streams can do it all” school: Big Data Apps are Just Fast Data Apps Scaled-Out• If it can handle fast data, just scale it out to handle big

data• Big win: only one application needed

Wikibon recommendation (elaborated on next slide):Streaming and batch *will always* coexist• Even batch programs on streaming platform will still

have different application logic…• High volume machine learning vs. incremental update• Historical performance analysis vs. looking up a profile

Latency(Higher is Slower)

Even When Streaming Engines Support More Sophisticated Analytic WorkloadsThe Applications Are Likely to Differ Between Event-at-a-Time vs. Batch

Analytic Sophistication

Basic Streaming

SQL

Machine Learning

What HappenedCounting

What HappenedExploration, OLAP or Dashboard

Anticipate or Act AutomaticallyPrediction or Prescription

IMPLICATION: Converging on one application engine not critical

Stream processors: Spark, Flink, InfoStreams, Samza, DataTorrent, (DB): VoltDB / MemSQL

Hist

oric

al a

naly

sis

Batc

h-or

ient

edPe

r Eve

nt-O

rient

ed

Profi

le lo

okup

Expl

ore

larg

e, n

ew

data

Incr

emen

tal m

odel

up

date

YARN – Cluster Resource Management

HDFS or operational database

StreamingStorm, Flink,Samza, Data Torrent

SQLImpala, Drill, Hive, HAWQ…

Machine LearningMahout…

Key Takeaway: Coexistence of Batch and Streaming Means One Application Engine Doesn’t Have to Rule All - Spark and Hadoop Can Live Together

Pro: Mix and match pipeline comprised of specialized processing *optimized* for each workloadCon: Batch-only - hand-off between processing engines via storage is slow. Each processing engine is standalone and can’t leverage the others’ functionality

Pro: Fast and simple - pipeline comprised of one in-memory engine with streaming, SQL, machine learning, graph personalities (libraries)

Con: still immature – performance an issue; haven’t fully delivered integration – But Tungsten per boost, IBM projects could add huge new value

Spark Core

Spark MLlib

Spark Streaming

Machine Learning

Spark SQL: Join, filter, aggregate

Streaming Ingest

Spark SQL

HDFS or operational database

YARN or Mesos or other Workload Mgr

wikibon #iot #hyperconvergence presentation via @thecube

Technology