idol presentation
TRANSCRIPT
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP IDOLSpeaker’s name
Month day, 2014
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
Video surveillance
Wire tapping
Internet of ThingsFacebook likes
Tweets
DronesOnline shopping Search queries Tweets
RBMS Social sentiment
CRM
Web logs
User clickstreams
Business data feedsMobile
SMS/MMS
User generated content
Apps YouTube
Service logs
The dawn of the information era
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
Improve customer relationship
Extend life expectancy
Deliver better, smarter products
Ensure governance & compliance
Protect and save lives
HP IDOL makes data matter
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
Understanding meaning is the key to solving information challenges
Risk modeling Fraud detection Competitive advantage Behavior analysis Knowledge delivery
?
Volume VelocityVariety Veracity
Fin Services ManufacturingLife Sciences Hospitality GovernmentTelecom RetailEntertainment Energy HealthcareMedia
Future challenges
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
Understanding human information
• Access and understand virtually any source of information on-premise and in the cloud
• A strategic pillar of HP’s HAVEn Big Data platform
• Non-disruptive, manage-in-place approach complements any organization
Social Media Video Audio Email Texts Mobile
Transactional Data IT/OTDocuments Search Engine Images
Harnessing the power
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
86%of corporations cannot deliver the right information, at the
right time, to support enterprise outcomes all of the time³
³Source: Coleman Parkes Survey November 2012
Keyword, metatags, database technologies often fail
Legacy technologies fall short
• Manual process does not scale
• Multiple definitions of the same word
• Not real-time
• Inaccurate and subjective
• Limited definitions, no relativity
• No idea distancing
• Interoperability of tagging
• Retroactive reporting
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
How does HP Autonomy approach human information?
Continuous learning based on incoming data and contextAdaptive
Mathematical, language independent technologyProbabilistic
Extract main concepts present in informationConcept
Combination of proprietary technology and proven industry standard methodologiesModeling
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
HP IDOL: Key enabling technology
• Mathematically based
• 15 years and over $280M in R&D
• >170 Patents
• Language independent
• Built for infrastructure
• All file types, all media types (voice/video)
• Scalable and with security
• Platform/OS /device agnostic
• Managed in place
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Powered by IDOL, the OS for human information
Social Media Video Audio Email Texts Mobile TransactionalData
Documents IT/OT Search Engine Images
Apps for Exploratory Information Analytics
Apps for Information Governance and Management
Apps for Marketing Optimization
HP Autonomy connectors
Developer/Partner
External/CloudHP Autonomy Enterprise
Applications
The OS for human information
Repositories
Information types
OS service layers 500+ functions
DigitalSafe SharePoint Hadoop
CRM
Jive
Exchange
Relational DB
ACA AeD
WorkSite HP Records Mgr MediaBin
Data Protector Connected LiveVault
Driven by advanced analytics to understand data in context from any source
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
Over 500 IDOL functions to augment your intelligence
Automatic hyperlinking
Conceptual search
Keyword search
Fieldtext search
Phrase search
Phonetic search
Field modulation
Fuzzy matching
Implicit profiling
Explicit profiling
Community and expertise network
Agents
Intent-based ranking
Alerting
Social feedback
Eduction
Automatic clustering
Clustering 2D/3D
Autoclassification
Auto language detection
Sentiment analysis
Automatic taxonomy generation
Automatic query guidance
Highlighting
Parametric refinement
Summarization
Real-time predictive query
Metadata extraction
Automatic tagging
Faceted navigation
InquireSearch your data
InvestigateAnalyze your data
InteractPersonalize your data
ImproveEnhance your data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Search your data
• Conceptual, Keyword or Object• Extensive Field combinations• Full Meta Search
• Linearly Scalable• Fault Tolerant• Disaster Recovery Friendly
• All Information • Real-Time Data• Audio and Video
• Mapped Security• Fully Extendable• Leverages Existing Security
Accuracy
Robust Architecture
Reach
Security
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”Analyze your data
Quickly evaluate the relevance of information
• Automatic Query Guidance (providing top themes from query results in real time)
• Concept navigation via advanced visualizations (node graphs, theme tracking, topic maps, broadcast analysis)
• Intelligent summarization (simple, concept and context)
• Intelligent highlighting (search terms, phrases, concepts, context, fidelity to query grammar)
• Concept streaming (Real-time summaries from audio that are contextual to queries and intent)
• Intelligent de-duplication, including “near” de-duplication
Use structure to navigate the data
• Structured, semi-structured and XML support
• Parametric search (unlimited nesting and association support)
• Directed navigation (create compelling navigation for users)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Personalize your dataWe are what we…
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Personalize your data
Explicit profiling (agent): user-defined •Define your interest using:
- Natural language descriptions
- Keyword/ Boolean rules
- Refine by example
•Automatically monitor information
•Customizable
•Share interests with knowledge community
Implicit profiling: capturing behavior data• Fully automatic• Ongoing monitoring of data consumption
and contribution• Multi-faceted profiles• Always up-to-date
Expertise
CommunitiesAgents
Profiles
Dynamic communities of interest•Expert identification
•Define business rules to guide relationships
•Automatically form and manage community
•Collaboration Networks
•Document rating
•Consumer groups
Expertise Expertise
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Exploratory analytics that help you discover the “unknown unknowns”
Enhance your data
Managed classification• Create categories using business rules or training
Automatic classification and clustering• Automatically determine categories based on patterns and relationships in information• Spot analysis of all themes and grouping• Time sensitive analysis; What’s hot? What’s New?
Eduction• Apply structure to unstructured data by extracting key fields and entities• Hundreds of entities supported, including names, addresses, credit card information, sentiment, intent, etc
Audio analysis• Speaker independent speech to text, speaker identification, audio events, language identification, etc
Image and video analysis• Next generation image classification (is this a car?/find more like “this”)• On-screen OCR, logo detection, intelligent scene analysis, Color and texture analysis,
story segmentation, etc
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
HP Autonomy solutions family, powered by IDOL
IDOL
Compliance
Litigation Readiness
Storage OptimizationDatabase Archiving
eDiscovery
Supervision
Legal Hold
Enterprise Search & AnalyticsVoice of the Customer
Voice of the Worker
Media Intelligence
Video Surveillance
Big Data Analytics
Knowledge Mgmt
Content Access& Extraction
Records Mgmt
Legal Content Mgmt
Business Process Mgmt
Document Mgmt
Records Mgmt
Legacy Clean Up
Server Data Protection
Virtual Machine Data Protection
Remote & BranchOffice Data Protection
Endpoint DeviceData Protection
Cloud Data Protection
EnterpriseContent Mgmt
Archiving & eDiscovery
DataProtection
Web Experience Mgmt
Web Optimization
Search Engine Marketing
Marketing Analytics
Contact Center Mgmt
Rich Media Mgmt
Aurasma - Augmented Reality Mobile Experience
Digital Marketing Experience
Information Analytics
Information Management & Governance Marketing Optimization
Hybrid
OEM
Software
Cloud for human information
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
HP technology powered by IDOL
Enterprise Group
HP StoreAllHP StoreOnceHP Gen 8 Appliances
Enterprise Services
HP Social Command CenterHP Information Governance
PPS
HP FlowHP Live PhotoHP Connected Backup
IDOL + HadoopIDOL + VerticaHAVEn
Big Data
IDOL + ArcSight
Security HP Labs
Compass
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Foundational methodology
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
Strong information and weak informationKey Words are small amounts of very strong information without contextLarger amounts of weaker information is what humans refer to as “context”
“Mercury”
Is it a planet?Is it an element?Is it a car?With high certainty; its and element!
“A heavy element and the only metal that is liquid at standard conditions for temperature and pressure with the symbol Hg and atomic number 80, commonly
known as quicksilver”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
Uses pattern-matching and probabilistic modeling to form an understanding of content
HP IDOL understands the meaning of information
Fundamentally language-independent• Treats words as symbols
Allows incoming data to dictate the model, not pre-defined rules or dictionaries• Adapts to changing definitions
Optimized with language packs• Eduction, sentiment analysis, speech analytics
Information Theoryand Bayesian Inference
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
Best-in-class combination of approaches
XML and Boolean+
Natural language processing
Probabilistic
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
If we toss a coin 100 times and get heads every time, what’s the probability of getting a head on the 101st?
Traditional probability says: 50%© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
If we toss a coin 100 times and get heads every time, what’s the probability of getting a head on the 101st?
Adaptive intelligence: prior information changes the model of understanding
Bayesian Inference says: 99+%© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24
What is in front of this wake?...
With high probability we can say there is a……..
BOAT!
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26
Let’s play hang man
_ _ _ e _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ e _ _ _ _ _ _ _ _ _ _ _ _ _ t _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ e _ _ _ i _ _ _ _ i _ i _ _ t i _ _ _ _ i _ _ i _ _ _ i _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _
e
t
a
i
n
o
s
r
l
d
h
c
u
m
f
p
y
g
w
v
b
k
x
j
q
z
Supercalifragilisticexpialidocious_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Platform features
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28
Language independence
• Free from linguistic restraints and rules
• Automatically adapts to changing definitions
• Over 170 live customer languages
• Single, multibyte and Unicode languages
• Optional language packs for localization
Department of Homeland Security - Requires extremely precise handling of foreign languages, including Chinese and Arabic
Open V – China’s largest online video website
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29
HP IDOL powers the largest systems in the world
Scalability
Millions of users
• Dept of Defense: 2.5 million users
Billions of documents
• Large bank: Over 1 bn emails
• Pharma: 50 terabytes of data in discovery repository alone
High throughput
• Bloomberg: Alert on 46m emails per day
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30
Mapped security
• Fully integrated Kerberos authentication together with Secure Socket Layer (SSL) encryption across all transactions
• Compliance with all major Security Standards, including US DoD5015.2, UK TNA2002, Australia’s VERS, ISO 15489
• Full-range of customizable security functionality:
– Discretionary access control (ACL based)
– Mandatory access control (Based on metadata)
– Kerberized access to IDOL
– SSO authentication using Windows Active Directory
Single supplier to US Department of Homeland Security
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31
Intelligent compaction
• Pause and resume the operation without causing corruption
• Monitor the progress
• Skip large sections of the index when appropriate to expedite the operation
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32
http://host:ACIPort/action=admin
HP IDOL admin
Answer common questions and ease common actions:
• “Why is this query slow?”
• “What’s using up so much memory in my engine?”
• “Is my engine operating as expected?”
• “I need to perform some light maintenance (DREREPLACEs, etc) but don’t want to bother writing a perl script.”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Architecture
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.34
HP IDOL connector overview
Connector actions
• Synchronize (fetch)
• View
• Identifiers, Collect, Hold, ReleaseHold
• Insert, Delete, Update
Repository ConnectorConnector framework
serverIDOL
LUA w/IDOL extensions
DocumentFormat
detection
Pre-import processing
KeyViewfiltering
Post-import processing
LUA w/IDOL extensions
Index into IDOL
Repository
Connector
Connector framework server
IDOL
Repository
Connector
Repository
Connector
DIH
IDOL IDOL
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35
HP IDOL data ingestion pipeline
LUA scripting engine is available within connectors
KeyView file format process, Eduction and LUA scripting engine are available within CFS
Repository
Connector
Connector framework server
Content
Repository
Connector
Repository
Connector
DIH
IDOL ProxyIndex tasks
OCR
Audio/Video
Category
APA Agents
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36
Provides a flexible way of batching, scheduling, routing, and aggregating information into IDOL servers
Distributed Index Handler (DIH)
Features• Consistent hashing
• Batch indexing
• Index routing
• Virtual databases
• Categorization-based indexing
• Time-based indexing
Benefits• Seamless integration with backend modules
• Resilience
• Scalability
• Flexibility
IDOL Server 1 IDOL Server 2
Distributed Index Handler
(DIH)
Connector
Mirror/non mirror index
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.37
Intelligent query distribution
Distributed Action Handler (DAH)
Features
• Arbitrary distribution
• Mirrored configuration
• Non-mirror configuration
• Load balancing
• Fail-over
Benefits
• Linear scaling
• Improved performance
• Reduced processing time
• Robustness
IDOL 1 IDOL 2 IDOL 3 IDOL 4
1
DAH1
N 1 N
DIH1
N N
DAH3 DAH2
DIH3 DIH2
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38
Globally distributed system
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Advanced functions in depth
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.40
HP IDOL retrieval methods
Conceptual
• Natural language
• Conceptual matching
• Unstructured refinement
Business rules
• Boolean
• Keyword
Parametric
• Structured refinement
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41
Over 100 operators for Boolean search
AND
OR
NOT
NEAR
NEARn
DNEAR
DNEARn
WNEAR
WNEARn
BEFORE
AFTER
EOR
WHEN
WHENn
vAND
vSUBSTRING
vMATCHES
NEAR
NEAR/n
SENTENCE
PARAGRAPH
BEFORE
AFTER
ORDER
SOUNDEX
MANY
[n] WORD
CASE
PHRASE
. >
. >=
. <
. <=
. !=
. =
LANG/x
TODAY
YESTERDAY
NOW
NOW+n
NOW-n
term
term*
term?
vOR
vNOT
vACCRUE
vANY
vALL
vIN
vWHEN
vCONTAINS
vENDS
vSTARTS
vSUBSTRING
vCONTAINS
vENDS
vSTARTS
FREETEXT
STEM
TYPO
TYPO/n
YES-NO
PRODUCT
SUM
COMPLEMENT
LOGSUM
LOGSUM/n
MULT
MULT/n
FREQ
term~
term[100]
term[*1.5]
"term"
"term phrase"
term:field
"term
phrase":field
~term
FUZZY()
FUZZYnn()
SOUNDEX()
APCMMOD[]
term[~]
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.42
Conceptual search
High recall and precision
• Return documents that do not contain query terms but are conceptually related
Input sentences or entire document as query
• Extracts main concepts in the query to deliver the most relevant results
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.43
Automatic hyperlinking
• Automatically retrieves conceptually related content
• Searches automatically done for the user
• Increase productivity and reduce duplicate work
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.44
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Add context to short queries by grouping results into concepts
Automatic query guidance
Query ”Madonna”
Results: Documents containing ”Madonna”
Query search
Documents about:1. Singer2. Italian Renaissance3. Madonna Further
suggestions…
Most likely meaning…
Result documents
Conceptual clustering
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.45
Summarization
Quick summary(N+ lines)
Context summary(What is this doc about with relation
to query terms?)
Concept Summary(What do I look for with regards
to interest rates?)
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Information Theory andBayesian Inference
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.46
Directed navigation Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Narrow search with facets
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.47
Visualization of main topics Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.48
Understanding the customer at the level of a dialog
Contextual segmentation
Geo + Demo + Psychographicsegments Behavioral segments
Functions
Performance
Feature Driven
Reviews
News
Adverts
Social media
Buzz driven
18-35 yrs
35-65
Seniors
Have Kids
Male
Female
Semantic segments
Large Screen
Lots of storage
High Res Display
Would give it 5 stars
Great Value for Price
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.49 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Intent-based ranking
Search results personalized and targeted based on user and context
Profile developed through complete behavior analysis… implicit or explicit profiling
Gather data from content consumption,
content contribution, interaction with colleagues, etc.
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.50
Foster collaboration by automatically matching and connecting employees with similar needs
Connect with your colleagues
Experts
Communities
Files
Social
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.51
Product performance issues
Clustering
Side letters
Off balancesheet transactionsAutomatically
partition the data so that similar information is clustered together
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.52 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Topical sentiment analysis
Decomposition and classification within a sentence to pull out specific topics
“I stayed at the Marriott last week, and though the mattresses were very nice, the service was awful.”
Is this Positive? Negative? Neutral?
How much Positive? How much Negative?
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.53
Hundreds of conceptual entities
Eduction
Quickly narrow search results with auto-identified facets and conceptual entities such as employee names from documents
Validate or customize entities
• Is this a valid credit card number?
• What are all docs that contain SSNs?
• If area code is 415, output as Home Office
Pinpoint accuracy for multibyte languages such as CJK, Thai and some European languages
NamesPlacesIP addressesCompaniesEventsRelationshipsMedicinesAirportsCarsSocial Security numbersPhone numbersCredit cardsDatesHolidaysJob titlesCurrencies… many more
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.54
Eduction Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
<Organization>• National Security Agency
<Names>• President Obama• Vladimir Putin• Edward Snowden
<Places>• Moscow• St. Petersburg• Washington• Syria• Russia
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.55
Search video as easily as textTransform rich media into intelligent assets
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Live video or playback from archived footage
On-screen text recognition
Face identification
Automatically generated transcript using speech
recognition
Speaker identification
Timecodesynchronization
Automatic keyframe generation
AutomateAutomatically create metadata, keyframes, transcriptions
UnderstandUnderstand video footage and audio streams in real time
ActApply advanced analytics such as clustering and categorization, and link with other file types
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.56
Most advanced speech technology
Convert spoken words to text• Acoustic + Language Model
• Speech-to-Text and IDOL’s conceptual understanding
Eliminate manually adding metadata to A/V clips
Phonetic approaches have major problems• No Conceptual or Contextual Language Understanding
• Keyword-Based
Model of language disambiguates similar terms• U.S. President “Bush”
• “bush” as in a large plant
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.57
Limitations of phonetic search
Phonetic sounds do not have a unique match
Only capable of keyword matching• “Cambridge University”• /k ey m b r ih jh y uw n ih v er s ax t iy/• The University of Cambridge• Cambridge colleges• Kings College• Trinity Hall
/k/ /ae/ /t/
“cat”
“category”
“scatty”
“catalogue”
?
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.58
Accurate speech technology
Language independent, statistical algorithms to recognize speech +
Language dependent, acoustic and language models for each supported language
How is the voice being recorded?Telephone models, Hz Rates Broadcast Models
What language +common phases,product names, etc.
Trained dictionarywith vocabulary and conceptual
understanding
Recognized hypothesis
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Front end processing
Front end processing
Front end processing
Front end processing
Front end processing
Speech
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.59
Statistical models of speech and language
Speech-to-text technology
P(W) = probability of word string W
P(A|W) = probability of a acoustic sequence A given W
Use Bayes rule to find the word string w that has the highest probability given the acoustic sequence
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Language model
W = arg max P(W|A) = arg max P(W) P(A|W)P(A)
Acoustic model
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.60
Language model provides probability of word sequence
Forms a conceptual understanding of language
“Can I help you?” vs. “Can eye help you?”
Trained from large text corpora (Hundreds of millions of words)
Defines words that can be recognized
Use training text, e.g., broadcast news
Encompasses topic information, colloquial phrases, etc.
Adaptable for particular customer
Specialist vocabulary, e.g., product names
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.61
Acoustic model analyzes the sounds that comprise a spoken language
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Audio analyzed to extract energy at various frequencies
Dependent on audio format
Complex statistical techniques model both the sounds and audio characteristics
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.62
Image technology: Text
Document field extraction
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
<item><price>$6.23</price><date>10/2/2012</date><purpose>Lunch</purpose>…</item>
OCR: Read text from images
1D and 2D barcode reading
ISBN (“9870140189865”) PDF-417 (“LASTNAME, FIRSTNAME,…”)
Data Matrix (“The Future of Ticketing…”)
Many more (about 20 barcode types)
Image artifacts such as wrinkled paper
Avoid non-text parts of the image
Column understanding
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.63
Image technology: 2D objects
Registered image Test image
Generic Logo recognition
Registered Logos
Test image
Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.64
Image technology: Human analysis Inquire“Search your data”
Investigate“Analyze your
data”
Interact“Personalize your
data”
Improve“Enhance your
data”
Primary clothing color = whiteNot nude
Primary clothing color = whiteNot nude
Primary clothing color = blackNot nude
Face detection
Face analysis
Found “President Obama” face
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.65 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hadoop/HDFS connector
Ingest Hadoop data into IDOL for advanced retrieval
Extract metadata, enrich and conduct advanced analytics for files stored in Hadoop
Push enterprise documents into Hadoop (chat data, ODBC, documents) for MapReduce analysis
Collect documents in Hadoop for legal collection
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What’s new
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.67
HP IDOL 10
Extending leadership in human information analytics
More powerful Easier to operate Reliable
• Analyze sentiment at a granular level• Automatically extract 100s of entities
for improved search• Enhance your Hadoop investment• Deliver search results personalized for
each user• Improve audio and image analysis• Increase query speed by up to 30%
• Quickly answer performance-relatedquestions with our new visual dashboard, IDOL Admin
• Dynamically expand capacity without re-indexing for improved performance and no downtime
• Increase your indexing speed by as much as 47x with improved data transmission
• Recover intelligently from system failures with improved self-diagnosis of indices
• Securely delete content from your index
• Prevent the loss of documents during the indexing process
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.68
Latest innovations in IDOL 10
Core IDOL algorithm enhancements• Improved compaction
• Improved ability to repair indices
• Improved query speed
• Incremental backup and point-in-time restore
IDOL architecture improvements• Indexing flow control
• IDOL Admin
Speech• Multi-CPU support
Eduction• Improved handling of multi-byte
languages
• New grammars
• Degrees of sentiment analysis
• 3x performance improvement in sentiment analysis
Image• Object detection
• Unified analysis
…and many more!
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.69
Key strategic themes of HP IDOL development
Platform for search based applications
Enable internal and external partners to more easily leverage IDOL as a platform to build applications
Strengthen core functionality
Improve existing areas (e.g., sentiment), and continue growing in new areas (e.g., image)
Simplified consumption
Easier to install with more robust features
Consumable from private and public cloud for rapid web services
Next-generation enterprise search
Reinvent enterprise search in the era of cloud, mobile, and social computing
Big Data / analytics
Enable IDOL as content analytics platform in the broader Big Data / Information Analytics ecosystem; integrate Hadoop
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Use cases
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.71
Insert slides from relevant decks
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you