The use of big data analytics—in short, finding insights in extremely large data sets—has been transforming the way many U.S. industries operate. And the legal industry is no exception: From case law to M&A transactions, attorneys are now able to see the forest for the trees.
But such insights do not come easy. Collecting and analyzing enough data to provide meaningful metrics to legal professionals can be an arduous affair, especially given the challenges of amassing certain types of data in the first place.
Federal case law data, for example, is fairly easy to obtain. There is PACER, which legal tech companies have learned to expertly mine. Meanwhile, other companies have been building their own case law repositories for years and tapping them for far reaching analytics.
But with legal business data, such as legal service costs and industry-wide legal spend benchmarks, the challenges of building out big data analytics becomes more obvious. While law firms and legal departments would benefit from benchmarking data around legal services pricing, for example, they are oftentimes reluctant to disclose how much they pay or charge for their services in the first place.
“There is no reason for a private party such as a law firm or a spender on legal services, like the legal department of a big company, to give you access to your data unless you can offer some value on that,” said Ketan Jhaveri, co founder and President of Bodhala, a big data legal spend analytics platform.
He added that finding initial databases of information to mine “is often a problem with data businesses. They call it the ‘data cold start’ problem.”
For many big data analytics companies, overcoming this problem means having to invest up front to find or elicit enough data to get going. Jhaveri said, “We spent millions of dollars creating our proprietary data set” to launch with. But Bodhala didn’t buy its initial data set—by its own estimation, any for-sale legal data on the market wasn’t up to par anyway.
Instead, the company built search indexing software, which uses proprietary machine learning algorithms to find, collect, classify publicly-available online data on companies’ finances and legal services costs.
“Whether it’s from the press, whether it’s what The American Lawyer reports, whether it shows up in a press release a law firm puts out, or whether it shows up an SEC filing,” the technology will find it and add it to Bodhala’s database, Jhaveri said.
With this market knowledge information in hand, Bodhala found it could now also further grow its database with private data from legal departments and law firms, who would hand over their information in exchange for free access to Bodhala’s analytics.
To be sure, Bodhala’s isn’t the only legal spend analytics company in the market. Brightflag, for example, uses AI technology to automatically categorize invoices for legal departments and provide them with invoice and spend analytics. Wolters Kluwer’s LegalVIEW BillAnalyzer also analyzes corporate legal invoices to find overcharges or invoices that deviate from set departmental standards.
Both Brightflag’s and Wolters Kluwer’s tools, however, offer data analytics on a micro, client-by-client level. But they may be looking to expand to big data analytics down the road.
Brightflag CEO Ian Nolan, for example, told Legaltech News that while they don’t currently offer market-wide legal spend metrics, “certainly the potential is to do that in the future.”
There are a few ways Brightflag could do this. The company could essentially get permission to collect and amass its clients’ invoice and spend data into a big data analytics platform. Or it could go the way of Bodhala and create a proprietary own database from pulling information from public websites.
But that strategy may become harder in the future. After all, online publications and websites could start to push back on how others can index and collect the data it publishes or hosts. And for some analytics companies, that is already happening.
Social media site LinkedIn, for example, recently sought legal action to stop start up firm hiQ, which creates analytics tools for employers, from mining data from LinkedIn profiles. The action, which is ongoing, may have far reaching consequences for how internet and data analytics companies operate in the future.
“One of [complex points] the courts will have to address for people who are dependent on LinkedIn data, is that LinkedIn does allow itself to get indexed from Google because that is a driver of its traffic,” Jhaveri said. “So are there limits of taking advantage of being indexed by Google and search engines versus keeping your information away from other startups? What’s the balance, what are the rules? That is something to be determined.”
Yet for the most part, Jhaveri is not too worried about the future of big data analytics. “If you’re a company that is dependent on building proprietary data sets that are dependent on other companies’ proprietary data sets, there is going to be a challenge. But if you’re building your datasets of our primary sources, I don’t think that raises a ton of issues.”
To read the article on Law.com, click here.